Berlin Buzzwords 2023 - notes for data engineers

That's the conference I've heard only recently about. What a huge mistake! Despite the lack of "data" word in the name, it covers many interesting data topics and before I share with you my notes from this year's Data+AI Summit, let me do the same for Berlin Buzzwords!

Looking for a better data engineering position and skills?

You have been working as a data engineer but feel stuck? You don't have any new challenges and are still writing the same jobs all over again? You have now different options. You can try to look for a new job, now or later, or learn from the others! "Become a Better Data Engineer" initiative is one of these places where you can find online learning resources where the theory meets the practice. They will help you prepare maybe for the next job, or at least, improve your current skillset without looking for something else.

👉 I'm interested in improving my data engineering skillset

See you there, Bartosz

Streaming

A Crash Course in Error Handling for Streaming Data Pipeline by Stefan Sprenger

Besides, Stefan also gives some code details on how to implement the Dead-Lettering and retries with Kafka Streams.

Minimizing the memory footprint of Apache Flink by Robert Metzger

Thank you, Robert! I haven't seen such a technically detailed investigation of the JVM in the data context for years! In summary:

A Kafka Client's Request: There and Back Again by Danica Fine

I haven't expected to see an in-depth talk about Apache Kafka elsewhere other than Kafka Summit. Good lord, how wrong I was! Danica shared a great deep dive into Kafka Client's requests. Put differently, she explained this picture in details:

The talk has a lot of details, I'm summarizing here my discoveries or important reminders:

Data engineering

Apache Airflow in Production - Bad vs Best Practices by Bhavani Ravi

Because it's always to have a handy list of best practices, I couldn't miss Bhavani's talk about them in Apache Airflow!

When Probably is Good Enough by Savannah Norem

Savannah gave a great talk about probabilistic data structures. How good was to refresh my memory and discover the structures I haven't covered in my exploration back in 2018. Some takeaways from the talk:

Hadoop Vectored IO: your data just got faster! by Steve Loughran

Steve Loughran explained a new Hadoop Vectored IO that improves reading from the cloud object stores:

I also had plans to watch the talks about column lineage, Kaldb, ClickHouse, and Data Mesh migration, but finally needed to postpone due to other topics waiting in my head. For sure, I'll watch them one day but in the meantime, I prefer to share the notes for the 6 first watched presentations, hopefully you find them useful!


If you liked it, you should read:

đź“š Newsletter Get new posts, recommended reading and other exclusive information every week. SPAM free - no 3rd party ads, only the information about waitingforcode!