Remove apache-kafka-intro-how-kafka-works
article thumbnail

Rebuilding Yelp's Data Pipeline with Justin Cunningham - Episode 5

Data Engineering Podcast

In this episode Justin Cunningham joins me to discuss the decisions they made and the lessons they learned in the process, including what worked, what didn’t, and what he would do differently if he was starting over today. Can you start by giving an overview of your pipeline and the type of workload that you are optimizing for?

article thumbnail

Building A Real Time Event Data Warehouse For Sentry

Data Engineering Podcast

Your host is Tobias Macey and today I’m interviewing Ted Kaemming and James Cunningham about Snuba, the new open source search service at Sentry implemented on top of Clickhouse Interview Introduction How did you get involved in the area of data management? How have you found the operational aspects of Clickhouse?

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Building Real Time Applications On Streaming Data With Eventador

Data Engineering Podcast

Eventador is a managed platform designed to let you focus on using the data that you collect, without worrying about how to make it reliable. This was an interesting inside look at building a business on top of open source stream processing frameworks and how to reduce the burden on end users.

Building 100
article thumbnail

Putting Apache Spark Into Action with Jean Georges Perrin - Episode 60

Data Engineering Podcast

Summary Apache Spark is a popular and widely used tool for a variety of data oriented projects. With the large array of capabilities, and the complexity of the underlying system, it can be difficult to understand how to get started using it. How does it compare to some of the other streaming frameworks such as Flink, Kafka, or Storm?

Scala 100
article thumbnail

Metadata Management And Integration At LinkedIn With DataHub

Data Engineering Podcast

LinkedIn has gone through several iterations on the most maintainable and scalable approach to metadata, leading them to their current work on DataHub. I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Can you describe how DataHub is architected?

Metadata 100
article thumbnail

TimescaleDB: Fast And Scalable Timeseries with Ajay Kulkarni and Mike Freedman - Episode 18

Data Engineering Podcast

In this episode the founders of TimescaleDB, Ajay Kulkarni and Mike Freedman, discuss how Timescale was started, the problems that it solves, and how it works under the covers. They also explain how you can start using it in your infrastructure and their plans for the future. What impact has the 10.0

article thumbnail

Data Engineering Weekly #124

Data Engineering Weekly

Come and hear talks from companies like StarTree, Confluent, LinkedIn, DoorDash, Imply, and Uber on how they are advancing the state-of-the-art in user-facing analytics delivered instantly. If you follow Data Engineering Weekly, We actively talk about data contracts & how data is a collaboration problem, not just an ETL problem.