Blog - Data Engineering Digest

apache-kafka-intro-how-kafka-works

Blog

Rebuilding Yelp's Data Pipeline with Justin Cunningham - Episode 5

Data Engineering Podcast

JUNE 17, 2017

In this episode Justin Cunningham joins me to discuss the decisions they made and the lessons they learned in the process, including what worked, what didn’t, and what he would do differently if he was starting over today. Can you start by giving an overview of your pipeline and the type of workload that you are optimizing for?

Data Pipeline

Data Pipeline Kafka Business Intelligence Architecture

Building A Real Time Event Data Warehouse For Sentry

Data Engineering Podcast

NOVEMBER 26, 2019

Your host is Tobias Macey and today I’m interviewing Ted Kaemming and James Cunningham about Snuba, the new open source search service at Sentry implemented on top of Clickhouse Interview Introduction How did you get involved in the area of data management? How have you found the operational aspects of Clickhouse?

Data Warehouse

Data Warehouse Building PostgreSQL Kafka

Join 16,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

How To Get Promoted In Product Management

MORE WEBINARS

Trending Sources

Building Real Time Applications On Streaming Data With Eventador

Data Engineering Podcast

APRIL 19, 2020

Eventador is a managed platform designed to let you focus on using the data that you collect, without worrying about how to make it reliable. This was an interesting inside look at building a business on top of open source stream processing frameworks and how to reduce the burden on end users.

Building

Building PostgreSQL MongoDB SQL

Webinars

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

How To Get Promoted In Product Management

MORE WEBINARS

Putting Apache Spark Into Action with Jean Georges Perrin - Episode 60

Data Engineering Podcast

DECEMBER 9, 2018

Summary Apache Spark is a popular and widely used tool for a variety of data oriented projects. With the large array of capabilities, and the complexity of the underlying system, it can be difficult to understand how to get started using it. How does it compare to some of the other streaming frameworks such as Flink, Kafka, or Storm?

Scala

Scala MySQL Kafka Hadoop

Metadata Management And Integration At LinkedIn With DataHub

Data Engineering Podcast

AUGUST 24, 2020

LinkedIn has gone through several iterations on the most maintainable and scalable approach to metadata, leading them to their current work on DataHub. I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Can you describe how DataHub is architected?

Metadata

Metadata Management Kafka Data Engineering

TimescaleDB: Fast And Scalable Timeseries with Ajay Kulkarni and Mike Freedman - Episode 18

Data Engineering Podcast

FEBRUARY 11, 2018

In this episode the founders of TimescaleDB, Ajay Kulkarni and Mike Freedman, discuss how Timescale was started, the problems that it solves, and how it works under the covers. They also explain how you can start using it in your infrastructure and their plans for the future. What impact has the 10.0

PostgreSQL

PostgreSQL NoSQL Google Cloud MongoDB

Data Engineering Weekly #124

Data Engineering Weekly

MARCH 26, 2023

Come and hear talks from companies like StarTree, Confluent, LinkedIn, DoorDash, Imply, and Uber on how they are advancing the state-of-the-art in user-facing analytics delivered instantly. If you follow Data Engineering Weekly, We actively talk about data contracts & how data is a collaboration problem, not just an ETL problem.

Data Engineering

Data Engineering Data Engineer Engineering Lambda Architecture

Fast Analytics On Semi-Structured And Structured Data In The Cloud

Data Engineering Podcast

OCTOBER 7, 2019

In this episode CEO Venkat Venkataramani and SVP of Product Shruti Bhat explain the origins of Rockset, how it is architected to allow for fast and flexible SQL analytics on your data, and how their serverless platform can save you the time and effort of implementing portions of your own infrastructure.

Structured Data

Structured Data Cloud SQL Programming Language

How to Use KSQL Stream Processing and Real-Time Databases to Analyze Streaming Data in Kafka

Rockset

MARCH 19, 2020

Intro In recent years, Kafka has become synonymous with “streaming,” and with features like Kafka Streams, KSQL, joins, and integrations into sinks like Elasticsearch and Druid, there are more ways than ever to build a real-time analytics application around streaming data in Kafka.

Kafka

Kafka Database Process SQL

The Good and the Bad of Apache Airflow Pipeline Orchestration

AltexSoft

NOVEMBER 7, 2022

But apparently, things were much more difficult before Apache Airflow appeared. How data engineering works. What is Apache Airflow? Apache Airflow is an open-source Python -based workflow orchestrator that enables you to design, schedule, and monitor data pipelines. Source: Apache Airflow.

PostgreSQL

PostgreSQL Metadata Python MySQL

Rebuilding Yelp's Data Pipeline with Justin Cunningham - Episode 5

Building A Real Time Event Data Warehouse For Sentry

Webinars

Trending Sources

Building Real Time Applications On Streaming Data With Eventador

Webinars

Putting Apache Spark Into Action with Jean Georges Perrin - Episode 60

Metadata Management And Integration At LinkedIn With DataHub

TimescaleDB: Fast And Scalable Timeseries with Ajay Kulkarni and Mike Freedman - Episode 18

Data Engineering Weekly #124

Fast Analytics On Semi-Structured And Structured Data In The Cloud

How to Use KSQL Stream Processing and Real-Time Databases to Analyze Streaming Data in Kafka

The Good and the Bad of Apache Airflow Pipeline Orchestration

Stay Connected