Data Engineering Digest

Change Data Capture For All Of Your Databases With Debezium

Data Engineering Podcast

JANUARY 5, 2020

How has the tight coupling with Kafka impacted the direction and capabilities of Debezium? Pulsar, Bookkeeper, Pravega)? What are some of the design tensions that exist in the Debezium community between acting as a simple pipe vs. adding functionality for interpreting/aggregating/formatting the information contained in the changesets?

Database

Database Kafka PostgreSQL MySQL

Insights And Advice On Building A Data Lake Platform From Someone Who Learned The Hard Way

Data Engineering Podcast

MAY 15, 2022

With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. How do you approach the build vs. buy problem and quantify the tradeoffs? How do you approach the build vs. buy problem and quantify the tradeoffs?

Data Lake

Data Lake Building BI Architecture

Stateful, Distributed Stream Processing on Flink with Fabian Hueske - Episode 57

Data Engineering Podcast

NOVEMBER 18, 2018

How does Flink compare to other streaming engines such as Spark, Kafka, Pulsar, and Storm? What are the comparative challenges of working with bounded vs unbounded streams of data? How does Flink compare to other streaming engines such as Spark, Kafka, Pulsar, and Storm? How is Flink architected?

Process

Process Scala Google Cloud Kafka

Webinars

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

How To Get Promoted In Product Management

MORE WEBINARS

Adopting Real-Time Data At Organizations Of Every Size

Data Engineering Podcast

DECEMBER 4, 2022

types of organizations/teams who are adopting real-time consumers of real-time data locations in data/application stacks where real-time needs to be integrated challenges (technical/infrastructure/talent) involved in adopting/supporting streaming/real-time lessons learned working with early customers that influenced design/implementation of Materialize (..)

Data Lake

Data Lake MongoDB MySQL Data Warehouse

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

Data Engineering Podcast

AUGUST 3, 2021

With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform.

Data Lake

Data Lake Data Warehouse Hadoop Architecture

Optimize Your Machine Learning Development And Serving With The Open Source Vector Database Milvus

Data Engineering Podcast

AUGUST 6, 2022

an image vs. an audio file, etc.) an image vs. an audio file, etc.) What are some of the characteristics of vector embeddings that might make them immune or susceptible to confusion of similarity across different source data types that share some implicit relationship due to specifics of their vectorized representation?

Machine Learning

Machine Learning Database MySQL PostgreSQL

Data Engineering Annotated Monthly – July 2021

Big Data Tools

AUGUST 3, 2021

Rack-aware Kafka streams – Kafka has already been rack-aware for a while, which gives its users more confidence. However, a part of Kafka called Kafka Streams, a stream processing framework and a competitor to other streaming solutions, is currently not rack-aware. Articles This section is about inspiration.

Data Engineering

Data Engineering Data Engineer Engineering Big Data Tools

Data Engineering Annotated Monthly – July 2021

Big Data Tools

AUGUST 3, 2021

Rack-aware Kafka streams – Kafka has already been rack-aware for a while, which gives its users more confidence. However, a part of Kafka called Kafka Streams, a stream processing framework and a competitor to other streaming solutions, is currently not rack-aware. Articles This section is about inspiration.

Data Engineering

Data Engineering Data Engineer Engineering Big Data Tools

Managing The DoorDash Data Platform

Data Engineering Podcast

MARCH 15, 2021

In this episode the head of data platform for DoorDash, Sudhir Tonse, discusses the technologies that they are using, the approach that they take to adding new systems, and how they think about priorities for what to support for the whole company vs what to leave as a specialized concern for a single team.

Management

Management Data Warehouse PostgreSQL Kafka

A Candid Exploration Of Timeseries Data Analysis With InfluxDB

Data Engineering Podcast

JUNE 28, 2021

With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. What are you optimizing for on the consistency vs. availability spectrum of CAP? What are you optimizing for on the consistency vs. availability spectrum of CAP?

Data Analysis

Data Analysis Scala Data Warehouse Kafka

Realtime Data Applications Made Easier With Meroxa

Data Engineering Podcast

APRIL 23, 2023

How does this differ when talking about internal vs. consumer/end-user facing applications? How does this differ when talking about internal vs. consumer/end-user facing applications? What are some of the technical controls that are available for organizations that are risk-averse?

Data Lake

Data Lake Kafka Machine Learning Data Warehouse

A quick tour of data distribution technologies by David Hope

Scott Logic

NOVEMBER 14, 2023

Be aware not every solution fits exactly in these cateogories and there’s some newer options such as Pulsar that attempt to bridge traditional queues and the high volume data streaming world. This contrasts with fairly basic filtering and routing on traditional queue message brokers and none on Kafka.

Technology

Technology Kafka AWS Data

Data Engineering Digest

Change Data Capture For All Of Your Databases With Debezium

Insights And Advice On Building A Data Lake Platform From Someone Who Learned The Hard Way

Webinars

Trending Sources

Stateful, Distributed Stream Processing on Flink with Fabian Hueske - Episode 57

Webinars

Adopting Real-Time Data At Organizations Of Every Size

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

Optimize Your Machine Learning Development And Serving With The Open Source Vector Database Milvus

Data Engineering Annotated Monthly – July 2021

Data Engineering Annotated Monthly – July 2021

Managing The DoorDash Data Platform

A Candid Exploration Of Timeseries Data Analysis With InfluxDB

Realtime Data Applications Made Easier With Meroxa

A quick tour of data distribution technologies by David Hope

Stay Connected