Remove kafka-vs-pulsar
article thumbnail

Change Data Capture For All Of Your Databases With Debezium

Data Engineering Podcast

How has the tight coupling with Kafka impacted the direction and capabilities of Debezium? Pulsar, Bookkeeper, Pravega)? What are some of the design tensions that exist in the Debezium community between acting as a simple pipe vs. adding functionality for interpreting/aggregating/formatting the information contained in the changesets?

Database 100
article thumbnail

Insights And Advice On Building A Data Lake Platform From Someone Who Learned The Hard Way

Data Engineering Podcast

With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. How do you approach the build vs. buy problem and quantify the tradeoffs? How do you approach the build vs. buy problem and quantify the tradeoffs?

Data Lake 100
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Stateful, Distributed Stream Processing on Flink with Fabian Hueske - Episode 57

Data Engineering Podcast

How does Flink compare to other streaming engines such as Spark, Kafka, Pulsar, and Storm? What are the comparative challenges of working with bounded vs unbounded streams of data? How does Flink compare to other streaming engines such as Spark, Kafka, Pulsar, and Storm? How is Flink architected?

Process 100
article thumbnail

Adopting Real-Time Data At Organizations Of Every Size

Data Engineering Podcast

types of organizations/teams who are adopting real-time consumers of real-time data locations in data/application stacks where real-time needs to be integrated challenges (technical/infrastructure/talent) involved in adopting/supporting streaming/real-time lessons learned working with early customers that influenced design/implementation of Materialize (..)

Data Lake 100
article thumbnail

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

Data Engineering Podcast

With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform.

Data Lake 130
article thumbnail

Optimize Your Machine Learning Development And Serving With The Open Source Vector Database Milvus

Data Engineering Podcast

an image vs. an audio file, etc.) an image vs. an audio file, etc.) What are some of the characteristics of vector embeddings that might make them immune or susceptible to confusion of similarity across different source data types that share some implicit relationship due to specifics of their vectorized representation?

article thumbnail

Data Engineering Annotated Monthly – July 2021

Big Data Tools

Rack-aware Kafka streams – Kafka has already been rack-aware for a while, which gives its users more confidence. However, a part of Kafka called Kafka Streams, a stream processing framework and a competitor to other streaming solutions, is currently not rack-aware. Articles This section is about inspiration.