Remove how-kafka-streams-works-guide-to-stream-processing
article thumbnail

Data News — Week 23.08

Christophe Blefari

This is something I struggle with, I really like writing, I really like this newsletter, I really like the blog, but it takes me one day per week to be done. If I want to continue for years I have to find a way to make it sustainable for me, and also if I want to continue more in this direction I have to find a model that works.

Kafka 130
article thumbnail

Data Engineering Weekly #157

Data Engineering Weekly

Joe went on to define the data modeling as follows: A data model is a structured representation that organizes and standardizes data to enable and guide human and machine behavior, inform decision-making, and facilitate actions. The user journey, sales process, marketing campaign, everything falls under a state machine.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Streams Replication Manager Prefixless Replication

Cloudera

It is also important to have multiple options (like normal and prefixless replication) to do the replication process, since every solution has its own advantages. It is also important to have multiple options (like normal and prefixless replication) to do the replication process, since every solution has its own advantages.

article thumbnail

Building Real-time Machine Learning Foundations at Lyft

Lyft Engineering

However, streaming data was not supported as a first-class citizen across many of the platform’s systems — such as training, complex monitoring, and others. While several teams were using streaming data in their Machine Learning (ML) workflows, doing so was a laborious process, sometimes requiring weeks or months of engineering effort.

article thumbnail

Data Engineering Weekly #155

Data Engineering Weekly

link] Dan Luu: How bad are search results? A thorough quickstart guide, created in partnership with Snowflake, is available, complete with a sample dataset so you can test-drive the tool. link] Grab: Kafka on Kubernetes: Reloaded for fault tolerance. Visit rudderstack.com to learn more.

article thumbnail

The Importance of Distributed Tracing for Apache-Kafka-Based Applications

Confluent

Apache-Kafka ® -based applications stand out for their ability to decouple producers and consumers using an event log as an intermediate layer. Distributed tracing has been key for helping us create a clear understanding of how applications are related to each other. Distributed tracing with Zipkin. Let’s imagine a “Hello, World!”

Kafka 111
article thumbnail

Addressing the Challenges of Sample Ratio Mismatch in A/B Testing

DoorDash Engineering

For example, if two reasonably sized groups are expected to be split 50/50, but instead show a 55/45 split, the assignment process likely is compromised. The term itself conjures a sense of rigor, validity, and trust. Yet as powerful as experimentation is, its integrity can be compromised by overlooked details and unforeseen challenges.