Remove apache-flink-stream-processing-use-cases-with-examples
article thumbnail

Streaming Ingestion for Apache Iceberg With Cloudera Stream Processing

Cloudera

Recently, we announced enhanced multi-function analytics support in Cloudera Data Platform (CDP) with Apache Iceberg. It allows multiple data processing engines, such as Flink, NiFi, Spark, Hive, and Impala to access and analyze data in simple, familiar SQL tables. Currently, Iceberg support in CSP is in technical preview mode.

Process 112
article thumbnail

Druid Deprecation and ClickHouse Adoption at Lyft

Lyft Engineering

Introduction At Lyft, we have used systems like Apache ClickHouse and Apache Druid for near real-time and sub-second analytics. This is crucial for use cases like market signaling and forecasting which benefit from, and depend upon, the most up-to-date information. An example of how we use Druid rollup at Lyft.

Kafka 104
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Running Unified PubSub Client in Production at Pinterest

Pinterest Engineering

A central component of data ingestion infrastructure at Pinterest is our PubSub stack, and the Logging Platform team currently runs deployments of Apache Kafka and MemQ. years since our previous blog post, PSC has been battle-tested at large scale in Pinterest with notably positive feedback and results.

Kafka 98
article thumbnail

Building Real-time Machine Learning Foundations at Lyft

Lyft Engineering

However, streaming data was not supported as a first-class citizen across many of the platform’s systems — such as training, complex monitoring, and others. While several teams were using streaming data in their Machine Learning (ML) workflows, doing so was a laborious process, sometimes requiring weeks or months of engineering effort.

article thumbnail

Fraud Detection With Cloudera Stream Processing Part 2: Real-Time Streaming Analytics

Cloudera

In part 1 of this blog we discussed how Cloudera DataFlow for the Public Cloud (CDF-PC), the universal data distribution service powered by Apache NiFi, can make it easy to acquire data from wherever it originates and move it efficiently to make it available to other applications in a streaming fashion. Use case recap.

Process 85
article thumbnail

Data Engineering Weekly #167

Data Engineering Weekly

With the 1-bit LLM model, the researchers are suggesting instead of FP16 (Full Precision floating-point number with 5-bits) or FP32 (Full Precision floating-point number with 6-bits), you can build an equally efficient model using ternary digit set ∈ {-1, 0, 1}. Github shares some insights on how Github engineers use Github Copilot.

article thumbnail

Lessons from debugging a tricky direct memory leak

Pinterest Engineering

Sanchay Javeria | Software Engineer, Ads Data Infrastructure To support metrics reporting for ads from external advertisers and real-time ad budget calculations at Pinterest, we run streaming pipelines using Apache Flink. This was intentionally generous to buy us enough time to fix the issue.