Remove apache-spark-streaming
article thumbnail

What's new in Apache Spark 3.5.0 - Structured Streaming

Waitingforcode

It's time to start the series covering Apache Spark 3.5.0 As the first topic I'm going to cover Structured Streaming which has got a lot of RocksDB improvements and some major API changes.

IT 130
article thumbnail

Multiple queries running in Apache Spark Structured Streaming

Waitingforcode

That's often a dilemma, whether we should put multiple sinks working on the same data source in the same or in different Apache Spark Structured Streaming applications? Both solutions may be valid depending on your use case but let's focus here on the former one including multiple sinks together.

Data 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Performance Improvements for Stateful Pipelines in Apache Spark Structured Streaming

databricks

Introduction Apache Sparkâ„¢ Structured Streaming is a popular open-source stream processing platform that provides scalability and fault tolerance, built on top of the S.

Process 104
article thumbnail

What's new in Apache Spark 3.4.0 - Async progress tracking for Structured Streaming

Waitingforcode

Finally, the time has come to start the analysis of the new features in Apache Spark. The first of them that grabbed my attention was the Async progress tracking from Structured Streaming.

130
130
article thumbnail

Latency goes subsecond in Apache Spark Structured Streaming

databricks

Apache Spark Structured Streaming is the leading open source stream processing platform. It is also the core technology that powers streaming on the.

article thumbnail

Watermark and input data filtering in Apache Spark Structured Streaming

Waitingforcode

I've already written about watermarks in a few places in the blog but despite that, I still find things to refresh. One of them is the watermark used to filter out the late data, which will be the topic of this blog post.

Data 130
article thumbnail

_spark_metadata in Apache Spark Structured Streaming issue is no more!

Waitingforcode

There are probably not that many people working today on the flat files with Structured Streaming than 5 years ago thanks to the table file formats. However, if you are in this group and are still generating CSVs or JSONs with the streaming sink, brace yourself, the memory problems are coming if you don't take action!

130
130