article thumbnail

Unlocking Real-Time Mainframe Data Replication with the Precisely Data Integrity Suite and Confluent Data Streams

Precisely

Used by more than 75% of the Fortune 500, Apache Kafka has emerged as a powerful open source data streaming platform to meet these challenges. But harnessing and integrating Kafka’s full potential into enterprise environments can be complex. This is where Confluent steps in.

article thumbnail

Data Engineering Annotated Monthly – May 2022

Big Data Tools

DataHub 0.8.36 – Metadata management is a big and complicated topic. DataHub is a completely independent product by LinkedIn, and the folks there definitely know what metadata is and how important it is. If you haven’t found your perfect metadata management system just yet, maybe it’s time to try DataHub!

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Data Engineering Annotated Monthly – May 2022

Big Data Tools

DataHub 0.8.36 – Metadata management is a big and complicated topic. DataHub is a completely independent product by LinkedIn, and the folks there definitely know what metadata is and how important it is. If you haven’t found your perfect metadata management system just yet, maybe it’s time to try DataHub!

article thumbnail

97 things every data engineer should know

Grouparoo

This provided a nice overview of the breadth of topics that are relevant to data engineering including data warehouses/lakes, pipelines, metadata, security, compliance, quality, and working with other teams. For example, grouping the ones about metadata, discoverability, and column naming might have made a lot of sense.

article thumbnail

Kafka Connect Deep Dive – Error Handling and Dead Letter Queues

Confluent

Kafka Connect is part of Apache Kafka ® and is a powerful framework for building streaming pipelines between Kafka and other technologies. Since Apache Kafka 2.0, This is the default behavior of Kafka Connect, and it can be set explicitly with the following: errors.tolerance = none. jq -c -M '[.name,tasks[].state]'

Kafka 111
article thumbnail

Kafka Listeners – Explained

Confluent

Put another way, courtesy of Spencer Ruport: LISTENERS are what interfaces Kafka binds to. Apache Kafka ® is a distributed system. When a client (producer/consumer) starts, it will request metadata about which broker is the leader for a partition—and it can do this from any broker. Is anyone listening? Brokers in the cloud (e.g.,

Kafka 100
article thumbnail

Kafka to Delta Lake, as fast as possible

Scribd Technology

Streaming data from Apache Kafka into Delta Lake is an integral part of Scribd’s data platform, but has been challenging to manage and scale. We use Spark Structured Streaming jobs to read data from Kafka topics and write that data into Delta Lake tables. To serve this need, we created kafka-delta-ingest.

Kafka 52