Remove Blog Remove Bytes Remove Kafka Remove Metadata
article thumbnail

Improving Efficiency Of Goku Time Series Database at Pinterest (Part?—?1)

Pinterest Engineering

In the first blog, we will share a short summary on the GokuS and GokuL architecture, data format for Goku Long Term, and how we improved the bootstrap time for our storage and serving components. This is because the synthetic data points would be present in the retry kafka waiting to be pushed into the recovering host by the retry ingestor.

article thumbnail

Kafka Listeners – Explained

Confluent

Put another way, courtesy of Spencer Ruport: LISTENERS are what interfaces Kafka binds to. Apache Kafka ® is a distributed system. When a client (producer/consumer) starts, it will request metadata about which broker is the leader for a partition—and it can do this from any broker. Is anyone listening? Brokers in the cloud (e.g.,

Kafka 100
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Kafka to Delta Lake, as fast as possible

Scribd Technology

Streaming data from Apache Kafka into Delta Lake is an integral part of Scribd’s data platform, but has been challenging to manage and scale. We use Spark Structured Streaming jobs to read data from Kafka topics and write that data into Delta Lake tables. To serve this need, we created kafka-delta-ingest.

Kafka 52
article thumbnail

Optimizing Kafka Streams Applications

Confluent

With the release of Apache Kafka ® 2.1.0, Kafka Streams introduced the processor topology optimization framework at the Kafka Streams DSL layer. In what follows, we provide some context around how a processor topology was generated inside Kafka Streams before 2.1, Kafka Streams topology generation 101.

Kafka 90
article thumbnail

Deploying Kafka Streams and KSQL with Gradle – Part 2: Managing KSQL Implementations

Confluent

In part 1 , we discussed an event streaming architecture that we implemented for a customer using Apache Kafka ® , KSQL from Confluent, and Kafka Streams. In part 3, we’ll explore using Gradle to build and deploy KSQL user-defined functions (UDFs) and Kafka Streams microservices. Sample repository. gradlew composeUp.

Kafka 95
article thumbnail

Data Engineering Annotated Monthly – May 2022

Big Data Tools

DataHub 0.8.36 – Metadata management is a big and complicated topic. DataHub is a completely independent product by LinkedIn, and the folks there definitely know what metadata is and how important it is. If you haven’t found your perfect metadata management system just yet, maybe it’s time to try DataHub!

article thumbnail

Data Engineering Annotated Monthly – May 2022

Big Data Tools

DataHub 0.8.36 – Metadata management is a big and complicated topic. DataHub is a completely independent product by LinkedIn, and the folks there definitely know what metadata is and how important it is. If you haven’t found your perfect metadata management system just yet, maybe it’s time to try DataHub!