Remove apache-kafka-pros-cons
article thumbnail

An Engineering Guide to Data Quality - A Data Contract Perspective - Part 2

Data Engineering Weekly

In the first part of this series, we talked about design patterns for data creation and the pros & cons of each system from the data contract perspective. I won’t bore you with the importance of data quality in the blog. The Fronting Kafka pattern follows a two-cluster approach. Why is Data Quality Expensive?

article thumbnail

Data Engineering Weekly #123

Data Engineering Weekly

link] Uber: Setting Uber’s Transactional Data Lake in Motion with Incremental ETL Using Apache Hudi Uber writes a comprehensive guide on running incremental ETL using Apache Hudi. The blog discusses implementing Type-2 SCD modeling and strategies to generate surrogate keys and bridge tables to handle many-to-many relationships.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

How to configure clients to connect to Apache Kafka Clusters securely – Part 2: LDAP

Cloudera

In the previous post, we talked about Kerberos authentication and explained how to configure a Kafka client to authenticate using Kerberos credentials. In this post we will look into how to configure a Kafka client to authenticate using LDAP, instead of Kerberos. We use the Kafka-console-consumer for all the examples below.

Kafka 52
article thumbnail

Machine Learning with Python, Jupyter, KSQL and TensorFlow

Confluent

The blog posts How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka and Using Apache Kafka to Drive Cutting-Edge Machine Learning describe the benefits of leveraging the Apache Kafka ® ecosystem as a central, scalable and mission-critical nervous system.

article thumbnail

How to Use KSQL Stream Processing and Real-Time Databases to Analyze Streaming Data in Kafka

Rockset

Intro In recent years, Kafka has become synonymous with “streaming,” and with features like Kafka Streams, KSQL, joins, and integrations into sinks like Elasticsearch and Druid, there are more ways than ever to build a real-time analytics application around streaming data in Kafka.

Kafka 40
article thumbnail

How AI may impact software architecture by Andrew Carr

Scott Logic

Predictions around this are very hard to make, especially taking into account how fast this field is changing, so it will be interesting to revisit this blog in a couple of years to see how things are. In the rest of this blog post, I wish to consider the viability of this and its potential impact on how software architecture is designed.

article thumbnail

Java vs Python for Data Science in 2023-What's your choice?

ProjectPro

This blog aims to answer all questions on how Java vs Python compare for data science and which should be the programming language of your choice for doing data science in 2021. Apache Spark is an open-source analytics engine that is used by data scientists for large-scale data processing.

Java 52