Blog - Data Engineering Digest

Data News — Week 23.08

Christophe Blefari

FEBRUARY 24, 2023

This is something I struggle with, I really like writing, I really like this newsletter, I really like the blog, but it takes me one day per week to be done. If I want to continue for years I have to find a way to make it sustainable for me, and also if I want to continue more in this direction I have to find a model that works.

Kafka

Kafka Data Lake Data Storage Data

Data Engineering Weekly #157

Data Engineering Weekly

FEBRUARY 4, 2024

Joe went on to define the data modeling as follows: A data model is a structured representation that organizes and standardizes data to enable and guide human and machine behavior, inform decision-making, and facilitate actions. The user journey, sales process, marketing campaign, everything falls under a state machine.

Data Engineering

Data Engineering Data Engineer Engineering PostgreSQL

Streams Replication Manager Prefixless Replication

Cloudera

JANUARY 31, 2024

It is also important to have multiple options (like normal and prefixless replication) to do the replication process, since every solution has its own advantages. It is also important to have multiple options (like normal and prefixless replication) to do the replication process, since every solution has its own advantages.

Management

Management Kafka Big Data Cloud

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Building Real-time Machine Learning Foundations at Lyft

Lyft Engineering

JUNE 28, 2023

However, streaming data was not supported as a first-class citizen across many of the platform’s systems — such as training, complex monitoring, and others. While several teams were using streaming data in their Machine Learning (ML) workflows, doing so was a laborious process, sometimes requiring weeks or months of engineering effort.

Machine Learning

Machine Learning Building Metadata Kafka

Data Engineering Weekly #155

Data Engineering Weekly

JANUARY 21, 2024

link] Dan Luu: How bad are search results? A thorough quickstart guide, created in partnership with Snowflake, is available, complete with a sample dataset so you can test-drive the tool. link] Grab: Kafka on Kubernetes: Reloaded for fault tolerance. Visit rudderstack.com to learn more.

Data Engineering

Data Engineering Data Engineer Engineering Kafka

The Importance of Distributed Tracing for Apache-Kafka-Based Applications

Confluent

MARCH 26, 2019

Apache-Kafka ® -based applications stand out for their ability to decouple producers and consumers using an event log as an intermediate layer. Distributed tracing has been key for helping us create a clear understanding of how applications are related to each other. Distributed tracing with Zipkin. Let’s imagine a “Hello, World!”

Kafka

Kafka Transportation Metadata Consulting

Addressing the Challenges of Sample Ratio Mismatch in A/B Testing

DoorDash Engineering

OCTOBER 17, 2023

For example, if two reasonably sized groups are expected to be split 50/50, but instead show a 55/45 split, the assignment process likely is compromised. The term itself conjures a sense of rigor, validity, and trust. Yet as powerful as experimentation is, its integrity can be compromised by overlooked details and unforeseen challenges.

Education

Education Kafka Algorithm Data Warehouse

Top 20+ Big Data Certifications and Courses in 2023

Knowledge Hut

SEPTEMBER 6, 2023

This influx of data is handled by robust big data systems which are capable of processing, storing, and querying data at scale. It would immensely help people who are working with big data technologies, want to switch into big data technologies, and even other software professionals in terms of technological-awareness.

Big Data

Big Data Certification Hadoop Scala

Data Engineering Annotated Monthly – April 2022

Big Data Tools

MAY 19, 2022

Hi, I’m Pasha Finkelshteyn , and I’ll be your guide through this month’s news. RocketMQ Streams 1.0.1 RocketMQ Streams 1.0.1 Virtually every technology seems to be adding some kind of streaming API these days. Kafka was the first, and soon enough, everybody was trying to grab their own share of the market.

Data Engineering

Data Engineering Data Engineer Engineering Big Data Tools

Data Engineering Annotated Monthly – April 2022

Big Data Tools

MAY 19, 2022

Hi, I’m Pasha Finkelshteyn , and I’ll be your guide through this month’s news. RocketMQ Streams 1.0.1 RocketMQ Streams 1.0.1 Virtually every technology seems to be adding some kind of streaming API these days. Kafka was the first, and soon enough, everybody was trying to grab their own share of the market.

Data Engineering

Data Engineering Data Engineer Engineering Big Data Tools

Data Engineering Annotated Monthly – June 2022

Big Data Tools

JULY 13, 2022

Hi, I’m Pasha Finkelshteyn , and I’ll be your guide today through this month’s news. The process of returning to active maintenance is not even described in the docs. How is it possible to support distributed transactions and solve the other complex problems of distributed systems? However, a miracle happened!

Data Engineering

Data Engineering Data Engineer Engineering Kafka

Data Engineering Annotated Monthly – June 2022

Big Data Tools

JULY 13, 2022

Hi, I’m Pasha Finkelshteyn , and I’ll be your guide today through this month’s news. The process of returning to active maintenance is not even described in the docs. How is it possible to support distributed transactions and solve the other complex problems of distributed systems? However, a miracle happened!

Data Engineering

Data Engineering Data Engineer Engineering Kafka

How to Connect KSQL to Confluent Cloud using Kubernetes with Helm

Confluent

JUNE 12, 2019

Confluent Cloud, a fully managed event cloud-native streaming service that extends the value of Apache Kafka ® , is simple, resilient, secure, and performant, allowing you to focus on what is important—building contextual event-driven applications, not infrastructure. and Helm/Tiller 2.8.2+ Click on the “Clients” option.

Cloud

Cloud Kafka Healthcare Software Engineer

Top 8 Data Engineering Books [Beginners to Advanced]

Knowledge Hut

JUNE 30, 2023

Whether you're a beginner looking to dive into the foundations or an experienced practitioner seeking advanced techniques, the right books can be your guiding light. Books on data engineering serve as essential resources to guide you through the vast terrain of data engineering. What is Data Engineering?

Data Engineering

Data Engineering Data Engineer Engineering Data Warehouse

Data Engineering Annotated Monthly – September 2022

Big Data Tools

OCTOBER 10, 2022

I’m Pasha Finkelshteyn , and I’ll be your guide through this month’s news. This time I learned about Brooklin, a LinkedIn service for streaming data in a heterogeneous environment. It’s been a very bustling two months in Berlin. Indeed, it’s been so busy that I had to skip the digests. Greetings from sunny Berlin!

Data Engineering

Data Engineering Data Engineer Engineering Kafka

Data Engineering Annotated Monthly – September 2022

Big Data Tools

OCTOBER 10, 2022

I’m Pasha Finkelshteyn , and I’ll be your guide through this month’s news. This time I learned about Brooklin, a LinkedIn service for streaming data in a heterogeneous environment. It’s been a very bustling two months in Berlin. Indeed, it’s been so busy that I had to skip the digests. Greetings from sunny Berlin!

Data Engineering

Data Engineering Data Engineer Engineering Kafka

Data Pipeline- Definition, Architecture, Examples, and Use Cases

ProjectPro

DECEMBER 7, 2021

Data pipelines are a significant part of the big data domain, and every professional working or willing to work in this field must have extensive knowledge of them. Features of a Data Pipeline Data Pipeline Architecture How to Build an End-to-End Data Pipeline from Scratch? Table of Contents What is a Data Pipeline?

Data Pipeline

Data Pipeline Architecture Kafka AWS

The Top 25 Data Engineering Influencers and Content Creators on LinkedIn

Databand.ai

DECEMBER 13, 2022

Follow Joseph on LinkedIn 2) Charles Mendelson Associate Data Engineer at PitchBook Data Charles is a skilled data engineer focused on telling stories with data and building tools to empower others to do the same, all in the pursuit of guiding a variety of audiences and stakeholders to make meaningful decisions.

Data Engineering

Data Engineering Data Engineer Engineering AWS

Azure Data Engineer Certification Path (DP-203): 2023 Roadmap

Knowledge Hut

SEPTEMBER 26, 2023

Data engineers work on the data to organize and make it usable with the aid of cloud services. Overall, because we are in charge of making sure that data is accurately gathered, saved, and made accessible for analysis and reporting, Azure Data Engineers play a critical role in organizations that depend on data to guide business choices.

Certification

Certification Data Engineering Data Engineer Engineering

The Kafka Connect Plugin for Rockset and How It Works

Rockset

AUGUST 21, 2019

Rockset continuously ingests data streams from Kafka, without the need for a fixed schema, and serves fast SQL queries on that data. We created the Kafka Connect Plugin for Rockset to export data from Kafka and send it to a collection of documents in Rockset. This blog covers how we implemented the plugin.

Kafka

Kafka IT Data Storage Relational Database

Data Science Course Fees, Eligibility & Duration

Knowledge Hut

JANUARY 22, 2024

As you stand on the precipice of this exhilarating venture, one question looms large—how does one embark on the path to data mastery? This makes it easier to balance learning with work and other commitments. Welcome to the world of data science. The reputation of the faculty and alumni network also matters.

Data Science

Data Science Certification Education Data Lake

Data Engineering Annotated Monthly – September 2021

Big Data Tools

OCTOBER 5, 2021

I’m Pasha Finkelshteyn , and I’ll be your guide through this month’s news. Kafka 3.0.0 – The Apache Software Foundation needed less than one month to go from Kafka version 3.0.0-rc0 Kafka 3.0.0 – The Apache Software Foundation needed less than one month to go from Kafka version 3.0.0-rc0 Burton the same person?

Data Engineering

Data Engineering Data Engineer Engineering Big Data Tools

Data Engineering Annotated Monthly – September 2021

Big Data Tools

OCTOBER 5, 2021

I’m Pasha Finkelshteyn , and I’ll be your guide through this month’s news. Kafka 3.0.0 – The Apache Software Foundation needed less than one month to go from Kafka version 3.0.0-rc0 Kafka 3.0.0 – The Apache Software Foundation needed less than one month to go from Kafka version 3.0.0-rc0 Burton the same person?

Data Engineering

Data Engineering Data Engineer Engineering Big Data Tools

Real-Time Analytics on Connected Car IoT Data Streams from Apache Kafka

Rockset

FEBRUARY 7, 2020

In this IoT example, we examine how to enable complex analytic queries on real-time Kafka streams from connected car sensors. Building Real-Time Analytics on Connected Car IoT Data For our example, we have a fleet of connected vehicles that send the sensor data they generate to a Kafka cluster.

Kafka

Kafka Transportation Data SQL

Forge Your Career Path with Best Data Engineering Certifications

ProjectPro

FEBRUARY 21, 2023

Whether you are just starting your career as a Data Engineer or looking to take the next step, this blog will walk you through the most valuable data engineering certifications and help you make an informed decision about which one to pursue. Certifications can be a successful alternative for work experience for beginner-level data engineers.

Certification

Certification Data Engineering Data Engineer Engineering

7 Lessons From GoCardless’ Implementation of Data Contracts

Monte Carlo

JULY 7, 2022

You can read more about Convoy’s approach from our blog with their Head of Product, Data Platform, Chad Sanderson, “ The modern data warehouse is broken.” This post will answer that question and how asking it led to the implementation of data contracts at our organization. Image courtesy of Andrew Jones.

Data Warehouse

Data Warehouse Software Engineer Software Engineering Data

Using Streams Replication Manager Prefixless Replication for Kafka Topic Aggregation

Cloudera

FEBRUARY 28, 2024

Businesses often need to aggregate topics because it is essential for organizing, simplifying, and optimizing the processing of streaming data. It enables efficient analysis, facilitates modular development, and enhances the overall effectiveness of streaming applications. If not, you can check out this related blog post.

Kafka

Kafka Management Big Data Architecture

Stream Rows and Kafka Topics Directly into Snowflake with Snowpipe Streaming

Snowflake

MARCH 2, 2023

That proves to be a difficult task for data engineering teams that have to manage separate infrastructure for batch data and streaming data. To address this challenge, we are happy to announce the public preview of Snowpipe Streaming as the latest addition to our Snowflake ingestion offerings.

Kafka

Kafka Data Ingestion Data Pipeline Cloud Storage

Improving Recruiting Efficiency with a Hybrid Bulk Data Processing Framework

LinkedIn Engineering

JANUARY 19, 2024

Similarly, when a recruiter transitions to their next opportunity, internally or externally, all their work is transferred to another recruiter. This multi-entity handover process involves huge amounts of data updating and cloning. Any disruptions in the transfer blocks the recruiter from carrying out the day-to-day recruiting process.

Recruitment

Recruitment Data Process Process Kafka

Top 30 Machine Learning Skills for ML Engineer in 2024

Knowledge Hut

JANUARY 16, 2024

However, transitioning from being interested to working in the field requires more than just accumulating theoretical knowledge. However, transitioning from being interested to working in the field requires more than just accumulating theoretical knowledge. What Is Machine Learning?

Machine Learning

Machine Learning Engineering Programming Language Algorithm

The Evolution of Enforcing our Professional Community Policies at Scale

LinkedIn Engineering

JANUARY 16, 2024

LinkedIn is always working hard to make sure that its platform is a safe and trusted place for its members. In a previous blog post, we talked about how we built our anti-abuse platform using CASAL. In a previous blog post, we talked about how we built our anti-abuse platform using CASAL.

Kafka

Kafka Relational Database Java Architecture

Making Sense of Real-Time Analytics on Streaming Data, Part 1: The Landscape

Rockset

FEBRUARY 24, 2023

Introduction Let’s get this out of the way at the beginning: understanding effective streaming data architectures is hard, and understanding how to make use of streaming data for analytics is really hard. Kafka or Kinesis ? Stream processing or an OLAP database? What Is Streaming Data?

Kafka

Kafka AWS Amazon Web Services Programming Language

Unified Streaming And Batch Pipelines At LinkedIn: Reducing Processing time by 94% with Apache Beam

LinkedIn Engineering

MARCH 23, 2023

Co-Authors: Yuhong Cheng , Shangjin Zhang , Xinyu Liu, and Yi Pan Efficient data processing is crucial in reducing learning curves, simplifying maintenance efforts, and decreasing operational complexity. By unifying these pipelines, we have saved 94% of processing time. Samza , Spark and Apache Flink ).

Process

Process Lambda Architecture Kafka Datasets

Spring for Apache Kafka Deep Dive – Part 3: Apache Kafka and Spring Cloud Data Flow

Confluent

MAY 30, 2019

Following part 1 and part 2 of the Spring for Apache Kafka Deep Dive blog series, here in part 3 we will discuss another project from the Spring team: Spring Cloud Data Flow , which focuses on enabling developers to easily develop, deploy, and orchestrate event streaming pipelines based on Apache Kafka ®.

Kafka

Kafka Cloud Data Pipeline PostgreSQL

The Good and the Bad of Apache Kafka Streaming Platform

AltexSoft

OCTOBER 21, 2022

Kafka can continue the list of brand names that became generic terms for the entire type of technology. Similar to Google in web browsing and Photoshop in image processing, it became a gold standard in data streaming, preferred by 70 percent of Fortune 500 companies. What is Kafka? What Kafka is used for.

Kafka

Kafka Hadoop ETL Tools Big Data

How to Become a Data Engineer in 2024?

Knowledge Hut

DECEMBER 26, 2023

Data Engineering is typically a software engineering role that focuses deeply on data – namely, data workflows, data pipelines, and the ETL (Extract, Transform, Load) process. How does one become a Data engineer, and what skills are required? However, several questions may arise for an individual. What is Data Science? And many more.

Data Engineering

Data Engineering Data Engineer Engineering Pipeline-centric

Schemas, Contracts, and Compatibility

Confluent

MAY 21, 2019

This leads us to event streaming microservices patterns. Not only that, but it could also be received by a risk evaluation service, by Kafka Connect that will write the update to the profile database and perhaps by a real-time event streaming application that updates a dashboard showing the number of customers in each sales region.

Kafka

Kafka Insurance Architecture Database

20+ Data Engineering Projects for Beginners with Source Code

ProjectPro

AUGUST 24, 2021

Data professionals who work with raw data like data engineers, data analysts, machine learning scientists , and machine learning engineers also play a crucial role in any data science project. And, out of these professions, this blog will discuss the data engineering job role. Nevertheless, that is not the only job in the data world.

Data Engineering

Data Engineering Data Engineer Coding Project

Data Engineering Weekly #111

Data Engineering Weekly

DECEMBER 11, 2022

Maxime Beauchemin wrote an influential article, Functional Data Engineering — a modern paradigm for batch data processing. Maxime Beauchemin wrote an influential article, Functional Data Engineering — a modern paradigm for batch data processing. Sign up free to test out the tool today. Should I write another prediction?

Data Engineering

Data Engineering Data Engineer Engineering Kafka

Streamline Data Pipelines: How to Use WhyLogs with PySpark for Data Profiling and Validation

Towards Data Science

JANUARY 7, 2024

Streamline Data Pipelines: How to Use WhyLogs with PySpark for Effective Data Profiling and Validation Photo by Evan Dennis on Unsplash Data pipelines, made by data engineers or machine learning engineers, do more than just prepare data for reports or training models. It’s crucial to not only process the data but also ensure its quality.

Data Pipeline

Data Pipeline Hospitality Data Validation Datasets

Azure Data Engineer Resume

Edureka

FEBRUARY 9, 2023

Azure Data Engineering is a rapidly growing field that involves designing, building, and maintaining data processing systems using Microsoft Azure technologies. As a certified Azure Data Engineer, you have the skills and expertise to design, implement and manage complex data storage and processing solutions on the Azure cloud platform.

Data Engineering

Data Engineering Data Engineer Engineering Amazon Web Services

Evolution of Streaming Pipelines in Lyft’s Marketplace

Lyft Engineering

SEPTEMBER 27, 2022

The journey of evolving our streaming platform and pipeline to better scale and support new use cases at Lyft. MVP After much deliberation, we decided that streaming engines would be a better fit for our requirements and selected Apache Beam. Decrease development time and increase product iteration speed.

Kafka

Kafka Aggregated Data Machine Learning Architecture

How to Become an Azure Data Engineer in 2023?

ProjectPro

JANUARY 19, 2022

Read this blog till the end to learn more about the roles and responsibilities, necessary skillsets, average salaries, and various important certifications that will help you build a successful career as an Azure Data Engineer. Planning to land a successful job as an Azure Data Engineer? Table of Contents Who is an Azure Data Engineer?

Data Engineering

Data Engineering Data Engineer Engineering Scala

The Good and the Bad of Apache Airflow Pipeline Orchestration

AltexSoft

NOVEMBER 7, 2022

How data engineering works. The tool represents processes in the form of directed acyclic graphs which visualize casual relationships between tasks and the order of their execution. Other tech professionals working with the tool are solution architects , software developers, DevOps specialists, and data scientists.

PostgreSQL

PostgreSQL Metadata Python MySQL

Data News — Week 23.08

Data Engineering Weekly #157

Webinars

Trending Sources

Streams Replication Manager Prefixless Replication

Webinars

Building Real-time Machine Learning Foundations at Lyft

Data Engineering Weekly #155

The Importance of Distributed Tracing for Apache-Kafka-Based Applications

Addressing the Challenges of Sample Ratio Mismatch in A/B Testing

Top 20+ Big Data Certifications and Courses in 2023

Data Engineering Annotated Monthly – April 2022

Data Engineering Annotated Monthly – April 2022

Data Engineering Annotated Monthly – June 2022

Data Engineering Annotated Monthly – June 2022

How to Connect KSQL to Confluent Cloud using Kubernetes with Helm

Top 8 Data Engineering Books [Beginners to Advanced]

Data Engineering Annotated Monthly – September 2022

Data Engineering Annotated Monthly – September 2022

Data Pipeline- Definition, Architecture, Examples, and Use Cases

The Top 25 Data Engineering Influencers and Content Creators on LinkedIn

Azure Data Engineer Certification Path (DP-203): 2023 Roadmap

The Kafka Connect Plugin for Rockset and How It Works

Data Science Course Fees, Eligibility & Duration

Data Engineering Annotated Monthly – September 2021

Data Engineering Annotated Monthly – September 2021

Real-Time Analytics on Connected Car IoT Data Streams from Apache Kafka

Forge Your Career Path with Best Data Engineering Certifications

7 Lessons From GoCardless’ Implementation of Data Contracts

Using Streams Replication Manager Prefixless Replication for Kafka Topic Aggregation

Stream Rows and Kafka Topics Directly into Snowflake with Snowpipe Streaming

Improving Recruiting Efficiency with a Hybrid Bulk Data Processing Framework

Top 30 Machine Learning Skills for ML Engineer in 2024

The Evolution of Enforcing our Professional Community Policies at Scale

Making Sense of Real-Time Analytics on Streaming Data, Part 1: The Landscape

Unified Streaming And Batch Pipelines At LinkedIn: Reducing Processing time by 94% with Apache Beam

Spring for Apache Kafka Deep Dive – Part 3: Apache Kafka and Spring Cloud Data Flow

The Good and the Bad of Apache Kafka Streaming Platform

How to Become a Data Engineer in 2024?

Schemas, Contracts, and Compatibility

20+ Data Engineering Projects for Beginners with Source Code

Data Engineering Weekly #111

Streamline Data Pipelines: How to Use WhyLogs with PySpark for Data Profiling and Validation

Azure Data Engineer Resume

Evolution of Streaming Pipelines in Lyft’s Marketplace

How to Become an Azure Data Engineer in 2023?

The Good and the Bad of Apache Airflow Pipeline Orchestration

Stay Connected