Blog - Data Engineering Digest

Getting Started with Apache Kafka in Python

Confluent

DECEMBER 8, 2021

Welcome Pythonistas to the streaming data world centered around Apache Kafka®! If you’re using Python and ready to get hands-on with Kafka, then you’re in the right place. This blog […].

Kafka

Kafka Python Data

Brief History of Data Engineering

Jesse Anderson

DECEMBER 12, 2022

Doug Cutting took those papers and created Apache Hadoop in 2005. Cloudera was started in 2008, and HortonWorks started in 2011. Hadoop was hard to program, and Apache Hive came along in 2010 to add SQL. Apache Pig in 2008 came too, but it didn’t ever see as much adoption. In the beginning, there was Google.

Data Engineering

Data Engineering Data Engineer Engineering Hadoop

Data News — Week 23.09

Christophe Blefari

MARCH 4, 2023

I'll try to think about it in the following weeks to understand where I go for the third year of the newsletter and the blog. I'll try to think about it in the following weeks to understand where I go for the third year of the newsletter and the blog. So thank you for that. Stay tuned and let's jump to the content.

Machine Learning

Machine Learning AWS Data Data Lake

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Data News — Week 23.11

Christophe Blefari

MARCH 17, 2023

On my side I'm slowly starting to get on top of the things I had in queue. Which means that I get easily disturbed by a notification—or even a thought—and do something that I did not plan to do at first. It, probably, explains why you always get the newsletter late on Fridays—or Saturdays.

Data

Data SQL Deep Learning Kafka

Druid Deprecation and ClickHouse Adoption at Lyft

Lyft Engineering

NOVEMBER 29, 2023

Introduction At Lyft, we have used systems like Apache ClickHouse and Apache Druid for near real-time and sub-second analytics. In this particular blog post, we explain how Druid has been used at Lyft and what led us to adopt ClickHouse for our sub-second analytic system. Written by Ritesh Varyani and Jeana Choi at Lyft.

Kafka

Kafka Data Ingestion Datasets Architecture

Machine Learning with Python, Jupyter, KSQL and TensorFlow

Confluent

FEBRUARY 6, 2019

It takes much more effort than just building an analytic model with Python and your favorite machine learning framework. After all, machine learning with Python requires the use of algorithms that allow computer programs to constantly learn, but building that infrastructure is several levels higher in complexity.

Machine Learning

Machine Learning Python Kafka Java

How to Become Databricks Certified Apache Spark Developer?

ProjectPro

FEBRUARY 21, 2023

With around 35k stars and over 26k forks on Github, Apache Spark is one of the most popular big data frameworks used by 22,760 companies worldwide. Apache Spark is the most efficient, scalable, and widely used in-memory data computation tool capable of performing batch-mode, real-time, and analytics operations.

Scala

Scala Programming Language Java Hadoop

Data News — Week 23.01

Christophe Blefari

JANUARY 7, 2023

You and me celebrating 2023 ( credits ) Happy new year 🎆 For those who were already subscribed at the start of last year I tried to put resolutions and objectives for the year that I did not succeed to follow. I started to pay myself after 1 year and half of unemployment pay. The year was so different to what I was expected.

Data

Data Data Science BI Kafka

Building A Real Time Event Data Warehouse For Sentry

Data Engineering Podcast

NOVEMBER 26, 2019

Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. Can you start by describing the internal and user-facing issues that you were facing at Sentry with the existing search capabilities? And for your machine learning workloads, they just announced dedicated CPU instances.

Data Warehouse

Data Warehouse Building PostgreSQL Kafka

New Snowflake Features Released in May–July 2023

Snowflake

AUGUST 16, 2023

Read our Summit recap blog for highlights across industries or watch Summit sessions now on-demand. Developers can now start building and testing Snowflake Native Apps in their accounts in AWS. The new Kafka connector, built with Snowpipe Streaming , now supports schema detection and evolution. If you missed out, not to worry!

Scala

Scala Transportation Kafka Data Lake

Building Real Time Applications On Streaming Data With Eventador

Data Engineering Podcast

APRIL 19, 2020

Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. Your host is Tobias Macey and today I’m interviewing Kenny Gorman about the Eventador streaming SQL platform Interview Introduction How did you get involved in the area of data management?

Building

Building PostgreSQL MongoDB SQL

Metadata Management And Integration At LinkedIn With DataHub

Data Engineering Podcast

AUGUST 24, 2020

Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. Go to dataengineeringpodcast.com/rudder to request a demo and get one free month of access to the hosted platform along with a free t-shirt. The key to those solutions is a robust and flexible metadata management system.

Metadata

Metadata Management Kafka Data Engineering

Building Real-time Machine Learning Foundations at Lyft

Lyft Engineering

JUNE 28, 2023

In this blog post, we will discuss what we built in support of that goal and some of the lessons we learned along the way. In this blog post, we will discuss what we built in support of that goal and some of the lessons we learned along the way. Written by Konstantin Gizdarski and Martin Liu at Lyft.

Machine Learning

Machine Learning Building Metadata Kafka

Top 20+ Big Data Certifications and Courses in 2023

Knowledge Hut

SEPTEMBER 6, 2023

This influx of data is handled by robust big data systems which are capable of processing, storing, and querying data at scale. Consequently, we see a huge demand for big data professionals. In today’s job market data professionals, there are ample great opportunities for skilled data professionals. What is Big Data Certification?

Big Data

Big Data Certification Hadoop Scala

Data Engineering Annotated Monthly – November 2021

Big Data Tools

DECEMBER 7, 2021

Also, if you would prefer to get this as an email, you can subscribe to the newsletter here. Apache Arrow 6.0.1 – Apache Arrow presents itself as a cross-language development platform for in-memory analytics. Of course, you probably already know that if you’re doing data engineering in Python or, for example, Go – because the 6.0

Data Engineering

Data Engineering Data Engineer Engineering Big Data Tools

Data Engineering Annotated Monthly – November 2021

Big Data Tools

DECEMBER 7, 2021

Also, if you would prefer to get this as an email, you can subscribe to the newsletter here. Apache Arrow 6.0.1 – Apache Arrow presents itself as a cross-language development platform for in-memory analytics. Of course, you probably already know that if you’re doing data engineering in Python or, for example, Go – because the 6.0

Data Engineering

Data Engineering Data Engineer Engineering Big Data Tools

Top 8 Data Engineering Books [Beginners to Advanced]

Knowledge Hut

JUNE 30, 2023

Whether you're an aspiring data engineer trying to start a lucrative career or a seasoned expert looking to improve your skills, Azure data engineer books can be beneficial. The demand for experienced data engineers continuously expands in today's data-driven environment. What is Data Engineering? Who are Data Engineers?

Data Engineering

Data Engineering Data Engineer Engineering Data Warehouse

Data Engineering Weekly #124

Data Engineering Weekly

MARCH 26, 2023

Contribute to the Rudderstack Transformations Library, Win $1000 RudderStack Transformations lets you customize event data in real time with your own JavaScript or Python code. Now you can win $1,000 cash by contributing a Transformation to our open-source library. Last year around this time, Bundling vs. Unbundling was the talk of the town.

Data Engineering

Data Engineering Data Engineer Engineering Lambda Architecture

15+ Best Data Engineering Tools to Explore in 2023

Knowledge Hut

APRIL 25, 2023

If you want to learn more about data engineering tools, get guidance from Data Engineer courses online. Strong programming skills: Data engineers should have a good grasp of programming languages like Python, Java, or Scala, which are commonly used in data engineering. While they go about it - enter big data data engineer tools.

Data Engineering

Data Engineering Data Engineer Engineering Google Cloud

20 Latest AWS Glue Interview Questions and Answers for 2023

ProjectPro

JANUARY 24, 2023

If you are preparing for your ETL developer or data engineer interview , you must possess a solid fundamental knowledge of AWS Glue, as you’re likely to get asked questions that test your ability to handle complex big data ETL tasks. The Schema Registry supports Java client apps and the Apache Avro and JSON Schema data formats.

AWS

AWS Data Lake ETL Tools Scala

Data Engineering Annotated Monthly – May 2022

Big Data Tools

JUNE 8, 2022

It’s the start of June. That means it’s time to start taking summer vacations and enjoying some fresh juice alongside your fresh news! I’ve had some experience with Apache Atlas, and even with the help of my colleagues, I wasn’t able to make it do what I wanted it to. There are several solutions. I am an old-school guy.

Data Engineering

Data Engineering Data Engineer Engineering Kafka

Data Engineering Annotated Monthly – May 2022

Big Data Tools

JUNE 8, 2022

It’s the start of June. That means it’s time to start taking summer vacations and enjoying some fresh juice alongside your fresh news! I’ve had some experience with Apache Atlas, and even with the help of my colleagues, I wasn’t able to make it do what I wanted it to. There are several solutions. I am an old-school guy.

Data Engineering

Data Engineering Data Engineer Engineering Kafka

Software Developer Salary in Singapore [2024 Market Overview]

Knowledge Hut

DECEMBER 27, 2023

So, if you have the right skills, this might be the right career option for you. Software developers are the great mastermind behind video games with amazing graphics or mobile and web applications that you use smoothly. They are highly skilled programmers who solve complex problems in websites, applications, etc. What Does Software Developer Do?

Medical

Medical Programming Language Amazon Web Services Entertainment

Data Engineering Annotated Monthly – October 2021

Big Data Tools

NOVEMBER 8, 2021

BTW, if you would prefer to get this in your email, you can subscribe to the newsletter here. Spark Release 3.2.0 – We’ll start with the big news first. Apache Spark® has been released and there are a load of changes, including ANSI SQL support, Pandas API layer over PySpark, and lots and lots of other things. Apache Ranger 2.2.0

Data Engineering

Data Engineering Data Engineer Engineering Big Data Tools

Data Engineering Annotated Monthly – October 2021

Big Data Tools

NOVEMBER 8, 2021

BTW, if you would prefer to get this in your email, you can subscribe to the newsletter here. Spark Release 3.2.0 – We’ll start with the big news first. Apache Spark® has been released and there are a load of changes, including ANSI SQL support, Pandas API layer over PySpark, and lots and lots of other things. Apache Ranger 2.2.0

Data Engineering

Data Engineering Data Engineer Engineering Big Data Tools

Java vs Python for Data Science in 2023-What's your choice?

ProjectPro

JUNE 18, 2021

Why do data scientists prefer Python over Java? Java vs Python for Data Science- Which is better? Which has a better future: Python or Java in 2021? These are the most common questions that our ProjectAdvisors get asked a lot from beginners getting started with a data science career.

Java

Java Data Science Python Programming Language

Data Architect: Role Description, Skills, Certifications and When to Hire

AltexSoft

FEBRUARY 11, 2023

To get a better understanding of a data architect’s role, let’s clear up what data architecture is. If you are not familiar with the above-mentioned concepts, we suggest you to follow the links above to learn more about each of them in our blog posts. Hiring a well-skilled data architect can be very helpful for that purpose.

Data Architect

Data Architect Certification Generalist Big Data

The Top 25 Data Engineering Influencers and Content Creators on LinkedIn

Databand.ai

DECEMBER 13, 2022

But knowing who to follow is important to getting the information you want on your home feed and not just a bunch of noise. Currently, Charles works at PitchBook Data and he holds degrees in Algorithms, Network, Computer Architecture, and Python Programming from Bradfield School of Computer Science and Bellevue College Continuing Education.

Data Engineering

Data Engineering Data Engineer Engineering AWS

Data Pipeline- Definition, Architecture, Examples, and Use Cases

ProjectPro

DECEMBER 7, 2021

This blog will give you an in-depth knowledge of what is a data pipeline and also explore other aspects such as data pipeline architecture, data pipeline tools, use cases, and so much more. As data is expanding exponentially, organizations struggle to harness digital information's power for different business use cases.

Data Pipeline

Data Pipeline Architecture Kafka AWS

Forge Your Career Path with Best Data Engineering Certifications

ProjectPro

FEBRUARY 21, 2023

Whether you are just starting your career as a Data Engineer or looking to take the next step, this blog will walk you through the most valuable data engineering certifications and help you make an informed decision about which one to pursue. AWS or Azure? Cloudera or Databricks? Don’t worry!

Certification

Certification Data Engineering Data Engineer Engineering

Top 30 Machine Learning Skills for ML Engineer in 2024

Knowledge Hut

JANUARY 16, 2024

In this comprehensive blog, we delve into the foundational aspects and intricacies of the machine learning landscape. Embarking on a journey in the highly demanded field of Machine Learning (ML) opens doors to diverse career opportunities. Look at the stats that show a positive trend for machine learning projects and careers.

Machine Learning

Machine Learning Engineering Programming Language Algorithm

Real-Time Analytics and Monitoring Dashboards with Apache Kafka and Rockset

Confluent

SEPTEMBER 26, 2019

In the early days, many companies simply used Apache Kafka ® for data ingestion into Hadoop or another data lake. However, Apache Kafka is more than just messaging. Rockset supports JDBC and integrates with other SQL dashboards like Tableau, Grafana, and Apache Superset.

Kafka

Kafka BI SQL Datasets

How to Become a Data Engineer in 2024?

Knowledge Hut

DECEMBER 26, 2023

Let us first get a clear understanding of why Data Science is important. Data Engineering is typically a software engineering role that focuses deeply on data – namely, data workflows, data pipelines, and the ETL (Extract, Transform, Load) process. What is the need for Data Science? An exploratory study of the given data set.

Data Engineering

Data Engineering Data Engineer Engineering Pipeline-centric

Improve Your LinkedIn Profile and find the right Hadoop Job!

ProjectPro

JUNE 17, 2016

With all these proven facts – it is absolutely necessary to create the perfect LinkedIn profile, in order to secure the right job to start your career in Big Data analytics. ” We hope that this blog post will solve all your queries related to crafting a winning LinkedIn profile. that are usually not present in a resume.

Hadoop

Hadoop Recruitment Big Data NoSQL

Best Data Processing Frameworks That You Must Know

Knowledge Hut

JANUARY 18, 2024

It's an exciting journey into the data world, where dealing with huge amounts of information needs special tools to get the most out of it. Get to know more about measures of dispersion through our blogs. Diving into Big Data Analytics has really been an eye-opener for me. It has changed how we look at and understand data.

Data Process

Data Process Process Hadoop Scala

Azure Data Engineer Resume

Edureka

FEBRUARY 9, 2023

This blog will guide you in creating an effective Azure Data Engineer resume that highlights your skills, experience and achievements in the field, and helps you stand out in a competitive job market. A strong resume can make you stand out from the competition and improve your chances of getting an interview.

Data Engineering

Data Engineering Data Engineer Engineering Amazon Web Services

A Beginner’s Guide to Learning PySpark for Big Data Processing

ProjectPro

JANUARY 25, 2022

Did you know that, according to Linkedin, over 24,000 Big Data jobs in the US list Apache Spark as a required skill? One of the most in-demand technical skills these days is analyzing large data sets, and Apache Spark and Python are two of the most widely used technologies to do this. This is where Apache Spark PySpark comes in.

Big Data

Big Data Data Process Process Kafka

5 Key Takeaways from #Current2023

Cloudera

OCTOBER 17, 2023

Recently, Confluent hosted Current 2023 (formerly Kafka summit) in San Jose on Sept 26th and 27th. This blog is for anyone who was interested but unable to attend the conference, or anyone interested in a quick summary of what happened there. More of a Confluent conference now than a kafka conference. Flink is here to stay.

Database-centric

Database-centric Kafka Pipeline-centric Database

The Good and the Bad of Apache Airflow Pipeline Orchestration

AltexSoft

NOVEMBER 7, 2022

But apparently, things were much more difficult before Apache Airflow appeared. Before we start, all those who are new to data engineering can watch our video explaining its general concepts. What is Apache Airflow? Source: Apache Airflow. No wonder, they represent over 54 percent of Apache Airflow active users.

PostgreSQL

PostgreSQL Metadata Python MySQL

Top 7 Data Engineering Career Opportunities in 2024

Knowledge Hut

DECEMBER 21, 2023

Data engineering involves a lot of technical skills like Python, Java, and SQL (Structured Query Language). Businesses rely significantly on data engineering to get meaningful insights from the vast amounts of data and take actionable decisions. When starting, the data engineer's role typically focuses on small initiatives.

Data Engineering

Data Engineering Data Engineer Engineering MongoDB

The Good and the Bad of Apache Kafka Streaming Platform

AltexSoft

OCTOBER 21, 2022

Kafka can continue the list of brand names that became generic terms for the entire type of technology. In this article, we’ll explain why businesses choose Kafka and what problems they face when using it. What is Kafka? How Apache Kafka streams relate to Franz Kafka’s books. What Kafka is used for.

Kafka

Kafka Hadoop ETL Tools Big Data

Stream Processing vs. Real-Time Analytics Databases

Rockset

MARCH 27, 2023

This blog will clarify some conceptual differences, provide an overview of popular tools, and offer a framework for deciding which tools are best suited for specific technical requirements. Let’s start with a quick summary of both stream processing and RTA databases. Let’s get into the details. With that, let’s dive in.

Database

Database Process Scala SQL

Making Sense of Real-Time Analytics on Streaming Data, Part 1: The Landscape

Rockset

FEBRUARY 24, 2023

Introduction Let’s get this out of the way at the beginning: understanding effective streaming data architectures is hard, and understanding how to make use of streaming data for analytics is really hard. Kafka or Kinesis ? We’re going to start with a basic question: what is streaming data? Stream processing or an OLAP database?

Kafka

Kafka AWS Amazon Web Services Programming Language

How to Automate Apache NiFi Data Flow Deployments in the Public Cloud

Cloudera

OCTOBER 22, 2021

With the latest release of Cloudera DataFlow for the Public Cloud (CDF-PC) we added new CLI capabilities that allow you to automate data flow deployments, making it easier than ever before to incorporate Apache NiFi flow deployments into your CI/CD pipelines. Understanding the data flow development lifecycle.

Cloud

Cloud Data Accessible Accessibility

Getting Started with Apache Kafka in Python

Brief History of Data Engineering

Webinars

Trending Sources

Data News — Week 23.09

Webinars

Data News — Week 23.11

Druid Deprecation and ClickHouse Adoption at Lyft

Machine Learning with Python, Jupyter, KSQL and TensorFlow

How to Become Databricks Certified Apache Spark Developer?

Data News — Week 23.01

Building A Real Time Event Data Warehouse For Sentry

New Snowflake Features Released in May–July 2023

Building Real Time Applications On Streaming Data With Eventador

Metadata Management And Integration At LinkedIn With DataHub

Building Real-time Machine Learning Foundations at Lyft

Top 20+ Big Data Certifications and Courses in 2023

Data Engineering Annotated Monthly – November 2021

Data Engineering Annotated Monthly – November 2021

Top 8 Data Engineering Books [Beginners to Advanced]

Data Engineering Weekly #124

15+ Best Data Engineering Tools to Explore in 2023

20 Latest AWS Glue Interview Questions and Answers for 2023

Data Engineering Annotated Monthly – May 2022

Data Engineering Annotated Monthly – May 2022

Software Developer Salary in Singapore [2024 Market Overview]

Data Engineering Annotated Monthly – October 2021

Data Engineering Annotated Monthly – October 2021

Java vs Python for Data Science in 2023-What's your choice?

Data Architect: Role Description, Skills, Certifications and When to Hire

The Top 25 Data Engineering Influencers and Content Creators on LinkedIn

Data Pipeline- Definition, Architecture, Examples, and Use Cases

Forge Your Career Path with Best Data Engineering Certifications

Top 30 Machine Learning Skills for ML Engineer in 2024

Real-Time Analytics and Monitoring Dashboards with Apache Kafka and Rockset

How to Become a Data Engineer in 2024?

Improve Your LinkedIn Profile and find the right Hadoop Job!

Best Data Processing Frameworks That You Must Know

Azure Data Engineer Resume

A Beginner’s Guide to Learning PySpark for Big Data Processing

5 Key Takeaways from #Current2023

The Good and the Bad of Apache Airflow Pipeline Orchestration

Top 7 Data Engineering Career Opportunities in 2024

The Good and the Bad of Apache Kafka Streaming Platform

Stream Processing vs. Real-Time Analytics Databases

Making Sense of Real-Time Analytics on Streaming Data, Part 1: The Landscape

How to Automate Apache NiFi Data Flow Deployments in the Public Cloud

Stay Connected