Top Data Engineering Digest Kafka Process Content for May, 2020

May, 2020

Change Data Capture Using Debezium Kafka and Pg

Start Data Engineering

MAY 9, 2020

Change data capture is a software design pattern used to capture changes to data and take corresponding action based on that change. The change to data is usually one of read, update or delete. The corresponding action usually is supposed to occur in another system in response to the change that was made in the source system.

Kafka

Kafka Data Designing Systems

Tips on Data Science Masters in Germany

Team Data Science

MAY 26, 2020

Should you do a masters degree in data science in Germany? Why not, but keep the following in mind! In general, it is very, very practical in Germany because it doesn't cost a lot of money to study. Not like for example in the USA or something like that. So if you are interested in it, you should first think about what the corresponding Master's programme is about.

Data Science

Data Science Computer Science Data Data Engineering

Join 16,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Trending Sources

Apache Kafka Needs No Keeper: Removing the Apache ZooKeeper Dependency

Confluent

MAY 15, 2020

Currently, Apache Kafka® uses Apache ZooKeeper™ to store its metadata. Data such as the location of partitions and the configuration of topics are stored outside of Kafka itself, in a […].

Kafka

Kafka Metadata IT Project

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Mapping The Customer Journey For B2B Companies At Dreamdata

Data Engineering Podcast

MAY 25, 2020

Summary Gaining a complete view of the customer journey is especially difficult in B2B companies. This is due to the number of different individuals involved and the myriad ways that they interface with the business. Dreamdata integrates data from the multitude of platforms that are used by these organizations so that they can get a comprehensive view of their customer lifecycle.

Machine Learning

Machine Learning Portfolio Deep Learning Data Engineering

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

Data Science

COVID-19: Risk Analytics for Building an Early Warning System

Teradata

MAY 5, 2020

Advanced analytics & AI techniques can help in curtailing the COVID-19 pandemic. This post describes an analytics prototype to build an early warning system for COVID-19.

Systems

Systems Building

Pull the Data you Actually Want

Grouparoo

MAY 21, 2020

There’s an underlying pattern prevalent today in many digital marketing tools that is causing problems. Wasted time, overpaying, slow velocity, and privacy issues for your customers are some of the results of this pattern. The problem is the over-reliance on Events. Specifically, the problem is that many marketing tools live in a world where they expect to be “pushed” data, when it would be so much better if they were “pulling” data when they needed it.

Data

Data Database Data Warehouse Building

Data Engineering Project for Beginners - Batch edition

Start Data Engineering

MAY 23, 2020

Introduction Approach Project overview Engineering Design Airflow Primer: Setup Code and explanation Stage 1. pg -> file -> s3 Stage 2. file -> s3 -> EMR -> s3 Stage 3. movie_review_stage, user_purchase_stage -> Redshift table -> quality Check data Monitoring ETL Design Review Common Scenarios Next Steps Conclusion Introduction Starting out in data engineering can be a little intimidating, especially because data engineering involves a lot of moving parts.

Data Engineering

Data Engineering Data Engineer Project Engineering

More Trending

Data Engineering Project for Beginners - Batch edition

Start Data Engineering

MAY 23, 2020

Data Engineering

Data Engineering Data Engineer Project Engineering

Jupyter Notebooks or Standalone Scripts?

Team Data Science

MAY 25, 2020

Lot's of people like notebooks and so do I. Jupyter Notebooks for instance, are great to quickly explore some data or try something out. If you want to bring code into production however, you should or most likely, have to write standalone scripts. If you want to create something for production and then do it in production, Jupiter notebooks are not ideal.

Coding

Coding Data Engineering Data Engineer Engineering

Building a Telegram Bot Powered by Apache Kafka and ksqlDB

Confluent

MAY 12, 2020

Imagine you’ve got a stream of data; it’s not “big data,” but it’s certainly a lot. Within the data, you’ve got some bits you’re interested in, and of those bits, […].

Kafka

Kafka Building Big Data MongoDB

Power Up Your PostgreSQL Analytics With Swarm64

Data Engineering Podcast

MAY 18, 2020

Summary The PostgreSQL database is massively popular due to its flexibility and extensive ecosystem of extensions, but it is still not the first choice for high performance analytics. Swarm64 aims to change that by adding support for advanced hardware capabilities like FPGAs and optimized usage of modern SSDs. In this episode CEO and co-founder Thomas Richter discusses his motivation for creating an extension to optimize Postgres hardware usage, the benefits of running your analytics on the same

PostgreSQL

PostgreSQL Database Data Warehouse Machine Learning

Introducing Teradata’s Incoming CEO Steve McMillan

Teradata

MAY 6, 2020

Teradata's Board of Directors has selected the company's next President and Chief Executive Officer: Steve McMillan. Read more from interim President and CEO, Vic Lund.

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

Engineering

New Course: NumPy for Data Engineers

Dataquest

MAY 21, 2020

Python programming is a critical skill for data engineers. When it comes to working with data, there’s a powerful library that can increase your code’s efficiency dramatically, especially when you’re working with large datasets: NumPy. That’s why we’ve added a NumPy for Data Engineers course to our Data Engineering path !

Data Engineering

Data Engineering Data Engineer Engineering Datasets

What Does It Mean for a Column to Be Indexed

Start Data Engineering

MAY 1, 2020

When optimizing queries on a database table, most developers tend to just create an index on the field to be queried.

IT Database

How to develop Spark applications with Zeppelin notebooks

Team Data Science

MAY 23, 2020

I love working with Zeppelin notebooks. Its so simple and you can just try something out. Especially working with dataframes and SparkSQL is a blast. What is a Zeppelin? A Zeppelin is a tool, a notebook tool, just like Jupiter. You can run it on a server and you can run it on your Hadoop cluster or whatever. And it can run Spark jobs in the background.

Hadoop

Hadoop Data Engineering Data Engineer Coding

Project Metamorphosis Part 1: Elastic Apache Kafka Clusters in Confluent Cloud

Confluent

MAY 6, 2020

A few weeks ago when we talked about our new fundraising, we also announced we’d be kicking off Project Metamorphosis. What is Project Metamorphosis? Let me try to explain. I […].

Project

Project Kafka Cloud

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

Building

StreamNative Brings Streaming Data To The Cloud Native Landscape With Pulsar

Data Engineering Podcast

MAY 11, 2020

Summary There have been several generations of platforms for managing streaming data, each with their own strengths and weaknesses, and different areas of focus. Pulsar is one of the recent entrants which has quickly gained adoption and an impressive set of capabilities. In this episode Sijie Guo discusses his motivations for spending so much of his time and energy on contributing to the project and growing the community.

Lambda Architecture

Lambda Architecture Cloud Kafka Hadoop

How to Balance Efficiency and Risk in Your Supply Chain

Teradata

MAY 25, 2020

Supply Chain organizations need visibility now to leverage data for making decisions and taking action, both in times of crisis and in relative stability.

Data

Continuous Deployment for NPM Packages

Grouparoo

MAY 6, 2020

A guide to the Grouparoo Monorepo Automated Release Process Coming from more traditional web & app development, I’m a big fan of git-flow style workflow. Specifically the following features: There are feature branches, an integration branch where features are merged together (usually called main ), and finally the "live" branch that customers are using (often called stable , release or production ) The main branch is always deployable (and should be deployed automatically with a CI/C

MySQL

MySQL Project Process Building

Thank You

Start Data Engineering

MAY 30, 2020

Thank you for contacting us. We will get back to you shortly.

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

Project

Build a Full Big Data Platform Right Away?

Team Data Science

MAY 20, 2020

Should companies go full blowing big data/data science platform right away? In my opinion, you should first look at the different stages you are in. Are you in the Proof-of-Concept phase, where you are just working with offline data, where you are proving your concepts? Or are you in the MVP phase or in the creation of an MVP, where you are bringing in the first users, the first customers?

Big Data

Big Data Building AWS Kafka

Learning All About Wi-Fi Data with Apache Kafka and Friends

Confluent

MAY 27, 2020

Recently, I’ve been looking at what’s possible with streams of Wi-Fi packet capture (pcap) data. I was prompted after initially setting up my Raspberry Pi to capture pcap data and […].

Kafka

Kafka Data Aggregated Data Process

Enterprise Data Operations And Orchestration At Infoworks

Data Engineering Podcast

MAY 4, 2020

Summary Data management is hard at any scale, but working in the context of an enterprise organization adds even greater complexity. Infoworks is a platform built to provide a unified set of tooling for managing the full lifecycle of data in large businesses. By reducing the barrier to entry with a graphical interface for defining data transformations and analysis, it makes it easier to bring the domain experts into the process.

Data Pipeline

Data Pipeline Hadoop Big Data Data

How to Operationalize Enterprise Analytics in the Telco Industry

Teradata

MAY 21, 2020

Operationalizing world class analytics into day-to-day processes can help solve some of the greatest challenges in the telecommunications industry. Find out more.

Telecommunication

Telecommunication Process

The Big Payoff of Application Analytics

Outdated or absent analytics won’t cut it in today’s data-driven applications – not for your end users, your development team, or your business. That’s what drove the five companies in this e-book to change their approach to analytics. Download this e-book to learn about the unique problems each company faced and how they achieved huge returns beyond expectation by embedding analytics into applications.

Building

Getting Started - Installing Additional Drivers

Preset

MAY 17, 2020

Now that you have Apache Superset installed locally, here's how to hook it up to your favorite database.

Database

Database IT

Keeping Customers Streaming?—?The Centralized Site Reliability Practice at Netflix

Netflix Tech

MAY 27, 2020

Keeping Customers Streaming?—?The Centralized Site Reliability Practice at Netflix By Hank Jacobs , Senior Site Reliability Engineer on CORE We’re privileged to be in the business of bringing joy to our customers at Netflix. Whether it’s a compelling new series or an innovative product feature, we strive to provide a best-in-class service that people love and can enjoy anytime, anywhere.

Consulting

Consulting Engineering Management Systems

Job Opportunities For Data Science Proof Of Concepts and MVPs

Team Data Science

MAY 20, 2020

What are the job opportunities in the field of Data Science? Several, of course! Based on the 4 phases of a Data Science project, the possibilities can be worked out well. In this blog post only two of the four phases will be discussed. But now from the beginning. The four phases are: Proof-of-Concept, MVP, Validation and Scaling. The Proof of Concept Phase (PoC) Starting at the PoC phase, you could say: okay, I'm getting a research data scientist here.

Data Science

Data Science Algorithm Data Engineering Data Engineer

Building a Clickstream Dashboard Application with ksqlDB and Elasticsearch

Confluent

MAY 26, 2020

Using a powerful, event-driven application can help you unlock insights contained in the event streams of your business. Before we get into the technology, let’s go over some questions you […].

Building

Building Technology Kafka Process

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

Certification

Create APIs for Aggregations and Joins on MongoDB in Under 15 Minutes

Rockset

MAY 19, 2020

Rockset has teamed up with MongoDB so you can build real-time apps with data across MongoDB and other sources. If you haven’t heard of Rockset or know what Rockset does, you will by the end of this guide! We’ll create an API to determine air quality using ClimaCell data on the weather and air pollutants. Air quality has been documented to effect human health (resources at the bottom).

MongoDB

MongoDB Python Database SQL

COVID-19: The Perfect Storm

Teradata

MAY 13, 2020

The COVID-19 pandemic has brought with it a Perfect Storm of disruption that impacts all of us -- from our health to the economy to the supply chain. Read more.

Getting Started - Connect Superset To Google Sheets

Preset

MAY 31, 2020

This tutorial shows you how to connect your local deployment of Apache Superset with Google Sheets, so you can query any publicly available Google Sheet.

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

Netflix Tech

MAY 26, 2020

How Netflix is able to enrich VPC Flow Logs at Hyper Scale to provide Network Insight By Hariharan Ananthakrishnan and Angela Ho The Cloud Network Infrastructure that Netflix utilizes today is a large distributed ecosystem that consists of specialized functional tiers and services such as DirectConnect, VPC Peering, Transit Gateways, NAT Gateways, etc.

AWS

AWS Bytes Metadata Cloud

Driving Business Impact for PMs

Speaker: Jon Harmer, Product Manager for Google Cloud

Move from feature factory to customer outcomes and drive impact in your business! This session will provide you with a comprehensive set of tools to help you develop impactful products by shifting from output-based thinking to outcome-based thinking. You will deepen your understanding of your customers and their needs as well as identifying and de-risking the different kinds of hypotheses built into your roadmap.

Certification

May, 2020

Change Data Capture Using Debezium Kafka and Pg

Tips on Data Science Masters in Germany

Webinars

Trending Sources

Apache Kafka Needs No Keeper: Removing the Apache ZooKeeper Dependency

Webinars

Mapping The Customer Journey For B2B Companies At Dreamdata

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

COVID-19: Risk Analytics for Building an Early Warning System

Pull the Data you Actually Want

Data Engineering Project for Beginners - Batch edition

Sign up to get articles personalized to your interests!

More Trending

Data Engineering Project for Beginners - Batch edition

Jupyter Notebooks or Standalone Scripts?

Building a Telegram Bot Powered by Apache Kafka and ksqlDB

Power Up Your PostgreSQL Analytics With Swarm64

Introducing Teradata’s Incoming CEO Steve McMillan

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

New Course: NumPy for Data Engineers

What Does It Mean for a Column to Be Indexed

How to develop Spark applications with Zeppelin notebooks

Project Metamorphosis Part 1: Elastic Apache Kafka Clusters in Confluent Cloud

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

StreamNative Brings Streaming Data To The Cloud Native Landscape With Pulsar

How to Balance Efficiency and Risk in Your Supply Chain

Continuous Deployment for NPM Packages

Thank You

Entity Resolution Checklist: What to Consider When Evaluating Options

Build a Full Big Data Platform Right Away?

Learning All About Wi-Fi Data with Apache Kafka and Friends

Enterprise Data Operations And Orchestration At Infoworks

How to Operationalize Enterprise Analytics in the Telco Industry

The Big Payoff of Application Analytics

Getting Started - Installing Additional Drivers

Keeping Customers Streaming?—?The Centralized Site Reliability Practice at Netflix

Job Opportunities For Data Science Proof Of Concepts and MVPs

Building a Clickstream Dashboard Application with ksqlDB and Elasticsearch

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Create APIs for Aggregations and Joins on MongoDB in Under 15 Minutes

COVID-19: The Perfect Storm

Getting Started - Connect Superset To Google Sheets

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

Driving Business Impact for PMs

Stay Connected