May, 2020

article thumbnail

Change Data Capture Using Debezium Kafka and Pg

Start Data Engineering

Change data capture is a software design pattern used to capture changes to data and take corresponding action based on that change. The change to data is usually one of read, update or delete. The corresponding action usually is supposed to occur in another system in response to the change that was made in the source system.

Kafka 246
article thumbnail

Tips on Data Science Masters in Germany

Team Data Science

Should you do a masters degree in data science in Germany? Why not, but keep the following in mind! In general, it is very, very practical in Germany because it doesn't cost a lot of money to study. Not like for example in the USA or something like that. So if you are interested in it, you should first think about what the corresponding Master's programme is about.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Apache Kafka Needs No Keeper: Removing the Apache ZooKeeper Dependency

Confluent

Currently, Apache Kafka® uses Apache ZooKeeper™ to store its metadata. Data such as the location of partitions and the configuration of topics are stored outside of Kafka itself, in a […].

Kafka 145
article thumbnail

Mapping The Customer Journey For B2B Companies At Dreamdata

Data Engineering Podcast

Summary Gaining a complete view of the customer journey is especially difficult in B2B companies. This is due to the number of different individuals involved and the myriad ways that they interface with the business. Dreamdata integrates data from the multitude of platforms that are used by these organizations so that they can get a comprehensive view of their customer lifecycle.

article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

COVID-19: Risk Analytics for Building an Early Warning System

Teradata

Advanced analytics & AI techniques can help in curtailing the COVID-19 pandemic. This post describes an analytics prototype to build an early warning system for COVID-19.

Systems 118
article thumbnail

Pull the Data you Actually Want

Grouparoo

There’s an underlying pattern prevalent today in many digital marketing tools that is causing problems. Wasted time, overpaying, slow velocity, and privacy issues for your customers are some of the results of this pattern. The problem is the over-reliance on Events. Specifically, the problem is that many marketing tools live in a world where they expect to be “pushed” data, when it would be so much better if they were “pulling” data when they needed it.

Data 52

More Trending

article thumbnail

Jupyter Notebooks or Standalone Scripts?

Team Data Science

Lot's of people like notebooks and so do I. Jupyter Notebooks for instance, are great to quickly explore some data or try something out. If you want to bring code into production however, you should or most likely, have to write standalone scripts. If you want to create something for production and then do it in production, Jupiter notebooks are not ideal.

Coding 130
article thumbnail

Building a Telegram Bot Powered by Apache Kafka and ksqlDB

Confluent

Imagine you’ve got a stream of data; it’s not “big data,” but it’s certainly a lot. Within the data, you’ve got some bits you’re interested in, and of those bits, […].

Kafka 141
article thumbnail

Power Up Your PostgreSQL Analytics With Swarm64

Data Engineering Podcast

Summary The PostgreSQL database is massively popular due to its flexibility and extensive ecosystem of extensions, but it is still not the first choice for high performance analytics. Swarm64 aims to change that by adding support for advanced hardware capabilities like FPGAs and optimized usage of modern SSDs. In this episode CEO and co-founder Thomas Richter discusses his motivation for creating an extension to optimize Postgres hardware usage, the benefits of running your analytics on the same

article thumbnail

Introducing Teradata’s Incoming CEO Steve McMillan

Teradata

Teradata's Board of Directors has selected the company's next President and Chief Executive Officer: Steve McMillan. Read more from interim President and CEO, Vic Lund.

108
108
article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

New Course: NumPy for Data Engineers

Dataquest

Python programming is a critical skill for data engineers. When it comes to working with data, there’s a powerful library that can increase your code’s efficiency dramatically, especially when you’re working with large datasets: NumPy. That’s why we’ve added a NumPy for Data Engineers course to our Data Engineering path !

article thumbnail

What Does It Mean for a Column to Be Indexed

Start Data Engineering

When optimizing queries on a database table, most developers tend to just create an index on the field to be queried.

IT 130
article thumbnail

How to develop Spark applications with Zeppelin notebooks

Team Data Science

I love working with Zeppelin notebooks. Its so simple and you can just try something out. Especially working with dataframes and SparkSQL is a blast. What is a Zeppelin? A Zeppelin is a tool, a notebook tool, just like Jupiter. You can run it on a server and you can run it on your Hadoop cluster or whatever. And it can run Spark jobs in the background.

Hadoop 130
article thumbnail

Project Metamorphosis Part 1: Elastic Apache Kafka Clusters in Confluent Cloud

Confluent

A few weeks ago when we talked about our new fundraising, we also announced we’d be kicking off Project Metamorphosis. What is Project Metamorphosis? Let me try to explain. I […].

Project 123
article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

StreamNative Brings Streaming Data To The Cloud Native Landscape With Pulsar

Data Engineering Podcast

Summary There have been several generations of platforms for managing streaming data, each with their own strengths and weaknesses, and different areas of focus. Pulsar is one of the recent entrants which has quickly gained adoption and an impressive set of capabilities. In this episode Sijie Guo discusses his motivations for spending so much of his time and energy on contributing to the project and growing the community.

article thumbnail

How to Balance Efficiency and Risk in Your Supply Chain

Teradata

Supply Chain organizations need visibility now to leverage data for making decisions and taking action, both in times of crisis and in relative stability.

Data 111
article thumbnail

Continuous Deployment for NPM Packages

Grouparoo

A guide to the Grouparoo Monorepo Automated Release Process Coming from more traditional web & app development, I’m a big fan of git-flow style workflow. Specifically the following features: There are feature branches, an integration branch where features are merged together (usually called main ), and finally the "live" branch that customers are using (often called stable , release or production ) The main branch is always deployable (and should be deployed automatically with a CI/C

MySQL 52
article thumbnail

Thank You

Start Data Engineering

Thank you for contacting us. We will get back to you shortly.

100
100
article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

Build a Full Big Data Platform Right Away?

Team Data Science

Should companies go full blowing big data/data science platform right away? In my opinion, you should first look at the different stages you are in. Are you in the Proof-of-Concept phase, where you are just working with offline data, where you are proving your concepts? Or are you in the MVP phase or in the creation of an MVP, where you are bringing in the first users, the first customers?

Big Data 130
article thumbnail

Learning All About Wi-Fi Data with Apache Kafka and Friends

Confluent

Recently, I’ve been looking at what’s possible with streams of Wi-Fi packet capture (pcap) data. I was prompted after initially setting up my Raspberry Pi to capture pcap data and […].

Kafka 122
article thumbnail

Enterprise Data Operations And Orchestration At Infoworks

Data Engineering Podcast

Summary Data management is hard at any scale, but working in the context of an enterprise organization adds even greater complexity. Infoworks is a platform built to provide a unified set of tooling for managing the full lifecycle of data in large businesses. By reducing the barrier to entry with a graphical interface for defining data transformations and analysis, it makes it easier to bring the domain experts into the process.

article thumbnail

How to Operationalize Enterprise Analytics in the Telco Industry

Teradata

Operationalizing world class analytics into day-to-day processes can help solve some of the greatest challenges in the telecommunications industry. Find out more.

article thumbnail

The Big Payoff of Application Analytics

Outdated or absent analytics won’t cut it in today’s data-driven applications – not for your end users, your development team, or your business. That’s what drove the five companies in this e-book to change their approach to analytics. Download this e-book to learn about the unique problems each company faced and how they achieved huge returns beyond expectation by embedding analytics into applications.

article thumbnail

Getting Started - Installing Additional Drivers

Preset

Now that you have Apache Superset installed locally, here's how to hook it up to your favorite database.

article thumbnail

Keeping Customers Streaming?—?The Centralized Site Reliability Practice at Netflix

Netflix Tech

Keeping Customers Streaming?—?The Centralized Site Reliability Practice at Netflix By Hank Jacobs , Senior Site Reliability Engineer on CORE We’re privileged to be in the business of bringing joy to our customers at Netflix. Whether it’s a compelling new series or an innovative product feature, we strive to provide a best-in-class service that people love and can enjoy anytime, anywhere.

article thumbnail

Job Opportunities For Data Science Proof Of Concepts and MVPs

Team Data Science

What are the job opportunities in the field of Data Science? Several, of course! Based on the 4 phases of a Data Science project, the possibilities can be worked out well. In this blog post only two of the four phases will be discussed. But now from the beginning. The four phases are: Proof-of-Concept, MVP, Validation and Scaling. The Proof of Concept Phase (PoC) Starting at the PoC phase, you could say: okay, I'm getting a research data scientist here.

article thumbnail

Building a Clickstream Dashboard Application with ksqlDB and Elasticsearch

Confluent

Using a powerful, event-driven application can help you unlock insights contained in the event streams of your business. Before we get into the technology, let’s go over some questions you […].

Building 117
article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

Create APIs for Aggregations and Joins on MongoDB in Under 15 Minutes

Rockset

Rockset has teamed up with MongoDB so you can build real-time apps with data across MongoDB and other sources. If you haven’t heard of Rockset or know what Rockset does, you will by the end of this guide! We’ll create an API to determine air quality using ClimaCell data on the weather and air pollutants. Air quality has been documented to effect human health (resources at the bottom).

MongoDB 40
article thumbnail

COVID-19: The Perfect Storm

Teradata

The COVID-19 pandemic has brought with it a Perfect Storm of disruption that impacts all of us -- from our health to the economy to the supply chain. Read more.

IT 105
article thumbnail

Getting Started - Connect Superset To Google Sheets

Preset

This tutorial shows you how to connect your local deployment of Apache Superset with Google Sheets, so you can query any publicly available Google Sheet.

40
article thumbnail

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

Netflix Tech

How Netflix is able to enrich VPC Flow Logs at Hyper Scale to provide Network Insight By Hariharan Ananthakrishnan and Angela Ho The Cloud Network Infrastructure that Netflix utilizes today is a large distributed ecosystem that consists of specialized functional tiers and services such as DirectConnect, VPC Peering, Transit Gateways, NAT Gateways, etc.

AWS 60
article thumbnail

Driving Business Impact for PMs

Speaker: Jon Harmer, Product Manager for Google Cloud

Move from feature factory to customer outcomes and drive impact in your business! This session will provide you with a comprehensive set of tools to help you develop impactful products by shifting from output-based thinking to outcome-based thinking. You will deepen your understanding of your customers and their needs as well as identifying and de-risking the different kinds of hypotheses built into your roadmap.