article thumbnail

Keeping Your Data Warehouse In Order With DataForm

Data Engineering Podcast

Summary Managing a data warehouse can be challenging, especially when trying to maintain a common set of patterns. What are some of the challenges and mistakes that are common among engineers and analysts with regard to versioning and evolving schemas and the accompanying data?

article thumbnail

Easier Stream Processing On Kafka With ksqlDB

Data Engineering Podcast

The ksqlDB project was created to address this state of affairs by building a unified layer on top of the Kafka ecosystem for stream processing. Developers can work with the SQL constructs that they are familiar with while automatically getting the durability and reliability that Kafka offers. How is ksqlDB architected?

Kafka 100
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Building A Real Time Event Data Warehouse For Sentry

Data Engineering Podcast

To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Links Sentry Podcast.__init__

article thumbnail

How to Use ChatGPT ETL Prompts For Your ETL Game

Monte Carlo

Loading ChatGPT ETL prompts can help write scripts to load data into different databases, data lakes, or data warehouses. I'd like to import this data into my MySQL database into a table called products_table. The data is currently in a pandas DataFrame. I've heard about the UPSERT functionality.

article thumbnail

Data News — Week 23.24

Christophe Blefari

Why data consumers do not trust your reporting — It is a good illustration of the data journey manifesto. Stakeholders often notice data issues before the data team does. Data warehouses are mutable, this is one of the many root causes proposed by Lucas. This is metrics drift.

article thumbnail

Making Analytical APIs Fast With Tinybird

Data Engineering Podcast

RudderStack’s smart customer data pipeline is warehouse-first. It builds your customer data warehouse and your identity graph on your data warehouse, with support for Snowflake, Google BigQuery, Amazon Redshift, and more. RudderStack’s smart customer data pipeline is warehouse-first.

article thumbnail

Change Data Capture For All Of Your Databases With Debezium

Data Engineering Podcast

Debezium is an open source platform for reliable change data capture that you can use to build supplemental systems for everything from maintaining audit trails to real-time updates of your data warehouse. What are some of the other options on the market for handling change data capture? Pulsar, Bookkeeper, Pravega)?

Database 100