article thumbnail

Creating Shared Context For Your Data Warehouse With A Controlled Vocabulary

Data Engineering Podcast

Summary Communication and shared context are the hardest part of any data system. In recent years the focus has been on data catalogs as the means for documenting data assets, but those introduce a secondary system of record in order to find the necessary information.

article thumbnail

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Data lakes are notoriously complex. Join in with the event for the global data community, Data Council Austin.

Data Lake 262
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Modern Customer Data Platform Principles

Data Engineering Podcast

A substantial amount of the data that is being managed in these systems is related to customers and their interactions with an organization. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Data projects are notoriously complex.

Data Lake 147
article thumbnail

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Netflix Tech

By Tianlong Chen and Ioannis Papapanagiotou Netflix has more than 195 million subscribers that generate petabytes of data everyday. Data scientists and engineers collect this data from our subscribers and videos, and implement data analytics models to discover customer behaviour with the goal of maximizing user joy.

article thumbnail

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

SQL 173
article thumbnail

Making Sense Of The Technical And Organizational Considerations Of Data Contracts

Data Engineering Podcast

Summary One of the reasons that data work is so challenging is because no single person or team owns the entire process. This introduces friction in the process of collecting, processing, and using data. In order to reduce the potential for broken pipelines some teams have started to adopt the idea of data contracts.

Metadata 130
article thumbnail

Upgrade your Modern Data Stack

Christophe Blefari

Make your data stack take-off ( credits ) Hello, another edition of Data News. This week, we're going to take a step back and look at the current state of data platforms. What are the current trends and why are people fighting around the concept of the modern data stack. Is the modern data stack dying?