Remove learn hive-to-bigquery
article thumbnail

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

Data Engineering Podcast

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management You listen to this show to learn about all of the latest tools, patterns, and practices that power data engineering projects across every domain. Vinoth Chandar helped to create the Hudi project while at Uber to address this challenge.

Data Lake 130
article thumbnail

An Exploration Of The Open Data Lakehouse And Dremio's Contribution To The Ecosystem

Data Engineering Podcast

In this episode Jason Hughes explains what it means for a lakehouse to be "open" and describes the different components that the Dremio team build and contribute to. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services.

Data Lake 100
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Data Cloud Cost Optimization With Bluesky Data

Data Engineering Podcast

Along with those benefits, they have also introduced a new consumption model that can lead to incredibly expensive bills at the end of the month. In order to ensure that you can explore and analyze your data without spending money on inefficient queries Mingsheng Hong and Zheng Shao created Bluesky Data.

Cloud 100
article thumbnail

Large Scale Ad Data Systems at Booking.com using the Public Cloud

Booking.com Engineering

From a technical perspective, solving this requires machine learning and operational infrastructure at scale, which is processing performance feedback, assessing historical performance and after running algorithms, communicating results back to a search engine provider. Booking Holdings, as a whole, spent $4.7 What are PPC’s Challenges?

Systems 52
article thumbnail

Simplify Your Data Architecture With The Presto Distributed SQL Engine

Data Engineering Podcast

You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data platforms. Summary Databases are limited in scope to the information that they directly contain.

article thumbnail

SnowflakeDB: The Data Warehouse Built For The Cloud

Data Engineering Podcast

And for your machine learning workloads, they just announced dedicated CPU instances. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management.

article thumbnail

Solving Data Discovery At Lyft

Data Engineering Podcast

And for your machine learning workloads, they just announced dedicated CPU instances. Summary Data is only valuable if you use it for something, and the first step is knowing that it is available. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai.