article thumbnail

Data News — Week 24.08

Christophe Blefari

JVM vs. SQL data engineer — There's a big discussion in the community about what real data engineering is. Is it Java/Scala or Python? Is it lake or warehouse? It's a sterile debate: both are useful and can serve different organisations with different service level for data users and stakeholders.

Data Lake 130
article thumbnail

An Exploration Of The Open Data Lakehouse And Dremio's Contribution To The Ecosystem

Data Engineering Podcast

Summary The "data lakehouse" architecture balances the scalability and flexibility of data lakes with the ease of use and transaction support of data warehouses. Mention the podcast to get a free "In Data We Trust World Tour" t-shirt.

Data Lake 100
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

How to Become an Azure Data Engineer? 2023 Roadmap

Knowledge Hut

To provide end users with a variety of ready-made models, Azure Data engineers collaborate with Azure AI services built on top of Azure Cognitive Services APIs. You ought to be able to create a data model that is performance- and scalability-optimized. Automation : Automation is key for managing large datasets efficiently.

article thumbnail

How Software Bill of Materials change the dependency game

Zalando Engineering

Some teams use tools like dependabot , scala-steward that create pull requests in repositories when new library versions are available. We publish a curated data set containing dependency data from the SBOM for every application we deploy, based on its Container image. maven-versions-plugin , gradle-versions-plugin ).

Java 98
article thumbnail

Clean Up Your Data Using Scalable Entity Resolution And Data Mastering With Zingg

Data Engineering Podcast

RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control.

MongoDB 130
article thumbnail

Discover And De-Clutter Your Unstructured Data With Aparavi

Data Engineering Podcast

Acryl Data provides DataHub as an easy to consume SaaS product which has been adopted by several companies. Signup for the SaaS product at dataengineeringpodcast.com/acryl RudderStack helps you build a customer data platform on your warehouse or data lake.

article thumbnail

Joe Reis Flips The Script And Interviews Tobias Macey About The Data Engineering Podcast

Data Engineering Podcast

RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control.