article thumbnail

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

Data Engineering Podcast

Summary Data lake architectures have largely been biased toward batch processing workflows due to the volume of data that they are designed for. With more real-time requirements and the increasing use of streaming data there has been a struggle to merge fast, incremental updates with large, historical analysis.

Data Lake 130
article thumbnail

Insights And Advice On Building A Data Lake Platform From Someone Who Learned The Hard Way

Data Engineering Podcast

Summary Designing a data platform is a complex and iterative undertaking which requires accounting for many conflicting needs. Designing a platform that relies on a data lake as its central architectural tenet adds additional layers of difficulty. What are the elements that are still cumbersome or intractable?

Data Lake 100
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Add Version Control To Your Data Lake With LakeFS

Data Engineering Podcast

Summary Data lakes are gaining popularity due to their flexibility and reduced cost of storage. Along with the benefits there are some additional complexities to consider, including how to safely integrate new data sources or test out changes to existing pipelines. What do you have planned for the future of the project?

Data Lake 100
article thumbnail

Straining Your Data Lake Through A Data Mesh

Data Engineering Podcast

Summary The current trend in data management is to centralize the responsibilities of storing and curating the organization’s information to a data engineering team. This organizational pattern is reinforced by the architectural pattern of data lakes as a solution for managing storage and access.

Data Lake 100
article thumbnail

Strategies And Tactics For A Successful Master Data Management Implementation

Data Engineering Podcast

Summary The most complicated part of data engineering is the effort involved in making the raw data fit into the narrative of the business. Master Data Management (MDM) is the process of building consensus around what the information actually means in the context of the business and then shaping the data to match those semantics.

article thumbnail

How Upsolver Is Building A Data Lake Platform In The Cloud with Yoni Iny - Episode 56

Data Engineering Podcast

Summary A data lake can be a highly valuable resource, as long as it is well built and well managed. In this episode Yoni Iny, CTO of Upsolver, discusses the various components that are necessary for a successful data lake project, how the Upsolver platform is architected, and how modern data lakes can benefit your organization.

Data Lake 100
article thumbnail

Top 7 AWS Cloud Practitioner Projects in 2023 [With Source Code]

Knowledge Hut

As an AWS Cloud Practitioner with experience in delivering multiple AWS cloud practitioner projects, I vividly recall assisting a startup to prove the scalability of their AI solution on AWS during one of my early projects. In this dynamic profession, there is no alternative to project-based learning.

AWS 52