A Data Lake, You Call It? It’s a Data Swamp
KDnuggets
FEBRUARY 5, 2024
How and why the data lake architecture often fails to meet its promises. And how better governance helps mitigate such challenges.
This site uses cookies to improve your experience. By viewing our content, you are accepting the use of cookies. To help us insure we adhere to various privacy regulations, please select your country/region of residence. If you do not select a country we will assume you are from the United States. View our privacy policy and terms of use.
KDnuggets
FEBRUARY 5, 2024
How and why the data lake architecture often fails to meet its promises. And how better governance helps mitigate such challenges.
AltexSoft
AUGUST 29, 2023
In 2010, a transformative concept took root in the realm of data storage and analytics — a data lake. The term was coined by James Dixon , Back-End Java, Data, and Business Intelligence Engineer, and it started a new era in how organizations could store, manage, and analyze their data. What is a data lake?
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications
From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success
Understanding User Needs and Satisfying Them
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know
KDnuggets
OCTOBER 30, 2023
A comparative overview of data warehouses, data lakes, and data marts to help you make informed decisions on data storage solutions for your data architecture.
Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications
From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success
Understanding User Needs and Satisfying Them
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know
Data Engineering Podcast
AUGUST 3, 2021
Summary Data lake architectures have largely been biased toward batch processing workflows due to the volume of data that they are designed for. With more real-time requirements and the increasing use of streaming data there has been a struggle to merge fast, incremental updates with large, historical analysis.
Monte Carlo
JANUARY 5, 2024
You know what they always say: data lakehouse architecture is like an onion. …ok, Data lakehouse architecture combines the benefits of data warehouses and data lakes, bringing together the structure and performance of a data warehouse with the flexibility of a data lake.
Monte Carlo
JANUARY 5, 2024
You know what they always say: data lakehouse architecture is like an onion. …ok, Data lakehouse architecture combines the benefits of data warehouses and data lakes, bringing together the structure and performance of a data warehouse with the flexibility of a data lake.
Data Engineering Podcast
MAY 15, 2022
Summary Designing a data platform is a complex and iterative undertaking which requires accounting for many conflicting needs. Designing a platform that relies on a data lake as its central architectural tenet adds additional layers of difficulty. Can you describe your current platform architecture?
Monte Carlo
APRIL 24, 2023
Data lakes are useful, flexible data storage repositories that enable many types of data to be stored in its rawest state. Traditionally, after being stored in a data lake, raw data was then often moved to various destinations like a data warehouse for further processing, analysis, and consumption.
DareData
JULY 5, 2023
Learn how we build data lake infrastructures and help organizations all around the world achieving their data goals. In today's data-driven world, organizations are faced with the challenge of managing and processing large volumes of data efficiently.
Cloudyard
DECEMBER 13, 2022
Read Time: 4 Minute, 23 Second During this post we will discuss how AWS S3 service and Snowflake integration can be used as Data Lake in current organizations. How customer has migrated On Premises EDW to Snowflake to leverage snowflake Data Lake capabilities.
Snowflake
SEPTEMBER 14, 2023
More than 50% of data leaders recently surveyed by BCG said the complexity of their data architecture is a significant pain point in their enterprise. As a result,” says BCG, “many companies find themselves at a tipping point, at risk of drowning in a deluge of data, overburdened with complexity and costs.”
Data Engineering Podcast
JULY 22, 2019
Summary The current trend in data management is to centralize the responsibilities of storing and curating the organization’s information to a data engineering team. This organizational pattern is reinforced by the architectural pattern of data lakes as a solution for managing storage and access.
François Nguyen
MARCH 22, 2021
With this 3rd platform generation, you have more real time data analytics and a cost reduction because it is easier to manage this infrastructure in the cloud thanks to managed services. We are Data Teams versus we have to patch the server with the latest version and do the tests. The number of subjects to automatize is not short.
phData: Data Engineering
APRIL 4, 2023
Today we want to introduce Fivetran’s support for Amazon S3 with Apache Iceberg, investigate some of the implications of this feature, and learn how it fits into the modern data architecture as a whole. Fivetran today announced support for Amazon Simple Storage Service (Amazon S3) with Apache Iceberg data lake format.
Acceldata
DECEMBER 19, 2022
Avoid these three data pitfalls when attempting to modernize your data architecture with a data lake or data warehouse.
Precisely
JUNE 6, 2023
At Precisely’s Trust ’23 conference, Chief Operating Officer Eric Yau hosted an expert panel discussion on modern data architectures. The group kicked off the session by exchanging ideas about what it means to have a modern data architecture. Watch the full Modern Data Architectures session and learn more.
Data Engineering Podcast
NOVEMBER 11, 2018
Summary A data lake can be a highly valuable resource, as long as it is well built and well managed. In this episode Yoni Iny, CTO of Upsolver, discusses the various components that are necessary for a successful data lake project, how the Upsolver platform is architected, and how modern data lakes can benefit your organization.
Data Engineering Podcast
SEPTEMBER 7, 2020
In this episode he explains how it is designed to allow for querying and combining data where it resides, the use cases that such an architecture unlocks, and the innovative ways that it is being employed at companies across the world.
Data Engineering Podcast
AUGUST 27, 2021
By making the software be the owner of the data that it generates, we have to go through the trouble of extracting the information to then be used elsewhere. The team at Cinchy are working to bring about a new paradigm of software architecture that puts the data as the central element. What is it used for? How does that work?
Monte Carlo
FEBRUARY 21, 2023
Despite its prevalence, data can be messy, siloed, ungovernable, and inaccessible—especially to the non-technical employees who rely on it. Enter data fabric: a data management architecture designed to serve the needs of the business, not just those of data engineers. Table of Contents What is a data fabric?
Monte Carlo
FEBRUARY 21, 2023
Despite its prevalence, data can be messy, siloed, ungovernable, and inaccessible—especially to the non-technical employees who rely on it. Enter data fabric: a data management architecture designed to serve the needs of the business, not just those of data engineers. Table of Contents What is a data fabric?
Rockset
APRIL 26, 2023
We’ve noticed many common patterns across streaming data architectures and we’ll be sharing a blueprint for three of the most popular: anomaly detection, IoT, and recommendations. The majority of anomaly detectors require streaming data, real-time data and historical data in order to generate inferences.
Databand.ai
AUGUST 30, 2023
DataOps Architecture: 5 Key Components and How to Get Started Ryan Yackel August 30, 2023 What Is DataOps Architecture? DataOps is a collaborative approach to data management that combines the agility of DevOps with the power of data analytics. As a result, they can be slow, inefficient, and prone to errors.
Hevo
FEBRUARY 2, 2024
Data-driven organizations are searching for different storage solutions to manage the latency, volume, and resilience of big data and analytics. Initially, businesses used existing data lakes and warehouses in their tech stack to make the most out of data assets.
ProjectPro
DECEMBER 7, 2021
Data pipelines are a significant part of the big data domain, and every professional working or willing to work in this field must have extensive knowledge of them. As data is expanding exponentially, organizations struggle to harness digital information's power for different business use cases. What is a Big Data Pipeline?
Scott Logic
APRIL 18, 2024
In this episode, Oliver Cronk, Andrew Carr and David Hope talk about the ever-changing world of data, with conversations moving from data warehouse to data lake, and data mesh to data fabric.
Data Engineering Podcast
MAY 22, 2022
Acryl Data provides DataHub as an easy to consume SaaS product which has been adopted by several companies. Signup for the SaaS product at dataengineeringpodcast.com/acryl RudderStack helps you build a customer data platform on your warehouse or data lake. Stop struggling to speed up your data lake.
Christophe Blefari
JANUARY 20, 2024
You'll be seen as the most technical person of a data team and you'll need to help regarding "low-level" stuff you team. You'll be also asked to put in place a data infrastructure. It means a data warehouse, a data lake or other concepts starting with data.
Cloudera
JUNE 18, 2022
Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes.
Data Engineering Podcast
OCTOBER 16, 2022
Summary The "data lakehouse" architecture balances the scalability and flexibility of data lakes with the ease of use and transaction support of data warehouses. Mention the podcast to get a free "In Data We Trust World Tour" t-shirt.
Towards Data Science
JANUARY 16, 2024
My personal take on justifying the existence of Data Mesh A senior stakeholder at one my projects mentioned that they wanted to decentralise their data platform architecture and democratise data across the organisation. When I heard the words ‘decentralised data architecture’, I was left utterly confused at first!
ProjectPro
JULY 22, 2016
For the same cost, organizations can now store 50 times as much data as in a Hadoop data lake than in a data warehouse. Data lake is gaining momentum across various organizations and everyone wants to know how to implement a data lake and why.
Knowledge Hut
MARCH 28, 2024
Role Level Intermediate Responsibilities Design and develop data pipelines to ingest, process, and transform data. Implemented and managed data storage solutions using Azure services like Azure SQL Database , Azure Data Lake Storage, and Azure Cosmos DB.
Data Engineering Weekly
MARCH 3, 2024
The migration enhanced data quality, lineage visibility, performance improvements, cost reductions, and better reliability and scalability, setting a robust foundation for future expansions and onboarding.
Data Engineering Podcast
JANUARY 28, 2024
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.
Christophe Blefari
FEBRUARY 24, 2023
When it comes to data storage, the real-time ecosystem has also changed a lot in the last few years and a lot of tooling went out to simplify the burden of managing Kafka clusters, Materialize—a real-time platform—detailed their architecture. Data Economy 💰 Qbeast raises €2.5m
Ascend.io
AUGUST 31, 2023
In the dynamic world of data, many professionals are still fixated on traditional patterns of data warehousing and ETL, even while their organizations are migrating to the cloud and adopting cloud-native data services. Central to this transformation are two shifts. Let’s take a closer look.
Data Engineering Podcast
AUGUST 20, 2021
Summary Data lakes have been gaining popularity alongside an increase in their sophistication and usability. Despite improvements in performance and data architecture they still require significant knowledge and experience to deploy and manage. Can you describe what Cuelake is and the story behind it?
Snowflake
DECEMBER 4, 2023
Apache Iceberg’s ecosystem of diverse adopters, contributors and commercial support continues to grow, establishing itself as the industry standard table format for an open data lakehouse architecture. Are you using Snowflake on AWS and already using Glue Data Catalog for your data lake?
Data Engineering Weekly
SEPTEMBER 9, 2023
Full Agenda: [link] Vinoth Chandar - Founder & CEO of One House: Vinoth will be talking about evolution of the Data Engineering stack from Data Warehouse → Data Lake → Lake House architecture and how the Lake House architecture will shape the data engineering landscape in the future.
Data Engineering Weekly
DECEMBER 25, 2023
Integrating AI into data workflows is not just a trend but a paradigm shift, making data processes more efficient and intelligent. Lake House Architectures: The New Frontier Lakehouse architectures have been at the forefront of data engineering discussions this year.
Cloudera
MARCH 17, 2023
As the use of ChatGPT becomes more prevalent, I frequently encounter customers and data users citing ChatGPT’s responses in their discussions. I love the enthusiasm surrounding ChatGPT and the eagerness to learn about modern data architectures such as data lakehouses, data meshes, and data fabrics.
Monte Carlo
MARCH 4, 2024
Made For The Lakehouse: Iceberg and Delta There was a point in time not too long ago that data lake vs. data warehouse was a heated and important debate. Today, these concepts have largely merged into a lakehouse architecture. Most major data cloud providers support both use cases.
Cloudera
SEPTEMBER 15, 2022
Modern data platforms deliver an elastic, flexible, and cost-effective environment for analytic applications by leveraging a hybrid, multi-cloud architecture to support data fabric, data mesh, data lakehouse and, most recently, data observability. Luke: What is a modern data platform?
Expert insights. Personalized for you.
We have resent the email to
Are you sure you want to cancel your subscriptions?
Let's personalize your content