article thumbnail

How to get started with dbt

Christophe Blefari

This switch has been lead by modern data stack vision. In terms of paradigms before 2012 we were doing ETL because storage was expensive, so it became a requirement to transform data before the data storage—mainly a data warehouse, to have the most optimised data for querying.

article thumbnail

Data Warehouse vs Data Lake vs Data Lakehouse: Definitions, Similarities, and Differences

Monte Carlo

That’s why it’s essential for teams to choose the right architecture for the storage layer of their data stack. But, the options for data storage are evolving quickly. So let’s get to the bottom of the big question: what kind of data storage layer will provide the strongest foundation for your data platform?

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

AltexSoft

In 2010, a transformative concept took root in the realm of data storage and analytics — a data lake. The term was coined by James Dixon , Back-End Java, Data, and Business Intelligence Engineer, and it started a new era in how organizations could store, manage, and analyze their data. Raw data store section.

article thumbnail

Data Vault Architecture, Data Quality Challenges, And How To Solve Them

Monte Carlo

For those unfamiliar, data vault is a data warehouse modeling methodology created by Dan Linstedt (you may be familiar with Kimball or Imon models ) created in 2000 and updated in 2013. Data vault collects and organizes raw data as underlying structure to act as the source to feed Kimball or Inmon dimensional models.

article thumbnail

Solving Data Lineage Tracking And Data Discovery At WeWork

Data Engineering Podcast

The solution to discoverability and tracking of data lineage is to incorporate a metadata repository into your data platform. The metadata repository serves as a data catalog and a means of reporting on the health and status of your datasets when it is properly integrated into the rest of your tools.

Metadata 100
article thumbnail

Iceberg, Right Ahead! 7 Apache Iceberg Best Practices for Smooth Data Sailing

Monte Carlo

It’s designed to improve upon the performance and usability challenges of older data storage formats such as Apache Hive and Apache Parquet. For example, Monte Carlo can monitor Apache Iceberg tables for data quality incidents, where other data observability platforms may be more limited.

article thumbnail

Ready or Not. The Post Modern Data Stack Is Coming.

Monte Carlo

Zero-ETL What it is : A misnomer for one thing; the data pipeline still exists. Today, data is often generated by a service and written into a transactional database. An automatic pipeline is deployed which not only moves the raw data to the analytical data warehouse, but modifies it slightly along the way.