Remove Data Engineer Remove Data Schemas Remove Data Warehouse Remove Metadata
article thumbnail

Implementing Data Contracts in the Data Warehouse

Monte Carlo

In this article, Chad Sanderson , Head of Product, Data Platform , at Convoy and creator of Data Quality Camp , introduces a new application of data contracts: in your data warehouse. In the last couple of posts , I’ve focused on implementing data contracts in production services.

article thumbnail

Monte Carlo Announces Delta Lake, Unity Catalog Integrations To Bring End-to-End Data Observability to Databricks

Monte Carlo

Traditionally, data lakes held raw data in its native format and were known for their flexibility, speed, and open source ecosystem. By design, data was less structured with limited metadata and no ACID properties. Since then, Databricks has aggressively moved toward allowing users to add more structure to their data.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

AWS Glue-Unleashing the Power of Serverless ETL Effortlessly

ProjectPro

Another leading European company, Claranet, has adopted Glue to migrate their data load from their existing on-premise solution to the cloud. The popular data integration tool, AWS Glue, enables data analytics users to quickly acquire, analyze, migrate, and integrate data from multiple sources. How Does AWS Glue Work?

AWS 98
article thumbnail

Top Data Catalog Tools

Monte Carlo

A data catalog is a constantly updated inventory of the universe of data assets within an organization. It uses metadata to create a picture of the data, as well as the relationships between data assets of diverse sources, and the processing that takes place as data moves through systems.

article thumbnail

Hands-On Introduction to Delta Lake with (py)Spark

Towards Data Science

Before going into further details on Delta Lake, we need to remember the concept of Data Lake, so let’s travel through some history. The main player in the context of the first data lakes was Hadoop, a distributed file system, with MapReduce, a processing paradigm built over the idea of minimal data movement and high parallelism.

article thumbnail

Monte Carlo + Databricks Doubles Mutual Customer Count—and We’re Just Getting Started

Monte Carlo

Over the last several years, Databricks has given users the ability to add more structure to the data inside their data lake. Monte Carlo can automatically monitor and alert for data schema, volume, freshness, and distribution anomalies within the data lake environment.

article thumbnail

11 Ways To Stop Data Anomalies Dead In Their Tracks

Monte Carlo

Otherwise you may produce more data anomalies than you prevent. Data Contracts Image courtesy of Andrew Jones. You can think of data contracts as circuit breakers, but for data schemas instead of the data itself. Data SLAs You can’t improve what you don’t measure.

Food 52