Remove Data Pipeline Remove Data Validation Remove Engineering Remove Metadata
article thumbnail

Data News — Week 24.11

Christophe Blefari

Cognition AI introduced Devin — Devin is the first AI software engineer, Devin can, unassisted, do software engineering tasks like fixing Github issues (13% of success, previously best was ~5%), apply to jobs on Upwork, train and fine-tune its own models. Pandera, a data validation library for dataframes, now supports Polars.

Metadata 272
article thumbnail

Data Engineering Weekly #162

Data Engineering Weekly

Pradheep Arjunan - Shared insights on AZ's journey from on-prem to the cloud data warehouses. Google: Croissant- a metadata format for ML-ready datasets Google Research introduced Croissant, a new metadata format designed to make datasets ML-ready by standardizing the format, facilitating easier use in machine learning projects.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

An Engineering Guide to Data Quality - A Data Contract Perspective - Part 2

Data Engineering Weekly

I won’t bore you with the importance of data quality in the blog. Instead, Let’s examine the current data pipeline architecture and ask why data quality is expensive. Instead of looking at the implementation of the data quality frameworks, Let's examine the architectural patterns of the data pipeline.

article thumbnail

Build A Common Understanding Of Your Data Reliability Rules With Soda Core and Soda Checks Language

Data Engineering Podcast

Summary Regardless of how data is being used, it is critical that the information is trusted. The practice of data reliability engineering has gained momentum recently to address that question. Atlan is the metadata hub for your data ecosystem.

Building 100
article thumbnail

Data Engineering Weekly #105

Data Engineering Weekly

Data Engineering Weekly Is Brought to You by RudderStack RudderStack provides data pipelines that make it easy to collect data from every application, website, and SaaS platform, then activate it in your warehouse and business tools. Sign up free to test out the tool today.

article thumbnail

97 things every data engineer should know

Grouparoo

It was a fun experience and I think we made a good choice by picking 97 Things Every Data Engineer Should Know. This provided a nice overview of the breadth of topics that are relevant to data engineering including data warehouses/lakes, pipelines, metadata, security, compliance, quality, and working with other teams.

article thumbnail

DataOps Tools: Key Capabilities & 5 Tools You Must Know About

Databand.ai

Each type of tool plays a specific role in the DataOps process, helping organizations manage and optimize their data pipelines more effectively. Poor data quality can lead to incorrect or misleading insights, which can have significant consequences for an organization. In this article: Why Are DataOps Tools Important?