article thumbnail

Data governance beyond SDX: Adding third party assets to Apache Atlas

Cloudera

In this blog, we’ll highlight the key CDP aspects that provide data governance and lineage and show how they can be extended to incorporate metadata for non-CDP systems from across the enterprise. Sketch of the end-to-end data pipeline. Apache Atlas as a fundamental part of SDX. Assets: Files. RDBMS Database Table.

article thumbnail

Data Migration Strategies For Large Scale Systems

Data Engineering Podcast

Starburst Logo]([link] This episode is brought to you by Starburst - an end-to-end data lakehouse platform for data engineers who are battling to build and scale high quality data pipelines on the data lake. Want to see Starburst in action? Want to see Starburst in action?

Systems 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Data Testing Tools: Key Capabilities and 6 Tools You Should Know

Databand.ai

Data profiling tools: Profiling plays a crucial role in understanding your dataset’s structure and content. This is part of a series of articles about data quality. In this article: Why Are Data Testing Tools Important?

article thumbnail

GPT-based data engineering accelerators

RandomTrees

GPT-Based Data Engineering Accelerators: Given below is the list of some of the GPT-based data engineering accelerators. 1. DataGPT OpenAI developed DataGpt for performing data engineering tasks. Datagpt creates code for data pipelines and transformations. Its technology is based on transformer architecture.

article thumbnail

Toward a Data Mesh (part 2) : Architecture & Technologies

François Nguyen

TL;DR After setting up and organizing the teams, we are describing 4 topics to make data mesh a reality. How do we build data products ? How can we interoperate between the data domains ? We want interoperability for any data stored versus we have to think how to store the data in a specific node to optimize the processing.

article thumbnail

Who Is Responsible For Data Quality? 5 Different Answers From Real Data Teams

Monte Carlo

This post will focus on the most common team ownership models including: data engineering, data reliability engineering, analytics engineering, data quality analysts, and data governance teams. Table of Contents Why is important to answer who is responsible for data quality?

article thumbnail

Data testing tools: Key capabilities you should know

Databand.ai

Data profiling tools: Profiling plays a crucial role in understanding your dataset’s structure and content. This is part of a series of articles about data quality. In this article: Why are data testing tools important? In this article: Why are data testing tools important?