Remove apache-airflow-pros-cons
article thumbnail

An Engineering Guide to Data Quality - A Data Contract Perspective - Part 2

Data Engineering Weekly

In the first part of this series, we talked about design patterns for data creation and the pros & cons of each system from the data contract perspective. I won’t bore you with the importance of data quality in the blog. Why is Data Quality Expensive? In the Iceberg case, it is a simple two-step config change.

article thumbnail

Data Engineering Weekly #123

Data Engineering Weekly

link] Uber: Setting Uber’s Transactional Data Lake in Motion with Incremental ETL Using Apache Hudi Uber writes a comprehensive guide on running incremental ETL using Apache Hudi. The blog discusses implementing Type-2 SCD modeling and strategies to generate surrogate keys and bridge tables to handle many-to-many relationships.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Is dbt a Good Tool for Implementing Data Models?

phData: Data Engineering

For data engineers who are more comfortable and familiar with Apache Spark, they may favor using Snowpark. If a data engineer needs to have more control over the orchestration of their data pipelines and models, they may leverage running queries from Apache Airflow.

article thumbnail

The Good and the Bad of Apache Airflow Pipeline Orchestration

AltexSoft

But apparently, things were much more difficult before Apache Airflow appeared. This article covers Airflow’s pros and gives a clue why, despite all its virtues, it’s not a silver bullet. This article covers Airflow’s pros and gives a clue why, despite all its virtues, it’s not a silver bullet.

article thumbnail

Data Engineering Weekly #112

Data Engineering Weekly

The author writes an exciting blog, Modern data stack in a Box!! link] Data Engineering Central: Why is everyone trying to kill Airflow? Airflow is probably one of the Top 5 breakthrough data technology in the last ten years. Looking at the test results, Polars implementation performs much better than Apache Spark.

article thumbnail

Data Engineering Weekly #115

Data Engineering Weekly

Editor’s Note: Update on our blog series One of the promises I made toward the end of 2022 is to publish more of my thoughts and industry observation of data engineering trends. Data Catalog - A broken promise A classic blog triggers a few conversations about Data Catalog and its future. Two key lessons out of the blog.

article thumbnail

DataOps: What Is It, Core Principles, and Tools For Implementation

phData: Data Engineering

These are easier to solve as the pros and cons are much simpler to calculate. Now part of the Apache Foundation, it originally was developed by CollabNet, Inc. Challenges with Source Control Management While the pros heavily outweigh the cons, it is important to talk about the challenges associated with version control.

IT 52