Remove managing-surrogate-keys
article thumbnail

Surrogate keys in dbt: Integers or hashes?

dbt Developer Hub

Those who have been building data warehouses for a long time have undoubtedly encountered the challenge of building surrogate keys on their data models. How were surrogate keys managed in the past? ​ How were surrogate keys managed in the past? How can you do this with dbt? Let’s dive in.

article thumbnail

Data Engineering Weekly #123

Data Engineering Weekly

The blog discusses implementing Type-2 SCD modeling and strategies to generate surrogate keys and bridge tables to handle many-to-many relationships. Try it free Netflix: Building a Media Understanding Platform for ML Innovations Netflix recently wrote a series of blogs about its media ML platform.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

The Rise of the Data Engineer

Maxime Beauchemin

In larger environments, there tends to be specialization and the creation of a formal role to manage this workload, as the need for a data infrastructure team grows. It’s also fairly common for engineers to develop and manage their own job orchestrator/scheduler.

article thumbnail

Dynamic Tables for Data Vault

Snowflake

We covered this in depth in a previous blog post. For that second bullet point, we infer the matching load dates and keys by using the Window LAG function with the “IGNORE NULLS” option, like so: , coalesce(s1.dv_loaddate Return the matched load date and surrogate key from the adjacent satellite table for a snapshot date.

SQL 68
article thumbnail

Functional Data Engineering — a modern paradigm for batch data processing

Maxime Beauchemin

Eliminating side effects , i.e., changes in state that do not depend on the function inputs, can make it much easier to understand and predict the behavior of a program, which is one of the key motivations for the development of functional programming. To put it simply, immutable data along with versioned logic are key to reproducibility.

article thumbnail

A Deep Dive into the Power and Principles of Data Vault Modeling

RandomTrees

Auditability and traceability become more easier here facilitating business key usages and can accommodate almost a variety of data formats and standards also keeping in mind to make it easy for data profiling and cleansing methods. The dimension table usually contains multiple columns and among them it would also have a primary key column.