How to design a dbt model from scratch

A simple framework for building dbt models that actually get used.

Taylor Brownlow
Towards Data Science

--

When I was researching the Ultimate Guide to dbt, I was shocked by the lack of material around actually building models from scratch. Not the exact steps to take in the tool — that is all covered in innumerable blogs and tutorials. I mean how do you know the right design? How do you make sure your stakeholders will use that model? How can you make sure it will be trusted and understood?

When we deploy new models without taking these steps, there can be significant consequences:

  • we face a deluge of questions and follow-up requests from stakeholders
  • we get suggestions for code improvements from other Data Engineers or Analytics Engineers
  • we have to go back and add in the new features, make the improvements, and answer all the questions before our work is done

If we repeat this process over and over trust between data and business teams begins to deteriorate as each side gets progressively more exhausted from this feedback frenzy, something that can be very challenging to build back up.

This underscores the importance of thinking carefully about how we design models, not just on our own in dbt, but collectively with all of our stakeholders, to make sure the model is accurate and effective, and we don’t waste our time building each model 4–5 times before its useful.

This article is the result of research and experiments into how best to design and implement a dbt model. It won’t have any commands to execute in dbt, but it will talk through how to think about your model, and how to structure your workflow to make sure you’re not wasting your time.

Photo by Med Badr Chemmaoui on Unsplash

A different approach

Lucky for me, I’m not the first to think about this problem. Many other fields have faced similar challenges and have created their own frameworks and processes that I can leverage when thinking about how to approach data modeling. For example:

Agile principles discourage software engineers from a waterfall development approach which is antithetical to an environment of rapidly-changing requirements [1]. Instead, Agile embraces rapid iteration and acknowledges the competitive advantage of being able to respond to changing requirements quickly.

Design principles similarly acknowledge the need to be deliberate about how you work with multiple stakeholders on a design project [2]. The framework prioritizes people and encourages feedback and each stage of development so the best solution can be found as quickly as possible.

Image Source: This work by the Design Council is licensed under a CC BY 4.0 license.

Even the data modeling godfather Ralph Kimball nods to the importance of getting quality input from stakeholders early in a modeling process in his 4-step process to data modeling [3]. Step 1 of which is to go learn as much about the business process as you can before you even think about building a model.

However, the most influential source I found when thinking about this problem was the System Engineering Heuristics — a set of truisms about working on a complex problem with many stakeholders [4]:

  • Don’t assume that the original statement of the problem is necessarily the best, or even the right one.
  • In the early stages of a project, unknowns are a bigger issue than known problems.
  • Model before build, wherever possible.
  • Most of the serious mistakes are made early on.

These sources helped shape the following process for designing data models from scratch.

A data modeling design process

And so I wanted to build a process that was true to those principles, that was repeatable, and that would actually make sure my models were built well the first time.

Here’s what I came up with:

We’ll walk through each step in more detail below.

The following examples will show screenshots from count.co, a data canvas, where I am Head of Product. It’s important to note, however, this process is tool-agnostic. You can follow along with the example in the screenshots here.

Image by author. All 5 steps of the process in action. See the full canvas here.

Step 1: Discover

Objective: Understand the business process you are modeling.

Players: You, Business stakeholders

Activities:

  • Map out the business process
  • Identify what the stakeholder wants to do with the final table (e.g. which metrics they need to calculate, which filters they need to add, etc.)
  • Understand how they are doing this today (if they are). What is wrong with that solution?
  • Who else will use this? Any secondary stakeholders you should talk to as well?
  • Any other relevant business context? e.g. someone has a big presentation on this topic next week
Image by author. Click here to see the example.

Step 2: Design (& iterate!)

Objective: Map out possible approaches to building your model

Players: You

Activities:

  • Map out the final table
  • what granularity will you choose?
  • which columns will you include?
  • If there are multiple options for a final table design, map them out and get feedback from stakeholders before continuing
Image by author. Click here to see the example.

Step 3: Prototype (& iterate!)

Objective: Map out how you’re going to get to the agreed-upon final table. Include code and explanation.

Players: You, members of the data team, stakeholders

Activities:

  • map out each step of the model, including code and results at each stage
  • make sure another member of the data team reviews your code
  • go through logic with stakeholders to make sure it aligns with their expectations and context
  • validate results of the prototyped model
  • iterate until both business stakeholders and the data team understand and approve the approach and results
Image by author. Click here to see the example.

Step 4: Deploy

Objective: Deploy model in dbt

Players: You

Activities:

  • take the final prototyped code and deploy it into your dbt codebase
  • makes sure it passes all tests
Image by author. This is how you export the prototyped SQL to dbt-friendly syntax.

Step 5: Deliver

Objective: Let stakeholders know the table is now available and how to interact with the table

Players: You, Business Stakeholders

Activities:

  • create documentation (either in dbt or elsewhere)
  • send link to table and docs to stakeholders
  • [optional] Create sample analysis with the table so they have a starting point
  • [optional] Hold a quick session for anyone who wants an introduction to the new table
Image by author. Click here to see the full example.

Next Steps

Try this process the next time you start building a dbt model from scratch. It will be a big change for both you and your stakeholders, but it has proven to significantly decrease the time taken to deploy new models and improve the overall uptake of those models.

The simple act of bringing more people into your data modeling process, and demonstrating transparency helps to promote trust and deliver valuable data models quickly.

And if you do give this a spin, please drop me a comment and let me know how it went and any ideas for improvements! These things have to be constantly iterated after all…

Resources

[1] Agile Manifesto. (2001). Principles behind the Agile Manifesto. Retrieved July 1, 2023, from https://agilemanifesto.org/principles.html

[2] Design Council. (2004). Framework for Innovation. Retrieved July 1, 2023, from https://www.designcouncil.org.uk/our-resources/framework-for-innovation/

[3] Holistics. Kimball’s Dimensional Data Modeling. Retrieved July 1, 2023, from https://www.holistics.io/books/setup-analytics/kimball-s-dimensional-data-modeling/

[4] Peter Brook. “Systems Enegineering Heuristics.” in SEBoK Editorial Board. 2023. The Guide to the Systems Engineering Body of Knowledge (SEBoK), v. 2.8, R.J. Cloutier (Editor in Chief). Hoboken, NJ: The Trustees of the Stevens Institute of Technology. Accessed [DATE]. www.sebokwiki.org. BKCASE is managed and maintained by the Stevens Institute of Technology Systems Engineering Research Center, the International Council on Systems Engineering, and the Institute of Electrical and Electronics Engineers Systems Council.

--

--