Simon Späti

article thumbnail

Data Orchestration Trends: The Shift From Data Pipelines to Data Products

Simon Späti

Data consumers, such as data analysts, and business users, care mostly about the production of data assets. On the other hand, data engineers have historically focused on modeling the dependencies between tasks (instead of data assets) with an orchestrator tool. How can we reconcile both worlds? This article reviews open-source data orchestration tools (Airflow, Prefect, Dagster) and discusses how data orchestration tools introduce data assets as first-class objects.

article thumbnail

Data Lake / Lakehouse Guide: Powered by Data Lake Table Formats (Delta Lake, Iceberg, Hudi)

Simon Späti

Image by Rachel Claire on Pexels Ever wanted or been asked to build an open-source Data Lake offloading data for analytics? Asked yourself what components and features would that include. Didn’t know the difference between a Data Lakehouse and a Data Warehouse? Or you just wanted to govern your hundreds to thousands of files and have more database-like features but don’t know how?

Data Lake 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

The Rise of the Semantic Layer

Simon Späti

A semantic layer is something we use every day. We build dashboards with yearly and monthly aggregations. We design dimensions for drilling down reports by region, product, or whatever metrics we are interested in. What has changed is that we no longer use a singular business intelligence tool; different teams use different visualizations (BI, notebooks, and embedded analytics).

BI 130
article thumbnail

Why Vim Is More than Just an Editor – Vim Language, Motions, and Modes Explained

Simon Späti

Throughout my time as a developer, I’ve used VS Code, Sublime, Notepad++, TextMate, and others. But shortcuts like cmd(+shift)+end and jumping with option+arrow-keys from word to word needed to be faster at some point. I was hitting my limits. Everything I was doing I did decently fast, but I didn’t get any faster. I’ve since learned that Vim is the only editor that you get faster using with time.

Coding 130
article thumbnail

Modern Data Stack: The Struggle of Enterprise Adoption

Simon Späti

In part I, The Open Data Stack Distilled into Four Core Tools, we discussed how to quickly set up a data stack, tackling end-to-end data analytics challenges. As a manager or developer working with data at a mid- to large-sized enterprise, you might ask why aren’t we using any of these tools. In this article, we dive into what mid-to-large-sized companies are using instead, the struggle of setting up a Modern Data Stack (MDS) for an enterprise size, and the opportunities of a free-of-charge and

article thumbnail

Data Modeling – The Unsung Hero of Data Engineering: An Introduction to Data Modeling (Part 1)

Simon Späti

Amidst the excitement and hype surrounding artificial intelligence, the significance of data engineering and its critical foundation—data modeling—can often be overlooked. This article is the first in a three-part series that will shine a spotlight on the fascinating world of data modeling, delving into its crucial importance within the broader context of data engineering.

article thumbnail

Data Modeling – The Unsung Hero of Data Engineering: Modeling Approaches and Techniques (Part 2)

Simon Späti

In case you missed Part 1, An Introduction to Data Modeling, make sure to check first, where we discussed the importance of data modeling in data engineering, the history, and the increasing complexity of data. We have also touched upon the significance of understanding the data landscape, its challenges, and much more. As we delve deeper into this topic, Part 2 will focus on data modeling approaches and techniques.