Skip to content

Data News — Week 23.42

Data News #23.42 — dbt Mesh and a new dbt alternative, a few fundraising, OpenAI crazy number, Meta banning Python ads, and more.

Christophe Blefari
Christophe Blefari
5 min read
white sheep on green grass field near body of water during daytime
Writing about dbt like a sheep (credits)

Hey, this week Coalesce—the dbt Labs annual conference—took place. During 3 days, people shared how they used dbt around the world. I'll, as usual, write a takeaway post after binge watching all keynotes, but this is for next week. Still dbt Labs announcements were mainly towards dbt Cloud with great features to drive adoption of the paid product.

They announced dbt Mesh a product enabling cross-project dependencies for teams with multiple dbt projects. In addition they also released an Explorer view that lets you navigate through all you project and see models, macros and more directly in one nice graph.

Does this mean that you have to use dbt Cloud to have a multi-project setup? No, you can activate multi-project collaboration with dbt Core. I've written a guide that helps you do it.

📺 On the content side I'll also present next week the Fancy Data Stack project at the Data Engineering And Machine Learning Summit 2023 organised by Seattle Data Guy. I'll be online on Thursday 26 at 5PM CEST. Add it to your calendar and sign up for the conference—the list of speakers is insane.

Data News is packed this week, take time to enjoy it, rainy times are coming, you can see it as a gift 🎁.

Enough dbt use lea 🥰

Max—the first Data News member 🤗—open-sourced carbonfact/lea this week. lea aims to be a minimalist alternative to dbt by fixing a few flaws that comes with dbt. You can even see the traditional Jaffle shop example done in lea.

What are the main differences?

  • You configure lea with env variables.
  • a lea prepare command that creates database objects that needs to be created (dataset, schema, etc.). Schema are interpreted from the folder structure (with DuckDB).
  • lea understand the views relationships, you don't need a ref. Jinja templating is still supported tho.
  • Tests are directly added in the SQL code at the column that is target. For instance if you need to test unicity on a column you add the @UNIQUE decorator. Singular tests are still supported.
  • lea generates documentation as Markdown in the workdir.
  • Other cool features: lea teardown delete database objects, lea diff shows table schema differences and you can write Python model as long as they return a DataFrame.

Max also wrote a nice post about data downstream issues—which is the main problem leading the data contracts space: Sh*t flows downhill, but not at Carbonfact. You should read it because it gives another perspective of solution to fix it.

Gen AI 🤖

  • Can you run it? — There is a HuggingFace app that tells you by taking your specs what you need to run a LLM model for inference or training.
  • 25 million Creative Commons image dataset released — Fondant, an open-source processing framework, released publicly available images from web crawling with their associated license.
  • New Vertex AI Feature Store — GCP Vertex AI is the place to do "serverless" AI. This is awesome to see this directly integrated within BigQuery as it obviously brings simplicity. In public preview.

Fast News ⚡️

panda bear on green grass during daytime
Pandas appreciation post (credits)

Engineering stuff

Data Economy 💰


See you next week ❤️.

Data News

Data Explorer

The hub to explore Data News links

Search and bookmark more than 2500 links

Explore

Christophe Blefari

Staff Data Engineer. I like 🚲, 🪴 and 🎮. I can do everything with data, just ask.

Comments


Related Posts

Members Public

Data News — Week 24.16

Data News #24.16 — Llama the Third, Mistral probable $5B valuation, structured Gen AI, principal engineers, big data scale to count billions and benchmarks.

Members Public

Data News — Week 24.15

Data News #24.15 — MDSFest quick recap, LLM news, Airbnb Chronon, AST, Beam YAML, WAP and more.