article thumbnail

Reflections On Designing A Data Platform From Scratch

Data Engineering Podcast

Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. I’m your host, Tobias Macey, and today I’m sharing the approach that I’m taking while designing a data platform Interview Introduction How did you get involved in the area of data management?

Designing 100
article thumbnail

On-Premise vs Cloud: Where Does the Future of Data Storage Lie?

Monte Carlo

Regardless, the important thing to understand is that the modern data stack doesn’t just allow you to store and process bigger data faster, it allows you to handle data fundamentally differently to accomplish new goals and extract different types of value. It’s just a matter of picking a flavor.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

How to get started with dbt

Christophe Blefari

This switch has been lead by modern data stack vision. In terms of paradigms before 2012 we were doing ETL because storage was expensive, so it became a requirement to transform data before the data storage—mainly a data warehouse, to have the most optimised data for querying.

article thumbnail

The Evolution of Table Formats

Monte Carlo

At its core, a table format is a sophisticated metadata layer that defines, organizes, and interprets multiple underlying data files. Table formats incorporate aspects like columns, rows, data types, and relationships, but can also include information about the structure of the data itself.

article thumbnail

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

phData: Data Engineering

A data lake is a centralized repository containing extensive storage for raw, unfiltered data coming into a company’s data storage system. This data can be structured, semi-structured, or unstructured and comes from various sources such as databases, IoT devices, log files, etc.

article thumbnail

Data Engineering Weekly #164

Data Engineering Weekly

The logging engine to debug AI workflow logs is an excellent system design study if you’re interested in it. The APIs support emitting unstructured log lines and typed metadata key-value pairs (per line). The extracted key-value pairs are written to the line’s metadata.

article thumbnail

Taking Charge of Tables: Introducing OpenHouse for Big Data Management

LinkedIn Engineering

Open source data lakehouse deployments are built on the foundations of compute engines (like Apache Spark, Trino, Apache Flink), distributed storage (HDFS, cloud blob stores), and metadata catalogs / table formats (like Apache Iceberg, Delta, Hudi, Apache Hive Metastore). Tables are governed as per agreed upon company standards.