Data Engineering Digest

Taking Charge of Tables: Introducing OpenHouse for Big Data Management

LinkedIn Engineering

JULY 19, 2023

Open source data lakehouse deployments are built on the foundations of compute engines (like Apache Spark, Trino, Apache Flink), distributed storage (HDFS, cloud blob stores), and metadata catalogs / table formats (like Apache Iceberg, Delta, Hudi, Apache Hive Metastore).

Big Data

Big Data Data Management Management Metadata

Data Warehouse vs Data Lake vs Data Lakehouse: Definitions, Similarities, and Differences

Monte Carlo

AUGUST 25, 2023

Compute tasks might run on Apache Pig, Hive, Presto, or Spark. Commonly, you’ll find data formats such as JSON, Apache Parquet, and Apache Avro in these environments. And data lakes can support sophisticated non-SQL programming models, such as Apache Hadoop, Apache Spark, PySpark, and other frameworks.

Data Lake

Data Lake Data Warehouse Unstructured Data Raw Data

Data Engineering Digest

Taking Charge of Tables: Introducing OpenHouse for Big Data Management

Data Warehouse vs Data Lake vs Data Lakehouse: Definitions, Similarities, and Differences

Webinars

Stay Connected