Remove apache-hudi acid-file-formats-writing-apache-hudi read
article thumbnail

Taking Charge of Tables: Introducing OpenHouse for Big Data Management

LinkedIn Engineering

Open source data lakehouse deployments are built on the foundations of compute engines (like Apache Spark, Trino, Apache Flink), distributed storage (HDFS, cloud blob stores), and metadata catalogs / table formats (like Apache Iceberg, Delta, Hudi, Apache Hive Metastore).

article thumbnail

Data Warehouse vs Data Lake vs Data Lakehouse: Definitions, Similarities, and Differences

Monte Carlo

Compute tasks might run on Apache Pig, Hive, Presto, or Spark. Commonly, you’ll find data formats such as JSON, Apache Parquet, and Apache Avro in these environments. And data lakes can support sophisticated non-SQL programming models, such as Apache Hadoop, Apache Spark, PySpark, and other frameworks.