article thumbnail

Build an Open Data Lakehouse with Iceberg Tables, Now in Public Preview

Snowflake

With this public preview, those external catalog options are either “GLUE”, where Snowflake can retrieve table metadata snapshots from AWS Glue Data Catalog, or “OBJECT_STORE”, where Snowflake retrieves metadata snapshots directly from the specified cloud storage location. Now, Snowflake can make changes to the table.

article thumbnail

Cost Conscious Data Warehousing with Cloudera Data Platform

Cloudera

With the separation of compute and storage, CDW engines leverage newer techniques such as compute-only scaling and efficient caching of shared data. These techniques range from distributing concurrent query for overall throughput to metadata caching, data caching, and results caching. 2,300 / month for the cloud hardware costs.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Migrate Hive data from CDH to CDP public cloud

Cloudera

Replication Manager can be used to migrate Apache Hive, Apache Impala, and HDFS objects from CDH clusters to CDP Public Cloud clusters. This blog post outlines detailed step by step instructions to perform Hive Replication from an on-prem CDH cluster to a CDP Public Cloud Data Lake. This blog post is not a substitute for that.

Cloud 68
article thumbnail

Cloudera Data Warehouse outperforms Azure HDInsight in TPC-DS benchmark

Cloudera

Performance is one of the key, if not the most important deciding criterion, in choosing a Cloud Data Warehouse service. In this blog post, we compare Cloudera Data Warehouse (CDW) on Cloudera Data Platform (CDP) using Apache Hive-LLAP to Microsoft HDInsight (also powered by Apache Hive-LLAP) on Azure using the TPC-DS 2.9

article thumbnail

Discover and Explore Data Faster with the CDP DDE Template

Cloudera

YARN allows you to use various data processing engines for batch, interactive, and real-time stream processing of data stored in HDFS or cloud storage like S3 and ADLS. Coordinates distribution of data and metadata, also known as shards. For the examples presented in this blog, we assume you have a CDP account already.

article thumbnail

Data Engineering Annotated Monthly – May 2022

Big Data Tools

DataHub 0.8.36 – Metadata management is a big and complicated topic. DataHub is a completely independent product by LinkedIn, and the folks there definitely know what metadata is and how important it is. If you haven’t found your perfect metadata management system just yet, maybe it’s time to try DataHub!

article thumbnail

Data Engineering Annotated Monthly – May 2022

Big Data Tools

DataHub 0.8.36 – Metadata management is a big and complicated topic. DataHub is a completely independent product by LinkedIn, and the folks there definitely know what metadata is and how important it is. If you haven’t found your perfect metadata management system just yet, maybe it’s time to try DataHub!