Remove Blog Remove Cloud Storage Remove Hadoop Remove Metadata
article thumbnail

Migrate Hive data from CDH to CDP public cloud

Cloudera

Replication Manager can be used to migrate Apache Hive, Apache Impala, and HDFS objects from CDH clusters to CDP Public Cloud clusters. This blog post outlines detailed step by step instructions to perform Hive Replication from an on-prem CDH cluster to a CDP Public Cloud Data Lake. Hadoop SQL Policies overview.

Cloud 69
article thumbnail

Data Architect: Role Description, Skills, Certifications and When to Hire

AltexSoft

It serves as a foundation for the entire data management strategy and consists of multiple components including data pipelines; , on-premises and cloud storage facilities – data lakes , data warehouses , data hubs ;, data streaming and Big Data analytics solutions ( Hadoop , Spark , Kafka , etc.);

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Discover and Explore Data Faster with the CDP DDE Template

Cloudera

YARN allows you to use various data processing engines for batch, interactive, and real-time stream processing of data stored in HDFS or cloud storage like S3 and ADLS. Coordinates distribution of data and metadata, also known as shards. For the examples presented in this blog, we assume you have a CDP account already.

article thumbnail

Data Engineering Annotated Monthly – May 2022

Big Data Tools

DataHub 0.8.36 – Metadata management is a big and complicated topic. On top of that, it’s a part of the Hadoop platform, which created additional work that we otherwise would not have had to do. DataHub is a completely independent product by LinkedIn, and the folks there definitely know what metadata is and how important it is.

article thumbnail

Data Engineering Annotated Monthly – May 2022

Big Data Tools

DataHub 0.8.36 – Metadata management is a big and complicated topic. On top of that, it’s a part of the Hadoop platform, which created additional work that we otherwise would not have had to do. DataHub is a completely independent product by LinkedIn, and the folks there definitely know what metadata is and how important it is.

article thumbnail

HDFS Data Encryption at Rest on Cloudera Data Platform

Cloudera

To prevent the management of these keys (which can run in the millions) from becoming a performance bottleneck, the encryption key itself is stored in the file metadata. Each file will have an EDEK which is stored in the file’s metadata. hdfs dfs -cat” on the file triggers a hadoop KMS API call to validate the “DECRYPT” access.

MySQL 69
article thumbnail

The Good and the Bad of Apache Kafka Streaming Platform

AltexSoft

popular SQL and NoSQL database management systems including Oracle, SQL Server, Postgres, MySQL, MongoDB, Cassandra, and more; cloud storage services — Amazon S3, Azure Blob, and Google Cloud Storage; message brokers such as ActiveMQ, IBM MQ, and RabbitMQ; Big Data processing systems like Hadoop ; and.

Kafka 93