Remove Events Remove Hadoop Remove Kafka Remove Metadata
article thumbnail

Deployment of Exabyte-Backed Big Data Components

LinkedIn Engineering

Co-authors: Arjun Mohnot , Jenchang Ho , Anthony Quigley , Xing Lin , Anil Alluri , Michael Kuchenbecker LinkedIn operates one of the world’s largest Apache Hadoop big data clusters. Historically, deploying code changes to Hadoop big data clusters has been complex.

article thumbnail

What’s New in CDP Private Cloud Base 7.1.7?

Cloudera

Apache Ozone enhancements deliver full High Availability providing customers with enterprise-grade object storage and compatibility with Hadoop Compatible File System and S3 API. . Deep Dive 2: Atlas / Kafka integration. To enable the Atlas Hook, the Atlas service needs to be deployed on the Kafka cluster or the data context cluster.

Cloud 96
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Solving Data Lineage Tracking And Data Discovery At WeWork

Data Engineering Podcast

The solution to discoverability and tracking of data lineage is to incorporate a metadata repository into your data platform. The metadata repository serves as a data catalog and a means of reporting on the health and status of your datasets when it is properly integrated into the rest of your tools.

Metadata 100
article thumbnail

Data Engineering Annotated Monthly – May 2022

Big Data Tools

DataHub 0.8.36 – Metadata management is a big and complicated topic. On top of that, it’s a part of the Hadoop platform, which created additional work that we otherwise would not have had to do. DataHub is a completely independent product by LinkedIn, and the folks there definitely know what metadata is and how important it is.

article thumbnail

Data Engineering Annotated Monthly – May 2022

Big Data Tools

DataHub 0.8.36 – Metadata management is a big and complicated topic. On top of that, it’s a part of the Hadoop platform, which created additional work that we otherwise would not have had to do. DataHub is a completely independent product by LinkedIn, and the folks there definitely know what metadata is and how important it is.

article thumbnail

Generating and Viewing Lineage through Apache Ozone

Cloudera

This integration mechanism does not provide a direct Atlas Hook or Atlas Bridge option to listen to the entity events in Ozone. Using the Hadoop CLI. If you’re bringing your own, it’s as simple as creating the bucket in Ozone using the Hadoop CLI and putting the data you want there: hdfs dfs -mkdir ofs://ozone1/data/tpc/test.

Hadoop 104
article thumbnail

Operational Database Security – Part 2

Cloudera

Access audits are mastered centrally in Apache Ranger which provides comprehensive non-repudiable audit log for every access event to every resource with rich access event metadata such as: IP. Cloudera’s platform can support piping of audit data to HDFS, Kafka, Syslog or to SIEM systems for long-term retention and archival.