Remove Blog Remove Hadoop Remove Kafka Remove Metadata
article thumbnail

How to learn data engineering

Christophe Blefari

Hadoop initially led the way with Big Data and distributed computing on-premise to finally land on Modern Data Stack — in the cloud — with a data warehouse at the center. In order to understand today's data engineering I think that this is important to at least know Hadoop concepts and context and computer science basics.

article thumbnail

What’s New in CDP Private Cloud Base 7.1.7?

Cloudera

Apache Ozone enhancements deliver full High Availability providing customers with enterprise-grade object storage and compatibility with Hadoop Compatible File System and S3 API. . We expand on this feature later in this blog. Deep Dive 2: Atlas / Kafka integration. This will expose newly created Kafka topics to Atlas.

Cloud 96
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Deployment of Exabyte-Backed Big Data Components

LinkedIn Engineering

Co-authors: Arjun Mohnot , Jenchang Ho , Anthony Quigley , Xing Lin , Anil Alluri , Michael Kuchenbecker LinkedIn operates one of the world’s largest Apache Hadoop big data clusters. Historically, deploying code changes to Hadoop big data clusters has been complex.

article thumbnail

A Reference Architecture for the Cloudera Private Cloud Base Data Platform

Cloudera

This blog post provides an overview of best practice for the design and deployment of clusters incorporating hardware and operating system configuration, along with guidance for networking and security as well as integration with existing enterprise infrastructure. Introduction and Rationale. Networking .

article thumbnail

Generating and Viewing Lineage through Apache Ozone

Cloudera

Using the Hadoop CLI. If you’re bringing your own, it’s as simple as creating the bucket in Ozone using the Hadoop CLI and putting the data you want there: hdfs dfs -mkdir ofs://ozone1/data/tpc/test. Then you can import Kafka lineage using the Atlas Kafka import tool provided with CDP. ozone sh bucket list /data.

Hadoop 104
article thumbnail

Data Engineering Annotated Monthly – May 2022

Big Data Tools

DataHub 0.8.36 – Metadata management is a big and complicated topic. On top of that, it’s a part of the Hadoop platform, which created additional work that we otherwise would not have had to do. DataHub is a completely independent product by LinkedIn, and the folks there definitely know what metadata is and how important it is.

article thumbnail

Data Engineering Annotated Monthly – May 2022

Big Data Tools

DataHub 0.8.36 – Metadata management is a big and complicated topic. On top of that, it’s a part of the Hadoop platform, which created additional work that we otherwise would not have had to do. DataHub is a completely independent product by LinkedIn, and the folks there definitely know what metadata is and how important it is.