Data Workflow, Google Cloud and Metadata

Data Workflow

Google Cloud

Metadata

Toward a Data Mesh (part 2) : Architecture & Technologies

François Nguyen

MARCH 22, 2021

TL;DR After setting up and organizing the teams, we are describing 4 topics to make data mesh a reality. How can we interoperate between the data domains ? How do we govern all these data products and domains ? It will be illustrated with our technical choices and the services we are using in the Google Cloud Platform.

Technology

Technology Architecture Google Cloud Metadata

Making The Total Cost Of Ownership For External Data Manageable With Crux

Data Engineering Podcast

JULY 17, 2022

Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code.

Data Management

Data Management Management Metadata MongoDB

Join 16,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Trending Sources

Data Engineering Zoomcamp – Data Ingestion (Week 2)

Hepta Analytics

FEBRUARY 14, 2022

Disadvantages of a data lake are: Can easily become a data swamp data has no versioning Same data with incompatible schemas is a problem without versioning Has no metadata associated It is difficult to join the data Data warehouse stores processed data, mostly structured data.

Data Ingestion

Data Ingestion Data Engineering Data Engineer Engineering

Webinars

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

The Good and the Bad of Apache Airflow Pipeline Orchestration

AltexSoft

NOVEMBER 7, 2022

DevOps tasks — for example, creating scheduled backups and restoring data from them. Airflow is especially useful for orchestrating Big Data workflows. Airflow is not a data processing tool by itself but rather an instrument to manage multiple components of data processing. Metadata database.

PostgreSQL

PostgreSQL Metadata Python MySQL

Big Data (Quality), Small Data Team: How Prefect Saved 20 Hours Per Week with Data Observability

Monte Carlo

SEPTEMBER 20, 2022

Here’s how Prefect , Series B startup and creator of the popular data orchestration tool, harnessed the power of data observability to preserve headcount, improve data quality and reduce time to detection and resolution for data incidents. This left Dylan’s team with a gap to fill.

Big Data

Big Data Data Warehouse Data Data Governance

Data Pipeline Architecture Explained: 6 Diagrams and Best Practices

Monte Carlo

JUNE 14, 2023

Apache Spark – Labeled as a unified analytics engine for large scale data processing, many leverage this open source solution for streaming use cases, often in conjunction with Databricks. Data orchestration Airflow : Airflow is the most common data orchestrator used by data teams.

Data Pipeline

Data Pipeline Architecture Data Lake Data Warehouse

The Good and the Bad of the Elasticsearch Search and Analytics Engine

AltexSoft

SEPTEMBER 21, 2023

Accessible via a unified API, these new features enhance search relevance and are available on Elastic Cloud. The Elastic Stacks Elasticsearch is integral within analytics stacks, collaborating seamlessly with other tools developed by Elastic to manage the entire data workflow — from ingestion to visualization.

Engineering

Engineering NoSQL Programming Language Java

The Top Data Strategy Influencers and Content Creators on LinkedIn

Databand.ai

DECEMBER 29, 2022

In his current role as Senior Director of Product Management at Google, he focuses on BigQuery, Cloud Dataflow, Cloud DataProc, Cloud DataPrep, Cloud PubSub, and Cloud Composer. She also posts frequently on LinkedIn about data analytics, data strategy, data governance, and data engineering.

BI Consulting Data Science Data Governance

Data Engineering Digest

Toward a Data Mesh (part 2) : Architecture & Technologies

Making The Total Cost Of Ownership For External Data Manageable With Crux

Webinars

Trending Sources

Data Engineering Zoomcamp – Data Ingestion (Week 2)

Webinars

The Good and the Bad of Apache Airflow Pipeline Orchestration

Big Data (Quality), Small Data Team: How Prefect Saved 20 Hours Per Week with Data Observability

Data Pipeline Architecture Explained: 6 Diagrams and Best Practices

The Good and the Bad of the Elasticsearch Search and Analytics Engine

The Top Data Strategy Influencers and Content Creators on LinkedIn

Stay Connected