Blog, Data Ingestion, Data Storage and Metadata

Blog

Data Ingestion

Data Storage

Metadata

DataOps Architecture: 5 Key Components and How to Get Started

Databand.ai

AUGUST 30, 2023

DataOps is a collaborative approach to data management that combines the agility of DevOps with the power of data analytics. It aims to streamline data ingestion, processing, and analytics by automating and integrating various data workflows. As a result, they can be slow, inefficient, and prone to errors.

Architecture

Architecture Data Ingestion Data Governance Data Cleanse

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

phData: Data Engineering

SEPTEMBER 19, 2023

With many data modeling methodologies and processes available, choosing the right approach can be daunting. This blog will guide you through the best data modeling methodologies and processes for your data lake, helping you make informed decisions and optimize your data management practices. What is a Data Lake?

Data Lake

Data Lake Process Metadata Data Warehouse

Join 16,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Trending Sources

How to learn data engineering

Christophe Blefari

JANUARY 20, 2024

formats — This is a huge part of data engineering. Picking the right format for your data storage. The main difference between both is the fact that your computation resides in your warehouse with SQL rather than outside with a programming language loading data in memory. workflows (Airflow, Prefect, Dagster, etc.)

Data Engineering

Data Engineering Data Engineer Engineering Hadoop

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Accelerate your Data Migration to Snowflake

RandomTrees

SEPTEMBER 6, 2020

The architecture is three layered: Database Storage: Snowflake has a mechanism to reorganize the data into its internal optimized, compressed and columnar format and stores this optimized data in cloud storage. The data objects are accessible only through SQL query operations run using Snowflake.

Cloud Storage

Cloud Storage Data Ingestion Data Cleanse Data Warehouse

Building Netflix’s Distributed Tracing Infrastructure

Netflix Tech

OCTOBER 19, 2020

In our previous blog post we introduced Edgar, our troubleshooting tool for streaming sessions. We could also get contextual information about the streaming session by joining relevant traces with account metadata and service logs. The high data ingestion rate eventually degraded both read and write operations.

Building

Building Transportation Metadata Java

Accenture’s Smart Data Transition Toolkit Now Available for Cloudera Data Platform

Cloudera

AUGUST 31, 2021

While this “data tsunami” may pose a new set of challenges, it also opens up opportunities for a wide variety of high value business intelligence (BI) and other analytics use cases that most companies are eager to deploy. . Traditional data warehouse vendors may have maturity in data storage, modeling, and high-performance analysis.

Data Warehouse

Data Warehouse Database-centric Metadata Cloud

Data Pipeline Observability: A Model For Data Engineers

Databand.ai

JUNE 28, 2023

Data observability works with your data pipeline by providing insights into how your data flows and is processed from start to end. Here is a more detailed explanation of how data observability works within the data pipeline: Data ingestion : Observability begins from the point where data is ingested into the pipeline.

Data Pipeline

Data Pipeline Data Engineering Data Engineer Engineering

Azure Data Engineer (DP-203) Certification Cost in 2023

Knowledge Hut

SEPTEMBER 29, 2023

Moreover, what benefits can you expect from a career in Azure Data Engineering? This blog aims to answer these questions, providing a straightforward and professional insight into the world of Azure Data Engineering. Join us on this journey through the exciting realm of Azure Data Engineering.

Certification

Certification Data Engineering Data Engineer Engineering

AWS Glue-Unleashing the Power of Serverless ETL Effortlessly

ProjectPro

FEBRUARY 8, 2023

Do ETL and data integration activities seem complex to you? Read this blog to understand everything about AWS Glue that makes it one of the most popular data integration solutions in the industry. Did you know the global big data market will likely reach $268.4 Businesses are leveraging big data now more than ever.

AWS

AWS Scala Metadata Data Lake

Costwiz: Saving cost for LinkedIn enterprise on Azure

LinkedIn Engineering

JULY 27, 2023

Costwiz provides a unified experience that helps leaders drive more accurate forecasting of Azure budgets at LinkedIn with resource ownership detection, accountability, expedited remedies, and holistic data visibility (via custom dashboards). ETL processes must determine where to pick up the next batch of data.

Metadata

Metadata Utilities Cloud Data Lake

New Snowflake Features Released in April 2023

Snowflake

MAY 22, 2023

Cross-Cloud Snowgrid Account Replication expands replication beyond databases – general availability Account Replication, now generally available, expands replication beyond databases to account metadata and integrations, making business continuity truly turnkey. Read our announcement blog post for more.

Healthcare

Healthcare Scala Medical Transportation

How Rockset Separates Compute and Storage Using RocksDB

Rockset

JUNE 6, 2023

Real-time systems such as Elasticsearch were designed to work off of directly attached storage to allow for fast access in the face of real-time updates. In this blog, we’ll walk through how Rockset provides compute-storage separation while making real-time data available to queries.

Metadata

Metadata Datasets Architecture Algorithm

20 Best Open Source Big Data Projects to Contribute on GitHub

ProjectPro

NOVEMBER 15, 2021

Table of Contents 20 Open Source Big Data Projects To Contribute How to Contribute to Open Source Big Data Projects? 20 Open Source Big Data Projects To Contribute There are thousands of open-source projects in action today. This blog will walk through the most popular and fascinating open source big data projects.

Big Data

Big Data Project Metadata Programming Language

100+ Big Data Interview Questions and Answers 2023

ProjectPro

JANUARY 31, 2023

If you're looking to break into the exciting field of big data or advance your big data career, being well-prepared for big data interview questions is essential. Get ready to expand your knowledge and take your big data career to the next level! But the concern is - how do you become a big data professional?

Big Data

Big Data Hadoop AWS Relational Database

Implementing the Netflix Media Database

Netflix Tech

DECEMBER 14, 2018

In the previous blog posts in this series, we introduced the N etflix M edia D ata B ase ( NMDB ) and its salient “Media Document” data model. A fundamental requirement for any lasting data system is that it should scale along with the growth of the business applications it wishes to serve.

Media

Media Database Metadata Data Schemas

Data Vault on Snowflake: Feature Engineering and Business Vault

Snowflake

MARCH 30, 2023

Data Vault as a practice does not stipulate how you transform your data, only that you follow the same standards to populate business vault link and satellite tables as you would to populate raw vault link and satellite tables. Feature engineering: Data is transformed to support ML model training. ML workflow, ubr.to/3EJHjvm

Engineering

Engineering Raw Data Data Science Scala

Big Data Fabric Weaves Together Automation, Scalability, and Intelligence

Cloudera

JANUARY 22, 2019

Forrester describes Big Data Fabric as, “A unified, trusted, and comprehensive view of business data produced by orchestrating data sources automatically, intelligently, and securely, then preparing and processing them in big data platforms such as Hadoop and Apache Spark, data lakes, in-memory, and NoSQL.”.

Big Data

Big Data NoSQL Data Lake Hadoop

50 PySpark Interview Questions and Answers For 2023

ProjectPro

NOVEMBER 22, 2021

StructType is a collection of StructField objects that determines column name, column data type, field nullability, and metadata. To define the columns, PySpark offers the pyspark.sql.types import StructField class, which has the column name (String), column type (DataType), nullable column (Boolean), and metadata (MetaData).

Hadoop

Hadoop Python Datasets Metadata

The Ultimate Modern Data Stack Migration Guide

phData: Data Engineering

JULY 18, 2023

Zero Copy Cloning: Create multiple ‘copies’ of tables, schemas, or databases without actually copying the data. This noticeably saves time on copying and drastically reduces data storage costs. Data Source Tool: A multipurpose tool that collects, compares, analyzes, and acts on data source metadata and profile metrics.

Data Warehouse

Data Warehouse Pipeline-centric Government Data

Data Engineering Digest

DataOps Architecture: 5 Key Components and How to Get Started

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

Webinars

Trending Sources

How to learn data engineering

Webinars

Accelerate your Data Migration to Snowflake

Building Netflix’s Distributed Tracing Infrastructure

Accenture’s Smart Data Transition Toolkit Now Available for Cloudera Data Platform

Data Pipeline Observability: A Model For Data Engineers

Azure Data Engineer (DP-203) Certification Cost in 2023

AWS Glue-Unleashing the Power of Serverless ETL Effortlessly

Costwiz: Saving cost for LinkedIn enterprise on Azure

New Snowflake Features Released in April 2023

How Rockset Separates Compute and Storage Using RocksDB

20 Best Open Source Big Data Projects to Contribute on GitHub

100+ Big Data Interview Questions and Answers 2023

Implementing the Netflix Media Database

Data Vault on Snowflake: Feature Engineering and Business Vault

Big Data Fabric Weaves Together Automation, Scalability, and Intelligence

50 PySpark Interview Questions and Answers For 2023

The Ultimate Modern Data Stack Migration Guide

Stay Connected