Data Warehouse, ETL Tools, Events and Metadata

Data Warehouse

ETL Tools

Events

Metadata

From Big Data to Better Data: Ensuring Data Quality with Verity

Lyft Engineering

OCTOBER 3, 2023

In this post we will define data quality at a high-level and explore our motivation to achieve better data quality. We will then introduce our in-house product, Verity, and showcase how it serves as a central platform for ensuring data quality in our Hive Data Warehouse. What and Where is Data Quality?

Big Data

Big Data Metadata Data Warehouse Data

Mastering the Art of ETL on AWS for Data Management

ProjectPro

FEBRUARY 16, 2023

With so much riding on the efficiency of ETL processes for data engineering teams, it is essential to take a deep dive into the complex world of ETL on AWS to take your data management to the next level. ETL has typically been carried out utilizing data warehouses and on-premise ETL tools.

AWS

AWS Data Management ETL Tools Management

Join 16,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Trending Sources

The Rise of the Data Engineer

Maxime Beauchemin

JANUARY 20, 2017

The fact that ETL tools evolved to expose graphical interfaces seems like a detour in the history of data processing, and would certainly make for an interesting blog post of its own. Let’s highlight the fact that the abstractions exposed by traditional ETL tools are off-target.

Data Engineering

Data Engineering Data Engineer Engineering ETL Tools

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

AltexSoft

AUGUST 29, 2023

Instead of relying on traditional hierarchical structures and predefined schemas, as in the case of data warehouses, a data lake utilizes a flat architecture. This structure is made efficient by data engineering practices that include object storage. Data warehouse vs. data lake in a nutshell.

Data Lake

Data Lake Architecture IT Amazon Web Services

Demystifying event streams: Transforming events into tables with dbt

dbt Developer Hub

NOVEMBER 3, 2022

Let’s discuss how to convert events from an event-driven microservice architecture into relational tables in a warehouse like Snowflake. We use Snowflake as our data warehouse where we build dashboards both for internal use and for customers. However, BI tools and dbt models aren’t typically written this way.

Kafka

Kafka ETL Tools BI Database

Sqoop vs. Flume Battle of the Hadoop ETL tools

ProjectPro

OCTOBER 28, 2015

Some of the common challenges with data ingestion in Hadoop are parallel processing, data quality, machine data on a higher scale of several gigabytes per minute, multiple source ingestion, real-time ingestion and scalability. Sqoop hadoop can also be used for exporting data from HDFS into RDBMS.

ETL Tools

ETL Tools Hadoop Relational Database Unstructured Data

The Good and the Bad of Apache Kafka Streaming Platform

AltexSoft

OCTOBER 21, 2022

The technology was written in Java and Scala in LinkedIn to solve the internal problem of managing continuous data flows. This scenario involves three main characters — publishers, subscribers, and a message or event broker. A subscriber is a receiving program such as an end-user app or business intelligence tool.

Kafka

Kafka Hadoop ETL Tools Big Data

IBM InfoSphere vs Oracle Data Integrator vs Xplenty and Others: Data Integration Tools Compared

AltexSoft

OCTOBER 8, 2021

Data integration defines the process of collecting data from a number of disparate source systems and presenting it in a unified form within a centralized location like a data warehouse. So, why is data integration such a big deal? Connections to both data warehouses and data lakes are possible in any case.

Data Integration

Data Integration Hadoop Data Warehouse Data Lake

What is ETL Pipeline? Process, Considerations, and Examples

ProjectPro

NOVEMBER 30, 2021

That's where the ETL (Extract, Transform, and Load) pipeline comes into the picture! Table of Contents What is ETL Pipeline? First, we will start with understanding the Data pipelines with a straightforward layman's example. Now let us try to understand ETL data pipelines in more detail.

Process

Process Data Pipeline Data Warehouse AWS

Data Vault on Snowflake: Feature Engineering and Business Vault

Snowflake

MARCH 30, 2023

A 2016 data science report from data enrichment platform CrowdFlower found that data scientists spend around 80% of their time in data preparation (collecting, cleaning, and organizing of data) before they can even begin to build machine learning (ML) models to deliver business value.

Engineering

Engineering Raw Data Data Science Scala

Sqoop Interview Questions and Answers for 2023

ProjectPro

JUNE 23, 2016

Thus, this solution is not practically recommended and this is when Apache Sqoop comes to the rescues of users that allows users to import data on HDFS. Apache Sqoop is a lifesaver for people facing challenges with moving data out of a data warehouse into the Hadoop environment. Data import in sqoop is not event driven.

Hadoop

Hadoop MySQL Relational Database Java

Turning Streams Into Data Products

Cloudera

JUNE 16, 2022

Reduce ingest latency and complexity: Multiple point solutions were needed to move data from different data sources to downstream systems. The DevOps/app dev team wants to know how data flows between such entities and understand the key performance metrics (KPMs) of these entities.

Kafka

Kafka Manufacturing Data Lake SQL

Data Engineering Digest

From Big Data to Better Data: Ensuring Data Quality with Verity

Mastering the Art of ETL on AWS for Data Management

Webinars

Trending Sources

The Rise of the Data Engineer

Webinars

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

Demystifying event streams: Transforming events into tables with dbt

Sqoop vs. Flume Battle of the Hadoop ETL tools

The Good and the Bad of Apache Kafka Streaming Platform

IBM InfoSphere vs Oracle Data Integrator vs Xplenty and Others: Data Integration Tools Compared

What is ETL Pipeline? Process, Considerations, and Examples

Data Vault on Snowflake: Feature Engineering and Business Vault

Sqoop Interview Questions and Answers for 2023

Turning Streams Into Data Products

Stay Connected