Data Lake, Data Workflow, Database and SQL

Designing A Non-Relational Database Engine

Data Engineering Podcast

APRIL 14, 2024

Summary Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. Datafold has recently launched data replication testing, providing ongoing validation for source-to-target replication.

Non-relational Database

Non-relational Database Relational Database Database Designing

Reconciling The Data In Your Databases With Datafold

Data Engineering Podcast

MARCH 17, 2024

Summary A significant portion of data workflows involve storing and processing information in database engines. Validating that the information is stored and processed correctly can be complex and time-consuming, especially when the source and destination speak different dialects of SQL. Your first 30 days are free!

Database

Database Data Lake High Quality Data Data Workflow

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

Summary Stream processing systems have long been built with a code-first design, adding SQL as a layer on top of the existing framework. RisingWave is a database engine that was created specifically for stream processing, with S3 as the storage layer. Can you describe what RisingWave is and the story behind it? Starburst : ![Starburst

SQL

SQL Data Lake High Quality Data Data Pipeline

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Data Engineering Podcast

FEBRUARY 25, 2024

Summary Building a database engine requires a substantial amount of engineering effort and time investment. In this episode he explains how he used the combination of Apache Arrow, Flight, Datafusion, and Parquet to lay the foundation of the newest version of his time-series database. Data lakes are notoriously complex.

Database

Database Technology Data Lake High Quality Data

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

Kafka

Kafka Data Lake High Quality Data SQL

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Data Engineering Podcast

APRIL 7, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment.

Data Lake

Data Lake High Quality Data BI Data Workflow

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Data Engineering Podcast

MARCH 31, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.

Project

Project Data Lake High Quality Data Data Workflow

Designing Data Transfer Systems That Scale

Data Engineering Podcast

DECEMBER 3, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

Systems

Systems Designing Data Lake SQL

Addressing The Challenges Of Component Integration In Data Platform Architectures

Data Engineering Podcast

NOVEMBER 26, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

Architecture

Architecture Data Lake High Quality Data SQL

Shining Some Light In The Black Box Of PostgreSQL Performance

Data Engineering Podcast

NOVEMBER 5, 2023

Summary Databases are the core of most applications, but they are often treated as inscrutable black boxes. When an application is slow, there is a good probability that the database needs some attention. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products.

PostgreSQL

PostgreSQL Data Lake High Quality Data SQL

Designing Data Platforms For Fintech Companies

Data Engineering Podcast

DECEMBER 31, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.

Designing

Designing Data Lake High Quality Data SQL

Modern Customer Data Platform Principles

Data Engineering Podcast

JANUARY 21, 2024

Summary Databases and analytics architectures have gone through several generational shifts. A substantial amount of the data that is being managed in these systems is related to customers and their interactions with an organization.

Data Lake

Data Lake High Quality Data NoSQL Data Warehouse

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

JANUARY 7, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.

Data Process

Data Process Process Data Lake High Quality Data

Unlocking Your dbt Projects With Practical Advice For Practitioners

Data Engineering Podcast

NOVEMBER 19, 2023

RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products.

Project

Project Data Lake High Quality Data SQL

Adding An Easy Mode For The Modern Data Stack With 5X

Data Engineering Podcast

DECEMBER 17, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

Data Lake

Data Lake High Quality Data SQL Architecture

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Data Engineering Podcast

NOVEMBER 12, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

Software Engineer

Software Engineer Software Engineering Engineering Data Lake

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Data Engineering Podcast

DECEMBER 10, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management You shouldn't have to throw away the database to build with fast-changing data. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products.

Data Lake

Data Lake High Quality Data SQL Architecture

The Evolution of Table Formats

Monte Carlo

MAY 14, 2024

It was designed to support high-volume data exchange and compatibility across different system versions, which is essential for streaming architectures such as Apache Kafka. This development was crucial for enabling both batch and streaming data workflows in dynamic environments, ensuring consistency and durability in big data processing.

Data Lake

Data Lake Metadata Hadoop Data Governance

Making Sense Of The Technical And Organizational Considerations Of Data Contracts

Data Engineering Podcast

DECEMBER 18, 2022

In this episode Abe Gong brings his experiences with the Great Expectations project and community to discuss the technical and organizational considerations involved in implementing these constraints to your data workflows. Atlan is the metadata hub for your data ecosystem.

Metadata

Metadata Business Intelligence Data Lake BI

Top 10 Azure Data Engineer Job Opportunities in 2024 [Career Options]

Knowledge Hut

MARCH 28, 2024

Implemented and managed data storage solutions using Azure services like Azure SQL Database , Azure Data Lake Storage, and Azure Cosmos DB. Education & Skills Required Proficiency in SQL, Python, or other programming languages. Familiarity with ETL tools and techniques for data integration.

Data Engineering

Data Engineering Data Engineer Engineering Data Warehouse

Azure Data Engineer Job Description [Roles and Responsibilities]

Knowledge Hut

SEPTEMBER 25, 2023

Skill Requirements for Azure Data Engineer Job Description Here are some important skill requirements that you may find in a job description for Azure Data Engineers: 1. Azure Data Engineers work with these and other solutions. They guarantee that the data is efficiently cleaned, converted, and loaded.

Data Engineering

Data Engineering Data Engineer Engineering Data Lake

Be Confident In Your Data Integration By Quickly Validating Matching Records With data-

Data Engineering Podcast

JULY 3, 2022

In order to quickly identify if and how two data systems are out of sync Gleb Mezhanskiy and Simon Eskildsen partnered to create the open source data-diff utility. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services.

Data Integration

Data Integration MongoDB Scala MySQL

Toward a Data Mesh (part 2) : Architecture & Technologies

François Nguyen

MARCH 22, 2021

TL;DR After setting up and organizing the teams, we are describing 4 topics to make data mesh a reality. With this 3rd platform generation, you have more real time data analytics and a cost reduction because it is easier to manage this infrastructure in the cloud thanks to managed services.

Architecture

Architecture Technology Google Cloud Metadata

10 Essential Azure Data Engineer Skills to Improve in 2023

Knowledge Hut

NOVEMBER 17, 2023

They enhance data pipelines, transform data, and guarantee the accuracy, integrity, and compliance of the data. Their job entails Azure data engineer skills like using big data, databases, data lakes, and analytics to help firms make efficient data-driven decisions.

Data Engineering

Data Engineering Data Engineer Engineering Data Lake

Top 20 Azure Data Engineering Projects in 2023 [Source Code]

Knowledge Hut

NOVEMBER 2, 2023

Azure Data Ingestion Pipeline Create an Azure Data Factory data ingestion pipeline to extract data from a source (e.g., CSV, SQL Server), transform it, and load it into a target storage (e.g., Azure SQL Database, Azure Data Lake Storage). A strong understanding of data sourcing with SQL.

Data Engineering

Data Engineering Data Engineer Coding Project

Building Data Pipelines That Run From Source To Analysis And Activation With Hevo Data

Data Engineering Podcast

SEPTEMBER 11, 2022

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. Sifflet also offers a 2-week free trial.

Data Pipeline

Data Pipeline Building MongoDB Scala

A Complete Guide to Azure Data Engineer Certification (DP-203)

Knowledge Hut

DECEMBER 28, 2023

This certification, often referred to as the Azure Data Engineer Associate certification, validates the competency of individuals in implementing Azure data solutions. It’s a testament to their ability to create scalable, efficient and secure data pipelines. What is the Azure Data Engineer Certification?

Certification

Certification Data Engineering Data Engineer Engineering

The Modern Data Stack: What It Is, How It Works, Use Cases, and Ways to Implement

AltexSoft

MARCH 14, 2023

As the volume and complexity of data continue to grow, organizations seek faster, more efficient, and cost-effective ways to manage and analyze data. In recent years, cloud-based data warehouses have revolutionized data processing with their advanced massively parallel processing (MPP) capabilities and SQL support.

IT

IT Data Warehouse Data Governance Data Lake

Data Pipeline Architecture Explained: 6 Diagrams and Best Practices

Monte Carlo

JUNE 14, 2023

5 Data pipeline architecture designs and their evolution The Hadoop era , roughly 2011 to 2017, arguably ushered in big data processing capabilities to mainstream organizations. Data then, and even today for some organizations, was primarily hosted in on-premises databases with non-scalable storage.

Data Pipeline

Data Pipeline Architecture Data Lake Data Warehouse

Unleashing the Power of CDC With Snowflake

Workfall

JUNE 12, 2023

Change Data Capture (CDC) is a powerful technique that revolutionises data engineering by capturing and applying incremental changes to databases or data sources. It bridges gaps in data ecosystems, ensuring consistency and synchronisation across systems.

Telecommunication

Telecommunication Metadata Healthcare Finance

Data Transformations Using the Data Build Tool

Ripple Engineering

MAY 27, 2021

Our data analysts used to schedule queries on BigQuery for transformation workflows and test the transformed data manually. We did not have a single tool that would automate the building, compiling, testing and documenting of SQL models, so we had no way to scale the process. SQL Models A model is a single.sql file.

Building

Building Raw Data SQL Data

Testing Data Applications is Hard

Meltano

FEBRUARY 22, 2023

Testing a data application is similar to testing any software application in many ways, just with a strong focus on testing data-related issues. But testing problems like failing data workflows, mismatches in data reconciliation after ETL, and data quality issues means that you’re not only testing the code but also the data itself.

Data

Data Transportation Data Workflow Data Lake

DataOps For Business Analytics Teams

DataKitchen

JANUARY 3, 2022

For example, teams working under the VP/Directors of Data Analytics may be tasked with accessing data, building databases, integrating data, and producing reports. Data scientists derive insights from data while business analysts work closely with and tend to the data needs of business units.

Business Analyst

Business Analyst Data Lake Consulting Data Analytics

Build vs Buy Data Pipeline Guide

Monte Carlo

APRIL 24, 2023

During data ingestion, raw data is extracted from sources and ferried to either a staging server for transformation or directly into the storage level of your data stack—usually in the form of a data warehouse or data lake. There are two primary types of raw data.

Data Pipeline

Data Pipeline Building Data Ingestion BI

Azure Data Engineer (DP-203) Certification Cost in 2023

Knowledge Hut

SEPTEMBER 29, 2023

You must have a solid grasp of ideas in parallel processing, data architecture, and data computation languages like SQL, Python, or Scala in order to become a Microsoft Certified Azure Data Engineer. Why Should You Get an Azure Data Engineer Certification? Then, you can create analytical layer serving designs.

Certification

Certification Data Engineering Data Engineer Engineering

DataOps: What Is It, Core Principles, and Tools For Implementation

phData: Data Engineering

JANUARY 3, 2022

This makes you work for your data instead of your data working for you. This commonly introduces: Database or Data Warehouse API/EDI Integrations ETL software Business intelligence tooling By leveraging off-the-shelf tooling, your company separates disciplines by technology. Databases, schemas, tables, view, etc.

IT

IT AWS Software Engineer Software Engineering

Designing A Non-Relational Database Engine

Reconciling The Data In Your Databases With Datafold

Webinars

Trending Sources

Tackling Real Time Streaming Data With SQL Using RisingWave

Webinars

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Troubleshooting Kafka In Production

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Designing Data Transfer Systems That Scale

Addressing The Challenges Of Component Integration In Data Platform Architectures

Shining Some Light In The Black Box Of PostgreSQL Performance

Designing Data Platforms For Fintech Companies

Modern Customer Data Platform Principles

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Unlocking Your dbt Projects With Practical Advice For Practitioners

Adding An Easy Mode For The Modern Data Stack With 5X

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

The Evolution of Table Formats

Making Sense Of The Technical And Organizational Considerations Of Data Contracts

Top 10 Azure Data Engineer Job Opportunities in 2024 [Career Options]

Azure Data Engineer Job Description [Roles and Responsibilities]

Be Confident In Your Data Integration By Quickly Validating Matching Records With data-

Toward a Data Mesh (part 2) : Architecture & Technologies

10 Essential Azure Data Engineer Skills to Improve in 2023

Top 20 Azure Data Engineering Projects in 2023 [Source Code]

Building Data Pipelines That Run From Source To Analysis And Activation With Hevo Data

A Complete Guide to Azure Data Engineer Certification (DP-203)

The Modern Data Stack: What It Is, How It Works, Use Cases, and Ways to Implement

Data Pipeline Architecture Explained: 6 Diagrams and Best Practices

Unleashing the Power of CDC With Snowflake

Data Transformations Using the Data Build Tool

Testing Data Applications is Hard

DataOps For Business Analytics Teams

Build vs Buy Data Pipeline Guide

Azure Data Engineer (DP-203) Certification Cost in 2023

DataOps: What Is It, Core Principles, and Tools For Implementation

Stay Connected