Database and High Quality Data - Data Engineering Digest

Designing A Non-Relational Database Engine

Data Engineering Podcast

APRIL 14, 2024

Summary Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. Datafold has recently launched data replication testing, providing ongoing validation for source-to-target replication.

Non-relational Database

Non-relational Database Relational Database Database Designing

Reconciling The Data In Your Databases With Datafold

Data Engineering Podcast

MARCH 17, 2024

Summary A significant portion of data workflows involve storing and processing information in database engines. In this episode Gleb Mezhanskiy, founder and CEO of Datafold, discusses the different error conditions and solutions that you need to know about to ensure the accuracy of your data. Data lakes are notoriously complex.

Database

Database Data Lake High Quality Data Data Workflow

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Data Engineering Podcast

FEBRUARY 25, 2024

Summary Building a database engine requires a substantial amount of engineering effort and time investment. In this episode he explains how he used the combination of Apache Arrow, Flight, Datafusion, and Parquet to lay the foundation of the newest version of his time-series database. Data lakes are notoriously complex.

Database

Database Technology Data Lake High Quality Data

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Release Management For Data Platform Services And Logic

Data Engineering Podcast

MAY 12, 2024

I listened to the recent episode "Transforming Your Database" and appreciated the valuable advice on how to approach the selection and integration of new databases in applications and the impact on team dynamics. Data lakes are notoriously complex. Data lakes are notoriously complex.

Management

Management Data Lake High Quality Data Machine Learning

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. With Materialize, you can!

Kafka

Kafka Data Lake High Quality Data SQL

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

RisingWave is a database engine that was created specifically for stream processing, with S3 as the storage layer. In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable. Starburst : ![Starburst

SQL

SQL Data Lake High Quality Data Data Pipeline

Monte Carlo Announces Support for Kafka and Vector Databases at IMPACT 2023

Monte Carlo

NOVEMBER 8, 2023

Kafka and Vector Database support According to Databricks’ State of Data and AI report , the number of companies using SaaS LLM APIs has grown more than 1300% since November 2022 with a nearly 411% increase in the number of AI models put into production during that same period. Both integrations will be available early 2024.

Kafka

Kafka Database High Quality Data Datasets

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Data Engineering Podcast

APRIL 7, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment.

Data Lake

Data Lake High Quality Data BI Data Workflow

Modern Customer Data Platform Principles

Data Engineering Podcast

JANUARY 21, 2024

Summary Databases and analytics architectures have gone through several generational shifts. A substantial amount of the data that is being managed in these systems is related to customers and their interactions with an organization.

Data Lake

Data Lake High Quality Data NoSQL Data Warehouse

Shining Some Light In The Black Box Of PostgreSQL Performance

Data Engineering Podcast

NOVEMBER 5, 2023

Summary Databases are the core of most applications, but they are often treated as inscrutable black boxes. When an application is slow, there is a good probability that the database needs some attention. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products.

PostgreSQL

PostgreSQL Data Lake High Quality Data SQL

Addressing The Challenges Of Component Integration In Data Platform Architectures

Data Engineering Podcast

NOVEMBER 26, 2023

Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. With Materialize, you can!

Architecture

Architecture Data Lake High Quality Data SQL

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Data Engineering Podcast

MARCH 31, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Datafold has recently launched data replication testing, providing ongoing validation for source-to-target replication.

Project

Project Data Lake High Quality Data Data Workflow

Designing Data Platforms For Fintech Companies

Data Engineering Podcast

DECEMBER 31, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data.

Designing

Designing Data Lake High Quality Data SQL

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

JANUARY 7, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Process

Data Process Process Data Lake High Quality Data

Adding An Easy Mode For The Modern Data Stack With 5X

Data Engineering Podcast

DECEMBER 17, 2023

Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. With Materialize, you can!

Data Lake

Data Lake High Quality Data SQL Architecture

Designing Data Transfer Systems That Scale

Data Engineering Podcast

DECEMBER 3, 2023

Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. With Materialize, you can!

Systems

Systems Designing Data Lake SQL

Unlocking Your dbt Projects With Practical Advice For Practitioners

Data Engineering Podcast

NOVEMBER 19, 2023

Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. With Materialize, you can!

Project

Project Data Lake High Quality Data SQL

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Data Engineering Podcast

DECEMBER 10, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management You shouldn't have to throw away the database to build with fast-changing data. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products.

Data Lake

Data Lake High Quality Data SQL Architecture

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Data Engineering Podcast

NOVEMBER 12, 2023

If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold Data lakes are notoriously complex. You shouldn't have to throw away the database to build with fast-changing data.

Software Engineer

Software Engineer Software Engineering Engineering Data Lake

Building ETL Pipelines With Generative AI

Data Engineering Podcast

OCTOBER 1, 2023

Summary Artificial intelligence applications require substantial high quality data, which is provided through ETL pipelines. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. With Materialize, you can!

Building

Building BI SQL Machine Learning

7 Essential Data Cleaning Best Practices

Monte Carlo

APRIL 1, 2024

Implement Routine Data Audits Build a data cleaning cadence into your data teams’ schedule. Routine data quality checks will not only help to reduce the risk of discrepancies in your data, but it will also help to fortify a culture of high-quality data throughout your organization.

High Quality Data

High Quality Data Datasets Data Data Pipeline

5 Skills Data Engineers Should Master to Keep Pace with GenAI

Monte Carlo

FEBRUARY 27, 2024

Organizations need to connect LLMs with their proprietary data and business context to actually create value for their customers and employees. They need robust data pipelines, high-quality data, well-guarded privacy, and cost-effective scalability. Data engineers. Who can deliver?

Data Engineering

Data Engineering Data Engineer Engineering High Quality Data

Is Prompt Engineering Overhyped? No—But Learn These 3 GenAI Skills Too

Monte Carlo

MARCH 7, 2024

Why prompt engineering isn’t all that and a bag of SQL queries Understand vector databases Create AI differentiation with RAG Find and solve real business problems High-quality data always lives up to the hype What is prompt engineering? Table of Contents Why is prompt engineering important?

Engineering

Engineering High Quality Data Database Architecture

6 Pillars of Data Quality and How to Improve Your Data

Databand.ai

MAY 30, 2023

Data quality refers to the degree of accuracy, consistency, completeness, reliability, and relevance of the data collected, stored, and used within an organization or a specific context. High-quality data is essential for making well-informed decisions, performing accurate analyses, and developing effective strategies.

Data Cleanse

Data Cleanse Datasets Data Governance Data Validation

5 Hard Truths About Generative AI for Technology Leaders

Monte Carlo

JANUARY 3, 2024

But RAG development comes with a learning curve, even for your most talented data engineers. They need to know prompt engineering , vector databases and embedding vectors , data modeling, data orchestration , data pipelines and all for RAG. away from your data infrastructure being GenAI ready.

Technology

Technology Database Data Governance Data Engineering

Innovating Operations in Agriculture: Kramp’s Real-Time Analytics Journey

Striim

APRIL 30, 2024

Striim’s Solution Kramp adopted Striim for its powerful, mature real-time data integration, seamlessly connecting diverse databases like Oracle, Microsoft, and Postgres, to ensure continuous, high-quality data replication essential for forecasting and order management.

Google Cloud

Google Cloud High Quality Data Business Intelligence Data Warehouse

Data Consistency vs Data Integrity: Similarities and Differences

Databand.ai

AUGUST 30, 2023

Data Consistency vs Data Integrity: Similarities and Differences Joseph Arnold August 30, 2023 What Is Data Consistency? Data consistency refers to the state of data in which all copies or instances are the same across all systems and databases.

Data Integration

Data Integration Data Cleanse Data Validation High Quality Data

Our Top 5 Generative AI Articles in 2023

Monte Carlo

DECEMBER 22, 2023

She also explores the key considerations for organizations looking to implement these use cases, including vector databases, fine-tuning models, and unstructured or streaming data processing. Organizing Generative AI: 5 Lessons Learned From Data Science Teams So, your business leadership is all in on generative AI?

High Quality Data

High Quality Data Data Science Data Engineering Data Engineer

Our Top 5 Generative AI Articles in 2023

Monte Carlo

DECEMBER 22, 2023

She also explores the key considerations for organizations looking to implement these use cases, including vector databases, fine-tuning models, and unstructured or streaming data processing. Organizing Generative AI: 5 Lessons Learned From Data Science Teams So, your business leadership is all in on generative AI?

High Quality Data

High Quality Data Data Science Data Engineering Data Engineer

Ripple's Centralized Data Platform

Ripple Engineering

JANUARY 29, 2024

For Ripple's product capabilities, the Payments team of Ripple, for example, ingests millions of transactional records into databases and performs analytics to generate invoices, reports, and other related payment operations. A lack of a centralized system makes building a single source of high-quality data difficult.

Database-centric

Database-centric Pipeline-centric NoSQL High Quality Data

B2B Data Enrichment for Beginners

Precisely

MARCH 12, 2024

In this blog post, we’ll explain what data enrichment is, why you need it, how it works, and how B2B companies can use enriched data to drive results. What is data enrichment? Third-party data could come from various sources, including social media platforms, mobile devices, and IoT devices.

Insurance

Insurance Telecommunication Retail High Quality Data

Intrinsic Data Quality: 6 Essential Tactics Every Data Engineer Needs to Know

Monte Carlo

JANUARY 10, 2024

Extrinsic data, meanwhile, is more about the context — it’s how your data interacts with the world outside and how it fits into the larger picture of your project or organization. Consider a database that holds customer details. usability) would be about extrinsic data quality.

Data Cleanse

Data Cleanse Data Engineering Data Engineer Engineering

Data Collection And Management To Power Sound Recognition At Audio Analytic

Data Engineering Podcast

JUNE 29, 2020

This was a great conversation about the complexities of working in a niche domain of data analysis and how to build a pipeline of high quality data from collection to analysis. The team at Audio Analytic are working to impart a sense of hearing to our myriad devices with their sound recognition technology.

Data Collection

Data Collection Management High Quality Data Metadata

5 Layers of Data Lakehouse Architecture Explained

Monte Carlo

JANUARY 5, 2024

Ingestion layer The ingestion layer in data lakehouse architecture extracts data from various sources, including transactional and relational databases, APIs, real-time data streams, CRM applications, NoSQL databases, and more, and brings them into the data lake.

Architecture

Architecture Data Lake Metadata Unstructured Data

Data Lakehouse Architecture Explained: 5 Layers

Monte Carlo

JANUARY 5, 2024

Ingestion layer The ingestion layer in data lakehouse architecture extracts data from various sources, including transactional and relational databases, APIs, real-time data streams, CRM applications, NoSQL databases, and more, and brings them into the data lake.

Architecture

Architecture Data Lake Metadata Unstructured Data

Visionary Data Quality Paves the Way to Data Integrity

Precisely

MARCH 14, 2023

Read Quality data you can depend on – today, tomorrow, and beyond For many years Precisely customers have ensured the accuracy of data across their organizations by leveraging our leading data solutions including Trillium Quality, Spectrum Quality, and Data360 DQ+. What does all this mean for your business?

Data Integration

Data Integration High Quality Data BI Data

Fueling Data-Driven Decision-Making with Data Validation and Enrichment Processes

Precisely

SEPTEMBER 25, 2023

Data validation performs a check against existing values in a database to ensure that they fall within valid parameters. Data enrichment is the process of enhancing your data by appending relevant context from additional sources – improving its overall value, accuracy, and usability.

Data Validation

Data Validation Process Raw Data Data Cleanse

Experts Share the 5 Pillars Transforming Data & AI in 2024

Monte Carlo

JANUARY 23, 2024

RAG involves integrating a real-time database into the LLM’s response generation process, while fine-tuning trains models on targeted datasets to improve domain-specific responses. That implies working with new patterns like vector databases.” Those who don’t embrace it will be left behind. With the right prompt (this is key!)

Pipeline-centric

Pipeline-centric Database-centric Metadata Unstructured Data

Data Engineering Weekly #161

Data Engineering Weekly

MARCH 3, 2024

Here is the agenda, 1) Data Application Lifecycle Management - Harish Kumar( Paypal) Hear from the team in PayPal on how they build the data product lifecycle management (DPLM) systems. 3) DataOPS at AstraZeneca The AstraZeneca team talks about data ops best practices internally established and what worked and what didn’t work!!!

Data Engineering

Data Engineering Data Engineer Pipeline-centric Engineering

Four Vs Of Big Data

Knowledge Hut

APRIL 23, 2024

Each data point provides a specific value or attribute that contributes to the overall understanding and analysis of the data. On the other hand, data sources pertain to the origins or locations from which the data is collected. Data sources provide the context and environment from which the data points are collected.

Big Data

Big Data Media Datasets Unstructured Data

How to Use DBT to Get Actionable Insights from Data?

Workfall

JULY 4, 2023

With DBT, they weave powerful SQL spells to create data models that capture the essence of their organization’s information. DBT’s superpowers include seamlessly connecting with databases and data warehouses, performing amazing transformations, and effortlessly managing dependencies to ensure high-quality data.

Data Warehouse

Data Warehouse SQL PostgreSQL Database

5 ETL Best Practices You Shouldn’t Ignore

Monte Carlo

OCTOBER 5, 2023

Ensure data quality Even if there are no errors during the ETL process, you still have to make sure the data meets quality standards. High-quality data is crucial for accurate analysis and informed decision-making.

Data Cleanse

Data Cleanse ETL Tools Datasets High Quality Data

Six Books that Have Shaped My Data Career

Towards Data Science

MARCH 29, 2023

Great reads on modeling, processes, and leadership Photo by Emil Widlund on Unsplash At the very start of my journey in data, I thought I was going to be a data scientist, and my first foray into data was centered on studying statistics and linear algebra, not software engineering or database management.

Data Warehouse

Data Warehouse BI Healthcare Database

Best of 2022: Top 5 PropTech Blog Posts

Precisely

DECEMBER 19, 2022

The PropTech industry has been booming – and data holds the key to continuous transformation and competitive edge. High quality data and analytics helps PropTech companies gain deeper context on properties and locations, build richer models with accurate information, and more.

Data Governance

Data Governance Retail Government High Quality Data

Designing A Non-Relational Database Engine

Reconciling The Data In Your Databases With Datafold

Webinars

Trending Sources

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Webinars

Release Management For Data Platform Services And Logic

Troubleshooting Kafka In Production

Tackling Real Time Streaming Data With SQL Using RisingWave

Monte Carlo Announces Support for Kafka and Vector Databases at IMPACT 2023

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Modern Customer Data Platform Principles

Shining Some Light In The Black Box Of PostgreSQL Performance

Addressing The Challenges Of Component Integration In Data Platform Architectures

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Designing Data Platforms For Fintech Companies

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Adding An Easy Mode For The Modern Data Stack With 5X

Designing Data Transfer Systems That Scale

Unlocking Your dbt Projects With Practical Advice For Practitioners

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Building ETL Pipelines With Generative AI

7 Essential Data Cleaning Best Practices

5 Skills Data Engineers Should Master to Keep Pace with GenAI

Is Prompt Engineering Overhyped? No—But Learn These 3 GenAI Skills Too

6 Pillars of Data Quality and How to Improve Your Data

5 Hard Truths About Generative AI for Technology Leaders

Innovating Operations in Agriculture: Kramp’s Real-Time Analytics Journey

Data Consistency vs Data Integrity: Similarities and Differences

Our Top 5 Generative AI Articles in 2023

Our Top 5 Generative AI Articles in 2023

Ripple's Centralized Data Platform

B2B Data Enrichment for Beginners

Intrinsic Data Quality: 6 Essential Tactics Every Data Engineer Needs to Know

Data Collection And Management To Power Sound Recognition At Audio Analytic

5 Layers of Data Lakehouse Architecture Explained

Data Lakehouse Architecture Explained: 5 Layers

Visionary Data Quality Paves the Way to Data Integrity

Fueling Data-Driven Decision-Making with Data Validation and Enrichment Processes

Experts Share the 5 Pillars Transforming Data & AI in 2024

Data Engineering Weekly #161

Four Vs Of Big Data

How to Use DBT to Get Actionable Insights from Data?

5 ETL Best Practices You Shouldn’t Ignore

Six Books that Have Shaped My Data Career

Best of 2022: Top 5 PropTech Blog Posts

Stay Connected