Data, Data Pipeline, Data Workflow and SQL

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

Summary Stream processing systems have long been built with a code-first design, adding SQL as a layer on top of the existing framework. In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable.

SQL

SQL Data Lake High Quality Data Data Pipeline

Snowflake’s New Python API Empowers Data Engineers to Build Modern Data Pipelines with Ease

Snowflake

APRIL 17, 2024

In today’s data-driven world, developer productivity is essential for organizations to build effective and reliable products, accelerate time to value, and fuel ongoing innovation. This allows your applications to handle large data sets and complex workflows efficiently.

Data Pipeline

Data Pipeline Python Data Engineering Data Engineer

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Data lakes are notoriously complex. Join in with the event for the global data community, Data Council Austin.

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

Webinars

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Data Engineering Podcast

APRIL 7, 2024

Summary Maintaining a single source of truth for your data is the biggest challenge in data engineering. Different roles and tasks in the business need their own ways to access and analyze the data in the organization. Dagster offers a new approach to building and running data platforms and data pipelines.

Data Lake

Data Lake High Quality Data BI Data Workflow

Building Data Pipelines That Run From Source To Analysis And Activation With Hevo Data

Data Engineering Podcast

SEPTEMBER 11, 2022

Summary Any business that wants to understand their operations and customers through data requires some form of pipeline. Building reliable data pipelines is a complex and costly undertaking with many layered requirements. Data stacks are becoming more and more complex. Sifflet also offers a 2-week free trial.

Data Pipeline

Data Pipeline Building MongoDB Scala

Reconciling The Data In Your Databases With Datafold

Data Engineering Podcast

MARCH 17, 2024

Summary A significant portion of data workflows involve storing and processing information in database engines. Validating that the information is stored and processed correctly can be complex and time-consuming, especially when the source and destination speak different dialects of SQL. Data lakes are notoriously complex.

Database

Database Data Lake High Quality Data Data Workflow

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Data Engineering Podcast

MARCH 24, 2024

Summary A core differentiator of Dagster in the ecosystem of data orchestration is their focus on software defined assets as a means of building declarative workflows. Data lakes are notoriously complex. Your first 30 days are free! Want to see Starburst in action? What problems are you trying to solve with Dagster+?

Data Lake

Data Lake High Quality Data Hadoop Data Pipeline

Designing Data Transfer Systems That Scale

Data Engineering Podcast

DECEMBER 3, 2023

Summary The first step of data pipelines is to move the data to a place where you can process and prepare it for its eventual purpose. Data transfer systems are a critical component of data enablement, and building them to support large volumes of information is a complex endeavor. With Materialize, you can!

Systems

Systems Designing Data Lake SQL

Designing A Non-Relational Database Engine

Data Engineering Podcast

APRIL 14, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment.

Non-relational Database

Non-relational Database Relational Database Database Designing

Data Sharing Across Business And Platform Boundaries

Data Engineering Podcast

FEBRUARY 11, 2024

Summary Sharing data is a simple concept, but complicated to implement well. There are also numerous technical considerations to be made, particularly if the producer and consumer of the data aren't using the same platforms. Dagster offers a new approach to building and running data platforms and data pipelines.

Data Lake

Data Lake High Quality Data Government Data Pipeline

Making Email Better With AI At Shortwave

Data Engineering Podcast

APRIL 21, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. From a product perspective, what are the data challenges that are posed by email?

Data Lake

Data Lake High Quality Data Data Pipeline Machine Learning

Data Migration Strategies For Large Scale Systems

Data Engineering Podcast

MAY 26, 2024

When that system is responsible for the data layer the process becomes more challenging. Sriram Panyam has been involved in several projects that required migration of large volumes of data in high traffic environments. Can you start by sharing some of your experiences with data migration projects?

Systems

Systems Data Lake High Quality Data Google Cloud

Build Your Second Brain One Piece At A Time

Data Engineering Podcast

APRIL 28, 2024

In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain. Data lakes are notoriously complex. Your first 30 days are free! Want to see Starburst in action?

Building

Building Data Lake High Quality Data Machine Learning

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Data Engineering Podcast

MAY 5, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Your first 30 days are free! Want to see Starburst in action?

Building

Building Data Lake High Quality Data Machine Learning

Version Your Data Lakehouse Like Your Software With Nessie

Data Engineering Podcast

MARCH 10, 2024

Summary Data lakehouse architectures are gaining popularity due to the flexibility and cost effectiveness that they offer. The link that bridges the gap between data lake and warehouse capabilities is the catalog. Data lakes are notoriously complex. Join us at the top event for the global data community, Data Council Austin.

Data Lake

Data Lake High Quality Data Data Pipeline Architecture

Modern Customer Data Platform Principles

Data Engineering Podcast

JANUARY 21, 2024

A substantial amount of the data that is being managed in these systems is related to customers and their interactions with an organization. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Data projects are notoriously complex.

Data Lake

Data Lake High Quality Data NoSQL Data Warehouse

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Data Engineering Podcast

MARCH 31, 2024

Summary Working with data is a complicated process, with numerous chances for something to go wrong. Identifying and accounting for those errors is a critical piece of building trust in the organization that your data is accurate and up to date. Dagster offers a new approach to building and running data platforms and data pipelines.

Project

Project Data Lake High Quality Data Data Workflow

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Summary Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. Data lakes are notoriously complex.

Kafka

Kafka Data Lake High Quality Data SQL

Designing Data Platforms For Fintech Companies

Data Engineering Podcast

DECEMBER 31, 2023

Summary Working with financial data requires a high degree of rigor due to the numerous regulations and the risks involved in security breaches. In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platform in that sector. Want to see Starburst in action?

Designing

Designing Data Lake High Quality Data SQL

Build A Data Lake For Your Security Logs With Scanner

Data Engineering Podcast

JANUARY 28, 2024

Summary Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. Cliff Crosland co-founded Scanner to provide fast querying of high scale log data for security auditing. SIEM) A query engine is useless without data to analyze.

Data Lake

Data Lake Building High Quality Data AWS

Addressing The Challenges Of Component Integration In Data Platform Architectures

Data Engineering Podcast

NOVEMBER 26, 2023

Summary Building a data platform that is enjoyable and accessible for all of its end users is a substantial challenge. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. Data lakes are notoriously complex. With Materialize, you can!

Architecture

Architecture Data Lake High Quality Data SQL

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

JANUARY 7, 2024

Summary Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up. What are the open questions today in technical scalability of data engines?

Data Process

Data Process Process Data Lake High Quality Data

What Is Data Pipeline Automation?

Ascend.io

MARCH 17, 2023

Theoretically, data and analytics should be the backbones of decision-making in business. Like mitochondria power a cell, data powers a business. Today, there are no intelligent systems that deliver data at the pace, and with the impact, leaders need to power the business. But for most companies, that’s not the reality.

Data Pipeline

Data Pipeline Datasets Data Software Engineer

What Is Data Pipeline Automation?

Ascend.io

MARCH 17, 2023

Theoretically, data and analytics should be the backbones of decision-making in business. Like mitochondria power a cell, data powers a business. Today, there are no intelligent systems that deliver data at the pace, and with the impact, leaders need to power the business. But for most companies, that’s not the reality.

Data Pipeline

Data Pipeline Datasets Data Software Engineer

Adding An Easy Mode For The Modern Data Stack With 5X

Data Engineering Podcast

DECEMBER 17, 2023

Summary The "modern data stack" promised a scalable, composable data platform that gave everyone the flexibility to use the best tools for every job. In this episode founder Tarush Aggarwal explains how the realities of the modern data stack are impacting data teams and the work that they are doing to accelerate time to value.

Data Lake

Data Lake High Quality Data SQL Architecture

Build vs Buy Data Pipeline Guide

Monte Carlo

APRIL 24, 2023

In an evolving data landscape, the explosion of new tooling solutions—from cloud-based transforms to data observability —has made the question of “build versus buy” increasingly important for data leaders. But it’s not just our data platform that impacts how we define the problem and its solution.

Data Pipeline

Data Pipeline Building Data Ingestion BI

Data Pipeline vs. ETL: Which Delivers More Value?

Ascend.io

MAY 31, 2023

In the modern world of data engineering, two concepts often find themselves in a semantic tug-of-war: data pipeline and ETL. Fast forward to the present day, and we now have data pipelines. Data Ingestion Data ingestion is the first step of both ETL and data pipelines.

Data Pipeline

Data Pipeline ETL Tools Pipeline-centric Data Warehouse

Data Pipeline Architecture Explained: 6 Diagrams and Best Practices

Monte Carlo

JUNE 14, 2023

In this post, we will help you quickly level up your overall knowledge of data pipeline architecture by reviewing: Table of Contents What is data pipeline architecture? Why is data pipeline architecture important? What is data pipeline architecture? Why is data pipeline architecture important?

Data Pipeline

Data Pipeline Architecture Data Lake Data Warehouse

Data pipeline asset management with Dataflow

Netflix Tech

FEBRUARY 9, 2022

SQL) or compiled (e.g. JAR) form to be executed as part of the user defined data pipeline. data pipeline ?—?a DAG) for the purpose of transforming data using some business logic. Netflix homegrown CLI tool for data pipeline management. workflow ?—?see pyspark-workflow ? ???

Data Pipeline

Data Pipeline Scala Management Python

When And How To Conduct An AI Program

Data Engineering Podcast

MARCH 3, 2024

Colleen Tartow has worked across all stages of the data lifecycle, and in this episode she shares her hard-earned wisdom about how to conduct an AI program for your organization. Data lakes are notoriously complex. Join us at the top event for the global data community, Data Council Austin. Your first 30 days are free!

Programming

Programming Data Lake High Quality Data Data Pipeline

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Data Engineering Podcast

FEBRUARY 25, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Join us at the top event for the global data community, Data Council Austin.

Database

Database Technology Data Lake High Quality Data

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Data Engineering Podcast

NOVEMBER 12, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

Software Engineer

Software Engineer Software Engineering Engineering Data Lake

Upgrade your Modern Data Stack

Christophe Blefari

SEPTEMBER 28, 2023

Make your data stack take-off ( credits ) Hello, another edition of Data News. This week, we're going to take a step back and look at the current state of data platforms. What are the current trends and why are people fighting around the concept of the modern data stack. Is the modern data stack dying?

Cloud Storage

Cloud Storage Big Data Hadoop SQL

Unlocking Your dbt Projects With Practical Advice For Practitioners

Data Engineering Podcast

NOVEMBER 19, 2023

Summary The dbt project has become overwhelmingly popular across analytics and data engineering teams. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data projects are notoriously complex. Data lakes are notoriously complex. Introducing RudderStack Profiles.

Project

Project Data Lake High Quality Data SQL

Shining Some Light In The Black Box Of PostgreSQL Performance

Data Engineering Podcast

NOVEMBER 5, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

PostgreSQL

PostgreSQL Data Lake High Quality Data SQL

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Data Engineering Podcast

DECEMBER 10, 2023

Unfortunately, it can often be complex or expensive to incorporate anomaly detection into your data platform. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management You shouldn't have to throw away the database to build with fast-changing data. With Materialize, you can!

Data Lake

Data Lake High Quality Data SQL Architecture

Making Sense Of The Technical And Organizational Considerations Of Data Contracts

Data Engineering Podcast

DECEMBER 18, 2022

Summary One of the reasons that data work is so challenging is because no single person or team owns the entire process. This introduces friction in the process of collecting, processing, and using data. In order to reduce the potential for broken pipelines some teams have started to adopt the idea of data contracts.

Metadata

Metadata Business Intelligence Data Lake BI

Toward a Data Mesh (part 2) : Architecture & Technologies

François Nguyen

MARCH 22, 2021

TL;DR After setting up and organizing the teams, we are describing 4 topics to make data mesh a reality. TL;DR After setting up and organizing the teams, we are describing 4 topics to make data mesh a reality. How do we build data products ? How can we interoperate between the data domains ?

Technology

Technology Architecture Google Cloud Metadata

Making The Total Cost Of Ownership For External Data Manageable With Crux

Data Engineering Podcast

JULY 17, 2022

Summary There are extensive and valuable data sets that are available outside the bounds of your organization. Whether that data is public, paid, or scraped it requires investment and upkeep to acquire and integrate it with your systems. Atlan is the metadata hub for your data ecosystem.

Data Management

Data Management Management Metadata MongoDB

Be Confident In Your Data Integration By Quickly Validating Matching Records With data-

Data Engineering Podcast

JULY 3, 2022

Summary The perennial challenge of data engineers is ensuring that information is integrated reliably. In order to quickly identify if and how two data systems are out of sync Gleb Mezhanskiy and Simon Eskildsen partnered to create the open source data-diff utility. Data teams are increasingly under pressure to deliver.

Data Integration

Data Integration MongoDB Scala MySQL

Azure Data Engineer Job Description [Roles and Responsibilities]

Knowledge Hut

SEPTEMBER 25, 2023

This demonstrates how in-demand Microsoft Certified Data Engineers are becoming. They are moving their servers and on-premises data to Azure Cloud. What does all of this mean for Data Engineering professionals? In order to manage big data and other operational services, businesses are continuously in need of data engineers.

Data Engineering

Data Engineering Data Engineer Engineering Data Lake

A Reflection On The Data Ecosystem For The Year 2021

Data Engineering Podcast

JANUARY 1, 2022

Summary This has been an active year for the data ecosystem, with a number of new product categories and substantial growth in existing areas. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. Missing data?

Data Warehouse

Data Warehouse Data Lake SQL Hadoop

Top 10 Azure Data Engineer Job Opportunities in 2024 [Career Options]

Knowledge Hut

MARCH 28, 2024

Data science has become one of the most trending fields today. Data engineering is one of them. According to AnalytixLabs , the data science market is expected to be worth USD 230.80 This demonstrates the increasing need for Microsoft Certified Data Engineers. That’s where data engineers are on the go.

Data Engineering

Data Engineering Data Engineer Engineering Data Warehouse

How to Become a Data Engineer in 2024?

Knowledge Hut

DECEMBER 26, 2023

Data Engineering is typically a software engineering role that focuses deeply on data – namely, data workflows, data pipelines, and the ETL (Extract, Transform, Load) process. What is Data Science? What are the roles and responsibilities of a Data Engineer? What is the need for Data Science?

Data Engineering

Data Engineering Data Engineer Engineering Pipeline-centric

Tackling Real Time Streaming Data With SQL Using RisingWave

Snowflake’s New Python API Empowers Data Engineers to Build Modern Data Pipelines with Ease

Webinars

Trending Sources

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Webinars

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Building Data Pipelines That Run From Source To Analysis And Activation With Hevo Data

Reconciling The Data In Your Databases With Datafold

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Designing Data Transfer Systems That Scale

Designing A Non-Relational Database Engine

Data Sharing Across Business And Platform Boundaries

Making Email Better With AI At Shortwave

Data Migration Strategies For Large Scale Systems

Build Your Second Brain One Piece At A Time

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Version Your Data Lakehouse Like Your Software With Nessie

Modern Customer Data Platform Principles

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Troubleshooting Kafka In Production

Designing Data Platforms For Fintech Companies

Build A Data Lake For Your Security Logs With Scanner

Addressing The Challenges Of Component Integration In Data Platform Architectures

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

What Is Data Pipeline Automation?

What Is Data Pipeline Automation?

Adding An Easy Mode For The Modern Data Stack With 5X

Build vs Buy Data Pipeline Guide

Data Pipeline vs. ETL: Which Delivers More Value?

Data Pipeline Architecture Explained: 6 Diagrams and Best Practices

Data pipeline asset management with Dataflow

When And How To Conduct An AI Program

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Upgrade your Modern Data Stack

Unlocking Your dbt Projects With Practical Advice For Practitioners

Shining Some Light In The Black Box Of PostgreSQL Performance

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Making Sense Of The Technical And Organizational Considerations Of Data Contracts

Toward a Data Mesh (part 2) : Architecture & Technologies

Making The Total Cost Of Ownership For External Data Manageable With Crux

Be Confident In Your Data Integration By Quickly Validating Matching Records With data-

Azure Data Engineer Job Description [Roles and Responsibilities]

A Reflection On The Data Ecosystem For The Year 2021

Top 10 Azure Data Engineer Job Opportunities in 2024 [Career Options]

How to Become a Data Engineer in 2024?

Stay Connected