Data Lake, Data Workflow, Project and SQL

Build A Data Lake For Your Security Logs With Scanner

Data Engineering Podcast

JANUARY 28, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.

Data Lake

Data Lake Building High Quality Data AWS

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

Summary Stream processing systems have long been built with a code-first design, adding SQL as a layer on top of the existing framework. In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable.

SQL

SQL Data Lake High Quality Data Data Pipeline

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Data Engineering Podcast

MARCH 31, 2024

While there are numerous products available to provide that visibility, they all have different technologies and workflows that they focus on. To bring observability to dbt projects the team at Elementary embedded themselves into the workflow. How have the scope and goals of the project changed since you started working on it?

Project

Project Data Lake High Quality Data Data Workflow

Webinars

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Unlocking Your dbt Projects With Practical Advice For Practitioners

Data Engineering Podcast

NOVEMBER 19, 2023

Summary The dbt project has become overwhelmingly popular across analytics and data engineering teams. Dustin Dorsey and Cameron Cyr co-authored a practical guide to building your dbt project. In this episode they share their hard-won wisdom about how to build and scale your dbt projects. Introducing RudderStack Profiles.

Project

Project Data Lake High Quality Data SQL

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Multiple open source projects and vendors have been working together to make this vision a reality. Data lakes are notoriously complex.

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

Designing A Non-Relational Database Engine

Data Engineering Podcast

APRIL 14, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment.

Non-relational Database

Non-relational Database Relational Database Database Designing

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

Kafka

Kafka Data Lake High Quality Data SQL

Making Email Better With AI At Shortwave

Data Engineering Podcast

APRIL 21, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data Data Pipeline Machine Learning

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Data Engineering Podcast

APRIL 7, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment.

Data Lake

Data Lake High Quality Data BI Data Workflow

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Data Engineering Podcast

MARCH 24, 2024

Summary A core differentiator of Dagster in the ecosystem of data orchestration is their focus on software defined assets as a means of building declarative workflows. With their launch of Dagster+ as the redesigned commercial companion to the open source project they are investing in that capability with a suite of new features.

Data Lake

Data Lake High Quality Data Hadoop Data Pipeline

Build Your Second Brain One Piece At A Time

Data Engineering Podcast

APRIL 28, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Building

Building Data Lake High Quality Data Machine Learning

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Data Engineering Podcast

MAY 5, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Building

Building Data Lake High Quality Data Machine Learning

Designing Data Transfer Systems That Scale

Data Engineering Podcast

DECEMBER 3, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

Systems

Systems Designing Data Lake SQL

Reconciling The Data In Your Databases With Datafold

Data Engineering Podcast

MARCH 17, 2024

Summary A significant portion of data workflows involve storing and processing information in database engines. Validating that the information is stored and processed correctly can be complex and time-consuming, especially when the source and destination speak different dialects of SQL. Data lakes are notoriously complex.

Database

Database Data Lake High Quality Data Data Workflow

Modern Customer Data Platform Principles

Data Engineering Podcast

JANUARY 21, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.

Data Lake

Data Lake High Quality Data NoSQL Data Warehouse

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

JANUARY 7, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.

Data Process

Data Process Process Data Lake High Quality Data

Version Your Data Lakehouse Like Your Software With Nessie

Data Engineering Podcast

MARCH 10, 2024

Summary Data lakehouse architectures are gaining popularity due to the flexibility and cost effectiveness that they offer. The link that bridges the gap between data lake and warehouse capabilities is the catalog. Data lakes are notoriously complex. What is involved in integrating Nessie into a given data stack?

Data Lake

Data Lake High Quality Data Data Pipeline Architecture

Data Migration Strategies For Large Scale Systems

Data Engineering Podcast

MAY 26, 2024

Sriram Panyam has been involved in several projects that required migration of large volumes of data in high traffic environments. In this episode he shares some of the valuable lessons that he learned about how to make those projects successful. Can you start by sharing some of your experiences with data migration projects?

Systems

Systems Data Lake High Quality Data Google Cloud

When And How To Conduct An AI Program

Data Engineering Podcast

MARCH 3, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Programming

Programming Data Lake High Quality Data Data Pipeline

Data Sharing Across Business And Platform Boundaries

Data Engineering Podcast

FEBRUARY 11, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.

Data Lake

Data Lake High Quality Data Government Data Pipeline

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Data Engineering Podcast

FEBRUARY 25, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Database

Database Technology Data Lake High Quality Data

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Data Engineering Podcast

DECEMBER 10, 2023

Unfortunately, it can often be complex or expensive to incorporate anomaly detection into your data platform. Andrew Maguire got tired of solving that problem for each of the different roles he has ended up in, so he created the open source Anomstack project. Find simplicity in your most complex projects with Miro.

Data Lake

Data Lake High Quality Data SQL Architecture

Addressing The Challenges Of Component Integration In Data Platform Architectures

Data Engineering Podcast

NOVEMBER 26, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

Architecture

Architecture Data Lake High Quality Data SQL

Designing Data Platforms For Fintech Companies

Data Engineering Podcast

DECEMBER 31, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.

Designing

Designing Data Lake High Quality Data SQL

Adding An Easy Mode For The Modern Data Stack With 5X

Data Engineering Podcast

DECEMBER 17, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

Data Lake

Data Lake High Quality Data SQL Architecture

Shining Some Light In The Black Box Of PostgreSQL Performance

Data Engineering Podcast

NOVEMBER 5, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

PostgreSQL

PostgreSQL Data Lake High Quality Data SQL

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Data Engineering Podcast

NOVEMBER 12, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

Software Engineer

Software Engineer Software Engineering Engineering Data Lake

Top 20 Azure Data Engineering Projects in 2023 [Source Code]

Knowledge Hut

NOVEMBER 2, 2023

Azure Data engineering projects are complicated and require careful planning and effective team participation for a successful completion. While many technologies are available to help data engineers streamline their workflows and guarantee that each aspect meets its objectives, ensuring that everything works properly takes time.

Data Engineering

Data Engineering Data Engineer Coding Project

Making Sense Of The Technical And Organizational Considerations Of Data Contracts

Data Engineering Podcast

DECEMBER 18, 2022

In this episode Abe Gong brings his experiences with the Great Expectations project and community to discuss the technical and organizational considerations involved in implementing these constraints to your data workflows. If you've learned something or tried out a project from the show then tell us about it!

Metadata

Metadata Business Intelligence Data Lake BI

The Evolution of Table Formats

Monte Carlo

MAY 14, 2024

The “legacy” table formats The data landscape has evolved so quickly that table formats pioneered within the last 25 years are already achieving “legacy” status. It was designed to support high-volume data exchange and compatibility across different system versions, which is essential for streaming architectures such as Apache Kafka.

Data Lake

Data Lake Metadata Hadoop Data Governance

Toward a Data Mesh (part 2) : Architecture & Technologies

François Nguyen

MARCH 22, 2021

TL;DR After setting up and organizing the teams, we are describing 4 topics to make data mesh a reality. With this 3rd platform generation, you have more real time data analytics and a cost reduction because it is easier to manage this infrastructure in the cloud thanks to managed services.

Technology

Technology Architecture Google Cloud Metadata

Top 10 Azure Data Engineer Job Opportunities in 2024 [Career Options]

Knowledge Hut

MARCH 28, 2024

Implemented and managed data storage solutions using Azure services like Azure SQL Database , Azure Data Lake Storage, and Azure Cosmos DB. Education & Skills Required Proficiency in SQL, Python, or other programming languages. Develop predictive models and data-driven solutions to address business challenges.

Data Engineering

Data Engineering Data Engineer Engineering Data Warehouse

A Reflection On The Data Ecosystem For The Year 2021

Data Engineering Podcast

JANUARY 1, 2022

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. No more scripts, just SQL.

Data Warehouse

Data Warehouse Data Lake SQL Hadoop

Be Confident In Your Data Integration By Quickly Validating Matching Records With data-

Data Engineering Podcast

JULY 3, 2022

In this episode they explain how the utility is implemented to run quickly and how you can start using it in your own data workflows to ensure that your data warehouse isn’t missing any records from your source systems. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer.

Data Integration

Data Integration MongoDB Scala MySQL

Doing DataOps For External Data Sources As A Service at Demyst

Data Engineering Podcast

NOVEMBER 27, 2021

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. No more scripts, just SQL.

Data Warehouse

Data Warehouse Data Lake BI Business Intelligence

10 Essential Azure Data Engineer Skills to Improve in 2023

Knowledge Hut

NOVEMBER 17, 2023

They enhance data pipelines, transform data, and guarantee the accuracy, integrity, and compliance of the data. Their job entails Azure data engineer skills like using big data, databases, data lakes, and analytics to help firms make efficient data-driven decisions.

Data Engineering

Data Engineering Data Engineer Engineering Data Lake

Building Data Pipelines That Run From Source To Analysis And Activation With Hevo Data

Data Engineering Podcast

SEPTEMBER 11, 2022

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode.

Data Pipeline

Data Pipeline Building MongoDB Scala

A Complete Guide to Azure Data Engineer Certification (DP-203)

Knowledge Hut

DECEMBER 28, 2023

This certification, often referred to as the Azure Data Engineer Associate certification, validates the competency of individuals in implementing Azure data solutions. It’s a testament to their ability to create scalable, efficient and secure data pipelines. What is the Azure Data Engineer Certification?

Certification

Certification Data Engineering Data Engineer Engineering

Data Transformations Using the Data Build Tool

Ripple Engineering

MAY 27, 2021

Our data analysts used to schedule queries on BigQuery for transformation workflows and test the transformed data manually. We did not have a single tool that would automate the building, compiling, testing and documenting of SQL models, so we had no way to scale the process. SQL Models A model is a single.sql file.

Building

Building Raw Data SQL Data

DataOps For Business Analytics Teams

DataKitchen

JANUARY 3, 2022

If the IT or data engineering team can’t respond with an enabling data platform in the required time frame, the business analyst does the necessary data work themselves. This ad hoc data engineering work often means coping with numerous data tables and diverse data sets using Alteryx, SQL, Excel or similar tools. .

Business Analyst

Business Analyst Data Lake Consulting Data Analytics

Build vs Buy Data Pipeline Guide

Monte Carlo

APRIL 24, 2023

During data ingestion, raw data is extracted from sources and ferried to either a staging server for transformation or directly into the storage level of your data stack—usually in the form of a data warehouse or data lake. There are two primary types of raw data.

Data Pipeline

Data Pipeline Building Data Ingestion BI

Top 25 Data Science Tools To Use in 2024

Knowledge Hut

MAY 23, 2024

This vast stream of interdisciplinary domains deals with data in different ways. It helps companies understand data and obtain meaningful insights from it. According to the GlobeNewswire report , the projected growth of the data science market will hike up to a CAGR of 25 percent by 2030.

Data Science

Data Science MongoDB Programming Language Hadoop

Azure Data Engineer (DP-203) Certification Cost in 2023

Knowledge Hut

SEPTEMBER 29, 2023

You must have a solid grasp of ideas in parallel processing, data architecture, and data computation languages like SQL, Python, or Scala in order to become a Microsoft Certified Azure Data Engineer. Why Should You Get an Azure Data Engineer Certification? Then, you can create analytical layer serving designs.

Certification

Certification Data Engineering Data Engineer Engineering

DataOps: What Is It, Core Principles, and Tools For Implementation

phData: Data Engineering

JANUARY 3, 2022

However, defining how each of these is going to be accomplished is core to any software engineering product/project. The product owner is responsible for envisioning what the product/project would look like and accomplish, and then determining the team members and funding involved.

IT

IT AWS Software Engineer Software Engineering

Build A Data Lake For Your Security Logs With Scanner

Tackling Real Time Streaming Data With SQL Using RisingWave

Webinars

Trending Sources

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Webinars

Unlocking Your dbt Projects With Practical Advice For Practitioners

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Designing A Non-Relational Database Engine

Troubleshooting Kafka In Production

Making Email Better With AI At Shortwave

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Build Your Second Brain One Piece At A Time

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Designing Data Transfer Systems That Scale

Reconciling The Data In Your Databases With Datafold

Modern Customer Data Platform Principles

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Version Your Data Lakehouse Like Your Software With Nessie

Data Migration Strategies For Large Scale Systems

When And How To Conduct An AI Program

Data Sharing Across Business And Platform Boundaries

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Addressing The Challenges Of Component Integration In Data Platform Architectures

Designing Data Platforms For Fintech Companies

Adding An Easy Mode For The Modern Data Stack With 5X

Shining Some Light In The Black Box Of PostgreSQL Performance

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Top 20 Azure Data Engineering Projects in 2023 [Source Code]

Making Sense Of The Technical And Organizational Considerations Of Data Contracts

The Evolution of Table Formats

Toward a Data Mesh (part 2) : Architecture & Technologies

Top 10 Azure Data Engineer Job Opportunities in 2024 [Career Options]

A Reflection On The Data Ecosystem For The Year 2021

Be Confident In Your Data Integration By Quickly Validating Matching Records With data-

Doing DataOps For External Data Sources As A Service at Demyst

10 Essential Azure Data Engineer Skills to Improve in 2023

Building Data Pipelines That Run From Source To Analysis And Activation With Hevo Data

A Complete Guide to Azure Data Engineer Certification (DP-203)

Data Transformations Using the Data Build Tool

DataOps For Business Analytics Teams

Build vs Buy Data Pipeline Guide

Top 25 Data Science Tools To Use in 2024

Azure Data Engineer (DP-203) Certification Cost in 2023

DataOps: What Is It, Core Principles, and Tools For Implementation

Stay Connected