Accessibility, Data Governance, Data Workflow and Process

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

JANUARY 7, 2024

Summary Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up.

Data Process

Data Process Process Data Lake High Quality Data

Data Migration Strategies For Large Scale Systems

Data Engineering Podcast

MAY 26, 2024

When that system is responsible for the data layer the process becomes more challenging. Sriram Panyam has been involved in several projects that required migration of large volumes of data in high traffic environments. Can you start by sharing some of your experiences with data migration projects?

Systems

Systems Data Lake High Quality Data Google Cloud

Release Management For Data Platform Services And Logic

Data Engineering Podcast

MAY 12, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake.

Management

Management Data Lake High Quality Data Machine Learning

Webinars

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Build Your Second Brain One Piece At A Time

Data Engineering Podcast

APRIL 28, 2024

In order to simplify the integration of AI capabilities into developer workflows Tsavo Knott helped create Pieces, a powerful collection of tools that complements the tools that developers already use. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free!

Building

Building Data Lake High Quality Data Machine Learning

Data Sharing Across Business And Platform Boundaries

Data Engineering Podcast

FEBRUARY 11, 2024

In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process. How does that change as a function of the type of data? How does that change as a function of the type of data?

Data Lake

Data Lake High Quality Data Government Data Pipeline

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Data Engineering Podcast

MARCH 31, 2024

Summary Working with data is a complicated process, with numerous chances for something to go wrong. Identifying and accounting for those errors is a critical piece of building trust in the organization that your data is accurate and up to date. Your first 30 days are free!

Project

Project Data Lake High Quality Data Data Workflow

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

Summary Stream processing systems have long been built with a code-first design, adding SQL as a layer on top of the existing framework. RisingWave is a database engine that was created specifically for stream processing, with S3 as the storage layer. Can you describe what RisingWave is and the story behind it?

SQL

SQL Data Lake High Quality Data Data Pipeline

Reconciling The Data In Your Databases With Datafold

Data Engineering Podcast

MARCH 17, 2024

Summary A significant portion of data workflows involve storing and processing information in database engines. Validating that the information is stored and processed correctly can be complex and time-consuming, especially when the source and destination speak different dialects of SQL. Your first 30 days are free!

Database

Database Data Lake High Quality Data Data Workflow

Addressing The Challenges Of Component Integration In Data Platform Architectures

Data Engineering Podcast

NOVEMBER 26, 2023

Summary Building a data platform that is enjoyable and accessible for all of its end users is a substantial challenge. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. Data lakes are notoriously complex.

Architecture

Architecture Data Lake High Quality Data SQL

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Kafka

Kafka Data Lake High Quality Data SQL

Designing Data Platforms For Fintech Companies

Data Engineering Podcast

DECEMBER 31, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. What does a typical build vs. buy decision process look like? How does that influence the architectural design/capabilities for data platforms in those organizations?

Designing

Designing Data Lake High Quality Data SQL

When And How To Conduct An AI Program

Data Engineering Podcast

MARCH 3, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Programming

Programming Data Lake High Quality Data Data Pipeline

Designing Data Transfer Systems That Scale

Data Engineering Podcast

DECEMBER 3, 2023

Summary The first step of data pipelines is to move the data to a place where you can process and prepare it for its eventual purpose. Data transfer systems are a critical component of data enablement, and building them to support large volumes of information is a complex endeavor. Datafold : ![Datafold]([link]

Systems

Systems Designing Data Lake SQL

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Data Engineering Podcast

FEBRUARY 25, 2024

When Paul Dix decided to re-write the InfluxDB engine he found the Apache Arrow ecosystem ready and waiting with useful building blocks to accelerate the process. Data lakes are notoriously complex. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free!

Database

Database Technology Data Lake High Quality Data

Version Your Data Lakehouse Like Your Software With Nessie

Data Engineering Podcast

MARCH 10, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data Data Pipeline Architecture

Toward a Data Mesh (part 2) : Architecture & Technologies

François Nguyen

MARCH 22, 2021

TL;DR After setting up and organizing the teams, we are describing 4 topics to make data mesh a reality. How do we build data products ? How can we interoperate between the data domains ? We want interoperability for any data stored versus we have to think how to store the data in a specific node to optimize the processing.

Technology

Technology Architecture Google Cloud Metadata

Adding An Easy Mode For The Modern Data Stack With 5X

Data Engineering Podcast

DECEMBER 17, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data SQL Architecture

DataOps Framework: 4 Key Components and How to Implement Them

Databand.ai

AUGUST 30, 2023

The DataOps framework is a set of practices, processes, and technologies that enables organizations to improve the speed, accuracy, and reliability of their data management and analytics operations. The core philosophy of DataOps is to treat data as a valuable asset that must be managed and processed efficiently.

Data Governance

Data Governance Data Pipeline Government Data Cleanse

Unlocking Your dbt Projects With Practical Advice For Practitioners

Data Engineering Podcast

NOVEMBER 19, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Project

Project Data Lake High Quality Data SQL

The Evolution of Table Formats

Monte Carlo

MAY 14, 2024

Depending on the quantity of data flowing through an organization’s pipeline — or the format the data typically takes — the right modern table format can help to make workflows more efficient, increase access, extend functionality, and even offer new opportunities to activate your unstructured data.

Data Lake

Data Lake Metadata Hadoop Data Governance

Shining Some Light In The Black Box Of PostgreSQL Performance

Data Engineering Podcast

NOVEMBER 5, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

PostgreSQL

PostgreSQL Data Lake High Quality Data SQL

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Data Engineering Podcast

DECEMBER 10, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data SQL Architecture

DataOps Tools: Key Capabilities & 5 Tools You Must Know About

Databand.ai

AUGUST 30, 2023

DataOps , short for data operations, is an emerging discipline that focuses on improving the collaboration, integration, and automation of data processes across an organization. By using DataOps tools, organizations can break down silos, reduce time-to-insight, and improve the overall quality of their data analytics processes.

Data Cleanse

Data Cleanse Data Pipeline Data Ingestion Data Validation

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Data Engineering Podcast

NOVEMBER 12, 2023

Learn more about Datafold by visiting dataengineeringpodcast.com/datafold Data lakes are notoriously complex. Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Go to [materialize.com]([link] today and get 2 weeks free!

Software Engineer

Software Engineer Software Engineering Engineering Data Lake

Unified DataOps: Components, Challenges, and How to Get Started

Databand.ai

AUGUST 30, 2023

Unified DataOps represents a fresh approach to managing and synchronizing data operations across several domains, including data engineering, data science, DevOps, and analytics. The goal of this strategy is to streamline the entire process of extracting insights from raw data by removing silos between teams and technologies.

Data Governance

Data Governance Data Cleanse Government Data Pipeline

What is Data Orchestration?

Monte Carlo

MAY 25, 2023

While organizations across industries are sitting on mountains of un-mined decision-making gold, scattered and siloed data precludes them from delivering data-driven alchemy. What these companies need—what you might need—is data orchestration. Improved data governance. Automating data workflows.

Data Pipeline

Data Pipeline Data Workflow Data Data Governance

The Modern Data Stack: What It Is, How It Works, Use Cases, and Ways to Implement

AltexSoft

MARCH 14, 2023

As the volume and complexity of data continue to grow, organizations seek faster, more efficient, and cost-effective ways to manage and analyze data. In recent years, cloud-based data warehouses have revolutionized data processing with their advanced massively parallel processing (MPP) capabilities and SQL support.

IT

IT Data Warehouse Data Governance Data Lake

Top 10 Azure Data Engineer Job Opportunities in 2024 [Career Options]

Knowledge Hut

MARCH 28, 2024

They use many data storage, computation, and analytics technologies to develop scalable and robust data pipelines. Role Level Intermediate Responsibilities Design and develop data pipelines to ingest, process, and transform data. Develop data models, data governance policies, and data integration strategies.

Data Engineering

Data Engineering Data Engineer Engineering Data Warehouse

ETL for Snowflake: Why You Need It and How to Get Started

Ascend.io

DECEMBER 19, 2023

We’ll talk about when and why ETL becomes essential in your Snowflake journey and walk you through the process of choosing the right ETL tool. Our focus is to make your decision-making process smoother, helping you understand how to best integrate ETL into your data strategy. That’s what we call a data pipeline.

ETL Tools

ETL Tools IT Data Pipeline Data Warehouse

The DataOps Vendor Landscape, 2021

DataKitchen

APRIL 13, 2021

Process Analytics. We have also included vendors for the specific use cases of ModelOps, MLOps, DataGovOps and DataSecOps which apply DataOps principles to machine learning, AI, data governance, and data security operations. . Reflow — A system for incremental data processing in the cloud. Azure DevOps.

Consulting

Consulting Machine Learning Data Science Data Pipeline

Better Data Quality Through Observability With Monte Carlo

Data Engineering Podcast

OCTOBER 19, 2020

Are you bogged down by having to manually manage data access controls, repeatedly move and copy data, and create audit reports to prove compliance? Immuta is an automated data governance solution that enables safe and easy data analytics in the cloud. What is "data downtime"?

Machine Learning

Machine Learning Data Engineering Data Engineer Data

Data Migration Risks and the Checklist You Need to Avoid Them

Monte Carlo

MARCH 24, 2023

Sure, terabytes or even petabytes of data are involved, but generally it’s not the size of the data but everything surrounding the data–workflows, access permissions, layers of dependencies–that pose data migration risks. Data governance, compliance and access management Moving a table is relatively simple.

Data Warehouse

Data Warehouse AWS Cloud Data Governance

Big Data (Quality), Small Data Team: How Prefect Saved 20 Hours Per Week with Data Observability

Monte Carlo

SEPTEMBER 20, 2022

Here’s how Prefect , Series B startup and creator of the popular data orchestration tool, harnessed the power of data observability to preserve headcount, improve data quality and reduce time to detection and resolution for data incidents. Our data analyst uses Monte Carlo’s lineage almost every day,” Dylan said.

Big Data

Big Data Data Warehouse Data Data Governance

The Top Data Strategy Influencers and Content Creators on LinkedIn

Databand.ai

DECEMBER 29, 2022

Follow Scott on LinkedIn 2) Mico Yuk Chief Data Evangelist at Count and Co-founder of BI Brainz Group Mico Yuk is the Chief Data Evangelist at Count.co, Co-founder of BI Brainz, Host of the Analytics on Fire Podcast, and the mastermind behind the BI/Analytics Data Storytelling Framework (BIDS).

BI

BI Consulting Data Science Data Governance

Data Pipeline Architecture Explained: 6 Diagrams and Best Practices

Monte Carlo

JUNE 14, 2023

Data pipeline architecture is the process of designing how data is surfaced from its source system to the consumption layer. It’s important to understand most data pipelines aren’t a linear movement of data from source A to target B, but rather consist of a series of highly complex and interdependent processes.

Data Pipeline

Data Pipeline Architecture Data Lake Data Warehouse

The Advantages Of Live Data-Streaming In The Competitive Financial Services Sector (Part I)

Cloudera

AUGUST 21, 2020

Businesses need to be able to ingest huge volumes of data from these data points as well as handle, process, and store this vast amount of data. Then they need to move to data separation so that they not only ingest the data but prepare the data so that it becomes processable.

Banking

Banking Kafka Cloud Storage Government

DataOps: What Is It, Core Principles, and Tools For Implementation

phData: Data Engineering

JANUARY 3, 2022

In this post, we will explore the complexities involved with software engineering with a focus on data engineering and data operations (DataOps). We’ll work through the different facets of taking your data and extracting business value with the same rigor and process companies apply to product development. No problem!

IT

IT AWS Software Engineer Software Engineering

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Migration Strategies For Large Scale Systems

Webinars

Trending Sources

Release Management For Data Platform Services And Logic

Webinars

Build Your Second Brain One Piece At A Time

Data Sharing Across Business And Platform Boundaries

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Tackling Real Time Streaming Data With SQL Using RisingWave

Reconciling The Data In Your Databases With Datafold

Addressing The Challenges Of Component Integration In Data Platform Architectures

Troubleshooting Kafka In Production

Designing Data Platforms For Fintech Companies

When And How To Conduct An AI Program

Designing Data Transfer Systems That Scale

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Version Your Data Lakehouse Like Your Software With Nessie

Toward a Data Mesh (part 2) : Architecture & Technologies

Adding An Easy Mode For The Modern Data Stack With 5X

DataOps Framework: 4 Key Components and How to Implement Them

Unlocking Your dbt Projects With Practical Advice For Practitioners

The Evolution of Table Formats

Shining Some Light In The Black Box Of PostgreSQL Performance

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

DataOps Tools: Key Capabilities & 5 Tools You Must Know About

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Unified DataOps: Components, Challenges, and How to Get Started

What is Data Orchestration?

The Modern Data Stack: What It Is, How It Works, Use Cases, and Ways to Implement

Top 10 Azure Data Engineer Job Opportunities in 2024 [Career Options]

ETL for Snowflake: Why You Need It and How to Get Started

The DataOps Vendor Landscape, 2021

Better Data Quality Through Observability With Monte Carlo

Data Migration Risks and the Checklist You Need to Avoid Them

Big Data (Quality), Small Data Team: How Prefect Saved 20 Hours Per Week with Data Observability

The Top Data Strategy Influencers and Content Creators on LinkedIn

Data Pipeline Architecture Explained: 6 Diagrams and Best Practices

The Advantages Of Live Data-Streaming In The Competitive Financial Services Sector (Part I)

DataOps: What Is It, Core Principles, and Tools For Implementation

Stay Connected