Blog - Data Engineering Digest

data-science-docker-getting-setup

Blog

Streamline Data Pipelines: How to Use WhyLogs with PySpark for Data Profiling and Validation

Towards Data Science

JANUARY 7, 2024

Streamline Data Pipelines: How to Use WhyLogs with PySpark for Effective Data Profiling and Validation Photo by Evan Dennis on Unsplash Data pipelines, made by data engineers or machine learning engineers, do more than just prepare data for reports or training models. It lets you log all sorts of data.

Data Pipeline

Data Pipeline Hospitality Data Validation Datasets

Data Pipeline- Definition, Architecture, Examples, and Use Cases

ProjectPro

DECEMBER 7, 2021

Data pipelines are a significant part of the big data domain, and every professional working or willing to work in this field must have extensive knowledge of them. As data is expanding exponentially, organizations struggle to harness digital information's power for different business use cases. What is a Big Data Pipeline?

Data Pipeline

Data Pipeline Architecture Kafka AWS

Join 16,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Trending Sources

7 Lessons From GoCardless’ Implementation of Data Contracts

Monte Carlo

JULY 7, 2022

At the time, he was one of a very few people using the term “data contract.” Data contracts have since became one of the most discussed topics in data engineering. For posterity, we have preserved Barr’s forward that examines what was then a very nascent trend, but we have also added an updated data contract FAQ as an addendum.

Data Warehouse

Data Warehouse Software Engineer Software Engineering Data

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

A Machine Learning Pipeline with Real-Time Inference

Zalando Engineering

FEBRUARY 15, 2021

Everything started with a simple Python and scikit-learn setup. You can read about this transition on our engineering blog. We receive a JSON response with order data, and return a response in a JSON format. The preprocessing applied to incoming requests in production must be identical to that applied to the training data.

Machine Learning

Machine Learning AWS Scala Python

Streamline Data Pipelines: How to Use WhyLogs with PySpark for Data Profiling and Validation

Data Pipeline- Definition, Architecture, Examples, and Use Cases

Webinars

Trending Sources

7 Lessons From GoCardless’ Implementation of Data Contracts

Webinars

A Machine Learning Pipeline with Real-Time Inference

Stay Connected