July, 2024

article thumbnail

What are the types of data quality checks?

Start Data Engineering

1. Introduction 2. Data Quality(DQ) checks are run as part of your pipeline 2.1. Ensure your consumers don’t get incorrect data with output DQ checks 2.2. Catch upstream issues quickly with input DQ checks 2.3. Waiting a long time to run output DQ checks? Save time & money with mid-pipeline DQ checks. 2.4. Track incoming and outgoing row counts with Audit logs 3.

Data 215
article thumbnail

DAIS 2024: Testing framework from the Dataflow model for Apache Spark Structured Streaming

Waitingforcode

With this blog I'm starting a follow-up series for my Data+AI Summit 2024 talk. I missed this family of blog posts a lot as the previous DAIS with me as speaker was 4 years ago! As previously, this time too I'll be writing several blog posts that should help you remember the talk and also cover some of the topics left aside because of the time constraints.

Data 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

The software engineering industry in 2024: what changed, why, and what is next

The Pragmatic Engineer

The past 18 months have seen major change reshape the tech industry. What does it all mean for businesses and dev teams – and what will pragmatic software engineering approaches look like in the future? I tackled these burning questions in my conference talk, “What’s Old is New Again,” which was the keynote of the Craft Conference in May 2024.

article thumbnail

Data News — Week 24.28

Christophe Blefari

EuroSeagull ( credits ) Dear members, it's been a few weeks since I did not catch you on a proper Data News with a collection of links. Here we are. This week, I attended EuroPython in Prague. While I spent most of my time at the dltHub booth in the sponsors hall, I didn't attend many talks. However, I did give a few presentations on my SQL orchestration library, yato , which pairs well with dlt.

Kafka 130
article thumbnail

Demystifying DAPs: A Practical Guide to Digital Adoption Success

Speaker: Pulkit Agrawal

Digital Adoption Platforms (DAPs) are revolutionizing the way organizations interact with and optimize their software applications. As digital transformation continues to accelerate, DAPs have become essential tools for enhancing user engagement and software efficiency. This session is your guide into the robust world of DAPs, exploring their origins, evolution, and the current trends shaping their development.

article thumbnail

9 Habits Of Effective Data Managers – Running A Data Team

Seattle Data Guy

Running a successful data team is hard. Data teams are expected to juggle a combination of ad-hoc requests, big bet projects, migrations, etc. All while keeping up with the latest changes in technology. In the past few years I have gotten to work with dozens of teams and see how various directors and managers deal… Read more The post 9 Habits Of Effective Data Managers – Running A Data Team appeared first on Seattle Data Guy.

article thumbnail

Landing a Data Engineer Role: Free Courses and Certifications

KDnuggets

Is it possible to learn data engineering for free? I claim it is and present the evidence for that in the form of 10 free data engineering courses.

More Trending

article thumbnail

Data+AI Summit 2024 - Retrospective - Streaming

Waitingforcode

Welcome to the first Data+AI Summit 2024 retrospective blog post. I'm opening the series with the topic close to my heart at the moment, stream processing!

Data 130
article thumbnail

The Abstractions Are Making You Dumb (rise of the Shallow Expert)

Confessions of a Data Guy

When I was young and full of myself, writing Perl and PHP, while your ma was still reading you a bedtime story and giving you a stuffy to fall asleep with, I had to program uphill, both ways, in the rain and snow. Not like you milk toast Data Engineers clickty clicking around Databricks and […] The post The Abstractions Are Making You Dumb (rise of the Shallow Expert) appeared first on Confessions of a Data Guy.

article thumbnail

AI Lab: The secrets to keeping machine learning engineers moving fast

Engineering at Meta

The key to developer velocity across AI lies in minimizing time to first batch (TTFB) for machine learning (ML) engineers. AI Lab is a pre-production framework used internally at Meta. It allows us to continuously A/B test common ML workflows – enabling proactive improvements and automatically preventing regressions on TTFB. AI Lab prevents TTFB regressions whilst enabling experimentation to develop improvements.

article thumbnail

Welcoming Prodvana to Databricks: Investing in Next-Gen Infrastructure

databricks

The Prodvana team joins Databricks to support new innovations in the Data Intelligence Platform infrastructure. Learn more about the vision and what's ahead.

Data 114
article thumbnail

Provide Real Value in Your Applications with Data and Analytics

The complexity of financial data, the need for real-time insight, and the demand for user-friendly visualizations can seem daunting when it comes to analytics - but there is an easier way. With Logi Symphony, we aim to turn these challenges into opportunities. Our platform empowers you to seamlessly integrate advanced data analytics, generative AI, data visualization, and pixel-perfect reporting into your applications, transforming raw data into actionable insights.

article thumbnail

Top 8 GenAI Courses for AWS to Take Now

KDnuggets

This article is for anyone looking to maximize their use of Amazon Web Services (AWS) generative AI (GenAI) services. Here are eight courses that range from beginner to expert level.

AWS 126
article thumbnail

Generative AI in Urban Planning

ArcGIS

Planning a city block, a neighborhood, or maybe a whole new city is a multifaceted task with no universal recipe to use. How can Generative AI help Urban Planners?

Designing 105
article thumbnail

Data Engineering Weekly #180

Data Engineering Weekly

Canva: How Canva collects 25 billion events per day Canva writes about its event collection infrastructure capabilities, handling 25 billion events per day (800 billion events per month) with 99.999% uptime. At our team’s inception, a key decision we made, one we still believe to be a big part of our success, was that every collected event must have a machine-readable, well-documented schema.

article thumbnail

Snowflake’s Summer of Sports and AI

Snowflake

All eyes are on sports this summer, with blockbuster events happening in everything from soccer and cycling to cricket and car racing. Snowflake is excited to join the action with a virtual “relay race,” where Snowflake sports and data experts, customers and partners will demonstrate how the sports industry can win big with data and AI. Industry leaders already know that sports runs on data analytics: from individual athlete performance and team statistics, to marketing and fan engagement, to ti

article thumbnail

Entity Resolution: Your Guide to Deciding Whether to Build It or Buy It

Adding high-quality entity resolution capabilities to enterprise applications, services, data fabrics or data pipelines can be daunting and expensive. Organizations often invest millions of dollars and years of effort to achieve subpar results. This guide will walk you through the requirements and challenges of implementing entity resolution. By the end, you'll understand what to look for, the most common mistakes and pitfalls to avoid, and your options.

article thumbnail

Delivering Reliable Data and AI Pipelines with Monte Carlo and MotherDuck

Monte Carlo

The DuckDB hype is real — this in-process analytical database has skyrocketed in popularity over the last few years. Known for its columnar storage, vectorized query execution, and scale-up approach to SQL analytics, DuckDB fans proclaim it’s faster, more efficient, and more affordable than other databases. DuckDB is also becoming a must-have layer in many AI stacks.

article thumbnail

Announcing Mosaic AI Agent Framework and Agent Evaluation

databricks

Databricks announced the public preview of Mosaic AI Agent Framework & Agent Evaluation alongside our Generative AI Cookbook at the Data + AI.

Data 112
article thumbnail

The Role of AI in Digital Marketing

KDnuggets

Artificial intelligence (AI) has revolutionized numerous sectors, including digital marketing. This field leverages online platforms to promote products and services.

111
111
article thumbnail

Understand flooding using ArcGIS Pro with new flood simulation workflows, Arc Hydro and the Flood Impact Analysis solution

ArcGIS

Learn more about the collection of data models, workflows, and planning tools tailored for flooding available in ArcGIS Pro 3.3.

Data 113
article thumbnail

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Speaker: Maher Hanafi, VP of Engineering at Betterworks & Tony Karrer, CTO at Aggregage

Executive leaders and board members are pushing their teams to adopt Generative AI to gain a competitive edge, save money, and otherwise take advantage of the promise of this new era of artificial intelligence. There's no question that it is challenging to figure out where to focus and how to advance when it’s a new field that is evolving everyday. 💡 This new webinar featuring Maher Hanafi, VP of Engineering at Betterworks, will explore a practical framework to transform Generative AI pr

article thumbnail

Data Engineering Weekly #179

Data Engineering Weekly

Experience Enterprise-Grade Apache Airflow Astro augments Airflow with enterprise-grade features to enhance productivity, meet scalability and availability demands across your data pipelines, and more. Learn More → Notion: Building and scaling Notion’s data lake Notion writes about scaling the data lake by bringing critical data ingestion operations in-house.

article thumbnail

Snowflake Expands Leading AI Data Cloud into Global Regulated and Sovereign Markets

Snowflake

Regulated and sovereign markets across the world have stringent requirements stipulating certain important data be kept within geographical borders or even for certain workloads to have dedicated environments, separate from those of other customers. In these markets, organizations need a secure and well-governed data foundation with effective controls to help comply with regulatory requirements.

Cloud 80
article thumbnail

Modernizing Logging at Uber with CLP (Part II)

Uber Engineering

Modernizing the fundamentals of log management at Uber: How we used CLP to build a new logging infra that lets users view and analyze their logs seamlessly, at scale!

article thumbnail

Databricks Named a Leader in Stream Processing and Cloud Data Pipelines

databricks

We are proud to announce two new analyst reports recognizing Databricks in the data engineering and data streaming space: IDC MarketScape: Worldwide Analytic.

article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

Convert Bytes to String in Python: A Tutorial for Beginners

KDnuggets

Strings are common built-in data types in Python. But sometimes, you may need to work with bytes instead. Let’s learn how to convert bytes to string in Python.

Bytes 119
article thumbnail

How to best create large 3D web layers in ArcGIS

ArcGIS

You can host scene layers and 3D tiles layers in ArcGIS Online or reference datasets in cloud storage in ArcGIS Enterprise.

article thumbnail

How Google Security Operations Integration Protects Your IBM i and Z Data

Precisely

Key Takeaways: IBM mainframes present unique security challenges that make comprehensive visibility a must-have for modern IT security strategies. A siloed approach to security solutions doesn’t work anymore; strategic business-driven security is essential. Precisely Ironstream facilitates seamless real-time data integration to Google Security Operations, for faster and more effective threat management.

Data 63
article thumbnail

16 Ways Insurance Companies Can Use Data and AI

Snowflake

How insurance leaders can use the power of data and AI to transform the industry, from claims analytics to risk selection and beyond There is a growing recognition that insurers can introduce data, analytics and AI into virtually all of the important insurance functions and workflows, including product development, pricing and risk selection, underwriting, claims management, contact center optimization, distribution management, reinsurance, and understanding and shaping customer journeys.

article thumbnail

Deliver Mission Critical Insights in Real Time with Data & Analytics

In the fast-moving manufacturing sector, delivering mission-critical data insights to empower your end users or customers can be a challenge. Traditional BI tools can be cumbersome and difficult to integrate - but it doesn't have to be this way. Logi Symphony offers a powerful and user-friendly solution, allowing you to seamlessly embed self-service analytics, generative AI, data visualization, and pixel-perfect reporting directly into your applications.

article thumbnail

Introduction to Kafka Tiered Storage at Uber

Uber Engineering

Kafka Tiered Storage, developed in collaboration with the Apache Kafka community, introduces the separation of storage and processing in brokers, significantly improving the scalability, reliability, and efficiency of Kafka clusters.

Kafka 75
article thumbnail

Harnessing Enterprise AI: Innovations & Wins at Databricks

databricks

Discover how Databricks unlocks the transformative power of enterprise AI, from fraud detection to financial forecasting, and learn to harness AI's potential in your business.

86
article thumbnail

5 Free Online Courses to Learn Data Science Fundamentals

KDnuggets

Learn SQL, Python, statistics, mathematics, and data analysis—everything you need to learn before you start the journey of becoming a professional data scientist.

article thumbnail

Introducing Cloudera Observability Premium

Cloudera

There’s nothing worse than wasting money on unnecessary costs. In on-premises data estates, these costs appear as wasted person-hours waiting for inefficient analytics to complete, or troubleshooting jobs that have failed to execute as expected, or at all. They manifest as idle hardware waiting for urgent workloads to come in, ensuring sufficient spare capacity to run them amidst noisy neighbors and resource-hungry, lower-priority workloads.

article thumbnail

Using Data & Analytics for Improving Healthcare Innovation and Outcomes

In the rapidly evolving healthcare industry, delivering data insights to end users or customers can be a significant challenge for product managers, product owners, and application team developers. The complexity of healthcare data, the need for real-time analytics, and the demand for user-friendly interfaces can often seem overwhelming. But with Logi Symphony, these challenges become opportunities.