Sat.Apr 26, 2025 - Fri.May 02, 2025

article thumbnail

Chief Data Officers: You need to be a Marketing Pro, Too

Precisely

Throughout my career I’ve traveled down two paths, and Im passionate about both. My first path centered on data strategy and management, teaching me that trusted data delivers great business outcomes. As a data management practitioner, I built and scaled data quality, master data management, and data governance solutions for a variety of organizations.

article thumbnail

5 Open-Source AI Tools That Are Worth Your Time

KDnuggets

Learn five powerful open-source AI tools to boost your projects, save time, and stay ahead in AI innovation.

Project 139
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

The Best Data Dictionary Tools in 2025

Monte Carlo

Different teams love using the same data in totally different ways. Eventually, it gets to the point where everyone has their own secret nickname for the same customer fieldlike Sales calling it cust_id, while Marketing goes with user_ref. And yeah… thats kind of a problem. Thats where data dictionary tools come in. A data dictionary tool helps define and organize your data so everyones speaking the same language.

article thumbnail

Snowflake Data Quality Framework: Validate, Monitor, and Trust Your Data

Cloudyard

Read Time: 2 Minute, 3 Second In todays cloud-first landscape, the integrity of data pipelines is crucial for operational success, regulatory compliance, and business decision-making. This blog, “Snowflake Data Quality Framework: Validate, Monitor, and Trust Your Data,” will walk you through a Snowflake-native, dynamic, and extensible Data Quality (DQ) Framework capable of automatically validating data pipelines, logging results, and monitoring anomalies in near real-time.

article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

10 Essential Data Cleaning Techniques Explained in 12 Minutes

KDnuggets

Clean your data like a pro with these 10 essential techniques packed into a 12-minute crash course.

Data 118
article thumbnail

How Meta understands data at scale

Engineering at Meta

Managing and understanding large-scale data ecosystems is a significant challenge for many organizations, requiring innovative solutions to efficiently safeguard user data. Meta’s vast and diverse systems make it particularly challenging to comprehend its structure, meaning, and context at scale. To address these challenges, we made substantial investments in advanced data understanding technologies, as part of our Privacy Aware Infrastructure (PAI).

More Trending

article thumbnail

Network dynamics in the age of AI

databricks

In our highly (inter)connected world, with the growing impact of AI on almost every facet of business, organizations must redefine, cement, and extend not only

Data 92
article thumbnail

How To Migrate From SQL Server To Snowflake

Seattle Data Guy

Over the past three years our teams have noticed a pattern. Many companies looking to migrate to the cloud go from SQL Server to Snowflake. There are many reasons this makes sense. One of the reasons and common benefits was that teams found it far easier to manage that SQL Server and in almost every… Read more The post How To Migrate From SQL Server To Snowflake appeared first on Seattle Data Guy.

SQL 130
article thumbnail

Building Private Processing for AI tools on WhatsApp

Engineering at Meta

We are inspired by the possibilities of AI to help people be more creative, productive, and stay closely connected on WhatsApp, so we set out to build a new technology that allows our users around the world to use AI in a privacy-preserving way. Were sharing an early look into Private Processing, an optional capability that enables users to initiate a request to a confidential and secure environment and use AI for processing messages where no one including Meta and WhatsApp can access them.

Process 126
article thumbnail

Snowflake Ransomware Guardrails

Snowflake

Ransomware is a type of malicious software that encrypts a victim's data or locks their device, demanding a ransom to restore access or to not expose the data. It poses significant risks to companies, including financial losses from ransom payments and data restoration, operational disruptions, legal consequences and reputational damage. Additionally, sensitive data may be stolen and leaked, leading to further harm.

article thumbnail

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Accelerating AI Ambitions in the Nuclear Industry

databricks

Introduction Nuclear energy ranks among the worlds most regulated industries.

IT 92
article thumbnail

AI Evaluation in dbt

dbt Developer Hub

The AI revolution is herebut are we ready? Across the world, the excitement around AI is undeniable. Discussions on large language models, agentic workflows, and how AI is set to transform every industry abound, yet real-world use cases of AI in production remain few and far between. A common issue blocking people from moving AI use cases to production is an ability to evaluate the validity of AI responses in a systematic and well governed way.

article thumbnail

Data Link for Dun & Bradstreet is a Game-Changer: Here’s Why

Precisely

Earlier this year, Precisely announced Data Link : an ecosystem of pre-linked datasets from leading data providers. And now, I get to say that Dun & Bradstreet , a leading global provider of business decisioning data and a nalytics , has joined this groundbreaking program. Ive previously shared how Preciselys Data Link program will streamline the biggest challenges that businesses face today when it comes to onboarding and reconciling third-party datasets across multiple providers think com

article thumbnail

Data Engineering Weekly #218

Data Engineering Weekly

Try Apache Airflow® 3 on Astro Airflow 3 is here and has never been easier to use or more secure. Spin up a new 3.0 deployment on Astro to test DAG versioning, backfills, event-driven scheduling, and more. Get started → Chip Huyen: Exploring three strategies - functional correctness, AI-as-a-judge, and comparative evaluation As AI development becomes mainstream, so does the need to adopt all the best practices in software engineering.

article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Data quality on Databricks - DQX

Waitingforcode

In the last blog post of the data quality on Databricks series we're going to discover a Databricks Labs product, the DQX library.

Data 130
article thumbnail

Databricks Invests in LlamaIndex to Advance Knowledge Agents over Enterprise Data

databricks

While companies today increasingly recognize the potential of custom AI agents, many still struggle to build and scale these applications.

Data 113
article thumbnail

Scaling Data Pipelines for a Growth-Stage Fintech with Incremental Models

dbt Developer Hub

Introduction Building scalable data pipelines in a fast-growing fintech can feel like fixing a bike while riding it. You must keep insights flowing even as data volumes explode. At Kuda (a Nigerian neo-bank), we faced this problem as our user base surged. Traditional batch ETL (rebuilding entire tables each run) started to buckle; pipelines took hours, and costs ballooned.

article thumbnail

Infrastructure as Code (IaC)

WeCloudData

Infrastructure as Code (IaC) offers an efficient, reproducible, and error-resistant approach to managing infrastructure. IaC has become a vital strategy for modern IT teams that seek scalability and agility. This blog explores Infrastructure as Code (IaC), its use cases, benefits, tools, and how AWS and Azure are enhancing IaC practices. Lets start learning with WeCloudData!

Coding 52
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Breaking Out of Beginner: Python Patterns for Intermediate Data Scientists

KDnuggets

Learn to leverage Python patterns like a professional.

Python 104
article thumbnail

SQL Gets Easier: Announcing New Pipe Syntax

databricks

SQL has been the lingua franca for structured data analysis for multiple decades, and we have done a lot of work in the last few

SQL 107
article thumbnail

What are Vision Language Models and how do they work?

Edureka

Vision Language Models (VLMs) represent a substantial development in machine learning by merging computer vision with natural language processing (NLP) capabilities. By combining them, VLMs enable robots to do activities that require both visual and textual inputs. These models have been useful in a variety of applications, including picture captioning, visual question answering (VQA), and cross-modal search engines.

article thumbnail

Cloud Storage

WeCloudData

Our digital lives would be much different without cloud storage, which makes it easy to share, access, and protect data across platforms and devices. The cloud market has huge potential and is continuously evolving with the advancement in technology and time. This blog highlights cloud storage mechanisms, cost models, trends, service providers, and the benefits […] The post Cloud Storage appeared first on WeCloudData.

article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

OpenRouter: A Unified Interface for LLMs

KDnuggets

Explore a marketplace for LLM APIs where you can effortlessly access and pay for top-tier AI models without the usual hassle.

article thumbnail

Is Your Team in Denial of Data Quality? Here’s How to Tell

DataKitchen

Is Your Team in Denial of Data Quality? Here’s How to Tell In many organizations, data quality problems fester in the shadowsignored, rationalized, or swept aside with confident-sounding statements that mask a deeper dysfunction. While the cost of poor data quality is well-documentedwasted time, lost trust, and flawed decisionsteams often operate in collective denial.

Data 49
article thumbnail

What is the Inception Score (IS)?

Edureka

Imagine you’re generating synthetic fashion designs using a GAN, and you want to assess whether your AI is producing realistic and varied outfits. How do you measure that—especially without human judgment? This is where the Inception Score (IS) becomes incredibly valuable. Widely used in evaluating Generative Adversarial Networks (GANs) , IS quantifies how realistic and diverse your AI-generated images are.

article thumbnail

Announcing the General Availability of SAP Databricks on SAP Business Data Cloud

databricks

SAP Databricks in SAP Business Data Cloud is now generally available.

Cloud 94
article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

FireDucks: An Accelerated Fully Compatible Pandas Library

KDnuggets

Swiftly process your data without hassle with Pandas-like APIs.

Process 97
article thumbnail

Understanding Wildfire Risk: Smarter Data for Better Coverage and Risk Management

Precisely

In my experience working with insurers, accurately assessing wildfire risk has long been a challenge and today, that challenge is more pressing than ever. Research shows that the risk of extreme wildfires has doubled in the past 20 years alone, which makes increasing the accuracy of risk assessments a top priority. Wildfires were once thought of as more of a seasonal threat confined to forests and undeveloped areas, but unfortunately, this perception no longer holds up.

article thumbnail

Taking the plunge: The engineering journey of building a subsea cable

Engineering at Meta

Meta develops infrastructure all across the globe to transport information and content for the billions of people using our services around the world. At the core of this infrastructure are aggregation points like data centers and the digital cables that connect them. Subsea cables the unseen digital highways of the internet are critical for Meta to serve people wherever they are in the world.

article thumbnail

Announcing the General Availability of SAP Databricks on AWS

databricks

SAP Databricks in SAP Business Data Cloud is now generally available on AWS.

AWS 83
article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m