Tue.Jun 10, 2025

article thumbnail

7 Python Errors That Are Actually Features

KDnuggets

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter 7 Python Errors That Are Actually Features You never expected these Python errors to help your work, but they do!

Python 86
article thumbnail

Using Joins and Group Bys the right way for data warehousing

Start Data Engineering

1. Introduction 2. Joins & Group bys are two of the most commonly used operations in data warehousing 2.1. Joins are used to create denormalized dimension tables & to enrich fact tables with dimensions for reporting 2.1.1. When to use joins 2.1.2. How to use joins 2.1.3. Things to watch out for when joining 2.2. Group bys are the cornerstone of reporting 2.

Data 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Integrating DuckDB & Python: An Analytics Guide

KDnuggets

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter Integrating DuckDB & Python: An Analytics Guide Learn how to run lightning-fast SQL queries on local files with ease.

Python 107
article thumbnail

Apache Iceberg v3 Table Spec: Celebrating the OSS Community’s Shared Success

Snowflake

The Apache Iceberg™ project exemplifies the spirit of open source and shows what’s possible when a community comes together with a common goal: to drive a technology forward. With a mission to bring reliability, performance and openness to large-scale analytics, the Iceberg project continues to evolve and offer many benefits thanks to the diverse voices and efforts of its contributors.

article thumbnail

Whats New in Apache Airflow 3.0 –– And How Will It Reshape Your Data Workflows?

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

From AI Chaos to Control: A Flexible Data Integrity Ecosystem

Precisely

If you’re leading any kind of AI initiative right now, you already know the opportunities are vast – but so is the complexity. Between widespread generative AI adoption, a wide variety of LLM options, and compelling visions of agentic AI-fueled automation, the pace of innovation is extraordinary. But the fact is this: we won’t get the most from our AI initiatives unless we have full control: control over the technologies we use, how we use them, where, and – most importantly – the data tha

article thumbnail

Build Better Data Pipelines with SQL and Python in Snowflake

Snowflake

Data transformations are the engine room of modern data operations — powering innovations in AI, analytics and applications. As the core building blocks of any effective data strategy, these transformations are crucial for constructing robust and scalable data pipelines. Today, we're excited to announce the latest product advancements in Snowflake to build and orchestrate data pipelines.

More Trending

article thumbnail

Snowflake Postgres: Built for Developers, Ready for the Enterprise

Snowflake

PostgreSQL has become the undisputed choice for developers worldwide, celebrated for its open source flexibility, vibrant ecosystem and growing AI capabilities like vector support. But as companies race to build the next generation of AI agents and scale their critical operational systems, a fundamental question emerges: Is your Postgres truly ready for the enterprise, or does it come with hidden compromises?

article thumbnail

Data Observability vs. Monitoring: What’s the Difference, Really?

Monte Carlo

Data engineering is full of buzzwords—data mesh, reverse ETL, lakehouse, you name it. It’s easy to tune them out. So when someone drops “data observability,” it’s fair to ask: what’s data observability vs. monitoring? If you’ve ever wrestled with broken dashboards, missing data, or a pipeline that quietly failed overnight, you know how frustrating it is to figure out what went wrong.

Data 52
article thumbnail

Beyond the Hype: Event-Driven Architecture – The only data integration approach you need? by Oliver Cronk

Scott Logic

In this episode, I dive into the world of Event-Driven Architecture (EDA) with Tom Fairbairn from Solace and Scott Logic’s Gordon Campbell. The discussion explores whether EDA has matured beyond the hype into a practical strategy for modern systems integration, or if it’s just another architectural buzzword. Together, we unpack the core principles of EDA, its role in taming point-to-point integration chaos, and how asynchronous processing can help smooth demand spikes.

article thumbnail

Monte Carlo Expands Databricks Partnership with Support for AI/BI and Unity Catalog

Monte Carlo

Monte Carlo, the leader in data + AI observability, today announced extended support for the Databricks Data Intelligence Platform through new integrations with Databricks AI/BI and Unity Catalog Metrics. These enhancements, unveiled ahead of the Databricks Data + AI Summit 2025 , represent a major milestone in enabling AI-ready data at scale for joint customers of Databricks and Monte Carlo.

BI 52
article thumbnail

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

From Data to Decisions: Market Basket Analysis for Retailers Using Python

WeCloudData

In today’s data-driven world, understanding customer purchasing behavior plays a crucial role for businesses aiming to enhance sales and customer satisfaction. Market Basket Analysis is a powerful technique that helps in discovering associations between products purchased together, enabling retailers to make informed decisions on product placements, promotions, and recommendations.

Retail 52
article thumbnail

Selling Your Side Project? 10 Marketplaces Data Scientists Need to Know

KDnuggets

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter Selling Your Side Project? 10 Marketplaces Data Scientists Need to Know That app collecting dust on your GitHub?

Project 79
article thumbnail

Monte Carlo Named 2025 Databricks Data Governance Partner of the Year

Monte Carlo

Monte Carlo was officially named the 2025 Databricks Data Governance Partner of the Year. This award highlights Monte Carlo’s continued innovation as we strive to help enterprise teams ensure reliable data and AI systems through end-to-end data + AI observability. Read on to learn more! Table of Contents Doing More for Databricks Customers Stop by the Monte Carlo Booth at Databricks Data + AI Summit to Learn More Doing More for Databricks Customers The award was presented at the annual Dat

article thumbnail

Announcing the New Enterprise Tier for Databricks on Google Cloud

databricks

To help organizations meet growing demands around data security and compliance, we’re excited to introduce a new platform tier, the Enterprise tier for Databricks on

article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

Joe Reis on Staying Grounded in a Fast-Moving Data World

Striim

Joe Reis joins us to reflect on life after Fundamentals of Data Engineering, what makes data content worth consuming, and why good taste matters as much as technical skill. We talk about burnout in big tech, the myth of AI replacing everyone, and how Discord communities, DJ sets, and a sense of humor are helping shape the future of data. This one’s part industry pulse check, part real talk.

Data 52
article thumbnail

Empower Your AI Efforts with Data Governance

Elder Research

Before implementing some of the most advanced technology in the world, take the time to get your data in order and understand the best ways to leverage AI for your goals.

article thumbnail

What’s new in security and compliance at Data + AI Summit 2025

databricks

Over the past year, we’ve continued to expand our security and compliance offerings to meet the evolving needs of regulated industries, privately connect to external

Data 52
article thumbnail

From Tweets To Trust: How Banks Are Winning the Social Media Game

Teradata

Skip to main content Support Global Global Deutschland France 日本 대한민국 Why Teradata Our platform Getting started Insights About us search Try for free Contact us search Join us at Possible 2025. Register now Join us at Possible 2025. Register now Home Insights Artificial Intelligence Article From Tweets To Trust: How Banks Are Winning the Social Media Game See how banks are leveraging social media and customer journey analytics to stay one step ahead in t

Banking 52
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Snowflake Summit 2025 recap: Launches, live demos, and real-time data

RudderStack

This article recaps RudderStack's participation at the Snowflake Summit 2025.

Data 40
article thumbnail

Use the Feature Preserving Smoothing tool on elevation surfaces

ArcGIS

ArcGIS Blog Menu Overview Topics Search ArcGIS Blog ArcGIS Blog Analytics ArcGIS Spatial Analyst Jun 10, 2025 Use the Feature Preserving Smoothing tool on elevation surfaces By Xuguang Wang With the ever-increasing availability of digital elevation model (DEM) data, and its resolution becoming finer and finer, some data can become overly detailed for the phenomena we are trying to model.