Mon.Feb 03, 2025

article thumbnail

How to turn a 1000-line messy SQL into a modular, & easy-to-maintain data pipeline?

Start Data Engineering

1. Introduction 2. Split your SQL into smaller parts 2.1. Start with a baseline validation to ensure that your changes do not change the output too much 2.2. Split your CTAs/Subquery into separate functions (or models if using dbt) 2.3. Unit test your functions for maintainability and evolution of logic 3. Conclusion 4. Required reading 1. Introduction If you’ve been in the data space long enough, you would have come across really long SQL scripts that someone had written years ago.

SQL 147
article thumbnail

5 AI Agent Frameworks Compared

KDnuggets

Check out this comparison of 5 AI frameworks to determine which you should choose.

145
145
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Why Pivot Tables Never Die

Simon Späti

While everyone’s talking about AI revolutionizing business, there’s a quiet renaissance happening with one of the most influential business tools created: the pivot table. In 2025, we’re witnessing something remarkable - modern data tools are bringing pivot tables back to the forefront. But why would cutting-edge platforms invest in a decades-old spreadsheet feature?

Coding 130
article thumbnail

How to Fine-Tune DeepSeek-R1 for Your Custom Dataset (Step-by-Step)

KDnuggets

Fine-tune the DeepSeek model step by step. even if you're new to LLMs!

Datasets 125
article thumbnail

Apache Airflow® 101 Essential Tips for Beginners

Apache Airflow® is the open-source standard to manage workflows as code. It is a versatile tool used in companies across the world from agile startups to tech giants to flagship enterprises across all industries. Due to its widespread adoption, Airflow knowledge is paramount to success in the field of data engineering.

article thumbnail

How Precision Time Protocol handles leap seconds

Engineering at Meta

Weve previously described why we think its time to leave the leap second in the past. In todays rapidly evolving digital landscape, introducing new leap seconds to account for the long-term slowdown of the Earths rotation is a risky practice that, frankly, does more harm than good. This is particularly true in the data center space, where new protocols like Precision Time Protocol (PTP) are allowing systems to be synchronized down to nanosecond precision.

Algorithm 103

More Trending

article thumbnail

Building AI Application with Gemini 2.0

KDnuggets

Learn to create a document-based chatbot with memory, powered by one of the world's top-performing LLMs.

Building 107
article thumbnail

Unlock Cost Savings with Freight Clusters–Now in General Availability

Confluent

Confluent Cloud Freight clusters are now Generally Available on AWS.

AWS 78
article thumbnail

Search Query Understanding with LLMs: From Ideation to Production

Yelp Engineering

How we bring LLM intelligence to millions of daily searches at Yelp. From the moment a user enters a search query to when we present a list of results, understanding the users intent is crucial for meeting their needs. Were they looking for a general category of business for that evening, a particular dish or service, or one specific business nearby?

IT 77
article thumbnail

Integrating Address Data Management Solution with ArcGIS Roads and Highways

ArcGIS

Streamline data management, reduce duplication, enhance data quality, and provide a single source of truth for address and road data.

article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Top Ranked, Flexible MS in Applied Data Science

KDnuggets

Start Advancing Your Career this Year. Online Courses Start on March 3rd.

article thumbnail

ArcGIS CityEngine: Procedural Urban Design for a Waterfront Destination in Jeddah

ArcGIS

Explore how Dar used ArcGIS CityEngine for procedural urban design to transform a brownfield into a vibrant waterfront destination in Jeddah.

article thumbnail

Faster, Smarter Customer Experiences Begin Here

Precisely

Key Takeaways A unified customer communication management (CCM) solution eliminates reliance on IT for communication updates, which empowers business users to create and deploy content quickly. Fast, personalized, and seamless customer communications help you build customer trust and drive loyalty. Save time and money with streamlined processes and automation that increase operational efficiency and improve the customer experience.

article thumbnail

Driving Real-Time Innovation: Meet the Five New Build with Confluent Partners

Confluent

Jump-start a new use case with our new Build with Confluent partners and solutions.

article thumbnail

Apache Airflow® Crash Course: From 0 to Running your Pipeline in the Cloud

With over 30 million monthly downloads, Apache Airflow is the tool of choice for programmatically authoring, scheduling, and monitoring data pipelines. Airflow enables you to define workflows as Python code, allowing for dynamic and scalable pipelines suitable to any use case from ETL/ELT to running ML/AI operations in production. This introductory tutorial provides a crash course for writing and deploying your first Airflow pipeline.

article thumbnail

Faster, Smarter Customer Experiences Begin Here

Precisely

Key Takeaways A unified customer communication management (CCM) solution eliminates reliance on IT for communication updates, which empowers business users to create and deploy content quickly. Fast, personalized, and seamless customer communications help you build customer trust and drive loyalty. Save time and money with streamlined processes and automation that increase operational efficiency and improve the customer experience.

article thumbnail

Confluent Partner Awards 2025

Confluent

Confluent Announces Global Partner Awards 2025

52
article thumbnail

Advancements in Embedding-Based Retrieval at Pinterest Homefeed

Pinterest Engineering

Zhibo Fan | Machine Learning Engineer, Homefeed Candidate Generation; Bowen Deng | Machine Learning Engineer, Homefeed Candidate Generation; Hedi Xia | Machine Learning Engineer, Homefeed Candidate Generation; Yuke Yan | Machine Learning Engineer, Homefeed Candidate Generation; Hongtao Lin | Machine Learning Engineer, ATG Applied Science; Haoyu Chen | Machine Learning Engineer, ATG Applied Science; Dafang He | Machine Learning Engineer, Homefeed Relevance; Jay Adams | Principal Engineer, Pinner

article thumbnail

Turning AI Ambitions into ROI with Snowflake Partners

Snowflake

Generative AIs potential to drive innovation, improve efficiency and create competitive advantages is enormous. However, the ability to fully realize the benefits of generative AI hinges on one crucial factor: data strategy. Data Strategies for AI Leaders , a report co-written by MIT and Snowflake, underscores how organizations must invest in robust data foundations to succeed in the AI era.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Will 2025 Be the Year Real-Time Analytics Finally Goes Mainstream?

Towards Data Science

A deep dive into how (and why) streaming is becoming more accessible in the data space.

article thumbnail

What is Data Science

WeCloudData

Data is the new Gold. Everyday we use and generate data more than we often realize. Data is shaping our decisions, from scrolling through personalized social media feeds to checking weather forecasts before leaving home. Behind the scenes, Data Science powers banking apps to detect suspicious activity or when you get personalized recommendations on […] The post What is Data Science appeared first on WeCloudData.

article thumbnail

Introducing the Pro Football Championship Market

Robinhood

Robinhood is launching event contracts for the upcoming Kansas City vs. Philadelphia championship game through Robinhood Derivatives, LLC Today, Robinhood Derivatives, LLC (RHD) is launching event contracts for the Pro Football Championship, allowing eligible customers to place trades on the outcome of the upcoming showdown between Kansas City and Philadelphia.