Sat.Dec 14, 2024 - Fri.Dec 20, 2024

article thumbnail

Top 10 Data & AI Trends for 2025

Towards Data Science

Agentic AI, small data, and the search for value in the age of the unstructured datastack. Image credit: MonteCarlo According to industry experts, 2024 was destined to be a banner year for generative AI. Operational use cases were rising to the surface, technology was reducing barriers to entry, and general artificial intelligence was obviously right around thecorner.

article thumbnail

Queues in Apache Kafka®: Enhancing Message Processing and Scalability

Confluent

Queue support in Apache Kafka 4.0, enabled by share groups, lets you accommodate traditional queue-type workloads through cooperative consumption.

Kafka 136
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Designing a Declarative Data Stack: From Theory to Practice

Simon Späti

What started as a straightforward implementation guide for a declarative data stack quickly evolved into something more fundamental. While attempting to build a system that could define an entire data stack through a single YAML file, I encountered architectural questions that challenged my initial assumptions: Should we generate production-ready code from templates or create a boilerplate repository with best-in-class tools?

Designing 130
article thumbnail

How to reference a seed from a different dbt project?

Start Data Engineering

1. Introduction 2. Ways to reuse seed data across multiple dbt projects 2.1. Code setup 2.1.1. Prerequisites 2.1.2. Setup project environment 2.2. Turn the source repo into a dbt package 2.2.1. Define package version in dbt_project.yml 2.2.2. Store your package for other dbt projects to reference 2.3. Use project dependencies (dbt enterprise only) 2.4.

Project 130
article thumbnail

Apache Airflow® Crash Course: From 0 to Running your Pipeline in the Cloud

With over 30 million monthly downloads, Apache Airflow is the tool of choice for programmatically authoring, scheduling, and monitoring data pipelines. Airflow enables you to define workflows as Python code, allowing for dynamic and scalable pipelines suitable to any use case from ETL/ELT to running ML/AI operations in production. This introductory tutorial provides a crash course for writing and deploying your first Airflow pipeline.

article thumbnail

How to Use Docker for Local Development Environments

KDnuggets

Using Docker for local development brings stability, flexibility, and ease of management of the environment. No matter what operating system you're using. Learn how to use Docker on Windows, Linux, and macOS to simplify your development setup, from creating your first container to managing complex environments with Docker Compose.

Systems 110
article thumbnail

Integrating Microservices with Confluent Cloud Using Micronaut® Framework

Confluent

Real-time data streaming and messaging are essential for building scalable, resilient, event-driven microservices. Explore integrating the Micronaut framework with Confluent Cloud.

Cloud 115

More Trending

article thumbnail

Introducing Configurable Metaflow

Netflix Tech

David J. Berg * , David Casler ^, Romain Cledat * , Qian Huang * , Rui Lin * , Nissan Pow * , Nurcan Sonmez * , Shashank Srikanth * , Chaoying Wang * , Regina Wang * , Darin Yu * *: Model Development Team, Machine Learning Platform ^: Content Demand ModelingTeam A month ago at QConSF, we showcased how Netflix utilizes Metaflow to power a diverse set of ML and AI use cases , managing thousands of unique Metaflow flows.

article thumbnail

15 Useful Python One-Liners for String Manipulation

KDnuggets

In this article, we'll explore 15 Python one-liners that make string manipulation not just efficient but also fun.

Python 108
article thumbnail

LLMs vs Advent of Code, AI is winning by Colin Eberhardt

Scott Logic

Advent of Code (AoC) is an annual, christmas-themed, coding competition that has been running for the past years and is something that I participate in at times. This year, while ~~subjecting myself to~~ learning Rust, I decided to see how OpenAIs latest model faired at the challenge. I quickly knocked together a script, and to my astonishment, found that o1-mini gave correct answers to all but one part of the first six days.

Coding 98
article thumbnail

Benchmarking Domain Intelligence

databricks

Large language models are improving rapidly; to date, this improvement has largely been measured via academic benchmarks. These benchmarks, such as MMLU and.

107
107
article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

The Power of Predictive Analytics in Healthcare: Using Generative AI and Confluent

Confluent

Learn how predictive analytics, powered by generative AI and Confluent, transforms healthcare by improving outcomes, reducing costs, and enabling real-time decisions.

article thumbnail

How to Get Addicted to Machine Learning

KDnuggets

A simple guide for getting hooked to machine learning and building a successful career in the field.

article thumbnail

Monte Carlo Recognized as the #1 Leader in Data Observability and Data Quality by G2

Monte Carlo

As we turn the corner into 2025, were excited to announce that for the 7th quarter in a row, Monte Carlo has been named G2s #1 Data Observability Platform, as well as #1 in the Data Quality category. This recognition never gets old because G2 bases their rankings on feedback and insights from real customers who work in these tools every day to add value to their business.

article thumbnail

Introducing Git Support for Queries in Databricks

databricks

Were excited to announce the Public Preview of Query Git integration as part of the new SQL Editor. Git support for queries.

SQL 104
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Key Takeaways from AWS re:Invent 2024

Cloudera

AWS re:Invent is one of my favorite trade shows. It is one of the biggest technology conferences of the year and is an opportunity to have hundreds of conversations with customers and prospects, listen to their priorities and challenges, hopes, and give them a Cloudera tote bag or a pair of orange sunglasses. What follows is a collection of just a few things I learned and observed during my week in Las Vegas.

AWS 74
article thumbnail

How to Write Clean Python Code as a Beginner

KDnuggets

Writing Python code thats clean and easy to understand isnt just for experts — learn how to avoid common pitfalls and write like a pro from the start!

Coding 99
article thumbnail

Indexing code at scale with Glean

Engineering at Meta

Were sharing details about Glean , Metas open source system for collecting, deriving and working with facts about source code. In this blog post well talk about why a system like Glean is important, explain the rationale for Gleans design, and run through some of the ways were using Glean to supercharge our developer tooling at Meta. In August 2021 we open-sourced our code indexing system Glean.

Coding 70
article thumbnail

Philadelphia Union: Streamlining MLS Roster Planning with GenAI

databricks

Staying competitive in Major League Soccer (MLS) demands building and maintaining a strong squad through strategic roster planning and smart, effective navigation of.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Cloudera’s Take: What’s in Store for Data and AI in 2025

Cloudera

In the last year, weve seen the explosion of AI in the enterprise, leaving organizations to consider the infrastructure and processes for AI to successfullyand securelydeploy across an organization. As we head into 2025, its clear that next year will be just as exciting as past years. Here, Cloudera experts share their insights on what to expect in data and AI for the enterprise in 2025.

article thumbnail

10 Essential Pandas Commands for Data Preprocessing

KDnuggets

Check out this beginner's guide to cleaning and preparing data efficiently with Python.

Python 98
article thumbnail

Translating Java to Kotlin at Scale

Engineering at Meta

Meta has been on a years-long undertaking to translate our entire Android codebase from Java to Kotlin. Today, despite having one of the largest Android codebases in the world, we’re well past the halfway point and still going. We’re sharing some of the tradeoffs we’ve made to support automating our transition to Kotlin, seemingly simple transformations that are surprisingly tricky, and how we’re collaborating with other companies to capture hundreds more corner cases.

Java 83
article thumbnail

Česká spořitelna: How GenAI is Transforming Call Centers in the Financial Services Industry

databricks

Czech savings bank esk spoitelna , a division of Austrias Erste Group , recently collaborated with AI solution builder DataSentics to explore the.

Banking 80
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Telco Enterprise Data Platforms: Key Success Factors in Building for an AI Future

Cloudera

Since 5G networks began rolling out commercially in 2019, telecom carriers have faced a wide range of new challenges: managing high-velocity workloads, reducing infrastructure costs, and adopting AI and automation. Because data management is a key variable for overcoming these challenges, carriers are turning to hybrid cloud solutions, which provide the flexibility and scalability needed to adapt to the evolving landscape 5G enables.

article thumbnail

An Introduction to Dask: The Python Data Scientist’s Power Tool

KDnuggets

Ever wondered how to handle large data without slowing down your computer? Lets learn about Dask, a tool that helps you work with large data quickly.

Python 97
article thumbnail

How we think about Threads’ iOS performance

Engineering at Meta

How did the Threads iOS team maintain the app’s performance during its incredible growth? Here’s how Meta’s Threads team thinks about performance, including the key metrics we monitor to keep the app healthy. We’re also diving into some case studies that impact publish reliability and navigation latency. When Meta launched Threads in 2023, it became the fastest-growing app in history, gaining 100 million users in only five days.

Media 79
article thumbnail

Databricks at NRF 2025: The future of retail runs on the Data Intelligence Platform

databricks

Book at meeting wtih Databricks at NRF 2025! As we approach January 2025, the retail industry is gearing up for another groundbreaking Retail's.

Retail 76
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Integrating Salesforce with Snowflake

Confluent

Learn how to integrate Salesforce with Snowflake to streamline CRM data, improve analytics, and explore common use cases.

Data 69
article thumbnail

Multimodal RAG Implementation with Hugging Face

KDnuggets

Learn how to enhance RAG models by combining text and visual inputs using Hugging Face Transformers.

95
article thumbnail

Redefining AIOps IT Workflows with Legacy System Visibility

Precisely

Key Takeaways: Centralized visibility of data is key. Modern IT environments require comprehensive data for successful AIOps, that includes incorporating data from legacy systems like IBM i and IBM Z into ITOps platforms. Predictive of AIOps capabilities will revolutionize IT operations. The shift from reactive to proactive IT operations is driven by AI-powered analysis, automation and insights.

Systems 59
article thumbnail

Elevating Global Health with Databricks and The Virtue Foundation

databricks

Introduction Databricks has joined forces with the Virtue Foundation through Databricks for Good, a grassroots initiative providing pro bono professional services to drive.

article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.