Sat.Nov 16, 2024 - Fri.Nov 22, 2024

article thumbnail

8 Essential Data Pipeline Design Patterns You Should Know

Monte Carlo

Let’s set the scene: your company collects data, and you need to do something useful with it. Whether it’s customer transactions, IoT sensor readings, or just an endless stream of social media hot takes, you need a reliable way to get that data from point A to point B while doing something clever with it along the way. That’s where data pipeline design patterns come in.

article thumbnail

From IC to Data Leader: Key Strategies for Managing and Growing Data Teams

Seattle Data Guy

There are plenty of statistics about the speed at which we are creating data in today’s modern world. On the flip side of all that data creation is a need to manage all of that data and thats where data teams come in. But leading these data teams is challenging and yet many new data… Read more The post From IC to Data Leader: Key Strategies for Managing and Growing Data Teams appeared first on Seattle Data Guy.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Secrets of Spark to Snowflake Migration Success: Customer Stories

Snowflake

Today’s business landscape is increasingly competitive — and the right data platform can be the difference between teams that feel empowered or impaired. I love talking with leaders across industries and organizations to hear about what’s top of mind for them as they evaluate various data platforms. In these conversations, there are a number of questions that I hear time and time again: Will my data platform be scalable and reliable enough?

article thumbnail

Expert Insights for Your 2025 Data, Analytics, and AI Initiatives

Precisely

Key Takeaways: Data integrity is required for AI initiatives, better decision-making, and more – but data trust is on the decline. Data quality and data governance are the top data integrity challenges, and priorities. A long-term approach to your data strategy is key to success as business environments and technologies continue to evolve. The rapid pace of technological change has made data-driven initiatives more crucial than ever within modern business strategies.

article thumbnail

Going Beyond Chatbots: Connecting AI to Your Tools, Systems, & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

If AI agents are going to deliver ROI, they need to move beyond chat and actually do things. But, turning a model into a reliable, secure workflow agent isn’t as simple as plugging in an API. In this new webinar, Alex Salazar and Nate Barbettini will break down the emerging AI architecture that makes action possible, and how it differs from traditional integration approaches.

article thumbnail

DuckDB … reading from s3 … with AWS Credentials and more.

Confessions of a Data Guy

In my never-ending quest to plumb the most boring depths of every single data tool on the market, I found myself annoyed when recently using DuckDB for a benchmark that was reading parquet files from s3. What was not clear, or easy, was trying to figure out how DuckDB would LIKE to read default AWS […] The post DuckDB … reading from s3 … with AWS Credentials and more. appeared first on Confessions of a Data Guy.

AWS 113
article thumbnail

Challenges You Will Face When Parsing PDFs With Python – How To Parse PDFs With Python

Seattle Data Guy

Scraping data from PDFs is a right of passage if you work in data. Someone somewhere always needs help getting invoices parsed, contracts read through, or dozens of other use cases. Most of us will turn to Python and our trusty list of Python libraries and start plugging away. Of course, there are many challenges… Read more The post Challenges You Will Face When Parsing PDFs With Python – How To Parse PDFs With Python appeared first on Seattle Data Guy.

Python 130

More Trending

article thumbnail

Automation and Data Integrity: A Duo for Digital Transformation Success

Precisely

Key Takeaways: Harness automation and data integrity unlock the full potential of your data, powering sustainable digital transformation and growth. Data and processes are deeply interconnected. Successful digital transformation requires you to optimize both so that they work together seamlessly. Simplify complex SAP® processes with automation solutions that drive efficiency, reduce costs, and empower your teams to act quickly.

article thumbnail

How Skyscanner Enabled Data & AI Governance with Monte Carlo

Monte Carlo

For over 20 years, Skyscanner has been helping travelers plan and book trips with confidence— including airfare, hotels, and car rentals. As digital natives, the organization is no stranger to staggering volume. Over the years, Skyscanner has grown organically to include a vast network of high-volume data producers and consumers, including: Serving over 110 million monthly users Partnering with hundreds of travel providers Operating in 30+ languages and 180 countries An fulfilling over 5,000

article thumbnail

10 Python Libraries Every Data Analyst Should Know

KDnuggets

Interested in data analytics? Here's a list of Python libraries you cannot do without.

Python 135
article thumbnail

Celebrating Innovation: Announcing the Finalists of the Databricks Generative AI Startup Challenge

databricks

We are thrilled to unveil the finalists for the Databricks Generative AI Startup Challenge , a competition designed to spotlight innovative early-stage startups.

Designing 133
article thumbnail

Smart Tech + Human Expertise = How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Expert Insights for Your 2025 Data, Analytics, and AI Initiatives

Precisely

Key Takeaways: Data integrity is required for AI initiatives, better decision-making, and more – but data trust is on the decline. Data quality and data governance are the top data integrity challenges, and priorities. A long-term approach to your data strategy is key to success as business environments and technologies continue to evolve. The rapid pace of technological change has made data-driven initiatives more crucial than ever within modern business strategies.

article thumbnail

How Skyscanner Enabled Data & AI Governance with Monte Carlo

Monte Carlo

For over 20 years, Skyscanner has been helping travelers plan and book trips with confidence— including airfare, hotels, and car rentals. As digital natives, the organization is no stranger to staggering volume. Over the years, Skyscanner has grown organically to include a vast network of high-volume data producers and consumers, including: Serving over 110 million monthly users Partnering with hundreds of travel providers Operating in 30+ languages and 180 countries An fulfilling over 5,000

article thumbnail

Mirroring SQL Server Database to Microsoft Fabric

Striim

SQL2Fabric Mirroring is a new fully managed service offered by Striim to mirror on premise SQL Databases. It’s a collaborative service between Striim and Microsoft based on Fabric Open Mirroring that enables real-time data replication from on-premise SQL Server databases to Azure Fabric OneLake. This fully managed service leverages Striim Cloud’s integration with the Microsoft Fabric stack for seamless data mirroring to Fabric Data Warehouse and Lake House.

SQL 52
article thumbnail

How to present and share your Notebook insights in AI/BI Dashboards

databricks

We’re excited to announce a new integration between Databricks Notebooks and AI/BI Dashboards, enabling you to effortlessly transform insights from your notebooks into.

BI 132
article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

5 Essential Resources for Learning R

KDnuggets

Learn R from top institutions like Harvard, Stanford, and Codecademy.

130
130
article thumbnail

What do Snowflake, Databricks, Redshift, BigQuery actually do?

Start Data Engineering

1. Introduction 2. Analytical databases aggregate large amounts of data 3. Most platforms enable you to do the same thing but have different strengths 3.1. Understand how the platforms process data 3.1.1. A compute engine is a system that transforms data 3.1.2. Metadata catalog stores information about datasets 3.1.3. Data platform support for SQL, Dataframe, and Dataset APIs 3.1.4.

Metadata 130
article thumbnail

Exploring the Semantic Layer Through the Lens of MVC

Simon Späti

MVC is an interesting concept from the late 70s that separates the View (presentation) from the Controller via the Model. It has been used in designing web applications and is still heavily used, for example, in Ruby on Rails or Laravel, a popular PHP framework. This design pattern got me thinking: Wouldn’t it be convenient to separate the presentation from the storage through a data modeling layer, similar to the model layer?

Designing 130
article thumbnail

Choosing Between Star Schema and Snowflake Schema: A Comprehensive Guide

Hevo

In today’s data-driven world, choosing the right schema to store data is equally important as collecting it. Schema design plays a crucial role in the performance, scalability, and usability of your data systems. Different data use cases require the selection of different schema designs.

article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

7 Advanced SQL Techniques for Data Manipulation in Data Science

KDnuggets

Can SQL be used for advanced data manipulation in data science? It sure can with these seven techniques.

article thumbnail

Introducing Predictive Optimization for Statistics

databricks

We are excited to introduce the gated Public Preview of Predictive Optimization for statistics. Announced at the Data + AI Summit, Predictive Optimization.

Data 114
article thumbnail

Connect with Confluent Q4 Update: New Program Entrants and SAP Datasphere Hydration

Confluent

Confluent’s CwC partner program introduces bidirectional data streaming for SAP Datasphere, powered by Apache Kafka and Apache Flink; CwC Q4 2024 new entrants.

article thumbnail

BigQuery Partitioning vs Clustering: Make the Right Choice for Your Workloads

Hevo

In the modern field of data analytics, proper data management is the only way to maximize performance while minimizing costs. Google BigQuery, one of the leading cloud-based data warehouses, shows great skills in managing huge datasets by partitioning and clustering.

article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m

article thumbnail

Run Local LLMs with Cortex

KDnuggets

Check out this local AI model manager similar to Ollama, but better.

article thumbnail

Introducing an exclusively Databricks-hosted Assistant

databricks

We’re excited to announce that the Databricks Assistant , now fully hosted and managed within Databricks, is available in public preview! This version.

article thumbnail

Your Data Quality Checks Are Worth Less (Than You Think)

Towards Data Science

How to deliver outsized value on your data quality program Continue reading on Towards Data Science »

article thumbnail

CDC and Data Streaming: Capture Database Changes in Real Time with Debezium PostgreSQL Connector

Confluent

CDC has evolved to become a key component of data streaming platforms, and is easily enabled by managed connectors such as the Debezium PostgreSQL CDC connector.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Exploring Ethics and Morality Through Machine Intelligence

KDnuggets

This article examines the challenges of aligning machine behavior with human values, and the role of ethical frameworks in shaping responsible AI.

125
125
article thumbnail

Databricks training invests in closing the data + AI skills gap across enterprises

databricks

The Data + AI Skills Gap The “skills gap” has been a concern for CEOs and leaders for many years, and the gap.

Data 105
article thumbnail

Collision Risk in Hash-Based Surrogate Keys

Towards Data Science

Various aspects and real-life analogies of the odds of having a hash collision when computing Surrogate Keys using MD5, SHA-1, and SHA-256.

article thumbnail

Seamlessly Connect IoT Data Streams: Integrating Confluent Cloud with AWS IoT Core

Confluent

Combine AWS IoT Core with Confluent Cloud to contextualize your IoT data using your other data sources. Learn more and get a full setup tutorial.

AWS 59
article thumbnail

Apache Airflow® 101 Essential Tips for Beginners

Apache Airflow® is the open-source standard to manage workflows as code. It is a versatile tool used in companies across the world from agile startups to tech giants to flagship enterprises across all industries. Due to its widespread adoption, Airflow knowledge is paramount to success in the field of data engineering.