Sat.Sep 14, 2024 - Fri.Sep 20, 2024

article thumbnail

Paying down tech debt: further learnings

The Pragmatic Engineer

This is a follow-up to the article Paying down tech debt , written by industry veteran Lou Franco. Lou has been in the software business for over 30 years as an engineer, EM, and executive. He’s also worked at four startups and the companies that later acquired them; most recently Atlassian as a Principal Engineer on the Trello iOS app. Later this year, he’s publishing a book on tech debt.

article thumbnail

How to build a data project with step-by-step instructions

Start Data Engineering

1. Introduction 2. Setup 3. Parts of data engineering 3.1. Requirements 3.1.1. Understand input datasets available 3.1.2. Define what the output dataset will look like 3.1.3. Define SLAs so stakeholders know what to expect 3.1.4. Define checks to ensure the output dataset is usable 3.2. Identify what tool to use to process data 3.3. Data flow architecture 3.

Project 240
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Data Teams Survey 2020-2024 Analysis

Jesse Anderson

Survey Changes Over Time Between 2020 and 2024 (see 2020, 2023, and 2024 for each year’s information), I’ve been conducting a data teams survey. I wanted to dedicate an entire post to examining the change in data teams over time. Total Value Creation The most important question I ask each year concerns data team value creation. I break the question into two parts: “How successful would the business say your projects are?

article thumbnail

Partial Functions in Python: A Guide for Developers

KDnuggets

In Python, functions often require multiple arguments, and you may find yourself repeatedly passing the same values for certain parameters. This is where partial functions can help. Python’s built-in functools module allows you to create partial functions.

Python 142
article thumbnail

Going Beyond Chatbots: Connecting AI to Your Tools, Systems, & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

If AI agents are going to deliver ROI, they need to move beyond chat and actually do things. But, turning a model into a reliable, secure workflow agent isn’t as simple as plugging in an API. In this new webinar, Alex Salazar and Nate Barbettini will break down the emerging AI architecture that makes action possible, and how it differs from traditional integration approaches.

article thumbnail

Unleash Your Innovation: Announcing the Databricks Generative AI Startup Challenge with Over $1 Million in Credits, Prizes, and Potential Venture Funding

databricks

The Databricks Generative AI Startup Challenge offers $1M+ in prizes for innovative startups building Generative AI use cases on Databricks. Apply by November 1, 2024!

Building 138
article thumbnail

How To Modernize Your Data Strategy And Infrastructure For 2025

Seattle Data Guy

We are still in the early days of data and the value it can add to companies. You’ll read plenty of statistics about how much value data can drive and how far behind companies that aren’t using data are. And as a data consultant, I have helped companies find that value in their data. It… Read more The post How To Modernize Your Data Strategy And Infrastructure For 2025 appeared first on Seattle Data Guy.

More Trending

article thumbnail

5 YouTube Channels to Master LLMs

KDnuggets

Image by Author If you’re in the tech industry (or are attempting to transition into the field), LLMs are a must-learn. Companies have started integrating language models into their workflows to improve efficiencies and cut costs. Due to this, there have been a number of new AI job openings. New roles have begun to.

135
135
article thumbnail

Fine-tuning Llama 3.1 with Long Sequences

databricks

Mosaic AI Model Training now supports fine-tuning up to 131K context length for Llama 3.1 models. More efficient training at long sequence lengths is made possible by several optimizations highlighted in this post.

138
138
article thumbnail

Data Modeling in the Brave New Lakehouse World

Confessions of a Data Guy

It is a Brave New World out there these days. The new tools and features come out faster than your mom on Sunday morning getting you ready for church. The same goes for the context and advice being produced on a myriad of platforms, the ole’ Like and Subscribe, and all that bit. It does […] The post Data Modeling in the Brave New Lakehouse World appeared first on Confessions of a Data Guy.

Data 113
article thumbnail

Run pandas on 1TB+ Enterprise Data Directly in Snowflake

Snowflake

As one of the most widely used libraries in the Python ecosystem, pandas helps developers analyze, load and transform data across data science, data engineering and machine learning. The flexibility and ease of use of the pandas API have driven rapid growth in popularity, with pandas being used by one in every five developers , according to the StackOverflow 2024 Developer Survey.

Python 109
article thumbnail

Smart Tech + Human Expertise = How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

10 GitHub Repositories for Deep Learning Enthusiasts

KDnuggets

Image generated with FLUX.1 [dev] and edited with Canva Pro The 10 GitHub Repository Education Series has been a hit among readers, so here is another list to help you master the basics of deep learning. This collection will guide you through understanding popular deep learning frameworks and various model architectures. In short, you.

article thumbnail

Introducing Databricks Assistant Quick Fix

databricks

Today, we're excited to introduce Databricks Assistant Quick Fix , a powerful new feature designed to automatically correct common, single-line errors such as.

Designing 122
article thumbnail

Introducing Netflix’s Key-Value Data Abstraction Layer

Netflix Tech

Vidhya Arvind , Rajasekhar Ummadisetty , Joey Lynch , Vinay Chella Introduction At Netflix our ability to deliver seamless, high-quality, streaming experiences to millions of users hinges on robust, global backend infrastructure. Central to this infrastructure is our use of multiple online distributed databases such as Apache Cassandra , a NoSQL database known for its high availability and scalability.

Bytes 104
article thumbnail

Snowflake Acquires Night Shift Development, Inc. to Accelerate Growth in US Public Sector

Snowflake

Data is increasingly becoming critical for the public sector — from guiding decisions in higher education to enhancing citizen services and streamlining government operations. Government agencies are overwhelmed with data, whether it be structured, like incident logs, or unstructured, like satellite images. Harnessing the vast amount of data can become a burden for any organization, yet the insights have the potential to significantly improve quality of life and strengthen national security.

article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

VoiceChat with Your LLMs using AlwaysReddy

KDnuggets

Rapid development is happening around us, and one of the most interesting aspects of this evolution is artificial intelligence's ability to communicate through natural language with humans. Suppose you want to communicate with some LLM running on your computer without switching between applications or windows, just by using a voice hotkey. This is exactly what.

130
130
article thumbnail

Announcing GA of AI Model Sharing

databricks

Special thanks to Daniel Benito (CTO, Bitext), Antonio Valderrabanos(CEO, Bitext), Chen Wang (Lead Solution Architect, AI21 Labs), Robbin Jang (Alliance Manager, AI21 Labs).

article thumbnail

Inside Bento: Jupyter Notebooks at Meta

Engineering at Meta

This episode of the Meta Tech Podcast is all about Bento , Meta’s internal distribution of Jupyter Notebooks, an open-source web-based computing platform. Bento allows our engineers to mix code, text, and multimedia in a single document and serves a wide range of use cases at Meta from prototyping to complex machine learning workflows. Pascal Hartig ( @passy ) is joined by Steve, whose team has built several features on top of Jupyter, including scheduled notebooks , sharing with colleagues, and

article thumbnail

Key Takeaways from Snowflake Industry Day 2024

Snowflake

Building on the momentum from Snowflake Summit , where Snowflake announced the rollout of dozens of new features, this year’s Industry Day showcased the numerous ways these capabilities can be put to use, particularly in an AI- and ML-driven world. In his keynote address , Snowflake CEO Sridhar Ramaswamy explained how the AI Data Cloud aligns with customers’ AI and data strategies, highlighting the platform’s unique position to achieve enterprise AI goals.

article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

How to Perform Data Aggregation Over Time Series Data with Pandas

KDnuggets

Image by Editor | Ideogram Let’s learn how to perform time series data aggregation in Pandas. Preparation We would need the Pandas and Numpy packages installed, so we can install them using the following code: pip install pandas numpy With the packages installed, let’s jump into the article. Time Series.

Data 130
article thumbnail

Security best practices for the Databricks Data Intelligence Platform

databricks

At Databricks, we know that data is one of your most valuable assets. Our product and security teams work together to deliver an enterprise-grade Data Intelligence Platform that enables you to defend against security risks and meet your compliance obligations. In this blog, we'll explain how you can leverage our platform's security features to establish a robust defense-in-depth posture that protects your data and AI assets from risks.

Data 121
article thumbnail

Evaluating Data Observability Tools: A Comprehensive Guide

Data Engineering Weekly

Click the Link Below to Get Your Free Data Observability Buyers Guide: [link] The Buyer Guide for Data Observability is out. Please feel free to make a copy or comment to add more criteria. I w ant to extend my gratitude to the Data Heroes Community for their valuable insights and discussions, which served as the foundation for this piece. The points and thoughts shared here are largely drawn from the community's collective knowledge and contributions.

article thumbnail

Introducing Confluent’s OEM Program: Deliver Data Streaming Faster and Unlock Revenue Growth

Confluent

Bring data streaming to your product or service quickly and confidently with unified Apache Kafka® and Apache Flink®, backed by the original creators of Kafka.

article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m

article thumbnail

How to Import Data into BigQuery

KDnuggets

Data come from everywhere, and the number of origins, sources, and formats under which valuable data may appear underscores the need for database management tools capable of loading data from multiple sources. This tutorial illustrates how to load datasets from different formats and sources into Google BigQuery. All the prerequisites we need are having registered.

Datasets 128
article thumbnail

Unifying Parameters Across Databricks

databricks

Today, we are excited to announce the support for named parameter markers in the SQL editor. This feature allows you to write parameterized.

SQL 119
article thumbnail

9 Ways AI Can Uplevel Your Business Right Now

Snowflake

As the frenzied hype around generative AI cools off and as we get into the year of ideation, earlier adopters of AI are starting to see the results of initial experimentation. And these conversations are increasingly shifting to a more problem-oriented mentality. A lot of people were understandably swept up in the excitement of all that AI can do, only to find that some use cases were too risky or that those problems could be solved with traditional methods that were less costly.

article thumbnail

Cloudera Evaluates Integrated Data and AI Exchange Business Line to Optimize Data-Driven Generative AI Use Cases

Cloudera

According to recent survey data from Cloudera, 88% of companies are already utilizing AI for the tasks of enhancing efficiency in IT processes, improving customer support with chatbots, and leveraging analytics for better decision-making. More and more enterprises are leveraging pre-trained models for various applications, from natural language processing to computer vision.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Deep Learning Approaches in Medical Image Segmentation

KDnuggets

Medical imaging has been revolutionized by the adoption of deep learning techniques. The use of this branch of machine learning has ushered in a new era of precision and efficiency in medical image segmentation, a central analytical process in modern healthcare diagnostics and treatment planning. By harnessing neural networks, deep learning algorithms are able.

Medical 127
article thumbnail

Establish your Generative AI expertise with the latest Databricks certification

databricks

The value of Generative AI, the deepened investment Databricks has made in the space, and how customers have benefited from the certification.

article thumbnail

Streamlining Financial Market Intelligence with Time-Series Innovations

Snowflake

Why now is the time for data leaders in financial services to address the challenges of tick data analysis — and how Snowflake can help The financial services industry has been facing plenty of challenges lately. The rising cost of capital means leaders need to be smart about finding places to reduce total cost of ownership and scale technology. The number of market and technology regulations around data, infrastructure and reporting, like DORA and GDPR , can be overwhelming; complying to them m

Banking 96
article thumbnail

Ensuring Even Ad Spend on the Zalando Homepage: How Our New Bidding Algorithm Maximizes Value for Advertisers and Shoppers

Zalando Engineering

Introduction Zalando Marketing Services (ZMS) is Zalando's advertising platform. It helps brands create and manage campaigns on Zalando, increasing their visibility and improving performance at every stage of the marketing funnel, from awareness to purchase, within the Zalando marketplace. At ZMS, we're constantly innovating to optimize the advertising experience on Zalando homepage.

article thumbnail

Apache Airflow® 101 Essential Tips for Beginners

Apache Airflow® is the open-source standard to manage workflows as code. It is a versatile tool used in companies across the world from agile startups to tech giants to flagship enterprises across all industries. Due to its widespread adoption, Airflow knowledge is paramount to success in the field of data engineering.