Data Engineering Digest

2024

How does ChatGPT work? As explained by the ChatGPT team.

The Pragmatic Engineer

APRIL 21, 2024

See a longer version of this article here: Scaling ChatGPT: Five Real-World Engineering Challenges. Sometimes the best explanations of how a technology solution works come from the software engineers who built it. To explain how ChatGPT (and other large language models) operate, I turned to the ChatGPT engineering team. "How does ChatGPT work, under the hood?

Engineering

Engineering Software Engineer Software Engineering Programming

A Collection Of Free Data Science Courses From Harvard, Stanford, MIT, Cornell, and Berkeley

KDnuggets

MARCH 27, 2024

Learn everything about data science by exploring our curated collection of free courses from top universities, covering essential topics from math and programming to machine learning, and mastering the nine steps to become a job-ready data scientist.

Data Science

Data Science Machine Learning Programming Data

Join 16,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Demystifying DAPs: A Practical Guide to Digital Adoption Success

The AI Superhero Approach to Product Management

MORE WEBINARS

Trending Sources

OpenAI Acquires Rockset

Rockset

JUNE 21, 2024

I’m excited to share that OpenAI has completed the acquisition of Rockset. We are thrilled to join the OpenAI team and bring our technology and expertise to building safe and beneficial AGI. From the start, our vision at Rockset was to fundamentally transform the way data-driven applications were built. We developed our search and analytics database, taking full advantage of the cloud, to eliminate the complexity inherent in the data infrastructure needed for these apps.

Database

Database Cloud Accessible Accessibility

Webinars

Demystifying DAPs: A Practical Guide to Digital Adoption Success

The AI Superhero Approach to Product Management

MORE WEBINARS

Snowflake Arctic: The Best LLM for Enterprise AI — Efficiently Intelligent, Truly Open

Snowflake

APRIL 24, 2024

Building top-tier enterprise-grade intelligence using LLMs has traditionally been prohibitively expensive and resource-hungry, and often costs tens to hundreds of millions of dollars. As researchers, we have grappled with the constraints of efficiently training and inferencing LLMs for years. Members of the Snowflake AI Research team pioneered systems such as ZeRO and DeepSpeed , PagedAttention / vLLM , and LLM360 which significantly reduced the cost of LLM training and inference, and open sourc

Amazon Web Services

Amazon Web Services SQL AWS Architecture

The AI Superhero Approach to Product Management

Speaker: Conrado Morlan

In this engaging and witty talk, we’ll explore how artificial intelligence can transform the daily tasks of product managers into streamlined, efficient processes. Using the lens of a superhero narrative, we’ll uncover how AI can be the ultimate sidekick, aiding in decision-making, enhancing productivity, and boosting innovation. Attendees will leave with practical tools and actionable insights, motivated to embrace AI and leverage its potential in their work. 🦸 🏢 Key objectives:

Management

Introducing the Robinhood Crypto Trading API

Robinhood

MAY 30, 2024

Robinhood Crypto customers in the United States can now use our API to view crypto market data, manage portfolios and account information, and place crypto orders programmatically Today, we are excited to announce the Robinhood Crypto trading API , ushering in a new era of convenience, efficiency, and strategy for our most seasoned crypto traders. Robinhood Crypto customers in the United States can use our new trading API to set up advanced and automated trading strategies that allow them to st

Insurance

Insurance Portfolio Algorithm Coding

WebSockets in Scala, Part 2: Integrating Redis and PostgreSQL

Rock the JVM

MAY 22, 2024

by Herbert Kateu 1. Introduction This article is a follow-up to the websocket article that was published previously. To recap, we created an in-memory chat application using WebSockets with the help of the Http4s library. The chat application had a variety of features implemented through commands directly in the chat window such as the ability to create users, create chat rooms, and switch between chat rooms.

Scala

Scala PostgreSQL Database SQL

What’s New in ArcGIS Pro 3.3

ArcGIS

MAY 7, 2024

Discover the exciting new features of ArcGIS Pro 3.3. From water flow modeling to direct PDF support, this release has it all. Read our blog to learn more.

More Trending

What’s New in ArcGIS Pro 3.3

ArcGIS

MAY 7, 2024

Discover the exciting new features of ArcGIS Pro 3.3. From water flow modeling to direct PDF support, this release has it all. Read our blog to learn more.

How FactSet Implemented an Enterprise Generative AI Platform with Databricks and MLflow

databricks

JUNE 12, 2024

“FactSet’s mission is to empower clients to make data-driven decisions and supercharge their workflows and productivity. To deliver AI-driven solutions across our entire.

Machine Learning

Machine Learning Data

Data Engineering Weekly #181

Data Engineering Weekly

JULY 21, 2024

Editor’s Note: A New Series on Data Engineering Tools Evaluation There are plenty of data tools and vendors in the industry. But how can we choose a tool for the specific need? The traditional evaluation of running PoC on all the selected vendor tools is time-consuming and practically unviable for growth-driven companies. Data Engineering Weekly is launching a new series on software evaluation focused on data engineering to better guide data engineering leaders in evaluating data tools.

Data Engineering

Data Engineering Data Engineer Engineering Generalist

How Meta trains large language models at scale

Engineering at Meta

JUNE 12, 2024

As we continue to focus our AI research and development on solving increasingly complex problems, one of the most significant and challenging shifts we’ve experienced is the sheer scale of computation required to train large language models (LLMs). Traditionally, our AI model training has involved a training massive number of models that required a comparatively smaller number of GPUs.

Algorithm

Algorithm Data Storage Technology Building

Introducing Confluent Cloud Freight Clusters

Confluent

MAY 1, 2024

Confluent Cloud Freight clusters are now available in Early Access. In this blog, learn how Freight clusters can save you up to 90% at GBps+ scale.

Cloud

Cloud Accessible Accessibility

Provide Real Value in Your Applications with Data and Analytics

The complexity of financial data, the need for real-time insight, and the demand for user-friendly visualizations can seem daunting when it comes to analytics - but there is an easier way. With Logi Symphony, we aim to turn these challenges into opportunities. Our platform empowers you to seamlessly integrate advanced data analytics, generative AI, data visualization, and pixel-perfect reporting into your applications, transforming raw data into actionable insights.

Raw Data

The Pulse: Will US companies hire fewer engineers due to Section 174?

The Pragmatic Engineer

JANUARY 4, 2024

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover one out of four topics from today’s subscriber-only The Pulse issue. To get full issues twice a week, subscribe here.

Engineering

Engineering Software Engineer Software Engineering Media

Where to Go Next in Your Data Career

KDnuggets

MAY 22, 2024

We are all looking for the right opportunities in our career. In the landscape of data-related careers, the roles can be grouped into classes, and future opportunities tend to follow natural migration paths between the class groups.

Data

Introducing DoorDash’s In-House Search Engine

DoorDash Engineering

FEBRUARY 27, 2024

We reviewed the architecture of our global search at DoorDash in early 2022 and concluded that our rapid growth meant within three years we wouldn’t be able to scale the system efficiently, particularly as global search shifted from store-only to a hybrid item-and-store search experience. Our analysis identified Elasticsearch as our architecture’s primary bottleneck.

Engineering

Engineering Systems Designing Architecture

A Breakthrough AI-Powered SQL Assistant

Snowflake

APRIL 11, 2024

Data is the lifeblood of modern businesses, but unlocking its true insights often requires complex SQL queries. These queries can be time-consuming to write and challenging to maintain. At Snowflake, we believe in making the power of data accessible to all. That’s why we prioritize simplicity, governance and quality in everything we build – including our AI-powered tools.

SQL

SQL AWS Data Analysis High Quality Data

Entity Resolution: Your Guide to Deciding Whether to Build It or Buy It

Adding high-quality entity resolution capabilities to enterprise applications, services, data fabrics or data pipelines can be daunting and expensive. Organizations often invest millions of dollars and years of effort to achieve subpar results. This guide will walk you through the requirements and challenges of implementing entity resolution. By the end, you'll understand what to look for, the most common mistakes and pitfalls to avoid, and your options.

Robinhood Acquires Pluto, AI Investment Research Platform

Robinhood

JULY 1, 2024

Robinhood Markets, Inc. is excited to announce the acquisition of Pluto Capital Inc., an artificial intelligence (AI) powered investment research platform that delivers highly-customized investment strategies based on customer needs and financial goals. With this strategic acquisition, investors can look forward to a new era of intelligent, data-driven investing at Robinhood.

Portfolio

Portfolio Finance Retail Algorithm

A look under GHC's hood: desugaring linear types

Tweag

JANUARY 17, 2024

I recently merged linear let- and where-bindings in GHC. Which means that we’ll have these in GHC 9.10, which is cause for celebration for me. Though they are much overdue, so maybe I should instead apologise to you. Anyway, I thought I’d take the opportunity to discuss some of GHC’s inner workings and how they explain some of the features of linear types in Haskell.

Algorithm

Algorithm AWS Designing Systems

Totally Eclipsed

ArcGIS

JANUARY 31, 2024

Exploring the value of critique as part of the process of creating a new map of the Total Eclipse that will cross the United States on April 8th

Process

Process Designing

Mosaic AI: Build and deploy production-quality Compound AI Systems

databricks

JUNE 12, 2024

Over the last year, we have seen a surge of commercial and open-source foundation models showing strong reasoning abilities on general knowledge tasks.

Systems

Systems Building Data Science Data

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Speaker: Maher Hanafi, VP of Engineering at Betterworks & Tony Karrer, CTO at Aggregage

Executive leaders and board members are pushing their teams to adopt Generative AI to gain a competitive edge, save money, and otherwise take advantage of the promise of this new era of artificial intelligence. There's no question that it is challenging to figure out where to focus and how to advance when it’s a new field that is evolving everyday. 💡 This new webinar featuring Maher Hanafi, VP of Engineering at Betterworks, will explore a practical framework to transform Generative AI pr

Data Collection

A Brief History of Modern Data Stack

Data Engineering Weekly

JULY 10, 2024

The origin - a legend The origin of the modern data stack is a topic of intense debate, shrouded in uncertainty and mystery. Some attribute its incubation to Snowflake, Redshift, or Airflow, while others propose different theories. Rather than being the result of a single event, the term "modern data stack" emerged from a series of innovations and industry shifts, adding to the intrigue of its history.

Hadoop

Hadoop Data Warehouse Data Big Data

Making messaging interoperability with third parties safe for users in Europe

Engineering at Meta

MARCH 6, 2024

To comply with a new EU law, the Digital Markets Act (DMA), which comes into force on March 7th, we’ve made major changes to WhatsApp and Messenger to enable interoperability with third-party messaging services. We’re sharing how we enabled third-party interoperability (interop) while maintaining end-to-end encryption (E2EE) and other privacy guarantees in our services as far as possible.

Media

Media Architecture Metadata Data Storage

Introducing Tableflow

Confluent

MARCH 19, 2024

Seamlessly integrate Apache Kafka data into your lakehouse as Apache Iceberg tables, bridging the operational and analytical divide, with Tableflow. Read more in our blog post.

Kafka

Kafka Data

Generative AI vs. Predictive AI: Understanding the Differences

Edureka

JUNE 7, 2024

Is AI taking over the world? Umm, not yet, at least. However, according to a recently published report , almost 35% of global companies report using AI to optimize their business. In this article, we will take a closer look at two of the most talked about and widely used AI technologies of 2024 – generative AI and predictive AI. Table of Contents Generative AI vs Predictive AI – Comparison Table Generative AI 101: A Revolutionary Cocktail of Technology and Art How Does Generative AI

Deep Learning

Deep Learning Media Manufacturing Algorithm

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

Data Science

Deploying Machine Learning Models: A Step-by-Step Tutorial

KDnuggets

JUNE 20, 2024

Image by author Model deployment is the process of trained models being integrated into practical applications. This includes defining the necessary environment, specifying how input data is introduced into the model and the output produced, and the capacity to analyze new data and provide relevant predictions or categorizations.

Machine Learning

Machine Learning Process Data

Monte Carlo Releases Mastering Data Quality And Your ABCs, World’s First-Ever Children’s Book on Data Quality

Monte Carlo

APRIL 1, 2024

Good Night Moon. Where The Wild Things Are. The Cat in the Hat. And now, from the mind of Barr Moses, comes the historic next children’s literary classic: Mastering Data Quality And Your ABCs. A follow up to 2022’s Data Quality Fundamentals: A Practical Guide to Building Reliable Data Pipelines published by O’Reilly Media , Mastering Data Quality And Your ABCs educates the next generation of data and AI engineers about the importance of highly reliable data.

Data Pipeline

Data Pipeline Media Education Data

Snowflake Admins Can Now Enforce Mandatory MFA

Snowflake

JULY 9, 2024

Snowflake is committed to helping customers protect their accounts and data. That’s why we have been working on product capabilities that allow Snowflake admins to make multifactor authentication (MFA) mandatory and monitor compliance with this new policy. As part of that effort, today we’re announcing several key features: 1. A new authentication policy that requires MFA for all users in a Snowflake account 2.

Technology

Technology Accessible Accessibility Management

Robinhood to Acquire Bitstamp

Robinhood

JUNE 6, 2024

This acquisition will bring Bitstamp’s globally-scaled crypto exchange to Robinhood, with retail and institutional customers across the EU, UK, US and Asia. This strategic combination better positions Robinhood to expand outside of the US and will bring a trusted and reputable institutional business to Robinhood. Expected to close in the first half of 2025, subject to customary closing conditions, including regulatory approvals.

Retail

Retail Systems Process Management

Demystifying DAPs: A Practical Guide to Digital Adoption Success

Speaker: Pulkit Agrawal

Digital Adoption Platforms (DAPs) are revolutionizing the way organizations interact with and optimize their software applications. As digital transformation continues to accelerate, DAPs have become essential tools for enhancing user engagement and software efficiency. This session is your guide into the robust world of DAPs, exploring their origins, evolution, and the current trends shaping their development.

Certification

Extending destination-passing style programming to arbitrary data types in Linear Haskell

Tweag

MARCH 6, 2024

Three years ago, a blog post introduced destination-passing style (DPS) programming in Haskell, focusing on array processing, for which the API was made safe thanks to Linear Haskell. Today, I’ll present a slightly different API to manipulate arbitrary data types in a DPS fashion, and show why it can be useful for some parts of your programs. The present blog post is mostly based on my recent paper Destination-passing style programming: a Haskell implementation , published at JFLA 2024.

Programming

Programming Data Programming Language Coding

ArcGIS Pro 3.3 Moves to.NET 8

ArcGIS

FEBRUARY 21, 2024

ArcGIS Pro 3.3 is planned to be available in May 2024. Install.NET 8 before attempting to install ArcGIS Pro 3.3 for the best user experience!

Announcing DBRX: A new standard for efficient open source LLMs

databricks

MARCH 27, 2024

Databricks’ mission is to deliver data intelligence to every enterprise by allowing organizations to understand and use their unique data to build their.

Building

Building Data

A Notebook is all I want or Don't

Data Engineering Weekly

MAY 3, 2024

The tweet received strong reactions on LinkedIn and Twitter. To clarify, I quoted it as a Notebook-style development, but it is not exactly a Notebook. There is a lot of context missing in that tweet, so I decided to write a blog about it. People have reservations about using tools like Jupytor Notebook for the production pipeline for a good reason.

Programming Language

Programming Language ETL Tools Data Pipeline Coding

Deliver Mission Critical Insights in Real Time with Data & Analytics

In the fast-moving manufacturing sector, delivering mission-critical data insights to empower your end users or customers can be a challenge. Traditional BI tools can be cumbersome and difficult to integrate - but it doesn't have to be this way. Logi Symphony offers a powerful and user-friendly solution, allowing you to seamlessly embed self-service analytics, generative AI, data visualization, and pixel-perfect reporting directly into your applications.

Data Analytics

2024

How does ChatGPT work? As explained by the ChatGPT team.

A Collection Of Free Data Science Courses From Harvard, Stanford, MIT, Cornell, and Berkeley

Webinars

Trending Sources

OpenAI Acquires Rockset

Webinars

Snowflake Arctic: The Best LLM for Enterprise AI — Efficiently Intelligent, Truly Open

The AI Superhero Approach to Product Management

Introducing the Robinhood Crypto Trading API

WebSockets in Scala, Part 2: Integrating Redis and PostgreSQL

What’s New in ArcGIS Pro 3.3

Sign up to get articles personalized to your interests!

More Trending

What’s New in ArcGIS Pro 3.3

How FactSet Implemented an Enterprise Generative AI Platform with Databricks and MLflow

Data Engineering Weekly #181

How Meta trains large language models at scale

Introducing Confluent Cloud Freight Clusters

Provide Real Value in Your Applications with Data and Analytics

The Pulse: Will US companies hire fewer engineers due to Section 174?

Where to Go Next in Your Data Career

Introducing DoorDash’s In-House Search Engine

A Breakthrough AI-Powered SQL Assistant

Entity Resolution: Your Guide to Deciding Whether to Build It or Buy It

Robinhood Acquires Pluto, AI Investment Research Platform

A look under GHC's hood: desugaring linear types

Totally Eclipsed

Mosaic AI: Build and deploy production-quality Compound AI Systems

Generative AI Deep Dive: Advancing from Proof of Concept to Production

A Brief History of Modern Data Stack

Making messaging interoperability with third parties safe for users in Europe

Introducing Tableflow

Generative AI vs. Predictive AI: Understanding the Differences

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Deploying Machine Learning Models: A Step-by-Step Tutorial

Monte Carlo Releases Mastering Data Quality And Your ABCs, World’s First-Ever Children’s Book on Data Quality

Snowflake Admins Can Now Enforce Mandatory MFA

Robinhood to Acquire Bitstamp

Demystifying DAPs: A Practical Guide to Digital Adoption Success

Extending destination-passing style programming to arbitrary data types in Linear Haskell

ArcGIS Pro 3.3 Moves to.NET 8

Announcing DBRX: A new standard for efficient open source LLMs

A Notebook is all I want or Don't

Deliver Mission Critical Insights in Real Time with Data & Analytics

Stay Connected