Top Data Engineering Digest Data Engineer Data Engineering Content for Week of Oct 14

Sat.Oct 14, 2023 - Fri.Oct 20, 2023

Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable

Data Engineering Podcast

OCTOBER 15, 2023

Summary Building streaming applications has gotten substantially easier over the past several years. Despite this, it is still operationally challenging to deploy and maintain your own stream processing infrastructure. Decodable was built with a mission of eliminating all of the painful aspects of developing and deploying stream processing systems for engineering teams.

Process

Process Building SQL BI

Data News — Week 23.42

Christophe Blefari

OCTOBER 20, 2023

Writing about dbt like a sheep ( credits ) Hey, this week Coalesce—the dbt Labs annual conference—took place. During 3 days, people shared how they used dbt around the world. I'll, as usual, write a takeaway post after binge watching all keynotes, but this is for next week. Still dbt Labs announcements were mainly towards dbt Cloud with great features to drive adoption of the paid product.

Generalist

Generalist Entertainment NoSQL Datasets

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Building Your BI Strategy: How to Choose a Solution That Scales and Delivers

Improving the Accuracy of Generative AI Systems: A Structured Approach

Changing the Game with MES: Cut Costs, Drive Efficiency, & Achieve Sustainability Goals!

MORE WEBINARS

Trending Sources

How to use Airflow templates and macros

Marc Lamberti

OCTOBER 20, 2023

Templates and Macros in Apache Airflow allow passing data to your DAGs at runtime. Imagine that you want to execute an SQL request with the execution date of your DAG. How can you do that? How can you use the DAG ID when you send notifications to know which DAG to look at? Or what if you need to know when the next DAG run will be? Well, macros and templates answer these questions.

SQL

SQL Python Coding Metadata

Webinars

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Building Your BI Strategy: How to Choose a Solution That Scales and Delivers

Improving the Accuracy of Generative AI Systems: A Structured Approach

Changing the Game with MES: Cut Costs, Drive Efficiency, & Achieve Sustainability Goals!

MORE WEBINARS

Watermark and input data filtering in Apache Spark Structured Streaming

Waitingforcode

OCTOBER 17, 2023

I've already written about watermarks in a few places in the blog but despite that, I still find things to refresh. One of them is the watermark used to filter out the late data, which will be the topic of this blog post.

Data

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

Data

7 Steps to Mastering Large Language Models (LLMs)

KDnuggets

OCTOBER 18, 2023

Large Language Models (LLMs) have unlocked a new era in natural language processing. So why not learn more about them? Go from learning what large language models are to building and deploying LLM apps in 7 easy steps with this guide.

Building

Building Process

Revolutionizing Real-Time Streaming Processing: 4 Trillion Events Daily at LinkedIn

LinkedIn Engineering

OCTOBER 19, 2023

Authors: Bingfeng Xia and Xinyu Liu Background At LinkedIn, Apache Beam plays a pivotal role in stream processing infrastructures that process over 4 trillion events daily through more than 3,000 pipelines across multiple production data centers. This robust framework empowers near real-time data processing for critical services and platforms, ranging from machine learning and notifications to anti-abuse AI modeling.

Process

Process Lambda Architecture Kafka Machine Learning

How Meta is creating custom silicon for AI

Engineering at Meta

OCTOBER 18, 2023

With the recent launches of MTIA v1 , Meta’s first-generation AI inference accelerator, and Llama 2 , the next generation of Meta’s publicly available large language model, it’s clear that Meta is focused on advancing AI for a more connected world. Fueling the success of these products are world-class infrastructure teams, including Meta’s custom AI silicon team, led by Olivia Wu, a leader in the silicon industry for 30 years.

Designing

Designing Deep Learning Media Architecture

More Trending

How Meta is creating custom silicon for AI

Engineering at Meta

OCTOBER 18, 2023

Designing

Designing Deep Learning Media Architecture

Prepare your data for the National Spatial Reference System modernization of 2022 in the U.S.

ArcGIS

OCTOBER 17, 2023

The new U.S. datums of 2022 will soon be released. This article covers what is coming and how you should prepare your data.

Systems

Systems Data Data Management Government

5 Free Books to Master Data Science

KDnuggets

OCTOBER 16, 2023

Want to break into data science? Check this list of free books for learning Python, statistics, linear algebra, machine learning and deep learning.

Data Science

Data Science Deep Learning Machine Learning Python

JSON Schemas to Nickel contracts

Tweag

OCTOBER 18, 2023

At Tweag we have been cooking up a JSON Schema to Nickel contract converter , that we’re excited to announce! Background Nickel is a configuration language being developed at Tweag. You can get some deep dives into its design from previous blog posts. I’ll summarize it here as JSON, plus functions, plus types and contracts. One of its main use-cases is generating JSON configurations for other programs (Terraform, GitHub actions, etc).

Coding

Coding Programming Designing Accessible

The benefits of modern data architecture

InData Labs

OCTOBER 17, 2023

Big data is central to the efficient running of all modern organizations, but to be of use, raw data must be suitably organized. The way that businesses organize data assets is commonly known as data architecture, with the benefits of modern data architecture enabling teams to respond to changing demands with improved agility when compared. Запись The benefits of modern data architecture впервые появилась InData Labs.

Data Architecture

Data Architecture Architecture Raw Data Big Data

Changing the Game with MES: Cut Costs, Drive Efficiency, & Achieve Sustainability Goals!

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

In an era where efficiency is king, are you leveraging the right tools to transform your manufacturing processes? A Manufacturing Execution System (MES) is critical for enhancing operational efficiency, reducing waste, and optimizing energy usage—key factors for improving your bottom line and lowering your carbon footprint. Join Nikhil Joshi, a manufacturing technology expert with 18+ years of hands-on experience, in this new webinar as he uncovers the secrets of MES and how to best utilize thes

Manufacturing

Analysis of the XLS-30 AMM Amendment

Ripple Engineering

OCTOBER 19, 2023

RippleX has enabled its validator to vote in support of the XLS-30 amendment, introducing innovative AMM capabilities to the XRPL. We, at RippleX, place great emphasis on the strength that collaborative effort and shared responsibility bring to the enhancement and security of the XRPL. Today, we earnestly request the community's consideration of the XLS-30 amendment —a proposal poised to offer numerous advantages by bolstering liquidity, offering yield opportunities for liquidity pro

Utilities

Utilities Algorithm Coding Engineering

7 Best Cloud Database Platforms

KDnuggets

OCTOBER 18, 2023

Cloud databases have made it easier and cheaper to develop enterprise-level applications, offering flexibility, convenience, and standard database functionality. See what KDnuggets recommends.

Database

Database Cloud IT Data Engineering

Simplifying Production MLOps with Lakehouse AI

databricks

OCTOBER 19, 2023

Machine learning (ML) is more than just developing models; it's about bringing them to life in real-world, production systems. But transitioning from prototype.

Machine Learning

Machine Learning Systems IT

Product-Led Growth: 6 Secrets for Success

Snowflake

OCTOBER 18, 2023

Product-led growth (PLG) is a business model that emerged in the last decade with the enormous success of vendors like Slack and Datadog. Unlike traditional sales-led models, PLG models cut out the middlemen (sales reps, for example) and let customers just download and use the product without third-party onboarding. The relative novelty of the pricing model and its demonstrably successful application in growing these companies attracted a lot of attention.

Designing

Designing Accessible Accessibility Systems

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

Systems

Sounds Like a Better Plan: USA Transportation Noise, Revised and Updated

ArcGIS

OCTOBER 17, 2023

The Living Atlas of the World just updated the tiled, hosted image service featuring transportation noise, from the USDOT.

Transportation

Transportation Designing

ChatGPT vs. BARD

KDnuggets

OCTOBER 17, 2023

Large language models (LLMs) are transforming the way we process and produce information. But, before considering either one of these models as a one-stop-solution, one must consider their key differences.

Process

Tools for measuring Cloud Carbon Emissions by Darren Smith

Scott Logic

OCTOBER 18, 2023

Introduction In my previous blog post I discussed how migrating to the Cloud could help your organisation reach its Net Zero goals. I discussed how shifting your workloads away from on-premises data centres can reduce emissions by allowing you to leverage the expertise of cloud providers and their greater efficiency of scale. It should be noted this isn’t always clear cut - do consider how energy efficient your current hosting is and the embodied carbon of any hardware you’d be decommissioning.

Cloud

Cloud AWS BI Accessible

Connecting with Clouderans

Cloudera

OCTOBER 20, 2023

There are some who believe that growing in your professional career and a desire to travel the world don’t mix well. I am not one of those people – in fact, I’m proof that these two ambitions can blend together to create a beautiful life. I’m Kinga Kamaras. My title at Cloudera is a Strategic Customer Success Manager. It’s a role I enjoy and growing in my career is a big ambition of mine.

Process

Process Management IT

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

Project

Automating Reality Mapping: Accelerate Your Drone Workflows with ArcGIS Reality for ArcGIS Pro

ArcGIS

OCTOBER 16, 2023

Streamline GIS workflows with ArcGIS Reality for ArcGIS Pro. Automate reality mapping, generate accurate geospatial products.

Semantic Layer: The Backbone of AI-powered Data Experiences

KDnuggets

OCTOBER 19, 2023

Looking to understand the semantic layer and how it can improve the AI-powered data experience? Read more to learn why a semantic layer can be the backbone of LLMs and reduce hallucinations.

Data

Data IT

Addressing the Challenges of Sample Ratio Mismatch in A/B Testing

DoorDash Engineering

OCTOBER 17, 2023

Experimentation isn’t just a cornerstone for innovation and sound decision-making; it’s often referred to as the gold standard for problem-solving, thanks in part to its roots in the scientific method. The term itself conjures a sense of rigor, validity, and trust. Yet as powerful as experimentation is, its integrity can be compromised by overlooked details and unforeseen challenges.

Education

Education Kafka Algorithm Data Warehouse

How the Lakehouse can optimize provider networks and improve member care

databricks

OCTOBER 20, 2023

Check out our Nearest Neighborhood Search Solution Accelerator to get started quickly. The Member Experience An insured member typically experiences their healthcare in.

Insurance

Insurance Healthcare

Building Your BI Strategy: How to Choose a Solution That Scales and Delivers

Speaker: Evelyn Chou

Choosing the right business intelligence (BI) platform can feel like navigating a maze of features, promises, and technical jargon. With so many options available, how can you ensure you’re making the right decision for your organization’s unique needs? 🤔 This webinar brings together expert insights to break down the complexities of BI solution vetting.

How Snowflake Helps Confront Data Challenges and Ensure Program Integrity in Healthcare and Human Services

Snowflake

OCTOBER 16, 2023

U.S. Health and Human Services agencies can solve data issues to break down data silos, improve disease surveillance and lower costs From February 2020 to the end of March 2023, Congress’s Families First Coronavirus Response Act (FFCRA) required the provision of continuous enrollment for people with Medicaid throughout the COVID-19 public health emergency (PHE), causing enrollment in Medicaid to grow by 23.2 million to nearly 95 million.

Healthcare

Healthcare Programming Hospitality Food

Gradient Descent: The Mountain Trekker’s Guide to Optimization with Mathematics

KDnuggets

OCTOBER 19, 2023

Gradient descent is an optimization technique used to minimise errors in machine learning models. By iteratively adjusting parameters in the steepest direction of decrease, it seeks the lowest error value.

Machine Learning

Machine Learning IT

Getting Started With Cloudera Open Data Lakehouse on Private Cloud

Cloudera

OCTOBER 16, 2023

Cloudera recently released a fully featured Open Data Lakehouse , powered by Apache Iceberg in the private cloud, in addition to what’s already been available for the Open Data Lakehouse in the public cloud since last year. This release signified Cloudera’s vision of Iceberg everywhere. Customers can deploy Open Data Lakehouse wherever the data resides — any public cloud, private cloud, or hybrid cloud, and port workloads seamlessly across deployments.

Cloud

Cloud Kafka SQL Data

LLM Training on Unity Catalog data with MosaicML Streaming Dataset

databricks

OCTOBER 17, 2023

Introduction Large Language Models (LLMs) have given us a way to generate text, extract information, and identify patterns in industries from healthcare to.

Datasets

Datasets Healthcare Data

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.

Government

Real-Time Inventory in Retail with Confluent Cloud

Confluent

OCTOBER 17, 2023

Use data streaming and stream processing (Flink, ksqlDB) to integrate data from store returns, purchases, exchanges, shipments, interstore transfers, etc., to produce a consistent, real-time view of inventory.

Retail

Retail Cloud Process Data

How To Fine-Tune ChatGPT 3.5 Turbo

KDnuggets

OCTOBER 16, 2023

This article has outlined how you can fine tune your GPT 3.5 Turbo models. You can do this by preparing your data, uploading your files, and then setting up a custom OpenAI session to handle the fine tuning.

Data

Accelerating Cost Reduction: AI Making an Impact on Financial Services

Cloudera

OCTOBER 18, 2023

In the ever-evolving landscape of the financial services Industry, change is a constant and transformation is a requirement — to stay at pace with new regulations, risk mitigation, and the technological developments that support transformation. And just as financial services experiences its cycles, this time of year I find myself returning to the topic of cost reduction.

Portfolio

Portfolio Banking Algorithm Machine Learning

Fastest way to get SAP HANA data into Databricks using SAP FedML

databricks

OCTOBER 20, 2023

SAP's recent announcement of a strategic partnership with Databricks has generated significant excitement among SAP customers. Databricks, the data and AI experts, presents.

Data

Launching LLM-Based Products: From Concept to Cash in 90 Days

Speaker: Christophe Louvion, Chief Product & Technology Officer of NRC Health and Tony Karrer, CTO at Aggregage

Christophe Louvion, Chief Product & Technology Officer of NRC Health, is here to take us through how he guided his company's recent experience of getting from concept to launch and sales of products within 90 days. In this exclusive webinar, Christophe will cover key aspects of his journey, including: LLM Development & Quick Wins 🤖 Understand how LLMs differ from traditional software, identifying opportunities for rapid development and deployment.

Designing

Sat.Oct 14, 2023 - Fri.Oct 20, 2023

Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable

Data News — Week 23.42

Webinars

Trending Sources

How to use Airflow templates and macros

Webinars

Watermark and input data filtering in Apache Spark Structured Streaming

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

7 Steps to Mastering Large Language Models (LLMs)

Revolutionizing Real-Time Streaming Processing: 4 Trillion Events Daily at LinkedIn

How Meta is creating custom silicon for AI

Sign up to get articles personalized to your interests!

More Trending

How Meta is creating custom silicon for AI

Prepare your data for the National Spatial Reference System modernization of 2022 in the U.S.

5 Free Books to Master Data Science

JSON Schemas to Nickel contracts

The benefits of modern data architecture

Changing the Game with MES: Cut Costs, Drive Efficiency, & Achieve Sustainability Goals!

Analysis of the XLS-30 AMM Amendment

7 Best Cloud Database Platforms

Simplifying Production MLOps with Lakehouse AI

Product-Led Growth: 6 Secrets for Success

Improving the Accuracy of Generative AI Systems: A Structured Approach

Sounds Like a Better Plan: USA Transportation Noise, Revised and Updated

ChatGPT vs. BARD

Tools for measuring Cloud Carbon Emissions by Darren Smith

Connecting with Clouderans

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Automating Reality Mapping: Accelerate Your Drone Workflows with ArcGIS Reality for ArcGIS Pro

Semantic Layer: The Backbone of AI-powered Data Experiences

Addressing the Challenges of Sample Ratio Mismatch in A/B Testing

How the Lakehouse can optimize provider networks and improve member care

Building Your BI Strategy: How to Choose a Solution That Scales and Delivers

How Snowflake Helps Confront Data Challenges and Ensure Program Integrity in Healthcare and Human Services

Gradient Descent: The Mountain Trekker’s Guide to Optimization with Mathematics

Getting Started With Cloudera Open Data Lakehouse on Private Cloud

LLM Training on Unity Catalog data with MosaicML Streaming Dataset

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Real-Time Inventory in Retail with Confluent Cloud

How To Fine-Tune ChatGPT 3.5 Turbo

Accelerating Cost Reduction: AI Making an Impact on Financial Services

Fastest way to get SAP HANA data into Databricks using SAP FedML

Launching LLM-Based Products: From Concept to Cash in 90 Days

Stay Connected