Top Data Engineering Digest Data Engineer Data Engineering Content for Week of Jan 01

Sat.Jan 01, 2022 - Fri.Jan 07, 2022

Why Do Machine Learning Models Die In Silence?

KDnuggets

JANUARY 5, 2022

A critical problem for companies when integrating machine learning in their business processes is not knowing why they don't perform well after a while. The reason is called concept drift. Here's an informational guide to understanding the concept well.

Machine Learning

Machine Learning Process

Data Observability Out Of The Box With Metaplane

Data Engineering Podcast

JANUARY 7, 2022

Summary Data observability is a set of technical and organizational capabilities related to understanding how your data is being processed and used so that you can proactively identify and fix errors in your workflows. In this episode Metaplane founder Kevin Hu shares his working definition of the term and explains the work that he and his team are doing to cut down on the time to adoption for this new set of practices.

BI Data Warehouse Metadata SQL

Join 16,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Beyond the Basics of A/B Tests: Innovative Experimentation Tactics You Need to Know as a Data or Product Professional

MORE WEBINARS

Trending Sources

The Link To Cloud: How to Build a Seamless and Secure Hybrid Data Bridge with Cluster Linking

Confluent

JANUARY 5, 2022

Chances are your business is migrating to the cloud. But if you operate business applications in an on-premises datacenter, you know firsthand that the journey to the cloud is fraught […].

Cloud

Cloud Building Data

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Beyond the Basics of A/B Tests: Innovative Experimentation Tactics You Need to Know as a Data or Product Professional

MORE WEBINARS

DataOps For Business Analytics Teams

DataKitchen

JANUARY 3, 2022

Business analysts often find themselves in a no-win situation with constraints imposed from all sides. Their business unit colleagues ask an endless stream of urgent questions that require analytic insights. Business analysts must rapidly deliver value and simultaneously manage fragile and error-prone analytics production pipelines. Data tables from IT and other data sources require a large amount of repetitive, manual work to be used in analytics.

Business Analyst

Business Analyst Data Lake Consulting Data Analytics

Beyond the Basics of A/B Tests: Innovative Experimentation Tactics You Need to Know as a Data or Product Professional

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

Data

Why are More Developers Using Python for Their Machine Learning Projects?

KDnuggets

JANUARY 4, 2022

To support the creation of new and exciting ML and artificial intelligence (AI) applications, developers need a robust programming language. That's where the Python programming language comes in.

Machine Learning

Machine Learning Python Programming Language Project

A Reflection On The Data Ecosystem For The Year 2021

Data Engineering Podcast

JANUARY 1, 2022

Summary This has been an active year for the data ecosystem, with a number of new product categories and substantial growth in existing areas. In an attempt to capture the zeitgeist Maura Church, David Wallace, Benn Stancil, and Gleb Mezhanskiy join the show to reflect on the past year and share their thought son the year to come. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to

Data Warehouse

Data Warehouse Data Lake SQL Hadoop

Auto-Balance and Optimize Apache Kafka Clusters with Improved Observability and Elasticity in Confluent Platform 7.0

Confluent

JANUARY 6, 2022

While Self-Balancing Clusters (SBC) perform effectively in balancing Apache Kafka® clusters, one of the common themes we hear from our users is that they would love some visibility into the […].

Kafka

More Trending

Auto-Balance and Optimize Apache Kafka Clusters with Improved Observability and Elasticity in Confluent Platform 7.0

Confluent

JANUARY 6, 2022

While Self-Balancing Clusters (SBC) perform effectively in balancing Apache Kafka® clusters, one of the common themes we hear from our users is that they would love some visibility into the […].

Kafka

Trend-Setting Products in Data and Information Management for 2022

DataKitchen

JANUARY 7, 2022

The post Trend-Setting Products in Data and Information Management for 2022 first appeared on DataKitchen.

Management

Management Data

Misconceptions About Semantic Segmentation Annotation

KDnuggets

JANUARY 6, 2022

Semantic segmentation is a computer vision problem that entails putting related elements of an image into the same class. Read on to discover more, including the difficulties associated with annotation.

TypeScript Types from Class Properties

Grouparoo

JANUARY 5, 2022

At Grouparoo, we use a lot of TypeScript. We are always striving to enhance our usage of strong TypeScript types to make better software, and to make it easier to develop Grouparoo. Strong types make it easy for team members to get quick validation about new code, and see hints and tips in their IDEs - a double win! Recently, I found myself repeating a lot of metadata when defining a new API endpoint as I was working to enable noImplicitAny within the @grouparoo/core project.

Utilities

Utilities Metadata Coding Building

Top Retail Predictions for 2022

Teradata

JANUARY 4, 2022

From supply chain to inflation, our top retail industry consultants weigh in on what the retail & CPG industry will experience in 2022 and beyond.

Retail

Retail Consulting

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

Engineering

Data Science and AI Predictions for 2022

DataKitchen

JANUARY 3, 2022

The post Data Science and AI Predictions for 2022 first appeared on DataKitchen.

Data Science

Data Science Data

What is Transfer Learning?

KDnuggets

JANUARY 5, 2022

During transfer learning, the knowledge leveraged and rapid progress from a source task is used to improve the learning and development to a new target task. Read on for a deeper dive on the subject.

Machine Learning

Check out my first course on LinkedIn Learning: Security in Fintech – Essential Training

Hepta Analytics

JANUARY 4, 2022

Today my first LinkedIn Learning course on securing fintech solutions went live! Securing fintech solutions from Security in Fintech Essential Training by Emmanuel Chebukati It was an exciting surprise to wake up to the notifications of the course’s release, and to see the initial reactions it elicited. This demonstrative course covers the essentials that fintech providers and professionals in the industry ought to implement to arrive at a baseline security posture.

Media

Media Designing Building Cloud Computing

Implementing GFANZ Requirements Needs Granular Data at Scale – Here’s How to Prepare

Teradata

JANUARY 6, 2022

Learn more about the pressures and some of the potential responses for banks in the rapidly evolving area of climate risk.

Banking

Banking Data

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

Building

The State of Data Engineering in 2022

RudderStack

JANUARY 5, 2022

In 2021, we wrote about trends we saw emerging in data engineering and made a few predictions. Here, we revisit those predictions and make a few for 2022.

Data Engineering

Data Engineering Data Engineer Engineering Data

Hands-on Reinforcement Learning Course Part 3: SARSA

KDnuggets

JANUARY 3, 2022

This is part 3 of my hands-on course on reinforcement learning, which takes you from zero to HERO. Today we will learn about SARSA, a powerful RL algorithm.

Algorithm

Creating Shared Context For Your Data Warehouse With A Controlled Vocabulary

Data Engineering Podcast

JANUARY 1, 2022

Summary Communication and shared context are the hardest part of any data system. In recent years the focus has been on data catalogs as the means for documenting data assets, but those introduce a secondary system of record in order to find the necessary information. In this episode Emily Riederer shares her work to create a controlled vocabulary for managing the semantic elements of the data managed by her team and encoding it in the schema definitions in her data warehouse.

Data Warehouse

Data Warehouse BI Data Workflow Coding

10 Python Data Visualization Libraries to Win Over Your Insights

ProjectPro

JANUARY 6, 2022

Can you believe that the human brain takes only 13 milliseconds to process an image? Humans crave stories, and visualizations allow us to create one from data. The majority of data that data scientists and machine learning engineers work with is in a structured or unstructured format that is challenging for humans to analyze and comprehend. Understanding data requires the use of data visualizations, and this is because visuals are processed 60,000 times faster than text inside the human brain.

Python

Python Datasets Programming Language Data Science

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

Project

Refactoring RudderStack's High-performance JavaScript SDK

RudderStack

JANUARY 6, 2022

This post details our engineering team's decision making process for optimizing our Javascript SDK and highlights the results of their work.

Engineering

Engineering Process

How I Tripled My Income With Data Science in 18 Months

KDnuggets

JANUARY 3, 2022

Over a year ago, I lost my job due to the COVID-19 pandemic. During this this, I taught myself data science and tripled my income.

Data Science

Data Science Data

What is Data Engineering? Everything You Need to Know in 2022

phData: Data Engineering

JANUARY 3, 2022

What is Data Engineering? Everything You Need to Know in 2022 Nick Goble January 4, 2022 It’s easy to overlook the amount of data that’s being generated every day — from your smartphone, your Zoom calls, to your Wi-Fi-connected dishwasher. It is estimated that the world will have created and stored 200 Zettabytes of data by the year 2025.

Data Engineering

Data Engineering Data Engineer Engineering Data Governance

Open Source Reverse ETL For Everyone With Grouparoo

Data Engineering Podcast

JANUARY 7, 2022

Summary Reverse ETL is a product category that evolved from the landscape of customer data platforms with a number of companies offering their own implementation of it. While struggling with the work of automating data integration workflows with marketing, sales, and support tools Brian Leonard accidentally discovered this need himself and turned it into the open source framework Grouparoo.

ETL System

ETL System Data Pipeline Data Warehouse Architecture

How to Build an Experimentation Culture for Data-Driven Product Development

Speaker: Margaret-Ann Seger, Head of Product, Statsig

Experimentation is often seen as an aspirational practice, especially at smaller, fast-moving companies who are strapped for time and resources. So, how can you get your team making decisions in a more data-driven way while continuing to remain lean and maintaining ship velocity? In this webinar, Margaret-Ann Seger, Head of Product at Statsig, will teach you how to build an experimentation culture from the ground-up, graduating from just getting started with data-driven development to operating

Building

RudderStack and Braze Power Advanced Customer Engagement

RudderStack

JANUARY 7, 2022

Leveraging RudderStack with Braze, you can effortlessly sync data in and out of the customer engagement platform.

Data

Learn Deep Learning by Building 15 Neural Network Projects in 2022

KDnuggets

JANUARY 4, 2022

Here are 15 neural network projects you can take on in 2022 to build your skills, your know-how, and your portfolio.

Deep Learning

Deep Learning Project Building Portfolio

DataOps: What Is It, Core Principles, and Tools For Implementation

phData: Data Engineering

JANUARY 3, 2022

DataOps: What Is It, Core Principles, and Tools For Implementation Nick Goble January 3, 2022 When building a successful company, it’s critical to have a strategy around how you build and scale your business from a technology and data perspective. Your business likely has competitors that are trying to beat you to market, technology is constantly evolving, and so are your customers.

IT AWS Software Engineer Software Engineering

Monte Carlo Announces dbt Core Integration to Help Companies Ship Reliable Data Faster

Monte Carlo

JANUARY 5, 2022

When it comes to trusting your data, Monte Carlo, the leading data observability platform and dbt Core are better together. “Why didn’t my job run?” “What happened to this dashboard?” “Why is this column missing?” “What went wrong with my data?!” If you’ve been on the receiving end of a broken data pipeline, these questions probably look familiar to you.

Retail

Retail Metadata Data Pipeline Software Engineer

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

Certification

Mythbusting: The Venerable SQL Database and Today’s Real-Time Analytics

Rockset

JANUARY 5, 2022

Rockset is the real-time analytics database in the cloud for modern data teams. Get faster analytics on fresher data, at lower costs, by exploiting indexing over brute-force scanning. It's not your father’s Oracle cluster, but better.* We all know the lightning pace of software innovation. Show me a technology or platform that’s been around for a decade, and I’ll show you an outmoded relic that’s been leapfrogged by faster, more efficient competitors.

Database

Database SQL NoSQL Raw Data

How to Build a Logistic Regression Model in R?

ProjectPro

JANUARY 4, 2022

Whether it is predicting the likelihood of having a heart attack based on weight and workout routine or predicting the probability of email being spam based on the country of origin and word count -logistic regression is widely used because of its remarkable results. It is a machine learning method to solve a classification problem by differentiating one class from another in a given dataset.

Building

Building Datasets Machine Learning Algorithm

SQL Interview Questions for Experienced Professionals

KDnuggets

JANUARY 7, 2022

This article will show you what SQL concepts you should know as an experienced professional.

SQL

Reimagined: Building Products with Generative AI

“Reimagined: Building Products with Generative AI” is an extensive guide for integrating generative AI into product strategy and careers featuring over 150 real-world examples, 30 case studies, and 20+ frameworks, and endorsed by over 20 leading AI and product executives, inventors, entrepreneurs, and researchers.

Building

Sat.Jan 01, 2022 - Fri.Jan 07, 2022

Why Do Machine Learning Models Die In Silence?

Data Observability Out Of The Box With Metaplane

Webinars

Trending Sources

The Link To Cloud: How to Build a Seamless and Secure Hybrid Data Bridge with Cluster Linking

Webinars

DataOps For Business Analytics Teams

Beyond the Basics of A/B Tests: Innovative Experimentation Tactics You Need to Know as a Data or Product Professional

Why are More Developers Using Python for Their Machine Learning Projects?

A Reflection On The Data Ecosystem For The Year 2021

Auto-Balance and Optimize Apache Kafka Clusters with Improved Observability and Elasticity in Confluent Platform 7.0

Sign up to get articles personalized to your interests!

More Trending

Auto-Balance and Optimize Apache Kafka Clusters with Improved Observability and Elasticity in Confluent Platform 7.0

Trend-Setting Products in Data and Information Management for 2022

Misconceptions About Semantic Segmentation Annotation

TypeScript Types from Class Properties

Top Retail Predictions for 2022

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Data Science and AI Predictions for 2022

What is Transfer Learning?

Check out my first course on LinkedIn Learning: Security in Fintech – Essential Training

Implementing GFANZ Requirements Needs Granular Data at Scale – Here’s How to Prepare

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

The State of Data Engineering in 2022

Hands-on Reinforcement Learning Course Part 3: SARSA

Creating Shared Context For Your Data Warehouse With A Controlled Vocabulary

10 Python Data Visualization Libraries to Win Over Your Insights

Entity Resolution Checklist: What to Consider When Evaluating Options

Refactoring RudderStack's High-performance JavaScript SDK

How I Tripled My Income With Data Science in 18 Months

What is Data Engineering? Everything You Need to Know in 2022

Open Source Reverse ETL For Everyone With Grouparoo

How to Build an Experimentation Culture for Data-Driven Product Development

RudderStack and Braze Power Advanced Customer Engagement

Learn Deep Learning by Building 15 Neural Network Projects in 2022

DataOps: What Is It, Core Principles, and Tools For Implementation

Monte Carlo Announces dbt Core Integration to Help Companies Ship Reliable Data Faster

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Mythbusting: The Venerable SQL Database and Today’s Real-Time Analytics

Top Stories, Dec 20 – Jan 2: 3 Tools to Track and Visualize the Execution of Your Python Code

How to Build a Logistic Regression Model in R?

SQL Interview Questions for Experienced Professionals

Reimagined: Building Products with Generative AI

Stay Connected