Top Data Engineering Digest Data Collection Data Integration Content for Week of Jun 01

Sat.Jun 01, 2019 - Fri.Jun 07, 2019

Evolving An ETL Pipeline For Better Productivity

Data Engineering Podcast

JUNE 3, 2019

Summary Building an ETL pipeline can be a significant undertaking, and sometimes it needs to be rebuilt when a better option becomes available. In this episode Aaron Gibralter, director of engineering at Greenhouse, joins Raghu Murthy, founder and CEO of DataCoral, to discuss the journey that he and his team took from an in-house ETL pipeline built out of open source components onto a paid service.

Media

Media Data Pipeline Machine Learning Data Science

Predictive CPU isolation of containers at Netflix

Netflix Tech

JUNE 4, 2019

By Benoit Rostykus, Gabriel Hartmann Noisy Neighbors We’ve all had noisy neighbors at one point in our life. Whether it’s at a cafe or through a wall of an apartment, it is always disruptive. The need for good manners in shared spaces turns out to be important not just for people, but for your Docker containers too. When you’re running in the cloud your containers are in a shared space; in particular they share the CPU’s memory hierarchy of the host instance.

Machine Learning

Machine Learning Metadata Systems Data Collection

Join 16,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Trending Sources

Why Hadoop Failed and Where We Go from Here

Teradata

JUNE 6, 2019

Chad Meley delves into the demise of Hadoop distribution vendors and how they got there.

Hadoop

Webinars

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

IValue: efficient representation of dynamic types in C++

Rockset

JUNE 6, 2019

Introduction In traditional SQL systems, a column's type is determined when the table is created, and never changes while executing a query. If you create a table with an integer-valued column, the values in that column will always be integers (or possibly NULL ). Rockset, however, is dynamically typed , which means that we often don't know the type of a value until we actually execute the query.

Bytes

Bytes Programming Language SQL Database

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

Data Science

Reliable, Fast Access to On-Chain Data Insights

Confluent

JUNE 7, 2019

At TokenAnalyst , we are building the core infrastructure to integrate, clean, and analyze blockchain data. Data on a blockchain is also known as on-chain data. We offer both historical and low-latency data streams of on-chain data across multiple blockchains. How we use Apache Kafka and the Confluent Platform. Apache Kafka ® is the central data hub of our company.

Accessible

Accessible Accessibility Kafka Scala

Improving Multi-tenancy with Virtual Private Clusters

Cloudera

JUNE 6, 2019

Noisy Neighbors in Large, Multi-Tenant Clusters. The typical Cloudera Enterprise Data Hub Cluster starts with a few dozen nodes in the customer’s datacenter hosting a variety of distributed services. Once configured and secured, the cluster administrator (admin) gives access to a few individuals to onboard their workloads. Over time, workloads start processing more data, tenants start onboarding more workloads, and administrators (admins) start onboarding more tenants.

Metadata

Metadata Data Lake Cloud Big Data

Sat.Jun 01, 2019 - Fri.Jun 07, 2019

Evolving An ETL Pipeline For Better Productivity

Predictive CPU isolation of containers at Netflix

Webinars

Trending Sources

Why Hadoop Failed and Where We Go from Here

Webinars

IValue: efficient representation of dynamic types in C++

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Reliable, Fast Access to On-Chain Data Insights

Improving Multi-tenancy with Virtual Private Clusters

Stay Connected