Top Data Engineering Digest Data Integration Metadata Content for June, 2019

June, 2019

The Workflow Engine For Data Engineers And Data Scientists

Data Engineering Podcast

JUNE 24, 2019

Summary Building a data platform that works equally well for data engineering and data science is a task that requires familiarity with the needs of both roles. Data engineering platforms have a strong focus on stateful execution and tasks that are strictly ordered based on dependency graphs. Data science platforms provide an environment that is conducive to rapid experimentation and iteration, with data flowing directly between stages.

Data Engineering

Data Engineering Data Engineer Engineering Data Science

Designing the.NET API for Apache Kafka

Confluent

JUNE 27, 2019

Confluent’s clients for Apache Kafka ® recently passed a major milestone—the release of version 1.0. This has been a long time in the making. Magnus Edenhill first started developing librdkafka about seven years ago, later joining Confluent in the very early days to help foster the community of Kafka users outside the Java ecosystem. Since then, the clients team has been on a mission to build a set of high-quality librdkafka bindings for different languages (initially Python , Go , and.NET

Kafka

Kafka Designing Java Coding

Join 16,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Trending Sources

Predictive CPU isolation of containers at Netflix

Netflix Tech

JUNE 4, 2019

By Benoit Rostykus, Gabriel Hartmann Noisy Neighbors We’ve all had noisy neighbors at one point in our life. Whether it’s at a cafe or through a wall of an apartment, it is always disruptive. The need for good manners in shared spaces turns out to be important not just for people, but for your Docker containers too. When you’re running in the cloud your containers are in a shared space; in particular they share the CPU’s memory hierarchy of the host instance.

Machine Learning

Machine Learning Metadata Systems Data Collection

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

What Working “at Scale” Really Means

Teradata

JUNE 25, 2019

Rob Armstrong discusses the challenges of moving from a departmental solution to operational and production systems working at scale, and how Teradata Vantage can solve for them.

Systems

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

Data Science

Should you have an ETL window in your Modern Data Warehouse?

Advancing Analytics: Data Engineering

JUNE 21, 2019

Ah the ETL (Extract-Transform-Load) Window, the schedule by which the Business Intelligence developer sets their clock, the nail-biting nightly period during which the on-call support hopes their phone won’t ring. It’s a cornerstone of the data warehousing approach… and we shouldn’t have one. There, I said it. Hear me out – back in the on-premises days we had data loading processes that connect directly to our source system databases and perform huge data extract queries as the start of one long

Data Warehouse

Data Warehouse Business Intelligence Data Data Validation

Building a SQL Development Environment for Messy, Semi-Structured Data

Rockset

JUNE 13, 2019

Why build a new SQL development environment? We love SQL — our mission is to bring fast, real-time queries to messy, semi-structured real-world data and SQL is a core part of our effort. A SQL API allows our product to fit neatly into the stacks of our users without any workflow re-architecting. Our users can easily integrate Rockset with a multitude of existing tools for SQL development (e.g.

SQL

SQL Structured Data Building Raw Data

Managing The Machine Learning Lifecycle

Data Engineering Podcast

JUNE 9, 2019

Summary Building a machine learning model can be difficult, but that is only half of the battle. Having a perfect model is only useful if you are able to get it into production. In this episode Stepan Pushkarev, founder of Hydrosphere, explains why deploying and maintaining machine learning projects in production is different from regular software projects and the challenges that they bring.

Machine Learning

Machine Learning Management Scala Data Science

More Trending

Managing The Machine Learning Lifecycle

Data Engineering Podcast

JUNE 9, 2019

Machine Learning

Machine Learning Management Scala Data Science

How to Connect KSQL to Confluent Cloud using Kubernetes with Helm

Confluent

JUNE 12, 2019

Confluent Cloud, a fully managed event cloud-native streaming service that extends the value of Apache Kafka ® , is simple, resilient, secure, and performant, allowing you to focus on what is important—building contextual event-driven applications, not infrastructure. If you are using Confluent Cloud as your managed Apache Kafka cluster, you probably also want to start using other Confluent Platform components like the Confluent Schema Registry, Kafka Connect, KSQL, and Confluent REST Proxy.

Cloud

Cloud Kafka Healthcare Software Engineer

Netflix Studio Hack Day?—?May 2019

Netflix Tech

JUNE 20, 2019

Netflix Studio Hack Day ?—?May 2019 By Tom Richards , Carenina Garcia Motion , and Marlee Tart Hack Days are a big deal at Netflix. They’re a chance to bring together employees from all our different disciplines to explore new ideas and experiment with emerging technologies. For the most recent hack day, we channeled our creative energy towards our studio efforts.

Java

Java AWS Project Technology

Why Hadoop Failed and Where We Go from Here

Teradata

JUNE 6, 2019

Chad Meley delves into the demise of Hadoop distribution vendors and how they got there.

Hadoop

Unlock the Value of Data Faster Through Modern Data Warehousing

Advancing Analytics: Data Engineering

JUNE 10, 2019

Data has value – I think we’ve finally got to the point where most people agree on this. The problem we face is how long it takes to unlock that value, and it’s a frustration that most businesses I speak to are having. Let’s think about why this is. After the horror that was the “data silo” days, with clumps of data living in Access databases, Excel spreadsheets and isolated data stores, we’ve had a pretty good run with the classic Kimball data warehouse.

Data Lake

Data Lake Data Warehouse Data Data Validation

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

Engineering

How We Use RocksDB at Rockset

Rockset

JUNE 27, 2019

In this blog post, I'll describe how we use RocksDB at Rockset and how we tuned it to get the most performance out of it. I assume that the reader is generally familiar with how Log-Structured Merge tree based storage engines like RocksDB work. At Rockset, we want our users to be able to continuously ingest their data into Rockset with sub-second write latency and query it in 10s of milliseconds.

Bytes

Bytes Metadata Cloud Engineering

Evolving An ETL Pipeline For Better Productivity

Data Engineering Podcast

JUNE 3, 2019

Summary Building an ETL pipeline can be a significant undertaking, and sometimes it needs to be rebuilt when a better option becomes available. In this episode Aaron Gibralter, director of engineering at Greenhouse, joins Raghu Murthy, founder and CEO of DataCoral, to discuss the journey that he and his team took from an in-house ETL pipeline built out of open source components onto a paid service.

Media

Media Data Pipeline Machine Learning Data Science

Microservices, Apache Kafka, and Domain-Driven Design

Confluent

JUNE 26, 2019

Microservices have a symbiotic relationship with domain-driven design (DDD)—a design approach where the business domain is carefully modeled in software and evolved over time, independently of the plumbing that makes the system work. I see this pattern coming up more and more in the field in conjunction with Apache Kafka ®. In these projects, microservice architectures use Kafka as an event streaming platform.

Kafka

Kafka Designing Architecture ETL Tools

Cloudera Provides First Look at Cloudera Data Platform, the Industry’s First Enterprise Data Cloud

Cloudera

JUNE 25, 2019

Cloudera Unveils Industry’s First Enterprise Data Cloud in Webinar. How do you take a mission-critical on-premises workload and rapidly burst it to the cloud? Can you instantly auto-scale resources as demand requires and just as easily pause your work so you don’t run up your cloud bill? On June 18th, Cloudera provided an exclusive preview of these capabilities, and more, with the introduction of Cloudera Data Platform (CDP), the industry’s first enterprise data cloud.

Cloud

Cloud Entertainment Government Machine Learning

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

Building

How Teradata and Oxford Saïd are Modernizing Analytics for Academic Research

Teradata

JUNE 26, 2019

Oxford and Teradata partner to modernize analytics for academic research, shape new bodies of research and find answers to pressing business challenges.

Modern Data Warehousing with Azure Databricks at the #PASSSummit in Seattle

Advancing Analytics: Data Engineering

JUNE 10, 2019

Hey everyone, Advancing Analytics are heading to Seattle in November for the PASS Summit. We will be delivering a full day training day on Azure Databricks - Practical Azure Databricks: Engineering & Warehousing at Scale. The session will focus on using Azure Databricks for Modern Data Warehousing. Not sure if the day is for you? Well take a look at the video we recorded.

Data Science

Data Science Data Engineering

IValue: efficient representation of dynamic types in C++

Rockset

JUNE 6, 2019

Introduction In traditional SQL systems, a column's type is determined when the table is created, and never changes while executing a query. If you create a table with an integer-valued column, the values in that column will always be integers (or possibly NULL ). Rockset, however, is dynamically typed , which means that we often don't know the type of a value until we actually execute the query.

Bytes

Bytes Programming Language SQL Database

Maintaining Your Data Lake At Scale With Spark

Data Engineering Podcast

JUNE 16, 2019

Summary Building and maintaining a data lake is a choose your own adventure of tools, services, and evolving best practices. The flexibility and freedom that data lakes provide allows for generating significant value, but it can also lead to anti-patterns and inconsistent quality in your analytics. Delta Lake is an open source, opinionated framework built on top of Spark for interacting with and maintaining data lake platforms that incorporates the lessons learned at DataBricks from countless cu

Data Lake

Data Lake Lambda Architecture Data Warehouse Hadoop

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

Project

Building a Scalable Search Architecture

Confluent

JUNE 18, 2019

Software projects of all sizes and complexities have a common challenge: building a scalable solution for search. Who has never seen an application use RDBMS SQL statements to run searches? You might be wondering, is this a good solution? As the databases professor at my university used to say, it depends. Using SQL to run your search might be enough for your use case, but as your project requirements grow and more advanced features are needed—for example, enabling synonyms, multilingual search,

Architecture

Architecture Building Kafka Database-centric

Improving Multi-tenancy with Virtual Private Clusters

Cloudera

JUNE 6, 2019

Noisy Neighbors in Large, Multi-Tenant Clusters. The typical Cloudera Enterprise Data Hub Cluster starts with a few dozen nodes in the customer’s datacenter hosting a variety of distributed services. Once configured and secured, the cluster administrator (admin) gives access to a few individuals to onboard their workloads. Over time, workloads start processing more data, tenants start onboarding more workloads, and administrators (admins) start onboarding more tenants.

Metadata

Metadata Data Lake Cloud Big Data

Why Vantage Is Our Most Popular Release Ever

Teradata

JUNE 30, 2019

Teradata Vantage is busting through analytic silos and raising the bar. Find out what drove these innovations and led to Vantage becoming our most popular release yet.

Four Reasons Why Upgrading to Vantage is Worth It

Teradata

JUNE 16, 2019

Running older Teradata analytics software versions may not support the latest innovations of Vantage and could cost you more than upgrading. Learn more.

The Big Payoff of Application Analytics

Outdated or absent analytics won’t cut it in today’s data-driven applications – not for your end users, your development team, or your business. That’s what drove the five companies in this e-book to change their approach to analytics. Download this e-book to learn about the unique problems each company faced and how they achieved huge returns beyond expectation by embedding analytics into applications.

Building

Swedbank Delivers Superior Customer Experience by Illuminating the Customer Journey

Teradata

JUNE 25, 2019

Find out how Swedbank has partnered with Teradata to illuminate the customer journey, delivering answers to the business and a superior customer experience.

New As-a-Service Offers on Vantage Bring Simplicity, Modernization

Teradata

JUNE 9, 2019

Analytics as a service lets you offload IT infrastructure tasks so you can focus on solving your toughest business problems. Learn more about options for Teradata Vantage.

The Data Lake is Dead; Long Live the Data Lake!

Teradata

JUNE 13, 2019

Martin Wilcox examines the failure of data lakes.

Data Lake

Data Lake Data

How Moving to the Cloud Helped Craft the Ideal Fan Experience for Ticketmaster

Teradata

JUNE 23, 2019

Learn how moving to the cloud in 10 weeks enabled Ticketmaster to gain greater visibility into their data and respond to business needs quicker.

Cloud

Cloud Data

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

Certification

AI for Industrials: Why is it different?

Teradata

JUNE 18, 2019

Cheryl Wiebe examines the challenges of using AI in industrial situations.

What Tableau Customers Should Expect Post-Salesforce Acquisition

Teradata

JUNE 11, 2019

Chad Meley examines how Salesforce's acquisition of Tableau will impact customer choice and flexibility.

Reliable, Fast Access to On-Chain Data Insights

Confluent

JUNE 7, 2019

At TokenAnalyst , we are building the core infrastructure to integrate, clean, and analyze blockchain data. Data on a blockchain is also known as on-chain data. We offer both historical and low-latency data streams of on-chain data across multiple blockchains. How we use Apache Kafka and the Confluent Platform. Apache Kafka ® is the central data hub of our company.

Accessible

Accessible Accessibility Kafka Scala

Spring for Apache Kafka Deep Dive – Part 4: Continuous Delivery of Event Streaming Pipelines

Confluent

JUNE 11, 2019

For event streaming application developers, it is important to continuously update the streaming pipeline based on the need for changes in the individual applications in the pipeline. It is also important to understand some of the common streaming topologies that streaming developers use to build an event streaming pipeline. Here in part 4 of the Spring for Apache Kafka Deep Dive blog series, we will cover: Common event streaming topology patterns supported in Spring Cloud Data Flow.

Kafka

Kafka Cloud Java MongoDB

Driving Business Impact for PMs

Speaker: Jon Harmer, Product Manager for Google Cloud

Move from feature factory to customer outcomes and drive impact in your business! This session will provide you with a comprehensive set of tools to help you develop impactful products by shifting from output-based thinking to outcome-based thinking. You will deepen your understanding of your customers and their needs as well as identifying and de-risking the different kinds of hypotheses built into your roadmap.

Certification

June, 2019

The Workflow Engine For Data Engineers And Data Scientists

Designing the.NET API for Apache Kafka

Webinars

Trending Sources

Predictive CPU isolation of containers at Netflix

Webinars

What Working “at Scale” Really Means

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Should you have an ETL window in your Modern Data Warehouse?

Building a SQL Development Environment for Messy, Semi-Structured Data

Managing The Machine Learning Lifecycle

Sign up to get articles personalized to your interests!

More Trending

Managing The Machine Learning Lifecycle

How to Connect KSQL to Confluent Cloud using Kubernetes with Helm

Netflix Studio Hack Day?—?May 2019

Why Hadoop Failed and Where We Go from Here

Unlock the Value of Data Faster Through Modern Data Warehousing

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

How We Use RocksDB at Rockset

Evolving An ETL Pipeline For Better Productivity

Microservices, Apache Kafka, and Domain-Driven Design

Cloudera Provides First Look at Cloudera Data Platform, the Industry’s First Enterprise Data Cloud

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

How Teradata and Oxford Saïd are Modernizing Analytics for Academic Research

Modern Data Warehousing with Azure Databricks at the #PASSSummit in Seattle

IValue: efficient representation of dynamic types in C++

Maintaining Your Data Lake At Scale With Spark

Entity Resolution Checklist: What to Consider When Evaluating Options

Building a Scalable Search Architecture

Improving Multi-tenancy with Virtual Private Clusters

Why Vantage Is Our Most Popular Release Ever

Four Reasons Why Upgrading to Vantage is Worth It

The Big Payoff of Application Analytics

Swedbank Delivers Superior Customer Experience by Illuminating the Customer Journey

New As-a-Service Offers on Vantage Bring Simplicity, Modernization

The Data Lake is Dead; Long Live the Data Lake!

How Moving to the Cloud Helped Craft the Ideal Fan Experience for Ticketmaster

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

AI for Industrials: Why is it different?

What Tableau Customers Should Expect Post-Salesforce Acquisition

Reliable, Fast Access to On-Chain Data Insights

Spring for Apache Kafka Deep Dive – Part 4: Continuous Delivery of Event Streaming Pipelines

Driving Business Impact for PMs

Stay Connected