Top Data Engineering Digest Data Schemas Data Preparation Content for October, 2019

October, 2019

Build Maintainable And Testable Data Applications With Dagster

Data Engineering Podcast

OCTOBER 28, 2019

Summary Despite the fact that businesses have relied on useful and accurate data to succeed for decades now, the state of the art for obtaining and maintaining that information still leaves much to be desired. In an effort to create a better abstraction for building data applications Nick Schrock created Dagster. In this episode he explains his motivation for creating a product for data management, how the programming model simplifies the work of building testable and maintainable pipelines, and

Building

Building Data Pipeline Programming Language Metadata

Everything a Data Scientist Should Know About Data Management

KDnuggets

OCTOBER 22, 2019

For full-stack data science mastery, you must understand data management along with all the bells and whistles of machine learning. This high-level overview is a road map for the history and current state of the expansive options for data storage and infrastructure solutions.

Data Management

Data Management Management Data Storage Machine Learning

Join 16,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Trending Sources

Evolving Michelangelo Model Representation for Flexibility at Scale

Uber Engineering

OCTOBER 16, 2019

Michelangelo , Uber’s machine learning (ML) platform, supports the training and serving of thousands of models in production across the company. Designed to cover the end-to-end ML workflow, the system currently supports classical machine learning, time series forecasting, and deep … The post Evolving Michelangelo Model Representation for Flexibility at Scale appeared first on Uber Engineering Blog.

Machine Learning

Machine Learning Engineering Designing Systems

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Delta: A Data Synchronization and Enrichment Platform

Netflix Tech

OCTOBER 15, 2019

Part I: Overview Andreas Andreakis , Falguni Jhaveri , Ioannis Papapanagiotou , Mark Cho , Poorna Reddy , Tongliang Liu Overview It is a commonly observed pattern for applications to utilize multiple datastores where each is used to serve a specific need such as storing the canonical form of data (MySQL etc.), providing advanced search capabilities (ElasticSearch etc.), caching (Memcached etc.), and more.

Transportation

Transportation MySQL Kafka Data

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

Data Science

Why You Should Learn Data Engineering

Dataquest

OCTOBER 16, 2019

Exciting news: we just launched a totally revamped Data Engineering path that offers from-scratch training for anyone who wants to become a data engineer or learn some data engineering skills. Looks cool, right? But it begs the question: why learn data engineering in the first place? Typically, data science teams are comprised of data analysts, data scientists, and data engineers.

Data Engineering

Data Engineering Data Engineer Engineering Data Science

Teradata is Moving the Cloud Forward

Teradata

OCTOBER 21, 2019

With four new offerings, Teradata is helping companies move from analytics to answers wherever they are on their cloud journey. Read more.

Cloud

Data Orchestration For Hybrid Cloud Analytics

Data Engineering Podcast

OCTOBER 21, 2019

Summary The scale and complexity of the systems that we build to satisfy business requirements is increasing as the available tools become more sophisticated. In order to bridge the gap between legacy infrastructure and evolving use cases it is necessary to create a unifying set of components. In this episode Dipti Borkar explains how the emerging category of data orchestration tools fills this need, some of the existing projects that fit in this space, and some of the ways that they can work to

Cloud

Cloud Data Lake Hadoop Programming Language

More Trending

Data Orchestration For Hybrid Cloud Analytics

Data Engineering Podcast

OCTOBER 21, 2019

Cloud

Cloud Data Lake Hadoop Programming Language

10 Free Top Notch Natural Language Processing Courses

KDnuggets

OCTOBER 7, 2019

Are you looking to learn natural language processing? This collection of 10 free top notch courses will allow you to do just that, with something for every approach to learning NLP and its varied topics.

Process

Process IT

Evolving Michelangelo Model Representation for Flexibility at Scale

Uber Engineering

OCTOBER 16, 2019

Machine Learning

Machine Learning Engineering Designing Systems

ML Platform Meetup: Infra for Contextual Bandits and Reinforcement Learning

Netflix Tech

OCTOBER 18, 2019

Faisal Siddiqi Infrastructure for Contextual Bandits and Reinforcement Learning?—? theme of the ML Platform meetup hosted at Netflix, Los Gatos on Sep 12, 2019. Contextual and Multi-armed Bandits enable faster and adaptive alternatives to traditional A/B Testing. They enable rapid learning and better decision-making for product rollouts. Broadly speaking, these approaches can be seen as a stepping stone to full-on Reinforcement Learning (RL) with closed-loop, on-policy evaluation and model objec

Algorithm

Algorithm Architecture Machine Learning Deep Learning

Go From Total Beginner to Data Engineer with Our New Path

Dataquest

OCTOBER 16, 2019

We’ve got some really exciting news: we’ve just launched a total revamp of our Data Engineering learning path ! This revamped path is designed to be more like our other course paths. You can start it even if you have no prior experience with coding , and it’ll take you from total beginner to experienced practitioner with all of the core skills needed to become a data engineer.

Data Engineering

Data Engineering Data Engineer Engineering SQL

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

Engineering

How to Deliver Better Business Outcomes with Predictive Modeling

Teradata

OCTOBER 3, 2019

Predict the future faster with predictive modeling. Learn more about use cases and how to get more value out of your data.

Data

Keeping Your Data Warehouse In Order With DataForm

Data Engineering Podcast

OCTOBER 14, 2019

Summary Managing a data warehouse can be challenging, especially when trying to maintain a common set of patterns. Dataform is a platform that helps you apply engineering principles to your data transformations and table definitions, including unit testing SQL scripts, defining repeatable pipelines, and adding metadata to your warehouse to improve your team’s communication.

Data Warehouse

Data Warehouse PostgreSQL AWS Programming Language

Choosing the Right Clustering Algorithm for your Dataset

KDnuggets

OCTOBER 2, 2019

Applying a clustering algorithm is much easier than selecting the best one. Each type offers pros and cons that must be considered if you’re striving for a tidy cluster structure.

Algorithm

Algorithm Datasets

Machine Learning and Real-Time Analytics in Apache Kafka Applications

Confluent

OCTOBER 31, 2019

The relationship between Apache Kafka® and machine learning (ML) is an interesting one that I’ve written about quite a bit in How to Build and Deploy Scalable Machine Learning in […].

Machine Learning

Machine Learning Kafka Building Process

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

Building

ML Platform Meetup: Infra for Contextual Bandits and Reinforcement Learning

Netflix Tech

OCTOBER 18, 2019

Algorithm

Algorithm Architecture Machine Learning Deep Learning

ALL SYSTEMS GO.

Preset

OCTOBER 8, 2019

Preset Announcement

Systems

Embracing the Darkness: Vantage Developer

Teradata

OCTOBER 22, 2019

With our renewed focus on user experience, we’re applying user-centered design principles & conducting ethnographic research on key personas, starting with developers.

Designing

Fast Analytics On Semi-Structured And Structured Data In The Cloud

Data Engineering Podcast

OCTOBER 7, 2019

Summary The process of exposing your data through a SQL interface has many possible pathways, each with their own complications and tradeoffs. One of the recent options is Rockset, a serverless platform for fast SQL analytics on semi-structured and structured data. In this episode CEO Venkat Venkataramani and SVP of Product Shruti Bhat explain the origins of Rockset, how it is architected to allow for fast and flexible SQL analytics on your data, and how their serverless platform can save you th

Structured Data

Structured Data Cloud SQL Programming Language

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

Project

Data Preparation for Machine learning 101: Why it’s important and how to do it

KDnuggets

OCTOBER 2, 2019

As data scientists who are the brains behind the AI-based innovations, you need to understand the significance of data preparation to achieve the desired level of cognitive capability for your models. Let’s begin.

Data Preparation

Data Preparation Machine Learning IT Data

How to Run Apache Kafka with Spring Boot on Pivotal Application Service (PAS)

Confluent

OCTOBER 7, 2019

This tutorial describes how to set up a sample Spring Boot application in Pivotal Application Service (PAS), which consumes and produces events to an Apache Kafka ® cluster running in Pivotal Container Service (PKS). With this tutorial, you can set up your PAS and PKS configurations so that they work with Kafka. For a tutorial on how to set up a Kafka cluster in PKS, please see How to Deploy Confluent Platform on Pivotal Container Service (PKS) with Confluent Operator.

Kafka

Kafka Java Coding Accessible

Open-sourcing Polynote: an IDE-inspired polyglot notebook

Netflix Tech

OCTOBER 23, 2019

Jeremy Smith , Jonathan Indig , Faisal Siddiqi We are pleased to announce the open-source launch of Polynote : a new, polyglot notebook with first-class Scala support, Apache Spark integration, multi-language interoperability including Scala, Python, and SQL, as-you-type autocomplete, and more. Polynote provides data scientists and machine learning researchers with a notebook environment that allows them the freedom to seamlessly integrate our JVM-based ML platform ?

Scala

Scala Machine Learning Python Coding

How We Analyze and Visualize Kubernetes Events in Real Time at Rockset

Rockset

OCTOBER 1, 2019

Kubernetes at Rockset At Rockset, we use Kubernetes (k8s) for cluster orchestration. It runs all our production microservices — from our ingest workers to our query-serving tier. In addition to hosting all the production infrastructure, each engineer has their own Kubernetes namespace and dedicated resources that we use to locally deploy and test new versions of code and configuration.

SQL

SQL Systems Metadata Accessible

The Big Payoff of Application Analytics

Outdated or absent analytics won’t cut it in today’s data-driven applications – not for your end users, your development team, or your business. That’s what drove the five companies in this e-book to change their approach to analytics. Download this e-book to learn about the unique problems each company faced and how they achieved huge returns beyond expectation by embedding analytics into applications.

Building

A Renewed Focus on User Experience at Teradata

Teradata

OCTOBER 15, 2019

Find out how our UX team is going to radically simplify the Teradata user experience. To be unveiled at Teradata Universe!

The 4 Quadrants of Data Science Skills and 7 Principles of Marie Kondo approach to Data Visualization

KDnuggets

OCTOBER 7, 2019

As a data scientist, your most important skill is creating meaningful visualizations to disseminate knowledge and impact your organization or client. These seven principals will guide you toward developing charts with clarity, as exemplified with data from a recent KDnuggets poll.

Data Science

Data Science Data Java Python

A European Approach to Master’s Degrees in Data Science

KDnuggets

OCTOBER 1, 2019

Data science education in Europe has been reevaluated and new recommendations are leading the way to the next generation of data science Master's courses to better support and train students.

Data Science

Data Science Education Data

How to Become a (Good) Data Scientist – Beginner Guide

KDnuggets

OCTOBER 16, 2019

A guide covering the things you should learn to become a data scientist, including the basics of business intelligence, statistics, programming, and machine learning.

Business Intelligence

Business Intelligence Machine Learning Programming Data

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

Certification

How YouTube is Recommending Your Next Video

KDnuggets

OCTOBER 21, 2019

If you are interested in learning more about the latest Youtube recommendation algorithm paper, read this post for details on its approach and improvements.

Algorithm

Algorithm IT Systems Engineering

Introduction to Natural Language Processing (NLP)

KDnuggets

OCTOBER 25, 2019

Have you ever wondered how your personal assistant (e.g: Siri) is built? Do you want to build your own? Perfect! Let’s talk about Natural Language Processing.

Process

Process Building

Three Things to Know About Reinforcement Learning

KDnuggets

OCTOBER 14, 2019

As an engineer, scientist, or researcher, you may want to take advantage of this new and growing technology, but where do you start? The best place to begin is to understand what the concept is, how to implement it, and whether it’s the right approach for a given problem.

Technology

Technology Engineering IT

Artificial Intelligence: Salaries Heading Skyward

KDnuggets

OCTOBER 17, 2019

While the average salary for a Software Engineer is around $100,000 to $150,000, to make the big bucks you want to be an AI or Machine Learning (Specialist/Scientist/Engineer.).

Machine Learning

Machine Learning Software Engineer Software Engineering Engineering

Driving Business Impact for PMs

Speaker: Jon Harmer, Product Manager for Google Cloud

Move from feature factory to customer outcomes and drive impact in your business! This session will provide you with a comprehensive set of tools to help you develop impactful products by shifting from output-based thinking to outcome-based thinking. You will deepen your understanding of your customers and their needs as well as identifying and de-risking the different kinds of hypotheses built into your roadmap.

Certification

October, 2019

Build Maintainable And Testable Data Applications With Dagster

Everything a Data Scientist Should Know About Data Management

Webinars

Trending Sources

Evolving Michelangelo Model Representation for Flexibility at Scale

Webinars

Delta: A Data Synchronization and Enrichment Platform

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Why You Should Learn Data Engineering

Teradata is Moving the Cloud Forward

Data Orchestration For Hybrid Cloud Analytics

Sign up to get articles personalized to your interests!

More Trending

Data Orchestration For Hybrid Cloud Analytics

10 Free Top Notch Natural Language Processing Courses

Evolving Michelangelo Model Representation for Flexibility at Scale

ML Platform Meetup: Infra for Contextual Bandits and Reinforcement Learning

Go From Total Beginner to Data Engineer with Our New Path

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

How to Deliver Better Business Outcomes with Predictive Modeling

Keeping Your Data Warehouse In Order With DataForm

Choosing the Right Clustering Algorithm for your Dataset

Machine Learning and Real-Time Analytics in Apache Kafka Applications

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

ML Platform Meetup: Infra for Contextual Bandits and Reinforcement Learning

ALL SYSTEMS GO.

Embracing the Darkness: Vantage Developer

Fast Analytics On Semi-Structured And Structured Data In The Cloud

Entity Resolution Checklist: What to Consider When Evaluating Options

Data Preparation for Machine learning 101: Why it’s important and how to do it

How to Run Apache Kafka with Spring Boot on Pivotal Application Service (PAS)

Open-sourcing Polynote: an IDE-inspired polyglot notebook

How We Analyze and Visualize Kubernetes Events in Real Time at Rockset

The Big Payoff of Application Analytics

A Renewed Focus on User Experience at Teradata

The 4 Quadrants of Data Science Skills and 7 Principles of Marie Kondo approach to Data Visualization

A European Approach to Master’s Degrees in Data Science

How to Become a (Good) Data Scientist – Beginner Guide

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

How YouTube is Recommending Your Next Video

Introduction to Natural Language Processing (NLP)

Three Things to Know About Reinforcement Learning

Artificial Intelligence: Salaries Heading Skyward

Driving Business Impact for PMs

Stay Connected