Top Data Engineering Digest MySQL PostgreSQL Content for Week of Dec 14

Sat.Dec 14, 2019 - Fri.Dec 20, 2019

Uber Infrastructure in 2019: Improving Reliability, Driving Customer Satisfaction

Uber Engineering

DECEMBER 19, 2019

Every day around the world, millions of trips take place across the Uber network, giving users more reliable transportation through ridesharing, bikes, and scooters, drivers and truckers additional opportunities to earn, employees and employers more convenient business travel, and hungry … The post Uber Infrastructure in 2019: Improving Reliability, Driving Customer Satisfaction appeared first on Uber Engineering Blog.

Transportation

Transportation Engineering Architecture

Interpretability part 3: LIME and SHAP

KDnuggets

DECEMBER 19, 2019

The third part in a series on leveraging techniques to take a look inside the black box of AI, this guide considers methods that try to explain each prediction instead of establishing a global explanation.

Join 16,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Trending Sources

Solving Data Lineage Tracking And Data Discovery At WeWork

Data Engineering Podcast

DECEMBER 16, 2019

Summary Building clean datasets with reliable and reproducible ingestion pipelines is completely useless if it’s not possible to find them and understand their provenance. The solution to discoverability and tracking of data lineage is to incorporate a metadata repository into your data platform. The metadata repository serves as a data catalog and a means of reporting on the health and status of your datasets when it is properly integrated into the rest of your tools.

Metadata

Metadata PostgreSQL Datasets Data Warehouse

Webinars

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

How Dataquest Made the Difference for Stacey’s Data Job

Dataquest

DECEMBER 18, 2019

Today, Stacey Ustian is a data engineer. But the path that led her here wasn’t always easy, and there were a few bumps and twists along the way. Her journey to data science started in a rather unusual place: the law library. After earning her Master’s degree in Library and Information Science, Stacey had taken a job working in the library of a law firm.

SQL

SQL Python Data Engineering Data Engineer

Get Better Network Graphs & Save Analysts Time

Many organizations today are unlocking the power of their data by using graph databases to feed downstream analytics, enahance visualizations, and more. Yet, when different graph nodes represent the same entity, graphs get messy. Watch this essential video with Senzing CEO Jeff Jonas on how adding entity resolution to a graph database condenses network graphs to improve analytics and save your analysts time.

Database

Uber’s Data Platform in 2019: Transforming Information to Intelligence

Uber Engineering

DECEMBER 17, 2019

Uber’s busy 2019 included our billionth delivery of an Uber Eats order, 24 million miles covered by bike and scooter riders on our platform, and trips to top destinations such as the Empire State Building, the Eiffel Tower, and the … The post Uber’s Data Platform in 2019: Transforming Information to Intelligence appeared first on Uber Engineering Blog.

Data

Data Engineering Building Big Data

The 4 fastest ways not to get hired as a data scientist

KDnuggets

DECEMBER 18, 2019

Ready to try to get hired as a data scientist for the first time? Avoiding these common mistakes won’t guarantee an offer, but not avoiding them is a sure fire way for your application to be tossed into the trash bin.

Data

Keeping a Lid on Concurrency within the Vantage Platform

Teradata

DECEMBER 18, 2019

Carrie Ballinger discusses the techniques for managing concurrency inside the Advanced SQL Engine and the benefits provided. Read more.

SQL

SQL Engineering Management

More Trending

Keeping a Lid on Concurrency within the Vantage Platform

Teradata

DECEMBER 18, 2019

Carrie Ballinger discusses the techniques for managing concurrency inside the Advanced SQL Engine and the benefits provided. Read more.

SQL

SQL Engineering Management

DBLog: A Generic Change-Data-Capture Framework

Netflix Tech

DECEMBER 17, 2019

Andreas Andreakis, Ioannis Papapanagiotou Continue reading on Netflix TechBlog ».

Data

Data MySQL Database

Superset Announces Elasticsearch Support!

Preset

DECEMBER 15, 2019

Announcing Elasticsearch in Superset, powered by a new open-source Python library from Preset

Python

Google’s New Explainable AI Service

KDnuggets

DECEMBER 20, 2019

Google has started offering a new service for “explainable AI” or XAI, as it is fashionably called. Presently offered tools are modest, but the intent is in the right direction.

6 Practices to Realize a Long-Term Data Vision Through Near-Term Work

Teradata

DECEMBER 16, 2019

Enterprises either have no data strategy at all or an over-complicated one that under delivers. Find out how to create an effective data strategy by striking balance.

Data

Understanding User Needs and Satisfying Them

Speaker: Scott Sehlhorst

We know we want to create products which our customers find to be valuable. Whether we label it as customer-centric or product-led depends on how long we've been doing product management. There are three challenges we face when doing this. The obvious challenge is figuring out what our users need; the non-obvious challenges are in creating a shared understanding of those needs and in sensing if what we're doing is meeting those needs.

Certification

Apache Kafka Producer Improvements with the Sticky Partitioner

Confluent

DECEMBER 18, 2019

The amount of time it takes for a message to move through a system plays a big role in the performance of distributed systems like Apache Kafka®. In Kafka, the […].

Kafka

Kafka Systems IT

DBLog: A Generic Change-Data-Capture Framework

Netflix Tech

DECEMBER 17, 2019

Andreas Andreakis , Ioannis Papapanagiotou Overview Change-Data-Capture (CDC) allows capturing committed changes from a database in real-time and propagating those changes to downstream consumers [1][2]. CDC is becoming increasingly popular for use cases that require keeping multiple heterogeneous datastores in sync (like MySQL and ElasticSearch) and addresses challenges that exist with traditional techniques like dual-writes and distributed transactions [3][4].

MySQL

MySQL PostgreSQL Database Transportation

Alternative Cloud Hosted Data Science Environments

KDnuggets

DECEMBER 19, 2019

Over the years new alternative providers have risen to provided a solitary data science environment hosted on the cloud for data scientist to analyze, host and share their work.

Data Science

Data Science Cloud Data Cloud Computing

Automatic Text Summarization in a Nutshell

KDnuggets

DECEMBER 18, 2019

Marketing scientist Kevin Gray asks Dr. Anna Farzindar of the University of Southern California about Automatic Text Summarization and the various ways it is used.

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

Data Science

The Ultimate Guide to Model Retraining

KDnuggets

DECEMBER 16, 2019

Once you have deployed your machine learning model into production, differences in real-world data will result in model drift. So, retraining and redeploying will likely be required. In other words, deployment should be treated as a continuous process. This guide defines model drift and how to identify it, and includes approaches to enable model training.

Machine Learning

Machine Learning Process Data IT

How to Convert an RGB Image to Grayscale

KDnuggets

DECEMBER 18, 2019

This post is about working with a mixture of color and grayscale images and needing to transform them into a uniform format - all grayscale. We'll be working in Python using the Pillow, Numpy, and Matplotlib packages.

Python

Python Process

Industry AI, Analytics, Machine Learning, Data Science Predictions for 2020

KDnuggets

DECEMBER 16, 2019

Predictions for 2020 from a dozen innovative companies in AI, Analytics, Machine Learning, Data Science, and Data industry.

Machine Learning

Machine Learning Data Science Data

5 Ways to Apply Ethics to AI

KDnuggets

DECEMBER 19, 2019

Here are six more lessons based on real life examples that I think we should all remember as people working in machine learning, whether you’re a researcher, engineer, or a decision-maker.

Machine Learning

Machine Learning Engineering Algorithm

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

Building

The Most In Demand Tech Skills for Data Scientists

KDnuggets

DECEMBER 20, 2019

By the end of this article you’ll know which technologies are becoming more popular with employers and which are becoming less popular.

Technology

Technology Data Data Science

Let’s Build an Intelligent Chatbot

KDnuggets

DECEMBER 17, 2019

Check out this step by step approach to building an intelligent chatbot in Python.

Building

Building Python

How To “Ultralearn” Data Science: removing distractions and finding focus, Part 2

KDnuggets

DECEMBER 17, 2019

This second part in a series about how to "ultralearn" data science will guide you through several techniques to remove those distractions -- because your focus needs more focus.

Data Science

Data Science Data Education

Ontotext Platform 3.0 for Enterprise Knowledge Graphs Released

KDnuggets

DECEMBER 18, 2019

Ontotext Platform 3.0 features significant technology improvements to enable simpler and faster graph navigation, including GraphQL interfaces to make it easier for application developers to access knowledge graphs without tedious development of back-end APIs or complex SPARQL.

Accessible

Accessible Accessibility Technology IT

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

Project

How To “Ultralearn” Data Science: optimization learning, Part 3

KDnuggets

DECEMBER 20, 2019

This third part in a series about how to "ultralearn" data science will guide you through how to optimize your learning through five valuable techniques.

Data Science

Data Science Data

Building an Analytics Career at UChicago

KDnuggets

DECEMBER 17, 2019

Michael Collela describes how UChicago’s Master of Science in Analytics has helped him define his career path. Michael currently works as a data scientist at dunnhumby.

Building

Building Data

The ravages of concept drift in stream learning applications and how to deal with it

KDnuggets

DECEMBER 18, 2019

Stream data processing has gained progressive momentum with the arriving of new stream applications and big data scenarios. These streams of data evolve generally over time and may be occasionally affected by a change (concept drift). How to handle this change by using detection and adaptation mechanisms is crucial in many real-world systems.

IT Big Data Data Process Systems

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

Engineering

KDnuggets™ News 19:n48, Dec 18: Build Pipelines with Pandas Using pdpipe; AI, Analytics, ML, DS, Technology Main Developments, Key Trends; Poll on AutoML

KDnuggets

DECEMBER 18, 2019

Build Pipelines with Pandas Using pdpipe; AI, Analytics, ML, DS, Technology Main Developments, Key Trends; New Poll: Does AutoML work? Ultralearn Data Science; Python Dictionary How-To; Top stories of 2019 and more.

Technology

Technology Building Data Science Python

Xavier Amatriain’s Machine Learning and Artificial Intelligence 2019 Year-end Roundup

KDnuggets

DECEMBER 16, 2019

It is an annual tradition for Xavier Amatriain to write a year-end retrospective of advances in AI/ML, and this year is no different. Gain an understanding of the important developments of the past year, as well as insights into what expect in 2020.

Machine Learning

Machine Learning Deep Learning Healthcare IT

Top KDnuggets tweets, Dec 11-17: Idiot’s Guide to Precision, Recall and Confusion

KDnuggets

DECEMBER 20, 2019

Idiot's Guide to Precision, Recall and Confusion Matrix; 10 Free Must-Read Books for Machine Learning and Data Science; How to Speed up Pandas by 4x with one line of codes; #Math for Programmers teaches you the math you need to know.

Machine Learning

Machine Learning Data Science Coding Data

How to Build an Experimentation Culture for Data-Driven Product Development

Speaker: Margaret-Ann Seger, Head of Product, Statsig

Experimentation is often seen as an aspirational practice, especially at smaller, fast-moving companies who are strapped for time and resources. So, how can you get your team making decisions in a more data-driven way while continuing to remain lean and maintaining ship velocity? In this webinar, Margaret-Ann Seger, Head of Product at Statsig, will teach you how to build an experimentation culture from the ground-up, graduating from just getting started with data-driven development to operating

Building

Sat.Dec 14, 2019 - Fri.Dec 20, 2019

Uber Infrastructure in 2019: Improving Reliability, Driving Customer Satisfaction

Interpretability part 3: LIME and SHAP

Webinars

Trending Sources

Solving Data Lineage Tracking And Data Discovery At WeWork

Webinars

How Dataquest Made the Difference for Stacey’s Data Job

Get Better Network Graphs & Save Analysts Time

Uber’s Data Platform in 2019: Transforming Information to Intelligence

The 4 fastest ways not to get hired as a data scientist

Keeping a Lid on Concurrency within the Vantage Platform

Sign up to get articles personalized to your interests!

More Trending

Keeping a Lid on Concurrency within the Vantage Platform

DBLog: A Generic Change-Data-Capture Framework

Superset Announces Elasticsearch Support!

Google’s New Explainable AI Service

6 Practices to Realize a Long-Term Data Vision Through Near-Term Work

Understanding User Needs and Satisfying Them

Apache Kafka Producer Improvements with the Sticky Partitioner

DBLog: A Generic Change-Data-Capture Framework

Alternative Cloud Hosted Data Science Environments

Automatic Text Summarization in a Nutshell

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

The Ultimate Guide to Model Retraining

How to Convert an RGB Image to Grayscale

Industry AI, Analytics, Machine Learning, Data Science Predictions for 2020

5 Ways to Apply Ethics to AI

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

The Most In Demand Tech Skills for Data Scientists

Let’s Build an Intelligent Chatbot

How To “Ultralearn” Data Science: removing distractions and finding focus, Part 2

Ontotext Platform 3.0 for Enterprise Knowledge Graphs Released

Entity Resolution Checklist: What to Consider When Evaluating Options

How To “Ultralearn” Data Science: optimization learning, Part 3

Building an Analytics Career at UChicago

Top 2019 Stories: Top 10 Technology Trends of 2019; How to select rows and columns in Pandas

The ravages of concept drift in stream learning applications and how to deal with it

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

KDnuggets™ News 19:n48, Dec 18: Build Pipelines with Pandas Using pdpipe; AI, Analytics, ML, DS, Technology Main Developments, Key Trends; Poll on AutoML

Xavier Amatriain’s Machine Learning and Artificial Intelligence 2019 Year-end Roundup

Top KDnuggets tweets, Dec 11-17: Idiot’s Guide to Precision, Recall and Confusion

Top Stories, Dec 9-15: Machine Learning & Data Science Research Main Developments, Key Trends; Build Pipelines with Pandas Using pdpipe

How to Build an Experimentation Culture for Data-Driven Product Development

Stay Connected