October, 2019

article thumbnail

Build Maintainable And Testable Data Applications With Dagster

Data Engineering Podcast

Summary Despite the fact that businesses have relied on useful and accurate data to succeed for decades now, the state of the art for obtaining and maintaining that information still leaves much to be desired. In an effort to create a better abstraction for building data applications Nick Schrock created Dagster. In this episode he explains his motivation for creating a product for data management, how the programming model simplifies the work of building testable and maintainable pipelines, and

Building 100
article thumbnail

Everything a Data Scientist Should Know About Data Management

KDnuggets

For full-stack data science mastery, you must understand data management along with all the bells and whistles of machine learning. This high-level overview is a road map for the history and current state of the expansive options for data storage and infrastructure solutions.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Evolving Michelangelo Model Representation for Flexibility at Scale

Uber Engineering

Michelangelo , Uber’s machine learning (ML) platform, supports the training and serving of thousands of models in production across the company. Designed to cover the end-to-end ML workflow, the system currently supports classical machine learning, time series forecasting, and deep … The post Evolving Michelangelo Model Representation for Flexibility at Scale appeared first on Uber Engineering Blog.

article thumbnail

Delta: A Data Synchronization and Enrichment Platform

Netflix Tech

Part I: Overview Andreas Andreakis , Falguni Jhaveri , Ioannis Papapanagiotou , Mark Cho , Poorna Reddy , Tongliang Liu Overview It is a commonly observed pattern for applications to utilize multiple datastores where each is used to serve a specific need such as storing the canonical form of data (MySQL etc.), providing advanced search capabilities (ElasticSearch etc.), caching (Memcached etc.), and more.

article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

Why You Should Learn Data Engineering

Dataquest

Exciting news: we just launched a totally revamped Data Engineering path that offers from-scratch training for anyone who wants to become a data engineer or learn some data engineering skills. Looks cool, right? But it begs the question: why learn data engineering in the first place? Typically, data science teams are comprised of data analysts, data scientists, and data engineers.

article thumbnail

Teradata is Moving the Cloud Forward

Teradata

With four new offerings, Teradata is helping companies move from analytics to answers wherever they are on their cloud journey. Read more.

Cloud 66

More Trending

article thumbnail

10 Free Top Notch Natural Language Processing Courses

KDnuggets

Are you looking to learn natural language processing? This collection of 10 free top notch courses will allow you to do just that, with something for every approach to learning NLP and its varied topics.

Process 123
article thumbnail

Evolving Michelangelo Model Representation for Flexibility at Scale

Uber Engineering

Michelangelo , Uber’s machine learning (ML) platform, supports the training and serving of thousands of models in production across the company. Designed to cover the end-to-end ML workflow, the system currently supports classical machine learning, time series forecasting, and deep … The post Evolving Michelangelo Model Representation for Flexibility at Scale appeared first on Uber Engineering Blog.

article thumbnail

ML Platform Meetup: Infra for Contextual Bandits and Reinforcement Learning

Netflix Tech

Faisal Siddiqi Infrastructure for Contextual Bandits and Reinforcement Learning?—? theme of the ML Platform meetup hosted at Netflix, Los Gatos on Sep 12, 2019. Contextual and Multi-armed Bandits enable faster and adaptive alternatives to traditional A/B Testing. They enable rapid learning and better decision-making for product rollouts. Broadly speaking, these approaches can be seen as a stepping stone to full-on Reinforcement Learning (RL) with closed-loop, on-policy evaluation and model objec

article thumbnail

Go From Total Beginner to Data Engineer with Our New Path

Dataquest

We’ve got some really exciting news: we’ve just launched a total revamp of our Data Engineering learning path ! This revamped path is designed to be more like our other course paths. You can start it even if you have no prior experience with coding , and it’ll take you from total beginner to experienced practitioner with all of the core skills needed to become a data engineer.

article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

How to Deliver Better Business Outcomes with Predictive Modeling

Teradata

Predict the future faster with predictive modeling. Learn more about use cases and how to get more value out of your data.

Data 69
article thumbnail

Keeping Your Data Warehouse In Order With DataForm

Data Engineering Podcast

Summary Managing a data warehouse can be challenging, especially when trying to maintain a common set of patterns. Dataform is a platform that helps you apply engineering principles to your data transformations and table definitions, including unit testing SQL scripts, defining repeatable pipelines, and adding metadata to your warehouse to improve your team’s communication.

article thumbnail

Choosing the Right Clustering Algorithm for your Dataset

KDnuggets

Applying a clustering algorithm is much easier than selecting the best one. Each type offers pros and cons that must be considered if you’re striving for a tidy cluster structure.

Algorithm 121
article thumbnail

Machine Learning and Real-Time Analytics in Apache Kafka Applications

Confluent

The relationship between Apache Kafka® and machine learning (ML) is an interesting one that I’ve written about quite a bit in How to Build and Deploy Scalable Machine Learning in […].

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

ML Platform Meetup: Infra for Contextual Bandits and Reinforcement Learning

Netflix Tech

Faisal Siddiqi Infrastructure for Contextual Bandits and Reinforcement Learning?—? theme of the ML Platform meetup hosted at Netflix, Los Gatos on Sep 12, 2019. Contextual and Multi-armed Bandits enable faster and adaptive alternatives to traditional A/B Testing. They enable rapid learning and better decision-making for product rollouts. Broadly speaking, these approaches can be seen as a stepping stone to full-on Reinforcement Learning (RL) with closed-loop, on-policy evaluation and model objec

article thumbnail

ALL SYSTEMS GO.

Preset

Preset Announcement

Systems 40
article thumbnail

Embracing the Darkness: Vantage Developer

Teradata

With our renewed focus on user experience, we’re applying user-centered design principles & conducting ethnographic research on key personas, starting with developers.

article thumbnail

Fast Analytics On Semi-Structured And Structured Data In The Cloud

Data Engineering Podcast

Summary The process of exposing your data through a SQL interface has many possible pathways, each with their own complications and tradeoffs. One of the recent options is Rockset, a serverless platform for fast SQL analytics on semi-structured and structured data. In this episode CEO Venkat Venkataramani and SVP of Product Shruti Bhat explain the origins of Rockset, how it is architected to allow for fast and flexible SQL analytics on your data, and how their serverless platform can save you th

article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

Data Preparation for Machine learning 101: Why it’s important and how to do it

KDnuggets

As data scientists who are the brains behind the AI-based innovations, you need to understand the significance of data preparation to achieve the desired level of cognitive capability for your models. Let’s begin.

article thumbnail

How to Run Apache Kafka with Spring Boot on Pivotal Application Service (PAS)

Confluent

This tutorial describes how to set up a sample Spring Boot application in Pivotal Application Service (PAS), which consumes and produces events to an Apache Kafka ® cluster running in Pivotal Container Service (PKS). With this tutorial, you can set up your PAS and PKS configurations so that they work with Kafka. For a tutorial on how to set up a Kafka cluster in PKS, please see How to Deploy Confluent Platform on Pivotal Container Service (PKS) with Confluent Operator.

Kafka 18
article thumbnail

Open-sourcing Polynote: an IDE-inspired polyglot notebook

Netflix Tech

Jeremy Smith , Jonathan Indig , Faisal Siddiqi We are pleased to announce the open-source launch of Polynote : a new, polyglot notebook with first-class Scala support, Apache Spark integration, multi-language interoperability including Scala, Python, and SQL, as-you-type autocomplete, and more. Polynote provides data scientists and machine learning researchers with a notebook environment that allows them the freedom to seamlessly integrate our JVM-based ML platform ?

Scala 93
article thumbnail

How We Analyze and Visualize Kubernetes Events in Real Time at Rockset

Rockset

Kubernetes at Rockset At Rockset, we use Kubernetes (k8s) for cluster orchestration. It runs all our production microservices — from our ingest workers to our query-serving tier. In addition to hosting all the production infrastructure, each engineer has their own Kubernetes namespace and dedicated resources that we use to locally deploy and test new versions of code and configuration.

SQL 40
article thumbnail

The Big Payoff of Application Analytics

Outdated or absent analytics won’t cut it in today’s data-driven applications – not for your end users, your development team, or your business. That’s what drove the five companies in this e-book to change their approach to analytics. Download this e-book to learn about the unique problems each company faced and how they achieved huge returns beyond expectation by embedding analytics into applications.

article thumbnail

A Renewed Focus on User Experience at Teradata

Teradata

Find out how our UX team is going to radically simplify the Teradata user experience. To be unveiled at Teradata Universe!

65
article thumbnail

The 4 Quadrants of Data Science Skills and 7 Principles of Marie Kondo approach to Data Visualization

KDnuggets

As a data scientist, your most important skill is creating meaningful visualizations to disseminate knowledge and impact your organization or client. These seven principals will guide you toward developing charts with clarity, as exemplified with data from a recent KDnuggets poll.

article thumbnail

A European Approach to Master’s Degrees in Data Science

KDnuggets

Data science education in Europe has been reevaluated and new recommendations are leading the way to the next generation of data science Master's courses to better support and train students.

article thumbnail

How to Become a (Good) Data Scientist – Beginner Guide

KDnuggets

A guide covering the things you should learn to become a data scientist, including the basics of business intelligence, statistics, programming, and machine learning.

article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

How YouTube is Recommending Your Next Video

KDnuggets

If you are interested in learning more about the latest Youtube recommendation algorithm paper, read this post for details on its approach and improvements.

Algorithm 123
article thumbnail

Introduction to Natural Language Processing (NLP)

KDnuggets

Have you ever wondered how your personal assistant (e.g: Siri) is built? Do you want to build your own? Perfect! Let’s talk about Natural Language Processing.

Process 122
article thumbnail

Three Things to Know About Reinforcement Learning

KDnuggets

As an engineer, scientist, or researcher, you may want to take advantage of this new and growing technology, but where do you start? The best place to begin is to understand what the concept is, how to implement it, and whether it’s the right approach for a given problem.

article thumbnail

Artificial Intelligence: Salaries Heading Skyward

KDnuggets

While the average salary for a Software Engineer is around $100,000 to $150,000, to make the big bucks you want to be an AI or Machine Learning (Specialist/Scientist/Engineer.).

article thumbnail

Driving Business Impact for PMs

Speaker: Jon Harmer, Product Manager for Google Cloud

Move from feature factory to customer outcomes and drive impact in your business! This session will provide you with a comprehensive set of tools to help you develop impactful products by shifting from output-based thinking to outcome-based thinking. You will deepen your understanding of your customers and their needs as well as identifying and de-risking the different kinds of hypotheses built into your roadmap.