Sat.Nov 02, 2019 - Fri.Nov 08, 2019

article thumbnail

Automating Your Production Dataflows On Spark

Data Engineering Podcast

Summary As data engineers the health of our pipelines is our highest priority. Unfortunately, there are countless ways that our dataflows can break or degrade that have nothing to do with the business logic or data transformations that we write and maintain. Sean Knapp founded Ascend to address the operational challenges of running a production grade and scalable Spark infrastructure, allowing data engineers to focus on the problems that power their business.

article thumbnail

10 Free Must-read Books on AI

KDnuggets

Artificial Intelligence continues to fill the media headlines while scientists and engineers rapidly expand its capabilities and applications. With such explosive growth in the field, there is a great deal to learn. Dive into these 10 free books that are must-reads to support your AI study and work.

Media 123
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Introducing Confluent Cloud on Microsoft Azure

Confluent

Today, we are proud to make Confluent Cloud available to companies leveraging the Microsoft Azure ecosystem of services, in addition to the previous rollouts on Google Cloud Platform (GCP) and […].

Cloud 80
article thumbnail

Three Distinctly Different Customer Experience Strategies

Teradata

Improving the customer experience is the top priority for CMOs. Find out what the top 3 distinct CX strategies are to drive customer loyalty.

40
article thumbnail

Beyond the Basics of A/B Tests: Innovative Experimentation Tactics You Need to Know as a Data or Product Professional

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

GraphQL Search Indexing

Netflix Tech

by Artem Shtatnov and Ravi Srinivas Ranganathan Almost a year ago we described our learnings from adopting GraphQL on the Netflix Marketing Tech team. We have a lot more to share since then! There are plenty of existing resources describing how to express a search query in GraphQL and paginate the results. This post looks at the other side of search: how to index data and make it searchable.

Kafka 96
article thumbnail

How to Create a Vocabulary for NLP Tasks in Python

KDnuggets

This post will walkthrough a Python implementation of a vocabulary class for storing processed text data and related metadata in a manner useful for subsequently performing NLP tasks.

Python 120

More Trending

article thumbnail

Power to the People: Vantage Analyst in Action

Teradata

The people who drive real business innovation in your org may not all be coders. With Vantage Analyst, they can explore data to uncover insights that may lead to that next big thing.

Data 11
article thumbnail

Tutorial: Building An Analytics Data Pipeline In Python

Dataquest

If you’ve ever wanted to learn Python online with streaming data, or data that changes quickly, you may be familiar with the concept of a data pipeline. Data pipelines allow you transform data from one representation to another through a series of steps. Data pipelines are a key part of data engineering, which we teach in our new Data Engineer Path.

article thumbnail

Customer Segmentation Using K Means Clustering

KDnuggets

Customer Segmentation can be a powerful means to identify unsatisfied customer needs. This technique can be used by companies to outperform the competition by developing uniquely appealing products and services.

Python 114
article thumbnail

GraphQL Search Indexing

Netflix Tech

by Artem Shtatnov and Ravi Srinivas Ranganathan Almost a year ago we described our learnings from adopting GraphQL on the Netflix Marketing Tech team. We have a lot more to share since then! There are plenty of existing resources describing how to express a search query in GraphQL and paginate the results. This post looks at the other side of search: how to index data and make it searchable.

Kafka 44
article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

How to Use Single Message Transforms in Kafka Connect

Confluent

Kafka Connect is the part of Apache Kafka® that provides reliable, scalable, distributed streaming integration between Apache Kafka and other systems. Kafka Connect has connectors for many, many systems, and […].

Kafka 74
article thumbnail

Designing Your Neural Networks

KDnuggets

Check out this step-by-step walk through of some of the more confusing aspects of neural nets to guide you to making smart decisions about your neural network architecture.

Designing 112
article thumbnail

Set Operations Applied to Pandas DataFrames

KDnuggets

In this tutorial, we show how to apply mathematical set operations (union, intersection, and difference) to Pandas DataFrames with the goal of easing the task of comparing the rows of two datasets.

Datasets 104
article thumbnail

Facebook Has Been Quietly Open Sourcing Some Amazing Deep Learning Capabilities for PyTorch

KDnuggets

The new release of PyTorch includes some impressive open source projects for deep learning researchers and developers.

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

Understanding Boxplots

KDnuggets

A boxplot. It can tell you about your outliers and what their values are. It can also tell you if your data is symmetrical, how tightly your data is grouped, and if and how your data is skewed.

Data 97
article thumbnail

Data Cleaning and Preprocessing for Beginners

KDnuggets

Careful preprocessing of data for your machine learning project is crucial. This overview describes the process of data cleaning and dealing with noise and missing data.

article thumbnail

Orchestrating Dynamic Reports in Python and R with Rmd Files

KDnuggets

Do you want to extract csv files with Python and visualize them in R? How does preparing everything in R and make conclusions with Python sound? Both are possible if you know the right libraries and techniques. Here, we’ll walk through a use-case using both languages in one analysis.

Python 89
article thumbnail

3 Reasons to attend Data Natives, 25-26 November, Berlin

KDnuggets

Data Natives is an outstanding conference that lets you meet many talented Data Scientists and Data Professionals. Find your dream company or your dream employee and level up for 2020. Use code DN19_KDNuggets_50 to save.

Data 74
article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

The Last Defense Against Another AI Winter

KDnuggets

My short answer is this: Yes, another AI Winter will be here if you don’t deploy more ML solutions. You and your Data Science teams are the last line of defense against the AI Winter. You need to solve five key challenges to keep the momentum up.

article thumbnail

Probability Learning: Maximum Likelihood

KDnuggets

The maths behind Bayes will be better understood if we first cover the theory and maths underlying another fundamental method of probabilistic machine learning: Maximum Likelihood. This post will be dedicated to explaining it.

article thumbnail

Research Guide: Advanced Loss Functions for Machine Learning Models

KDnuggets

This guide explores research centered on a variety of advanced loss functions for machine learning models.

article thumbnail

What is Data Science?

KDnuggets

Data Science is pitched as a modern and exciting job offering high satisfaction. Does its reality really live up to the hype? Here, we show what it's really like to work as a Data Scientist.

article thumbnail

How to Build an Experimentation Culture for Data-Driven Product Development

Speaker: Margaret-Ann Seger, Head of Product, Statsig

Experimentation is often seen as an aspirational practice, especially at smaller, fast-moving companies who are strapped for time and resources. So, how can you get your team making decisions in a more data-driven way while continuing to remain lean and maintaining ship velocity? In this webinar, Margaret-Ann Seger, Head of Product at Statsig, will teach you how to build an experimentation culture from the ground-up, graduating from just getting started with data-driven development to operating

article thumbnail

KDnuggets™ News 19:n42, Nov 6: 5 Statistical Traps Data Scientists Should Avoid; 10 Free Must-Read Books on AI

KDnuggets

Learn about statistical fallacies Data Scientists should avoid; New and quite amazing Deep Learning capabilities FB has been quietly open-sourcing; Top Machine Learning tools for Developers; How to build a Neural Network from scratch and more.

article thumbnail

How to Become a Successful Healthcare Data Analyst

KDnuggets

Are you interested in starting your career in the data analysis domain? Read this informative blog on how to get your career off the ground.

article thumbnail

An Eight-Step Checklist for An Analytics Project

KDnuggets

Follow these eight headings of an audit sheet that business analysts should address before submitting the results of their analytics project. One recommended approach is to rewrite each step as a question, answer it, and then attach it to your project.

Project 50
article thumbnail

Meet Neebo: The Virtual Analytics Hub

KDnuggets

Neebo is a SaaS solution that enables analytics teams to connect to, find, combine and collaborate on trusted data assets in hybrid cloud landscapes, and provides a unified access point where they can more effectively leverage all their analytics assets and knowledge. In this blog, we will highlight some of the features of Neebo and how they can completely transform the way analytics teams operate.

Cloud 49
article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

Top KDnuggets tweets, Oct 30 – Nov 05: Everything a Data Scientist Should Know About Data Management

KDnuggets

Which Data Science Skills are core and which are hot/emerging ones?; The 4 Quadrants of Data Science Skills and 7 Principles for Creating a Viral DataViz; Microsoft open sources #SandDance, a visual data exploration tool.

article thumbnail

Monitoring Models at Scale

KDnuggets

Catch this Domino webinar on monitoring models at scale, Dec 11 @ 10am PT, covering detecting changes in pattern of real-world data your models are seeing in production, tracking how model accuracy and other quality metrics are changing over time, and getting alerted when health checks fail so that resolution workflows can be triggered.

Data 46
article thumbnail

Practical Computer Vision Course with Real-Life Cases, Nov 18, Washington, DC

KDnuggets

This course, Practical Computer Vision Course with Real-Life Cases, Nov 18 in Washington, DC, will move you on the next step, providing you with practical means of solving business-specific tasks.Reserve your seat now.

article thumbnail

Top Stories, Oct 28 – Nov 3: 5 Statistical Traps Data Scientists Should Avoid; Top Machine Learning Software Tools for Developers

KDnuggets

Also: Why is Machine Learning Deployment Hard?; Data Sources 101; 5 Statistical Traps Data Scientists Should Avoid; Everything a Data Scientist Should Know About Data Management; How to Become a (Good) Data Scientist — Beginner Guide.

article thumbnail

Reimagined: Building Products with Generative AI

“Reimagined: Building Products with Generative AI” is an extensive guide for integrating generative AI into product strategy and careers featuring over 150 real-world examples, 30 case studies, and 20+ frameworks, and endorsed by over 20 leading AI and product executives, inventors, entrepreneurs, and researchers.