Sat.Sep 28, 2019 - Fri.Oct 04, 2019

article thumbnail

Ship Faster With An Opinionated Data Pipeline Framework

Data Engineering Podcast

Summary Building an end-to-end data pipeline for your machine learning projects is a complex task, made more difficult by the variety of ways that you can structure it. Kedro is a framework that provides an opinionated workflow that lets you focus on the parts that matter, so that you don’t waste time on gluing the steps together. In this episode Tom Goldenberg explains how it works, how it is being used at Quantum Black for customer projects, and how it can help you structure your own.

article thumbnail

Choosing the Right Clustering Algorithm for your Dataset

KDnuggets

Applying a clustering algorithm is much easier than selecting the best one. Each type offers pros and cons that must be considered if you’re striving for a tidy cluster structure.

Algorithm 121
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Teradata Certification Program Embraces Vantage

Teradata

The Teradata Certification program is celebrating its 20th anniversary! Find out how it can advance your career by making you a certified expert on Vantage.

article thumbnail

Kafka Summit SF 2019: Day 1 Recap

Confluent

Day one of the event, summarized for your convenience. They say you never forget your first Kafka Summit. Mine was in New York City in 2017, and it had, what, 300 people? Today we welcomed nearly 2,000 to a giant ballroom in San Francisco. There were laser beams. There were two Tony Stark allusions in the first 60 seconds. And most importantly, there was a vibrant and ever-growing community present.

Kafka 17
article thumbnail

Get Better Network Graphs & Save Analysts Time

Many organizations today are unlocking the power of their data by using graph databases to feed downstream analytics, enahance visualizations, and more. Yet, when different graph nodes represent the same entity, graphs get messy. Watch this essential video with Senzing CEO Jeff Jonas on how adding entity resolution to a graph database condenses network graphs to improve analytics and save your analysts time.

article thumbnail

How We Analyze and Visualize Kubernetes Events in Real Time at Rockset

Rockset

Kubernetes at Rockset At Rockset, we use Kubernetes (k8s) for cluster orchestration. It runs all our production microservices — from our ingest workers to our query-serving tier. In addition to hosting all the production infrastructure, each engineer has their own Kubernetes namespace and dedicated resources that we use to locally deploy and test new versions of code and configuration.

SQL 40
article thumbnail

Data Preparation for Machine learning 101: Why it’s important and how to do it

KDnuggets

As data scientists who are the brains behind the AI-based innovations, you need to understand the significance of data preparation to achieve the desired level of cognitive capability for your models. Let’s begin.

More Trending

article thumbnail

How to Deploy Confluent Platform on Pivotal Container Service (PKS) with Confluent Operator

Confluent

This tutorial describes how to set up an Apache Kafka ® cluster on Enterprise Pivotal Container Service (Enterprise PKS) using Confluent Operator , which allows you to deploy and run Confluent Platform at scale on virtually any Kubernetes platform, including Pivotal Container Service (PKS). With Enterprise PKS , you can deploy, scale, patch, and upgrade all the Kubernetes clusters in your system without downtime.

Kafka 16
article thumbnail

A European Approach to Master’s Degrees in Data Science

KDnuggets

Data science education in Europe has been reevaluated and new recommendations are leading the way to the next generation of data science Master's courses to better support and train students.

article thumbnail

How AI will transform healthcare (and can it fix the US healthcare system?)

KDnuggets

This thorough review focuses on the impact of AI, 5G, and edge computing on the healthcare sector in the 2020s as well as a look at quantum computing's potential impact on AI, healthcare, and financial services.

article thumbnail

The Last SQL Guide for Data Analysis You’ll Ever Need

KDnuggets

This is it: the last SQL guide for data analysis you'll ever need! OK, maybe it’s actually the first. But it’ll give you a solid head start.

article thumbnail

Understanding User Needs and Satisfying Them

Speaker: Scott Sehlhorst

We know we want to create products which our customers find to be valuable. Whether we label it as customer-centric or product-led depends on how long we've been doing product management. There are three challenges we face when doing this. The obvious challenge is figuring out what our users need; the non-obvious challenges are in creating a shared understanding of those needs and in sensing if what we're doing is meeting those needs.

article thumbnail

DeepMind Has Quietly Open Sourced Three New Impressive Reinforcement Learning Frameworks

KDnuggets

Three new releases that will help researchers streamline the implementation of reinforcement learning programs.

article thumbnail

Overcoming Deep Learning Stumbling Blocks

KDnuggets

Find out what was presented at the 6th annual Deep Learning Summit in London where industry leaders, academics, researchers, and innovative startups presenting the latest technological advancements and industry application methods in the field of deep learning.

article thumbnail

Training a Machine Learning Engineer

KDnuggets

There is no clear outline on how to study Machine Learning/Deep Learning due to which many individuals apply all the possible algorithms that they have heard of and hope that one of implemented algorithms work for their problem in hand. Below, I've listed out some of the steps that one should adopt while solving a machine learning problem.

article thumbnail

Sentiment and Emotion Analysis for Beginners: Types and Challenges

KDnuggets

There are three types of emotion AI, and their combinations. In this article, I’ll briefly go through these three types and the challenges of their real-life applications.

91
article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

Know Your Data: Part 1

KDnuggets

This article will introduce the different type of data sets, data object and attributes.

Data 123
article thumbnail

Will Machine Learning End Retail? Data Science Seattle Oct 17, 2019

KDnuggets

In advance of the Data Science Salon taking place in Seattle on Oct 17, we asked our speakers to shed some light on how Artificial Intelligence and Machine Learning are impacting one of America’s most disruptive industries. Read for more insight, and then register with KDnuggets exclusive link for 20% off tickets.

article thumbnail

Clustering Metrics Better Than the Elbow Method

KDnuggets

We show what metric to use for visualizing and determining an optimal number of clusters much better than the usual practice — elbow method.

92
article thumbnail

Research Guide for Neural Architecture Search

KDnuggets

In this guide, we will explore a range of research papers that have sought to solve the challenging task of automating neural network design.

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

Multi-Task Learning – ERNIE 2.0: State-of-the-Art NLP Architecture Intuitively Explained

KDnuggets

The tech giant Baidu unveiled its state-of-the-art NLP architecture ERNIE 2.0 earlier this year, which scored significantly higher than XLNet and BERT on all tasks in the GLUE benchmark. This major breakthrough in NLP takes advantage of a new innovation called “Continual Incremental Multi-Task Learning”.

article thumbnail

5 Fundamental AI Principles

KDnuggets

While AI may appear magical at times, these five principles will help guide you to avoid pitfalls when leveraging this tech.

Data 80
article thumbnail

6 Must See Deep Learning Experts at ODSC West 2019 – 20% Off Ends Friday

KDnuggets

You won’t want to miss the opportunity to learn about the future of deep learning first-hand at ODSC West in San Francisco, Oct 29 - Nov 1. So don’t forget to register soon for 20% off.

article thumbnail

KDnuggets™ News 19:n37, Oct 2: The Future of Analytics & Data Science! Starting NLP with spaCy & Python

KDnuggets

This week, find out what the future of analytics and data science holds; get an introduction to spaCy for natural language processing; find out how to use time series analysis for baseball; get to know your data; read 6 bits of advice for data scientists; and much, much more!

article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

Top KDnuggets tweets, Sep 25 – Oct 01: Natural Language in Python using spaCy: An Introduction

KDnuggets

Also: Top KDnuggets tweets, Sep 18-24: Python Libraries for Interpretable Machine Learning; Scikit-Learn: A silver bullet for basic ML; Automatic Version Control for Data Scientists; My journey path from a Software Engineer to BI Specialist to a Data Scientist.

Python 53
article thumbnail

Recreating Imagination: DeepMind Builds Neural Networks that Spontaneously Replay Past Experiences

KDnuggets

DeepMind researchers created a model to be able to replay past experiences in a way that simulate the mechanisms in the hippocampus.

article thumbnail

Statistical Thinking for Industrial Problem Solving: a free online course

KDnuggets

This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.

article thumbnail

Why Scrapinghub’s AutoExtract Chose Confluent Cloud for Their Apache Kafka Needs

Confluent

We recently launched a new artificial intelligence (AI) data extraction API called Scrapinghub AutoExtract , which turns article and product pages into structured data. At Scrapinghub, we specialize in web data extraction , and our products empower everyone from programmers to CEOs to extract web data quickly and effectively. Example of article extraction on Introducing a Cloud-Native Experience for Apache Kafka ® in Confluent Cloud.

Kafka 15
article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

Kafka Summit San Francisco 2019: Day 2 Recap

Confluent

If you looked at the Kafka Summits I’ve been a part of as a sequence of immutable events (and they are, unless you know something about time I don’t), it would look like this: New York City 2017, San Francisco 2017, London 2018, San Francisco 2018, New York City 2019, London 2019, San Francisco 2019. That makes this the seventh Summit I’ve attended.

Kafka 14
article thumbnail

Top Stories, Sep 23-29: The Future of Analytics and Data Science; 5 Famous Deep Learning Courses/Schools of 2019

KDnuggets

Also: 12 Deep Learning Researchers and Leaders; Natural Language in Python using spaCy: An Introduction; A Single Function to Streamline Image Classification with Keras; Which Data Science Skills are core and which are hot/emerging ones?; 6 bits of advice for Data Scientists.

article thumbnail

Free Apache Kafka as a Service with Confluent Cloud

Confluent

Go from zero to production on Apache Kafka ® without talking to sales reps or building infrastructure. Apache Kafka is the standard for event-driven applications. But it’s not without its challenges, and the ops burden can be heavy. Organizations that successfully build and run their own Kafka environment must make significant investments in engineering and operations to account for failover and security.

Kafka 19