Thu.Jul 28, 2022

article thumbnail

K-nearest Neighbors in Scikit-learn

KDnuggets

Learn about the k-nearest neighbours algorithm, one of the most prominent workhorse machine learning algorithms there is, and how to implement it using Scikit-learn in Python.

Algorithm 108
article thumbnail

Modern Data Flow: A Better Way of Building Data Pipelines

Confluent

Complete guide to data pipelines, data integration, and modern data flow, the key to next generation, data-driven applications, systems, and organizations.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

What is Text Classification?

KDnuggets

We will define text classification, how it works, some of its most known algorithms, and provide data sets that might help start your text classification journey.

Algorithm 108
article thumbnail

MongoDB CDC: When to Use Kafka, Debezium, Change Streams and Rockset

Rockset

MongoDB has grown from a basic JSON key-value store to one of the most popular NoSQL database solutions in use today. It is widely supported and provides flexible JSON document storage at scale. It also provides native querying and analytics capabilities. These attributes have caused MongoDB to be widely adopted especially alongside JavaScript web applications.

MongoDB 52
article thumbnail

Beyond the Basics of A/B Tests: Innovative Experimentation Tactics You Need to Know as a Data or Product Professional

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

How do I do that in Python?

KDnuggets

This book from Manning is full of techniques and best practices for writing readable and maintainable Python code, with careful cross-referencing that reveals how the same concept can be used in different contexts.

Python 102
article thumbnail

Data Contracts and 4 Other Ways to Overcome Schema Changes

Monte Carlo

There are virtually an unlimited number of ways data can break. It could be a bad JOIN statement, an untriggered Airflow job, or even just someone at a third-party provider who didn’t feel like hitting the send button that day. But perhaps one of the most common reasons for data quality challenges are software feature updates and other changes made upstream by software engineers.

More Trending

article thumbnail

What is Data Lineage?

Databand.ai

What is Data Lineage? Niv Sluzki 2022-07-28 10:20:02 The term “data lineage” has been thrown around a lot over the last few years. What started as an idea of connecting between datasets quickly became a very confusing term that now gets misused often. It’s time to put order to the chaos and dig deep into what it really is. Because the answer matters quite a lot.

article thumbnail

Be prepared to manage the threat with an MS in Cybersecurity from Bay Path University

KDnuggets

Bay Path’s Master’s in Cybersecurity prepares students to step into the workforce and assume immediate responsibility for the management and oversight of such systems.