Sat.Sep 21, 2019 - Fri.Sep 27, 2019

article thumbnail

Open Source Object Storage For All Of Your Data

Data Engineering Podcast

Summary Object storage is quickly becoming the unifying layer for data intensive applications and analytics. Modern, cloud oriented data warehouses and data lakes both rely on the durability and ease of use that it provides. S3 from Amazon has quickly become the de-facto API for interacting with this service, so the team at MinIO have built a production grade, easy to manage storage engine that replicates that interface.

AWS 100
article thumbnail

12 Deep Learning Researchers and Leaders

KDnuggets

Our list of deep learning researchers and industry leaders are the people you should follow to stay current with this wildly expanding field in AI. From early practitioners and established academics to entrepreneurs and today’s top corporate influencers, this diverse group of individuals is leading the way into tomorrow’s deep learning landscape.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Real-Time Analytics and Monitoring Dashboards with Apache Kafka and Rockset

Confluent

In the early days, many companies simply used Apache Kafka ® for data ingestion into Hadoop or another data lake. However, Apache Kafka is more than just messaging. The significant difference today is that companies use Apache Kafka as an event streaming platform for building mission-critical infrastructures and core operations platforms. Examples include microservice architectures, mainframe integration, instant payment, fraud detection, sensor analytics, real-time monitoring, and many more—dri

Kafka 21
article thumbnail

Time Series Analysis: Looking Back to See the Future

Teradata

Time series data is found everywhere from stock prices to public health. Vantage's Machine Learning Engine helps turn that data into answers. Find out how.

article thumbnail

Get Better Network Graphs & Save Analysts Time

Many organizations today are unlocking the power of their data by using graph databases to feed downstream analytics, enahance visualizations, and more. Yet, when different graph nodes represent the same entity, graphs get messy. Watch this essential video with Senzing CEO Jeff Jonas on how adding entity resolution to a graph database condenses network graphs to improve analytics and save your analysts time.

article thumbnail

Evolving Regional Evacuation

Netflix Tech

Niosha Behnam | Demand Engineering @ Netflix At Netflix we prioritize innovation and velocity in pursuit of the best experience for our 150+ million global customers. This means that our microservices constantly evolve and change, but what doesn’t change is our responsibility to provide a highly available service that delivers 100+ million hours of daily streaming to our subscribers.

article thumbnail

5 Famous Deep Learning Courses/Schools of 2019

KDnuggets

Deep Learning is/has become the hottest skill in Data Science at the moment. There is a plethora of articles, courses, technologies, influencers and resources that we can leverage to gain the Deep Learning skills.

More Trending

article thumbnail

Why Clean Data is Critical for Your Business

Teradata

Clean data is critical to your business. Find out what three things you need to know about clean data for the health of your organization. Read more.

Data 10
article thumbnail

Scaling a Mature Data Pipeline?—?Managing Overhead

Airbnb Tech

Scaling a Mature Data Pipeline — Managing Overhead There is often a hidden performance cost tied to the complexity of data pipelines — the overhead. In this post, we will introduce its concept, and examine the techniques we use to avoid it in our data pipelines. Author : Zachary Ennenga The view from the third floor at Airbnb HQ! Background There is often a natural evolution in the tooling, organization, and technical underpinning of data pipelines.

article thumbnail

A Single Function to Streamline Image Classification with Keras

KDnuggets

We show, step-by-step, how to construct a single, generalized, utility function to pull images automatically from a directory and train a convolutional neural net model.

Utilities 118
article thumbnail

Every Company is Becoming a Software Company

Confluent

In 2011, Marc Andressen wrote an article called Why Software is Eating the World. The central idea is that any process that can be moved into software, will be. This has become a kind of shorthand for the investment thesis behind Silicon Valley’s current wave of unicorn startups. It’s also a unifying idea behind the larger set of technology trends we see today, such as machine learning, IoT, ubiquitous mobile connectivity, SaaS, and cloud computing.

article thumbnail

Understanding User Needs and Satisfying Them

Speaker: Scott Sehlhorst

We know we want to create products which our customers find to be valuable. Whether we label it as customer-centric or product-led depends on how long we've been doing product management. There are three challenges we face when doing this. The obvious challenge is figuring out what our users need; the non-obvious challenges are in creating a shared understanding of those needs and in sensing if what we're doing is meeting those needs.

article thumbnail

Automatic Version Control for Data Scientists

KDnuggets

How can you keep your machine learning models and data organized so you can collaborate effectively? Discover this new tool set available for better version control designed for the data scientist workflow.

article thumbnail

The Future of Analytics and Data Science

KDnuggets

Learn about the the current and future issues of data science and possible solutions from this interview with IADSS Co-founder, Dr. Usama Fayyad following his keynote speech at ODSC Boston 2019.

article thumbnail

6 bits of advice for Data Scientists

KDnuggets

As a data scientist, you can get lost in your daily dives into the data. Consider this advice to be certain to follow in your work for being diligent and more impactful for your organization.

Data 105
article thumbnail

Natural Language in Python using spaCy: An Introduction

KDnuggets

This article provides a brief introduction to working with natural language (sometimes called “text analytics”) in Python using spaCy and related libraries.

Python 109
article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

Customer Segmentation for R Users

KDnuggets

This article shows you how to separate your customers into distinct groups based on their purchase behavior. For the R enthusiasts out there, I demonstrated what you can do with r/stats, ggradar, ggplot2, animation, and factoextra.

97
article thumbnail

A 2019 Guide for Automatic Speech Recognition

KDnuggets

In this article, we’ll look at a couple of papers aimed at solving the problem of automated speech recognition with machine and deep learning.

article thumbnail

Why data analysts should choose stories over statistics

KDnuggets

Join the Crunch Data Conference in Budapest, Oct 16-18, with stellar speakers from companies like Facebook, Netflix and LinkedIn. Use the discount code ‘KDNuggets’ to save $100 off your conference ticket.

Data 96
article thumbnail

Using Time Series Encodings to Discover Baseball History’s Most Interesting Seasons

KDnuggets

Take me out to the ballgame! Take me out to the crowd! For the 2,829 seasons that have been played for 101 baseball teams since 1880, which seasons were unlike any others? Using SAX Encoding to recognize patterns in time series data, the most special years in baseball can be found.

Data 94
article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

What is Hierarchical Clustering?

KDnuggets

The article contains a brief introduction to various concepts related to Hierarchical clustering algorithm.

Algorithm 115
article thumbnail

The thin line between data science and data engineering

KDnuggets

Today, as companies have finally come to understand the value that data science can bring, more and more emphasis is being placed on the implementation of data science in production systems. And as these implementations have required models that can perform on larger and larger datasets in real-time, an awful lot of data science problems have become engineering problems.

article thumbnail

Webinar: Build auto-adaptive machine learning models with Kubernetes

KDnuggets

This live webinar, Oct 2 2019, will instruct data scientists and machine learning engineers how to build manage and deploy auto-adaptive machine learning models in production. Save your spot now.

article thumbnail

Getting to the Future First: How Social Data is Transforming Trend Discovery

KDnuggets

Register now for this webinar, Sep 25 @ 12 PM ET, for a clear approach on how to apply machine learning language technology to massive, unstructured data sets in order to create predictive models of what may be the next “it” ingredient, color, flavor or pack size.

article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

Top Stories, Sep 16-22: Which Data Science Skills are core and which are hot/emerging ones?

KDnuggets

Also: Explore the world of Bioinformatics with Machine Learning; My journey path from a Software Engineer to BI Specialist to a Data Scientist; 5 Beginner Friendly Steps to Learn Machine Learning and Data Science with Python; 10 Great Python Resources for Aspiring Data Scientists.

article thumbnail

Beta Distribution: What, When & How

KDnuggets

This article covers the beta distribution, and explains it using baseball batting averages.

IT 112
article thumbnail

Help Your Career Survive ‘DataGeddon’

KDnuggets

Penn State’s fully online data analytics program uniquely prepares students to advance their career in data science. Penn State offers 3 intakes every year and reviews applications on a rolling basis. GMAT or GRE waivers are available to highly qualified candidates. Learn more now.

article thumbnail

Data Quality Assessment Is Not All Roses. What Challenges Should You Be Aware Of?

KDnuggets

Of all data quality characteristics, we consider consistency and accuracy to be the most difficult ones to measure. Here, we describe the challenges that you may encounter and the ways to overcome them.

Data 70
article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

Data Mapping Using Machine Learning

KDnuggets

Data mapping is a way to organize various bits of data into a manageable and easy-to-understand system.

article thumbnail

Introducing IceCAPS: Microsoft’s Framework for Advanced Conversation Modeling

KDnuggets

The new open source framework that brings multi-task learning to conversational agents.

93
article thumbnail

AI World Conference & Expo, Oct 23-25, Boston – Updated Agenda and Special KDnuggets Discount

KDnuggets

AI World Conference & Expo has become the industry’s largest independent business event focused on the state of the practice of AI in the enterprise. Join us in Boston, Oct 23-25. Use the discount code 1968-KDN and SAVE $200.

Coding 52
article thumbnail

Top KDnuggets tweets, Sep 18-24: Python Libraries for Interpretable Machine Learning; Scikit-Learn: A silver bullet for basic ML

KDnuggets

Python Libraries for Interpretable Machine Learning; Scikit-Learn: A silver bullet for basic machine learning; I wasn't getting hired as a Data Scientist. So I sought data on who is; Which Data Science Skills are core and which are hot/emerging ones?

article thumbnail

How to Build an Experimentation Culture for Data-Driven Product Development

Speaker: Margaret-Ann Seger, Head of Product, Statsig

Experimentation is often seen as an aspirational practice, especially at smaller, fast-moving companies who are strapped for time and resources. So, how can you get your team making decisions in a more data-driven way while continuing to remain lean and maintaining ship velocity? In this webinar, Margaret-Ann Seger, Head of Product at Statsig, will teach you how to build an experimentation culture from the ground-up, graduating from just getting started with data-driven development to operating