2020

article thumbnail

20+ Machine Learning Datasets & Project Ideas

KDnuggets

Upgrading your machine learning, AI, and Data Science skills requires practice. To practice, you need to develop models with a large amount of data. Finding good datasets to work with can be challenging, so this article discusses more than 20 great datasets along with machine learning project ideas for you to tackle today.

Datasets 160
article thumbnail

12 Days of Apache Kafka

Confluent

Before you say it: Yes, we are right now three days past Christmas, but technically the 12 days of Christmas refer to the days between Christmas and Epiphany, which is—I […].

Kafka 145
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Introducing Domain-Oriented Microservice Architecture

Uber Engineering

Introduction. Recently there has been substantial discussion around the downsides of service oriented architectures and microservice architectures in particular. While only a few years ago, many people readily adopted microservice architectures due to the numerous benefits they provide such as … The post Introducing Domain-Oriented Microservice Architecture appeared first on Uber Engineering Blog.

article thumbnail

Life of a Netflix Partner Engineer?—?The case of extra 40 ms

Netflix Tech

Life of a Netflix Partner Engineer?—?The case of the extra 40 ms By: John Blair , Netflix Partner Engineering The Netflix application runs on hundreds of smart TVs, streaming sticks and pay TV set top boxes. The role of a Partner Engineer at Netflix is to help device manufacturers launch the Netflix application on their devices. In this article we talk about one particularly difficult issue that blocked the launch of a device in Europe.

Bytes 140
article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

Advanced Analytics for Coronavirus – Trends, Patterns, Predictions

Teradata

Advanced analytics and AI can significantly accelerate data processing required to get the insights, answers and recommendations to handle and address the COVID-19 pandemic.

article thumbnail

Upgrade Journey: The Path from CDH to CDP Private Cloud

Cloudera

Cloudera delivers an enterprise data cloud that enables companies to build end-to-end data pipelines for hybrid cloud, spanning edge devices to public or private cloud, with integrated security and governance underpinning it to protect customers data. Cloudera has found that customers have spent many years investing in their big data assets and want to continue to build on that investment by moving towards a more modern architecture that helps leverage the multiple form factors.

Cloud 131

More Trending

article thumbnail

Top 5 must-have Data Science skills for 2020

KDnuggets

The standard job description for a Data Scientist has long highlighted skills in R, Python, SQL, and Machine Learning. With the field evolving, these core competencies are no longer enough to stay competitive in the job market.

article thumbnail

A Comprehensive Guide to Natural Language Generation

KDnuggets

Follow this overview of Natural Language Generation covering its applications in theory and practice. The evolution of NLG architecture is also described from simple gap-filling to dynamic document creation along with a summary of the most popular NLG models.

article thumbnail

The Book to Start You on Machine Learning

KDnuggets

This book is thought for beginners in Machine Learning, that are looking for a practical approach to learning by building projects and studying the different Machine Learning algorithms within a specific context.

article thumbnail

7 Resources to Becoming a Data Engineer

KDnuggets

An estimated 8,650% growth of the volume of Data to 175 zetabytes from 2010 to 2025 has created an enormous need for Data Engineers to build an organization's big data platform to be fast, efficient and scalable.

article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

I wanna be a data scientist, but… how?

KDnuggets

It’s easy to say "I wanna be a data scientist," but. where do you start? How much time is needed to be desired by companies? Do you need a Master’s degree? Do you need to know every mathematical concept ever derived? The journey might be long, but follow this plan to help you keep moving forward toward your career goal.

Data 159
article thumbnail

Coronavirus Data and Poll Analysis – yes, there is hope, if we act now

KDnuggets

We examine the growth of coronavirus daily cases in most affected countries, and show evidence that social distancing works in reducing the rate of spread. We also analyze KDnuggets Poll results - the scale of change to online and how Data Science work is likely to increase or drop in different regions. Stay Healthy and practice social distancing!

article thumbnail

The 4 Best Jupyter Notebook Environments for Deep Learning

KDnuggets

Many cloud providers, and other third-party services, see the value of a Jupyter notebook environment which is why many companies now offer cloud hosted notebooks that are hosted on the cloud. Let's have a look at 3 such environments.

article thumbnail

What is the most effective policy response to the new coronavirus pandemic?

KDnuggets

Where Test/Trace/Quarantine are working, the number of cases/day have declined empirically. Furthermore, this appears to be a radically superior strategy where it can be deployed. I’ll review the evidence, discuss the other strategies and their consequences, and then discuss what can be done.

IT 156
article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

When Will AutoML replace Data Scientists? Poll Results and Analysis

KDnuggets

Will AI always be 5-10 years away? The majority of respondents to this poll think that AutoML will reach expert level in 5-10 years. Interestingly, it is about the same as 5 years ago. We examine the trends by AutoML experience, industry, and region.

Data 153
article thumbnail

Predict Electricity Consumption Using Time Series Analysis

KDnuggets

Time series forecasting is a technique for the prediction of events through a sequence of time. In this post, we will be taking a small forecasting problem and try to solve it till the end learning time series forecasting alongside.

IT 152
article thumbnail

Top 9 Mobile Apps for Learning and Practicing Data Science

KDnuggets

This article will tell you about the top 9 mobile apps that help the user in learning and practicing data science and hence is improving their productivity.

article thumbnail

7 Steps to a Job-winning Data Science Resume

KDnuggets

A resume plays a key role in bagging that dream data science job. We break down the nuances of a job-winning data science resume so that you can go ahead and transform your own resume.

article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

Top 5 Things Every Kafka Developer Should Know

Confluent

Apache Kafka® is an event streaming platform used by more than 30% of the Fortune 500 today. There are numerous features of Kafka that make it the de-facto standard for […].

Kafka 145
article thumbnail

Five Interesting Data Engineering Projects

KDnuggets

As the role of the data engineer continues to grow in the field of data science, so are the many tools being developed to support wrangling all that data. Five of these tools are reviewed here (along with a few bonus tools) that you should pay attention to for your data pipeline work.

article thumbnail

Introducing the Confluent Parallel Message Processing Client

Confluent

Consuming messages in parallel is what Apache Kafka® is all about, so you may well wonder, why would we want anything else? It turns out that, in practice, there are […].

Process 144
article thumbnail

Benchmarking Apache Kafka, Apache Pulsar, and RabbitMQ: Which is the fastest?

Confluent

Apache Kafka® is one of the most popular event streaming systems. There are many ways to compare systems in this space, but one thing everyone cares about is performance. Kafka […].

Kafka 145
article thumbnail

The Big Payoff of Application Analytics

Outdated or absent analytics won’t cut it in today’s data-driven applications – not for your end users, your development team, or your business. That’s what drove the five companies in this e-book to change their approach to analytics. Download this e-book to learn about the unique problems each company faced and how they achieved huge returns beyond expectation by embedding analytics into applications.

article thumbnail

How Real-Time Stream Processing Works with ksqlDB, Animated

Confluent

ksqlDB, the event streaming database, is becoming one of the most popular ways to work with Apache Kafka®. Every day, we answer many questions about the project, but here’s a […].

Process 145
article thumbnail

Apache Kafka Needs No Keeper: Removing the Apache ZooKeeper Dependency

Confluent

Currently, Apache Kafka® uses Apache ZooKeeper™ to store its metadata. Data such as the location of partitions and the configuration of topics are stored outside of Kafka itself, in a […].

Kafka 145
article thumbnail

Preventing Fraud and Fighting Account Takeovers with Kafka Streams

Confluent

Many companies have recently started to take cybersecurity and data protection even more seriously, particularly driven by the recent General Data Protection Regulation (GDPR) legislation. They are increasing their investment […].

Kafka 145
article thumbnail

Designing Edge Gateway, Uber’s API Lifecycle Management Platform

Uber Engineering

The making of Edge Gateway, the highly-available and scalable self-serve gateway to configure, manage, and monitor APIs of every business domain at Uber. Evolution of Uber’s API gateway. In October 2014, Uber had started its journey of scale in what … The post Designing Edge Gateway, Uber’s API Lifecycle Management Platform appeared first on Uber Engineering Blog.

Designing 144
article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

Why We Leverage Multi-tenancy in Uber’s Microservice Architecture

Uber Engineering

The performance of Uber’s services relies on our ability to quickly and stably launch new features on our platform , regardless of where the corresponding service lives in our tech stack. Foundational to our platform’s power is its microservice-based architecture … The post Why We Leverage Multi-tenancy in Uber’s Microservice Architecture appeared first on Uber Engineering Blog.

article thumbnail

A Beginner’s Guide to Data Integration Approaches in Business Intelligence

KDnuggets

An integrated BI system has a trickle-down effect on all business processes, especially reporting and analytics. Find out how integration can help you leverage the power of BI.

article thumbnail

99th Percentile Latency at Scale with Apache Kafka

Confluent

Fraud detection, payment systems, and stock trading platforms are only a few of many Apache Kafka® use cases that require both fast and predictable delivery of data. For example, detecting […].

Kafka 145
article thumbnail

Apache Kafka as a Service with Confluent Cloud Now Available on Azure Marketplace

Confluent

Less than six months ago, we announced support for Microsoft Azure in Confluent Cloud, which allows developers using Azure as a public cloud to build event streaming applications with Apache […].

Cloud 145
article thumbnail

Driving Business Impact for PMs

Speaker: Jon Harmer, Product Manager for Google Cloud

Move from feature factory to customer outcomes and drive impact in your business! This session will provide you with a comprehensive set of tools to help you develop impactful products by shifting from output-based thinking to outcome-based thinking. You will deepen your understanding of your customers and their needs as well as identifying and de-risking the different kinds of hypotheses built into your roadmap.