Top Data Engineering Digest Unstructured Data Cloud Content for 2020

2020

Change Data Capture Using Debezium Kafka and Pg

Start Data Engineering

MAY 9, 2020

Change data capture is a software design pattern used to capture changes to data and take corresponding action based on that change. The change to data is usually one of read, update or delete. The corresponding action usually is supposed to occur in another system in response to the change that was made in the source system.

Kafka

Kafka Data Designing Systems

20+ Machine Learning Datasets & Project Ideas

KDnuggets

MARCH 9, 2020

Upgrading your machine learning, AI, and Data Science skills requires practice. To practice, you need to develop models with a large amount of data. Finding good datasets to work with can be challenging, so this article discusses more than 20 great datasets along with machine learning project ideas for you to tackle today.

Datasets

Datasets Machine Learning Project Data Science

Join 16,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Demystifying DAPs: A Practical Guide to Digital Adoption Success

The AI Superhero Approach to Product Management

MORE WEBINARS

Trending Sources

12 Days of Apache Kafka

Confluent

DECEMBER 28, 2020

Before you say it: Yes, we are right now three days past Christmas, but technically the 12 days of Christmas refer to the days between Christmas and Epiphany, which is—I […].

Kafka

Kafka IT

Webinars

Demystifying DAPs: A Practical Guide to Digital Adoption Success

The AI Superhero Approach to Product Management

MORE WEBINARS

Introducing Domain-Oriented Microservice Architecture

Uber Engineering

JULY 23, 2020

Introduction. Recently there has been substantial discussion around the downsides of service oriented architectures and microservice architectures in particular. While only a few years ago, many people readily adopted microservice architectures due to the numerous benefits they provide such as … The post Introducing Domain-Oriented Microservice Architecture appeared first on Uber Engineering Blog.

Architecture

Architecture Engineering

The AI Superhero Approach to Product Management

Speaker: Conrado Morlan

In this engaging and witty talk, we’ll explore how artificial intelligence can transform the daily tasks of product managers into streamlined, efficient processes. Using the lens of a superhero narrative, we’ll uncover how AI can be the ultimate sidekick, aiding in decision-making, enhancing productivity, and boosting innovation. Attendees will leave with practical tools and actionable insights, motivated to embrace AI and leverage its potential in their work. 🦸 🏢 Key objectives:

Management

Life of a Netflix Partner Engineer?—?The case of extra 40 ms

Netflix Tech

DECEMBER 14, 2020

Life of a Netflix Partner Engineer?—?The case of the extra 40 ms By: John Blair , Netflix Partner Engineering The Netflix application runs on hundreds of smart TVs, streaming sticks and pay TV set top boxes. The role of a Partner Engineer at Netflix is to help device manufacturers launch the Netflix application on their devices. In this article we talk about one particularly difficult issue that blocked the launch of a device in Europe.

Bytes

Bytes Engineering Manufacturing Coding

Advanced Analytics for Coronavirus – Trends, Patterns, Predictions

Teradata

MARCH 15, 2020

Advanced analytics and AI can significantly accelerate data processing required to get the insights, answers and recommendations to handle and address the COVID-19 pandemic.

Data Process

Data Process Process Data

Upgrade Journey: The Path from CDH to CDP Private Cloud

Cloudera

SEPTEMBER 28, 2020

Cloudera delivers an enterprise data cloud that enables companies to build end-to-end data pipelines for hybrid cloud, spanning edge devices to public or private cloud, with integrated security and governance underpinning it to protect customers data. Cloudera has found that customers have spent many years investing in their big data assets and want to continue to build on that investment by moving towards a more modern architecture that helps leverage the multiple form factors.

Cloud

Cloud Kafka Professional Services Metadata

More Trending

Upgrade Journey: The Path from CDH to CDP Private Cloud

Cloudera

SEPTEMBER 28, 2020

Cloud

Cloud Kafka Professional Services Metadata

Top 10 Technology Trends for 2020

KDnuggets

JANUARY 16, 2020

With integrations of multiple emerging technologies just in the past year, AI development continues at a fast pace. Following the blueprint of science and technology advancements in 2019, we predict 10 trends we expect to see in 2020 and beyond.

Technology

Top 5 must-have Data Science skills for 2020

KDnuggets

JANUARY 8, 2020

The standard job description for a Data Scientist has long highlighted skills in R, Python, SQL, and Machine Learning. With the field evolving, these core competencies are no longer enough to stay competitive in the job market.

Data Science

Data Science Machine Learning SQL Python

A Comprehensive Guide to Natural Language Generation

KDnuggets

JANUARY 7, 2020

Follow this overview of Natural Language Generation covering its applications in theory and practice. The evolution of NLG architecture is also described from simple gap-filling to dynamic document creation along with a summary of the most popular NLG models.

Architecture

Architecture IT Process

The Book to Start You on Machine Learning

KDnuggets

JANUARY 9, 2020

This book is thought for beginners in Machine Learning, that are looking for a practical approach to learning by building projects and studying the different Machine Learning algorithms within a specific context.

Machine Learning

Machine Learning Algorithm Project Building

Provide Real Value in Your Applications with Data and Analytics

The complexity of financial data, the need for real-time insight, and the demand for user-friendly visualizations can seem daunting when it comes to analytics - but there is an easier way. With Logi Symphony, we aim to turn these challenges into opportunities. Our platform empowers you to seamlessly integrate advanced data analytics, generative AI, data visualization, and pixel-perfect reporting into your applications, transforming raw data into actionable insights.

Raw Data

7 Resources to Becoming a Data Engineer

KDnuggets

JANUARY 7, 2020

An estimated 8,650% growth of the volume of Data to 175 zetabytes from 2010 to 2025 has created an enormous need for Data Engineers to build an organization's big data platform to be fast, efficient and scalable.

Data Engineering

Data Engineering Data Engineer Engineering Big Data

I wanna be a data scientist, but… how?

KDnuggets

JANUARY 20, 2020

It’s easy to say "I wanna be a data scientist," but. where do you start? How much time is needed to be desired by companies? Do you need a Master’s degree? Do you need to know every mathematical concept ever derived? The journey might be long, but follow this plan to help you keep moving forward toward your career goal.

Data

Coronavirus Data and Poll Analysis – yes, there is hope, if we act now

KDnuggets

MARCH 23, 2020

We examine the growth of coronavirus daily cases in most affected countries, and show evidence that social distancing works in reducing the rate of spread. We also analyze KDnuggets Poll results - the scale of change to online and how Data Science work is likely to increase or drop in different regions. Stay Healthy and practice social distancing!

Data Science

Data Science Data

The 4 Best Jupyter Notebook Environments for Deep Learning

KDnuggets

MARCH 19, 2020

Many cloud providers, and other third-party services, see the value of a Jupyter notebook environment which is why many companies now offer cloud hosted notebooks that are hosted on the cloud. Let's have a look at 3 such environments.

Deep Learning

Deep Learning Cloud Python

Entity Resolution: Your Guide to Deciding Whether to Build It or Buy It

Adding high-quality entity resolution capabilities to enterprise applications, services, data fabrics or data pipelines can be daunting and expensive. Organizations often invest millions of dollars and years of effort to achieve subpar results. This guide will walk you through the requirements and challenges of implementing entity resolution. By the end, you'll understand what to look for, the most common mistakes and pitfalls to avoid, and your options.

What is the most effective policy response to the new coronavirus pandemic?

KDnuggets

MARCH 19, 2020

Where Test/Trace/Quarantine are working, the number of cases/day have declined empirically. Furthermore, this appears to be a radically superior strategy where it can be deployed. I’ll review the evidence, discuss the other strategies and their consequences, and then discuss what can be done.

When Will AutoML replace Data Scientists? Poll Results and Analysis

KDnuggets

MARCH 16, 2020

Will AI always be 5-10 years away? The majority of respondents to this poll think that AutoML will reach expert level in 5-10 years. Interestingly, it is about the same as 5 years ago. We examine the trends by AutoML experience, industry, and region.

Data

Data IT Data Science

Predict Electricity Consumption Using Time Series Analysis

KDnuggets

JANUARY 2, 2020

Time series forecasting is a technique for the prediction of events through a sequence of time. In this post, we will be taking a small forecasting problem and try to solve it till the end learning time series forecasting alongside.

IT Python

Top 9 Mobile Apps for Learning and Practicing Data Science

KDnuggets

JANUARY 17, 2020

This article will tell you about the top 9 mobile apps that help the user in learning and practicing data science and hence is improving their productivity.

Data Science

Data Science Data

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Speaker: Maher Hanafi, VP of Engineering at Betterworks & Tony Karrer, CTO at Aggregage

Executive leaders and board members are pushing their teams to adopt Generative AI to gain a competitive edge, save money, and otherwise take advantage of the promise of this new era of artificial intelligence. There's no question that it is challenging to figure out where to focus and how to advance when it’s a new field that is evolving everyday. 💡 This new webinar featuring Maher Hanafi, VP of Engineering at Betterworks, will explore a practical framework to transform Generative AI pr

Data Collection

Top 5 Things Every Kafka Developer Should Know

Confluent

OCTOBER 16, 2020

Apache Kafka® is an event streaming platform used by more than 30% of the Fortune 500 today. There are numerous features of Kafka that make it the de-facto standard for […].

Kafka

Kafka IT

7 Steps to a Job-winning Data Science Resume

KDnuggets

JANUARY 10, 2020

A resume plays a key role in bagging that dream data science job. We break down the nuances of a job-winning data science resume so that you can go ahead and transform your own resume.

Data Science

Data Science Data

Introducing the Confluent Parallel Message Processing Client

Confluent

DECEMBER 15, 2020

Consuming messages in parallel is what Apache Kafka® is all about, so you may well wonder, why would we want anything else? It turns out that, in practice, there are […].

Process

Process Kafka IT

Benchmarking Apache Kafka, Apache Pulsar, and RabbitMQ: Which is the fastest?

Confluent

AUGUST 21, 2020

Apache Kafka® is one of the most popular event streaming systems. There are many ways to compare systems in this space, but one thing everyone cares about is performance. Kafka […].

Kafka

Kafka Systems

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

Data Science

Five Interesting Data Engineering Projects

KDnuggets

MARCH 17, 2020

As the role of the data engineer continues to grow in the field of data science, so are the many tools being developed to support wrangling all that data. Five of these tools are reviewed here (along with a few bonus tools) that you should pay attention to for your data pipeline work.

Data Engineering

Data Engineering Data Engineer Engineering Project

How Real-Time Stream Processing Works with ksqlDB, Animated

Confluent

SEPTEMBER 29, 2020

ksqlDB, the event streaming database, is becoming one of the most popular ways to work with Apache Kafka®. Every day, we answer many questions about the project, but here’s a […].

Process

Process Kafka Database Project

Apache Kafka Needs No Keeper: Removing the Apache ZooKeeper Dependency

Confluent

MAY 15, 2020

Currently, Apache Kafka® uses Apache ZooKeeper™ to store its metadata. Data such as the location of partitions and the configuration of topics are stored outside of Kafka itself, in a […].

Kafka

Kafka Metadata IT Project

Preventing Fraud and Fighting Account Takeovers with Kafka Streams

Confluent

APRIL 9, 2020

Many companies have recently started to take cybersecurity and data protection even more seriously, particularly driven by the recent General Data Protection Regulation (GDPR) legislation. They are increasing their investment […].

Kafka

Kafka Retail Data Process

Demystifying DAPs: A Practical Guide to Digital Adoption Success

Speaker: Pulkit Agrawal

Digital Adoption Platforms (DAPs) are revolutionizing the way organizations interact with and optimize their software applications. As digital transformation continues to accelerate, DAPs have become essential tools for enhancing user engagement and software efficiency. This session is your guide into the robust world of DAPs, exploring their origins, evolution, and the current trends shaping their development.

Certification

Why We Leverage Multi-tenancy in Uber’s Microservice Architecture

Uber Engineering

MARCH 11, 2020

The performance of Uber’s services relies on our ability to quickly and stably launch new features on our platform , regardless of where the corresponding service lives in our tech stack. Foundational to our platform’s power is its microservice-based architecture … The post Why We Leverage Multi-tenancy in Uber’s Microservice Architecture appeared first on Uber Engineering Blog.

Architecture

Architecture Engineering IT

Designing Edge Gateway, Uber’s API Lifecycle Management Platform

Uber Engineering

AUGUST 18, 2020

The making of Edge Gateway, the highly-available and scalable self-serve gateway to configure, manage, and monitor APIs of every business domain at Uber. Evolution of Uber’s API gateway. In October 2014, Uber had started its journey of scale in what … The post Designing Edge Gateway, Uber’s API Lifecycle Management Platform appeared first on Uber Engineering Blog.

Designing

Designing Management Engineering IT

99th Percentile Latency at Scale with Apache Kafka

Confluent

FEBRUARY 25, 2020

Fraud detection, payment systems, and stock trading platforms are only a few of many Apache Kafka® use cases that require both fast and predictable delivery of data. For example, detecting […].

Kafka

Kafka Systems Data

Apache Kafka as a Service with Confluent Cloud Now Available on Azure Marketplace

Confluent

FEBRUARY 18, 2020

Less than six months ago, we announced support for Microsoft Azure in Confluent Cloud, which allows developers using Azure as a public cloud to build event streaming applications with Apache […].

Cloud

Cloud Kafka Building Programming

Deliver Mission Critical Insights in Real Time with Data & Analytics

In the fast-moving manufacturing sector, delivering mission-critical data insights to empower your end users or customers can be a challenge. Traditional BI tools can be cumbersome and difficult to integrate - but it doesn't have to be this way. Logi Symphony offers a powerful and user-friendly solution, allowing you to seamlessly embed self-service analytics, generative AI, data visualization, and pixel-perfect reporting directly into your applications.

Data Analytics

2020

Change Data Capture Using Debezium Kafka and Pg

20+ Machine Learning Datasets & Project Ideas

Webinars

Trending Sources

12 Days of Apache Kafka

Webinars

Introducing Domain-Oriented Microservice Architecture

The AI Superhero Approach to Product Management

Life of a Netflix Partner Engineer?—?The case of extra 40 ms

Advanced Analytics for Coronavirus – Trends, Patterns, Predictions

Upgrade Journey: The Path from CDH to CDP Private Cloud

Sign up to get articles personalized to your interests!

More Trending

Upgrade Journey: The Path from CDH to CDP Private Cloud

Top 10 Technology Trends for 2020

Top 5 must-have Data Science skills for 2020

A Comprehensive Guide to Natural Language Generation

The Book to Start You on Machine Learning

Provide Real Value in Your Applications with Data and Analytics

7 Resources to Becoming a Data Engineer

I wanna be a data scientist, but… how?

Coronavirus Data and Poll Analysis – yes, there is hope, if we act now

The 4 Best Jupyter Notebook Environments for Deep Learning

Entity Resolution: Your Guide to Deciding Whether to Build It or Buy It

What is the most effective policy response to the new coronavirus pandemic?

When Will AutoML replace Data Scientists? Poll Results and Analysis

Predict Electricity Consumption Using Time Series Analysis

Top 9 Mobile Apps for Learning and Practicing Data Science

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Top 5 Things Every Kafka Developer Should Know

7 Steps to a Job-winning Data Science Resume

Introducing the Confluent Parallel Message Processing Client

Benchmarking Apache Kafka, Apache Pulsar, and RabbitMQ: Which is the fastest?

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Five Interesting Data Engineering Projects

How Real-Time Stream Processing Works with ksqlDB, Animated

Apache Kafka Needs No Keeper: Removing the Apache ZooKeeper Dependency

Preventing Fraud and Fighting Account Takeovers with Kafka Streams

Demystifying DAPs: A Practical Guide to Digital Adoption Success

Why We Leverage Multi-tenancy in Uber’s Microservice Architecture

Designing Edge Gateway, Uber’s API Lifecycle Management Platform

99th Percentile Latency at Scale with Apache Kafka

Apache Kafka as a Service with Confluent Cloud Now Available on Azure Marketplace

Deliver Mission Critical Insights in Real Time with Data & Analytics

Stay Connected