Sat.Mar 12, 2022 - Fri.Mar 18, 2022

article thumbnail

End-to-end data engineering project - batch edition

Start Data Engineering

Objective Setup Pre-requisites Components Source systems Schedule & Orchestrate Extract Load Transform Data visualization Choosing tools & frameworks Future work & improvements Conclusion Further reading References Objective It can be difficult to know where to begin when starting a data engineering side project. If you have wondered What data to use for your data project?

article thumbnail

Top Posts Mar 7-13: Build a Machine Learning Web App in 5 Minutes

KDnuggets

Also: Decision Tree Algorithm, Explained; The Complete Collection of Data Science Cheat Sheets – Part 2; Top Programming Languages and Their Uses; The Complete Collection of Data Science Cheat Sheets – Part 1.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Announcing ksqlDB 0.24.0

Confluent

We are excited to announce ksqlDB 0.24! It comes with a slew of improvements and new features. Access to Apache Kafka® record headers will enable a whole host of new […].

Kafka 109
article thumbnail

Crystal Ball, Black Box or Advanced Forecasting and Demand Planning in Retail and CPG

Teradata

Neither crystal balls nor black boxes will provide the agility needed for accurate demand forecasting in today’s retail & CPG environment. Learn more about new approaches to FDP.

Retail 98
article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

Planet and Profit: Reshaping Business Priorities for Good

Cloudera

The global pandemic has ushered in a new wave of economics. Driven by the rapid convergence of changing circumstances, data, automation and Artificial Intelligence (AI), businesses today have to contend with a whirlwind of internal and external pressures. Companies are experiencing pressure from growing customer demands, and amidst a global talent shortage, a workforce no longer driven purely by profit but altruism and purpose.

article thumbnail

Best Data Science Books for Beginners

KDnuggets

The best knowledge is still placed in the libraries; within books. In this article, discover some of the top recommended Data Science books catering to beginners.

More Trending

article thumbnail

Semi-Supervised Learning, Explained with Examples

AltexSoft

As it sometimes happens, when one approach doesn’t work to solve a particular problem, you try a different one. When that approach doesn’t work either, it may be a good idea to combine the best parts of both. At least that’s often the case with technology tasks. Machine learning is no exception. You’ve probably heard of two main ML techniques — supervised and unsupervised learning.

article thumbnail

My Internship Experience with Cloudera

Cloudera

Each year, various departments and teams across the organization welcome early talent to embark on internships which allow them to kickstart their careers within the technology and big data industries. One of those early talent interns is Trang Luong, who worked within the APAC Inside Sales team earlier this year for a six-months, supporting the team in connecting with prospects and customers to guide them in their data-driven journey. .

article thumbnail

Feature Stores for Real-time AI & Machine Learning

KDnuggets

Real-time AI/ML is on the rise and feature stores are key to successfully deploying them. Read on to see how the choice of online store and the feature store architecture play important roles in determining its performance and cost.

article thumbnail

How Mutable Databases Make It Easy To Do Real-Time Updates

Rockset

If you’re thinking about implementing real-time analytics , you've probably realized that you're going to need real-time updates. Real-time updates give you the power to insert, delete and update data in place. To do that, you'll need something more: a mutable database. In this post we'll discuss the three main reasons why a mutable database is required for real-time updates. 1) Late Arriving Data in Time-Based Window Rollups ⌛️ Let's say you have a rollup that's counting events for each hour.

article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

Accelerating Adoption Of The Modern Data Stack At 5X Data

Data Engineering Podcast

Summary The modern data stack is a constantly moving target which makes it difficult to adopt without prior experience. In order to accelerate the time to deliver useful insights at organizations of all sizes that are looking to take advantage of these new and evolving architectures Tarush Aggarwal founded 5X Data. In this episode he explains how he works with these companies to deploy the technology stack and pairs them with an experienced engineer who assists with the implementation and traini

Data 100
article thumbnail

Cloudera Statement Regarding Ukraine

Cloudera

The world cannot ignore the horrific invasion of Ukraine and the plight of the Ukrainian people, who are facing death and devastation in the defense of their country. Our Cloudera team members and their families in Ukraine have been impacted in ways we cannot imagine, and their safety is our top priority. . Cloudera – and many of our employees individually – are engaged in multiple activities to help Ukrainians, including donating supplies, running a donation matching program, and providing acco

article thumbnail

How to Generate Synthetic Tabular Dataset

KDnuggets

Check out this article on using CTGANs to create synthetic datasets for reducing privacy risks, training and testing machine learning models, and developing data-centric AI products.

Datasets 135
article thumbnail

Case Study: Complementing DynamoDB with Rockset for Real-Time IoT Analytics at 1NCE

Rockset

Growth of the Internet of Things (IoT) hasn’t matched the hype due to numerous pain points: limited, unreliable network coverage, high connectivity, and device maintenance costs, and the uncertainty created by diverse, constantly-evolving cellular standards (4G versus 5G, LTE-M versus NB-IoT, etc.) 1NCE was founded in 2017 as a pure-play IoT connectivity provider to jumpstart IoT deployments by solving every one of those pain points.

NoSQL 52
article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

Taking A Multidimensional Approach To Data Observability At Acceldata

Data Engineering Podcast

Summary Data observability is a term that has been co-opted by numerous vendors with varying ideas of what it should mean. At Acceldata, they view it as a holistic approach to understanding the computational and logical elements that power your analytical capabilities. In this episode Tristan Spaulding, head of product at Acceldata, explains the multi-dimensional nature of gaining visibility into your running data platform and how they have architected their platform to assist in that endeavor.

Data Lake 100
article thumbnail

Engineering for Impact

Palantir

Engineering for Impact: Problem solving with purpose at Palantir Editor’s note: In this blog post, we sit down with UK Health Lead Joanna Peller, recipient of Data IQ’s 2022 100 award, to discuss what she’s learned leading Palantir’s UK Health work during the pandemic. Describe your path to Palantir. After growing up in Boston, Massachusetts, I moved to the UK to study Mathematics at UCL.

article thumbnail

From Google Colab to a Ploomber Pipeline: ML at Scale with GPUs

KDnuggets

In this short blog, we’ll review the process of taking a POC data science pipeline (ML/Deep learning/NLP) that was conducted on Google Colab, and transforming it into a pipeline that can run parallel at scale and works with Git so the team can collaborate on.

article thumbnail

Accelerate Agency Missions with Data in Motion

Cloudera

Data is the true currency of the digital age, and it plays an indispensable role in defining and accelerating the mission of Government agencies. . Every level of government is awash in data (both structured and unstructured) that is perpetually in motion. It is constantly generated – and always growing in volume – by an ever-growing range of sources, from IoT sensors and other connected devices at the edge to web and social media to video and more.

article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

What Good Data Product Managers Do – And Why You Probably Need One

Monte Carlo

The companies we talk to are diligently building their data product or platform. This includes migrating to Snowflake , integrating with Databricks, moving towards a data mesh , or generally investing in their data stack. Increasingly, we are seeing data departments modernize their team structure with data product managers at the helm of such projects.

article thumbnail

Top AI and Data Science Tools and Techniques for 2022 and Beyond

KDnuggets

How will AI and data science impact the world of business in the next decade? Find out what trends to look out for in 2022 and beyond at NVIDIA GTC.

article thumbnail

How to Engineer Date Features in Python

KDnuggets

This article discusses and demonstrates how to quickly engineer some common date features using Python.

Python 159
article thumbnail

Machine Learning Algorithms for Classification

KDnuggets

In this article, we will be going through the algorithms that can be used for classification tasks.

Algorithm 160
article thumbnail

How to Build an Experimentation Culture for Data-Driven Product Development

Speaker: Margaret-Ann Seger, Head of Product, Statsig

Experimentation is often seen as an aspirational practice, especially at smaller, fast-moving companies who are strapped for time and resources. So, how can you get your team making decisions in a more data-driven way while continuing to remain lean and maintaining ship velocity? In this webinar, Margaret-Ann Seger, Head of Product at Statsig, will teach you how to build an experimentation culture from the ground-up, graduating from just getting started with data-driven development to operating

article thumbnail

On-device AI with Developer-Ready Software Stacks

KDnuggets

Still running your artificial intelligence workloads in the cloud? If your applications depend on techniques like person detection and pose estimation to name a few, then it’s time you looked into on-device AI.

Cloud 104
article thumbnail

AI-Generated Sports Highlights: Different Approaches

KDnuggets

Competition for viewers’ attention is not over after the players leave the field. Now, anyone who can put up a highlight compilation or a game summarization first gets the edge. So, let’s talk about how media companies do just that — with the help of Artificial Intelligence.

Media 102
article thumbnail

Why Do Most People Fail to Learn Programming?

KDnuggets

Have you spent hours taking coding bootcamps, online courses, and tutorials, only to feel like you aren’t getting anywhere?

article thumbnail

Become a Data Science Professional in Five Steps

KDnuggets

If your new year's resolution was to start a career in data science but you have stalled, simply follow these easy steps to acquire professional certification within a year.

article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

Building a Geospatial Application in Python with Google Earth Engine and Greppo

KDnuggets

In this blog, you will see how to build a web-application with Greppo and Google Earth using Python.

Python 126
article thumbnail

How to Manage Multiple Inheritance in Python

KDnuggets

In this guide, we'll learn how to use multiple inheritance in Python and make it sustainable.

Python 106
article thumbnail

Use third-party data to increase user engagement and deliver business outcomes

KDnuggets

Join this webinar and learn how competitive companies utilize third-party data, enhancing mobile customer experiences, through personalization and localization.

article thumbnail

KDnuggets News March 16, 2022: Learn Data Science Fundamentals & 5 Steps to Become a Data Scientist

KDnuggets

How Long Does It Take to Learn Data Science Fundamentals?; Become a Data Science Professional in Five Steps; New Ways of Sharing Code Blocks for Data Scientists; Machine Learning Algorithms for Classification; The Significance of Data Quality in Making a Successful Machine Learning Model.

article thumbnail

Reimagined: Building Products with Generative AI

“Reimagined: Building Products with Generative AI” is an extensive guide for integrating generative AI into product strategy and careers featuring over 150 real-world examples, 30 case studies, and 20+ frameworks, and endorsed by over 20 leading AI and product executives, inventors, entrepreneurs, and researchers.