Sat.Oct 23, 2021 - Fri.Oct 29, 2021

article thumbnail

Kafka Streams Fundamentals

Confluent

Kafka Streams is an abstraction over Apache Kafka® producers and consumers that lets you forget about low-level details and focus on processing your Kafka data. You could of course write […].

Kafka 130
article thumbnail

Streaming Data Pipelines Made SQL With Decodable

Data Engineering Podcast

Summary Streaming data systems have been growing more capable and flexible over the past few years. Despite this, it is still challenging to build reliable pipelines for stream processing. In this episode Eric Sammer discusses the shortcomings of the current set of streaming engines and how they force engineers to work at an extremely low level of abstraction.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

The Ultimate Map to finding Halloween candy surplus

Cloudera

As Halloween night quickly approaches, there is only one question on every kid’s mind: how can I maximize my candy haul this year with the best possible candy? This kind of question lends itself perfectly to data science approaches that enable quick and intuitive analysis of data across multiple sources. Using Cloudera Machine Learning, the world’s first hybrid data cloud machine learning tooling, let’s take a deep dive into the world of candy analytics to answer the tough question on everyone’s

article thumbnail

Is Balancing Complex Retail and CPG Supply Chains a Total Fantasy?

Teradata

Recent events have illustrated the fragility of ultra-lean supply chains. Chief Supply Chain Officers must figure out how to navigate these crises to manage costs, speed & quality of service.

Retail 98
article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

Stream Governance – How it Works

Confluent

At the recent Kafka Summit, Confluent announced the general availability of Stream Governance–the industry’s only governance suite for data in motion. Offered as a fully managed cloud solution, it delivers […].

article thumbnail

Open-Sourcing a Monitoring GUI for Metaflow

Netflix Tech

Open-Sourcing a Monitoring GUI for Metaflow, Netflix’s ML Platform tl;dr Today, we are open-sourcing a long-awaited GUI for Metaflow. The Metaflow GUI allows data scientists to monitor their workflows in real-time, track experiments, and see detailed logs and results for every executed task. The GUI can be extended with plugins, allowing the community to build integrations to other systems, custom visualizations, and embed upcoming features of Metaflow directly into its views.

Python 89

More Trending

article thumbnail

What are the Prerequisites to Learn Machine Learning?

ProjectPro

In this blog, we have mentioned all the topics that are considered as prerequisites for learning machine learning. We have covered all the subjects and the best resources that will help you learn them thoroughly. Upskilling in the era of the Internet has become hassle-free.The Internet has given a platform to experts who can now share their knowledge with a large number of people and help those people in acquiring new skills irrespective of their previous knowledge about the subject.

article thumbnail

Unicorns, data mesh, category creation, and more reasons to attend IMPACT: The Data Observability Summit

Monte Carlo

Fall is here, Halloween is right around the corner (see below), and we’re one week away from my favorite event of the year: IMPACT , the world’s first Data Observability summit! Here are five reasons why I’m excited – and you should be, too: The lineup. The former CEO of Snowflake. The first Chief Data Officer of the U.S. The founder of the data mesh.

Data 52
article thumbnail

Interpreting A/B test results: false negatives and power

Netflix Tech

Martin Tingley with Wenjing Zheng , Simon Ejdemyr , Stephanie Lane , and Colin McFarland This is the fourth post in a multi-part series on how Netflix uses A/B tests to inform decisions and continuously innovate on our products. Need to catch up? Have a look at Part 1 (Decision Making at Netflix), Part 2 (What is an A/B Test?), Part 3 (False positives and statistical significance).

article thumbnail

#ClouderaLife Spotlight: Krishna Birla, Software Engineer

Cloudera

Krishna is a Software Engineer working on our Compute Platform and operates out of Bangalore, India. His primary responsibility is to develop, test and maintain software applications that provide compute services to various Cloudera products. His day to day revolves around cloud computing, resource scheduling and API & systems designing. . Technology and design are his major interest areas.

article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

What Is ‘Equity As Code,’ And How Can It Eliminate AI Bias?

DataKitchen

This article was originally published in Forbes. Engineers unleashed artificial intelligence (AI) bias, and it will be engineers who design the solutions that eliminate it. Authors of an article published by McKinsey Global Institute assert that “more human vigilance is needed to critically analyze the unfair biases that can become baked in and scaled by AI systems.

Coding 52
article thumbnail

What is a Data Pipeline?

Grouparoo

In today’s data-driven business world, organizations are looking for more efficient ways to leverage data from a variety of sources. For example, businesses often need to evaluate their performance based on large volumes of customer and sales data that might be stored in a variety of locations and formats. Security and compliance teams need to monitor data from a wide array of devices and systems to detect threats as quickly as possible.

article thumbnail

Data Engineers of Netflix?—?Interview with Pallavi Phadnis

Netflix Tech

Data Engineers of Netflix?—?Interview with Pallavi Phadnis This post is part of our “ Data Engineers of Netflix ” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Pallavi Phadnis is a Senior Software Engineer at Netflix. Pallavi Phadnis is a Senior Software Engineer on the Product Data Science and Engineering team.

article thumbnail

Cloudera Machine Learning Workspace Provisioning Pre-Flight Checks

Cloudera

At Cloudera, we believe that data can make what is impossible today, possible tomorrow. There are many good uses of data. With data, we can monitor our business, the overall business, or specific business units. We can segment based on the customer verticals or whether they run in the public or private cloud. We can understand customers better, see usage patterns and main consumption drivers.

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

Case Study: Fast and Simple — Building Rich Patient Dashboards for Speech Therapists with Rockset

Rockset

There are more than 65 million speech-impaired people worldwide of every age and in every social sphere. Historically, they are a vulnerable social group, found in special education institutions, rehab centers, hospitals and clinics, or their own homes. Every one of them needs rehabilitation, education, and help, in order to communicate their needs, emotions and ideas.

NoSQL 52
article thumbnail

Welcome, Edmundo!

Grouparoo

There are some people that you meet and hope to work with someday. Two of our co-founders met Edmundo in school long ago and have been looking for that opportunity. It has arrived! Edmundo is joining the Grouparoo team as a Senior Full-Stack Engineer. Most recently, Edmundo was at Drift making conversational marketing and sales tools. Drift and tools like it are examples of where Grouparoo users want to sync their data.

article thumbnail

6 Ways to Optimize Your Database for Performance

Data Science Blog: Data Engineering

Knowing how to optimize your organization’s database for maximum performance can lead to greater efficiency, productivity, and user satisfaction. While it may seem challenging at first, there are a few easy performance tuning tips that you can get started with. 1. Use Indexing Indexing is one of the core ways to give databases a performance boost. There are different ways of approaching indexing , but they all have the same goal: decreasing query wait time by making it easier to find and access

article thumbnail

Infographic – Data Engineers are Burned Out and Calling for DataOps

DataKitchen

A survey commissioned by data.world and DataKitchen reveals a disturbing state of affairs among data engineering professionals. The study of 600 data engineers, conducted by Wakefield Research, suggests an overwhelming majority are burned out and calling for relief. This infographic highlights the results. You can also download the infographic here.

article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

Natural Language Processing in Healthcare: Using Text Analysis for Medical Documentation and Decision-Making

AltexSoft

“AI is technology’s most important priority, and health care is its most urgent application,” said Microsoft’s CEO Satya Nadella announcing the company’s new acquisition. Nuance, acquired for $19.7 billion (Microsoft’s biggest purchase since LinkedIn), provides niche AI products for clinical voice transcription, used in 77 percent of US hospitals. Its deep learning natural language processing algorithm is best in class for alleviating clinical documentation burnout, which is one of the main prob

Medical 52
article thumbnail

Grouparoo v0.7 release

Grouparoo

The 0.7 release of Grouparoo is a huge step forward for data engineers using Grouparoo to reliably sync a variety of types of data to operational tools. Here are the key features of the release. Models enable Grouparoo to work with multiple data schemas at once. Grouparoo helps troubleshoot messy data and is more resistant to data problems New Destination: Braze Users DevOps Logging Plugins: AWS CloudWatch, Prometheus Models The primary addition is the concept of having multiple Models.

AWS 52
article thumbnail

Avoiding a Digital Cardiac Arrest

Teradata

Data liquidity is the lifeblood of the digital transformation needed to deliver the Bank of the Future. Find out more.

Banking 52
article thumbnail

Determining Sentiment Analysis With RudderStack User Transformations

RudderStack

In this tutorial project, you’ll learn how you can replicate the sentiment analysis system we use here at RudderStack within your own stack.

Project 40
article thumbnail

How to Build an Experimentation Culture for Data-Driven Product Development

Speaker: Margaret-Ann Seger, Head of Product, Statsig

Experimentation is often seen as an aspirational practice, especially at smaller, fast-moving companies who are strapped for time and resources. So, how can you get your team making decisions in a more data-driven way while continuing to remain lean and maintaining ship velocity? In this webinar, Margaret-Ann Seger, Head of Product at Statsig, will teach you how to build an experimentation culture from the ground-up, graduating from just getting started with data-driven development to operating

article thumbnail

15 Projects on Machine Learning Applications in Finance

ProjectPro

Wondering how to implement machine learning in finance effectively and gain valuable insights? This blog presents the topmost useful machine learning applications in finance to help you understand how financial markets thrive by adopting AI and ML solutions. It also covers some innovative use cases to highlight the significance of machine learning in finance.

Finance 52
article thumbnail

Removing The Barrier To Exploratory Analytics with Activity Schema and Narrator

Data Engineering Podcast

Summary The perennial question of data warehousing is how to model the information that you are storing. This has given rise to methods as varied as star and snowflake schemas, data vault modeling, and wide tables. The challenge with many of those approaches is that they are optimized for answering known questions but brittle and cumbersome when exploring unknowns.

article thumbnail

Are you Somebody Who Leads from the Ivory Tower or from the Front Lines?

Cloudera

World Mental Health Day took place earlier this month. Many came forward to share their personal struggles with mental health to raise awareness and reduce the stigma surrounding these issues. The pressures of the pandemic may have exacerbated some deep-seated problems among some of us, which has led us to place greater emphasis on mental health. . The 2021 Global WellBeing report by professional services firm, AON, revealed that mental health and working environment are ranked among the top thr

article thumbnail

Deliver Deeper Digital Product Insights With RudderStack and Amplitude

RudderStack

RudderStack is proud to partner with Amplitude to deliver deeper digital product insights, so every team can make better decisions, faster.

40
article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

15 Popular Machine Learning Frameworks for Model Training

ProjectPro

There is no “one-size-fits-all” machine learning framework for model building. Data scientists and machine learning engineers use various machine learning tools and frameworks to build production-ready models. Since there are so many machine learning frameworks and tools available in the market with varied learning curves and user bases, deciding on which machine learning framework to choose for a business use case.

article thumbnail

Studying Job Duration

Datakin

A modern data pipeline is a large, complex, and often fragmented system with cascading interactions across multiple tools and platforms. It can be difficult to evaluate longer-term pipeline health in the absence of discrete warnings and failures, and to track tasks and dependencies across multiple teams and disparate systems. At Datakin, we’ve honed in on the runtime of pipeline jobs as a key metric to watch in daily data operations.

article thumbnail

High Availability (Multi-AZ) for CDP Operational Database

Cloudera

CDP Operational Database (COD) is an autonomous transactional database powered by Apache HBase and Apache Phoenix. It is one of the main Data Services that runs on Cloudera Data Platform (CDP) Public Cloud. You can access COD right from your CDP console. With COD, application developers can now leverage the power of HBase and Phoenix without the overheads that are often related to deployment and management.

article thumbnail

New Features in Cloudera Streams Messaging Public Cloud 7.2.12

Cloudera

With the launch of the Cloudera Public Cloud 7.2.12, the Streams Messaging for Data Hub deployments have gotten some interesting new features! From this release, Streams Messaging templates will support scaling with automatic rebalancing allowing you to grow or shrink your Apache Kafka cluster based on demand. Another notable item is that Streams Replication Manager (SRM) will now support multi-cluster monitoring patterns and aggregate replication metrics from multiple SRM deployments into a sin

Cloud 90
article thumbnail

Reimagined: Building Products with Generative AI

“Reimagined: Building Products with Generative AI” is an extensive guide for integrating generative AI into product strategy and careers featuring over 150 real-world examples, 30 case studies, and 20+ frameworks, and endorsed by over 20 leading AI and product executives, inventors, entrepreneurs, and researchers.