Sat.May 22, 2021 - Fri.May 28, 2021

article thumbnail

The Ethics of AI Comes Down to Conscious Decisions

Cloudera

This blog post was written by Pedro Pereira as a guest author for Cloudera. . Right now, someone somewhere is writing the next fake news story or editing a deepfake video. An authoritarian regime is manipulating an artificial intelligence (AI) system to spy on technology users. No matter how good the intentions behind the development of a technology, someone is bound to corrupt and manipulate it.

Algorithm 114
article thumbnail

Announcing ksqlDB 0.18.0

Confluent

We’re pleased to announce ksqlDB 0.18.0! This release includes pull queries on table-table joins and support for variable substitution in the Java client and ksqlDB’s migration tool. We’ll step through […].

Java 83
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

My (Seemingly) Random Walk to Netflix

Netflix Tech

Part of our series on who works in Analytics at Netflix?—?and what the role entails By Sean Barnes, Studio Production Data Science & Engineering I am going to tell you a story about a person that works for Netflix. That person grew up dreaming of working in the entertainment industry. They attended the University of Southern California, double majored in data science and television & film production, and graduated summa cum laude.

article thumbnail

Intelligent Document Processing: Technology Overview

AltexSoft

Whatever the industry, various documents accompany at least a quarter of business operations. Healthcare, for example, is filled with millions of patient records and medical forms. As far as transportation, these can be maintenance and driver logs. The documents often come in semi-structured and unstructured data formats, which makes them difficult to process quickly and accurately.

article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

Pushing Past Pilot Paralysis to Launch and Scale IIOT Use Cases

Cloudera

With billions of industrial IoT (IIOT) devices in place, generating massive volumes of data from “the edge,” the potential for proof of concept success for use cases in the factory can be paralyzing. While the value of this digital revolution, aka Industry 4.0, is clear, realizing the full promise has been slow. Research and real-life experience from Accenture shows that many manufacturers get stuck early on or can’t get beyond proof-of-concept pilots to scale.

article thumbnail

Continuous Deployment of Confluent with Ansible Tower

Confluent

When self-managing Confluent, provisioning and configuring Apache Kafka® deployments along with the rest of the Confluent components involves many hurdles, such as managing infrastructure, installing software, and configuring security. And […].

Kafka 76

More Trending

article thumbnail

What is Azure Data Factory? A beginner’s guide to ADF

A Cloud Guru: Data Engineering

With Microsoft Build 2021 currently underway, what better time to take a beginner-friendly deep dive into Azure Data Factory. In this post, we’ll talk about what Azure Data Factory is, how to get started using it, and what you might use it for. Keep up with all things Azure in the ACG original series Azure […] The post What is Azure Data Factory?

Data 52
article thumbnail

Session-based Recommender Systems

Cloudera

Recommendation systems have become a cornerstone of modern life, spanning sectors that include online retail, music and video streaming, and even content publishing. These systems help us navigate the sheer volume of content on the internet, allowing us to discover what’s interesting or important to us. The classic modeling approaches to recommendation systems can be broadly categorized as content-based, as collaborative filtering-based, or as hybrid approaches that combine aspects of the two.

Systems 59
article thumbnail

Are MySQL columns names case sensitive?

Grouparoo

There is a debate among a very specific set of people about what case to use in SQL queries. This debate is made possible by the fact that, generally, it does not matter. I believed this to be true even about identifiers like columns names. For example, both of these queries returns the same data even though the "real" column is defined in lowercase.

MySQL 52
article thumbnail

Will Open Banking Enhance the Quality of Daily Life?

Teradata

Banking organizations embracing an interest in improving the quality of their customers’ lives will be rewarded with the sustained inspiration needed to anticipate & deliver personalized services.

Banking 52
article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

Business Processing Dashboards

FreshBI

Business Processing Dashboards Explained A Business Workflow is visual representation of steps through a business activity used to support and validate your Business Intelligence Objectives. Best Benefits of Business Processing Dashboards Improved Business Intelligence: Business Processing Dashboards will assist your business to reach extraordinary achievements based on your live dashboards.

Process 52
article thumbnail

DS Building Blocks - A quick guide on experimentation for Non-Technical Users

DareData

Do you get overwhelmed when your data team rambles on about correlation, causality, A/B testing and other terms? Or you are a manager with some projects that include statistics and machine learning and you feel that you should contribute more on guiding your team to the correct outcome? These types of situations are common for business and non-technical users.

article thumbnail

Compare and Contrast Search Indexing With Real-Time Converged Indexing

Rockset

Let's compare and contrast search indexing with real-time converged indexing and explain what converged indexing is, how it's similar, how it's different, how the architecture is set up, and then review some of the details of how it is different in terms of operations. When you talk about serverless systems and cloud-native systems, there's a huge advantage that we have in the cloud and we really want to spend some time talking about initial setup, in terms of day two operations.

MongoDB 40
article thumbnail

We Rise as One in our Mission to Eradicate Racism

Teradata

Teradata reinforces its pledge to diversity, equity, and inclusion. We are committed to eradicate racism and expand diversity into all aspects of our business.

IT 52
article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

8 Feature Engineering Techniques for Machine Learning

ProjectPro

“Coming up with features is difficult, time-consuming, requires expert knowledge. ‘Applied machine learning is basically feature engineering.” — Prof. Andrew Ng. Data Scientists spend 80% of their time doing feature engineering because it's a time-consuming and difficult process. Understanding features and the various techniques involved to deconstruct this art can ease the complex process of feature engineering.

article thumbnail

Watch: Generating Data for Story-Driven Demos

Silectis

At a recent DC Data Engineering Meetup , a community group Silectis created and sponsors, we had the pleasure of having Tim Tutt , CEO of Night Shift Development as our guest speaker. Through Tim’s presentation about using data engineering to enable data analytics through the masses, our teams started to notice how complementary our products are.

Data 52
article thumbnail

What Is a Serverless Database and Why Use One

Rockset

The move to serverless has been a fast one. Of AWS users, over half have adopted Lambda , but serverless isn't just Lambda functions. Serverless is a way to utilize infrastructure to build applications and services without needing to provision or scale out servers. This can be an advantage when it comes to development because developers and engineers don’t need to manage as much in terms of infrastructure.

article thumbnail

Superset and Aws Athena Tutorial - Data Lake

Preset

Visualize your data lake using AWS Athena and Apache Superset™.

article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

Easily Build Advanced Similarity Search With The Pinecone Vector Database

Data Engineering Podcast

Summary Machine learning models use vectors as the natural mechanism for representing their internal state. The problem is that in order for the models to integrate with external systems their internal state has to be translated into a lower dimension. To eliminate this impedance mismatch Edo Liberty founded Pinecone to build database that works natively with vectors.

Database 100
article thumbnail

What's a Typical Data Scientist Career Path like in 2023?

ProjectPro

Is "becoming a data scientist" one of your resolutions for 2021? Data science careers have seen tremendous growth over the years. On top of commanding high data scientist salaries( average data scientist salary is $96501), data science beginners can expect growth opportunities to level up in their data science career as they upskill and gain experience.

article thumbnail

The Four Upgrade and Migration Paths to CDP from Legacy Distributions

Cloudera

The move into any new technology requires planning and coordinated effort to ensure a successful transition. This blog will describe the four paths to move from a legacy platform such as Cloudera CDH or HDP into CDP Public Cloud or CDP Private Cloud. The four paths are In-place Upgrade, Side-car Migration, Rolling Side-car Migration, and Migrate to Public Cloud. .

Cloud 79
article thumbnail

Data Transformations Using the Data Build Tool

Ripple Engineering

At Ripple , we are moving towards building complex business models out of raw data. To do this successfully, we need to automate our historically manual processes. Even with a digital-first approach, many of our internal processes were done by hand, making them great candidates to be automated. A prime example of this was the process of managing our data transformation workflows.

article thumbnail

How to Build an Experimentation Culture for Data-Driven Product Development

Speaker: Margaret-Ann Seger, Head of Product, Statsig

Experimentation is often seen as an aspirational practice, especially at smaller, fast-moving companies who are strapped for time and resources. So, how can you get your team making decisions in a more data-driven way while continuing to remain lean and maintaining ship velocity? In this webinar, Margaret-Ann Seger, Head of Product at Statsig, will teach you how to build an experimentation culture from the ground-up, graduating from just getting started with data-driven development to operating

article thumbnail

Paving The Road For Fast Analytics On Distributed Clouds With The Yellowbrick Data Warehouse

Data Engineering Podcast

Summary The data warehouse has become the focal point of the modern data platform. With increased usage of data across businesses, and a diversity of locations and environments where data needs to be managed, the warehouse engine needs to be fast and easy to manage. Yellowbrick is a data warehouse platform that was built from the ground up for speed, and can work across clouds and all the way to the edge.

article thumbnail

Asynchronous APIs in CRM and marketing tools

Grouparoo

When integrating with Destinations , there are generally two main approaches made available by API providers: single or batched. With the "single" approach, one API request usually affects a single profile in the destination. The "batched" approach, which you can read more about here , allows you to affect multiple profiles in a single API request.

Process 52
article thumbnail

Auditing to external systems in CDP Private Cloud Base

Cloudera

Cloudera is trusted by regulated industries and Government organisations around the world to store and analyze petabytes of highly sensitive or confidential information about people, healthcare data, financial data or just proprietary information sensitive to the customer itself. Anybody who is storing customer information, healthcare, financial or sensitive proprietary information will need to ensure they are taking steps to protect that data and that includes detecting and preventing inadverte

Systems 73
article thumbnail

Shorten time to critical insights with Streaming SQL

Cloudera

Data and analytics have become second nature to most businesses, but merely having access to the vast volumes of data from these devices will no longer suffice. Leading enterprises realize that the speed of data presents a new frontier for competitive differentiation. It is imperative for organizations to reduce time-to-insights to gain a competitive advantage by responding decisively to competitors, fine-tuning operations, and serving fickle customers. .

SQL 71
article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

Democratizing Data Through Search and Natural Language Processing in Cloudera Data Visualization

Cloudera

Since the release of Cloudera Data Visualization (DV) back in Oct 2020 , our primary mission has been to expand access to data analytics and predictive insights across enterprise businesses. Since that launch, we’ve worked tirelessly to deliver best-in-class data visualization, dashboarding, and predictive applications capabilities across our cloud and on-premises infrastructures through Cloudera’s machine learning and data warehousing products — all without additional costs, moving data or pur

Process 67