August, 2021

article thumbnail

Understand & Deliver on Your Data Engineering Task

Start Data Engineering

1. Introduction 2. Understanding your data engineering task 2.1. Data infrastructure overview 2.2. What exactly 2.3. Why exactly 2.4. Current state 2.5. Downstream impact 3. Delivering your data engineering task 3.1. How 3.2. Breakdown into sub-tasks 3.3. Delivering the finished task 4. Conclusion 5. Further reading 1. Introduction Congratulations! You are given a quick overview of the business and data architecture and are assigned your very first data engineering task.

article thumbnail

Build Trust In Your Data By Understanding Where It Comes From And How It Is Used With Stemma

Data Engineering Podcast

Summary All of the fancy data platform tools and shiny dashboards that you use are pointless if the consumers of your analysis don’t have trust in the answers. Stemma helps you establish and maintain that trust by giving visibility into who is using what data, annotating the reports with useful context, and understanding who is responsible for keeping it up to date.

IT 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

How Uber Achieves Operational Excellence in the Data Quality Experience

Uber Engineering

Uber delivers efficient and reliable transportation across the global marketplace, which is powered by hundreds of services, machine learning models, and tens of thousands of datasets. While growing rapidly, we’re also committed to maintaining data quality, as it can greatly … The post How Uber Achieves Operational Excellence in the Data Quality Experience appeared first on Uber Engineering Blog.

article thumbnail

How ksqlDB Works: Internal Architecture and Advanced Features

Confluent

To effectively use ksqlDB, the streaming database for Apache Kafka®, you should of course be familiar with its features and syntax. However, a deeper understanding of what goes on underneath […].

article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

Natural Language Processing: A Guide to NLP Use Cases, Approaches, and Tools

AltexSoft

Humans have been trying to make machines chat for decades. Alan Turing considered computers’ ability to generate natural speech a proof of their ability to think. Today, we converse with virtual companions all the time. But despite years of research and innovation, their unnatural responses remind us that no, we’re not yet at the HAL 9000-level of speech sophistication.

Process 139
article thumbnail

A ‘Fresh Squeeze on Data’ to Help Children Learn about Data, AI and Machine Learning

Cloudera

Dear Parents and Educators and Friends of Cloudera, If you are reading this blog, you know us at Cloudera as a group of self-described data geeks and data analysts. We believe data drives better decisions and moves businesses forward and for us, that’s exciting. We are innovating and helping Fortune 500 transform and grow because they can make better data-driven decisions at the accelerated pace we live and work in today.

More Trending

article thumbnail

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

Data Engineering Podcast

Summary Data lake architectures have largely been biased toward batch processing workflows due to the volume of data that they are designed for. With more real-time requirements and the increasing use of streaming data there has been a struggle to merge fast, incremental updates with large, historical analysis. Vinoth Chandar helped to create the Hudi project while at Uber to address this challenge.

Data Lake 130
article thumbnail

How DataOps is Transforming Commercial Pharma Analytics

DataKitchen

DataOps has become an essential methodology in pharmaceutical enterprise data organizations, especially for commercial operations. Companies that implement it well derive significant competitive advantage from their superior ability to manage and create value from data. They will be able to produce high-quality, on-demand insight that consistently leads to successful business decisions.

article thumbnail

Announcing Elastic Data Streams Support for Confluent’s Elasticsearch Sink Connector

Confluent

Today, as part of our expanded partnership with Elastic, we are announcing an update to the fully managed Elasticsearch Sink Connector in Confluent Cloud. This update allows you to take […].

Cloud 119
article thumbnail

The Power of Path Analysis

Teradata

For both analysts and data scientists, identifying paths and patterns in data is a valuable way to gain insight into the occurrences leading to or from any event of interest. Read more.

Data 98
article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

Apache Ozone Powers Data Science in CDP Private Cloud

Cloudera

Apache Ozone is a scalable distributed object store that can efficiently manage billions of small and large files. Ozone natively provides Amazon S3 and Hadoop Filesystem compatible endpoints in addition to its own native object store API endpoint and is designed to work seamlessly with enterprise scale data warehousing, machine learning and streaming workloads.

article thumbnail

Towards a Reliable Device Management Platform

Netflix Tech

By Benson Ma , Alok Ahuja Introduction At Netflix, hundreds of different device types, from streaming sticks to smart TVs, are tested every day through automation to ensure that new software releases continue to deliver the quality of the Netflix experience that our customers enjoy. In addition, Netflix continuously works with its partners (such as Roku, Samsung, LG, Amazon) to port the Netflix SDK to their new and upcoming devices (TVs, smart boxes, etc), to ensure the quality bar is reached be

article thumbnail

Do Away With Data Integration Through A Dataware Architecture With Cinchy

Data Engineering Podcast

Summary The reason that so much time and energy is spent on data integration is because of how our applications are designed. By making the software be the owner of the data that it generates, we have to go through the trouble of extracting the information to then be used elsewhere. The team at Cinchy are working to bring about a new paradigm of software architecture that puts the data as the central element.

article thumbnail

Implementing a Pharma Data Mesh using DataOps

DataKitchen

Below is our fourth post (4 of 5) on combining data mesh with DataOps to foster innovation while addressing the challenges of a decentralized architecture. We’ve covered the basic ideas behind data mesh and some of the difficulties that must be managed. Below is a discussion of a data mesh implementation in the pharmaceutical space. For those embarking on the data mesh journey, it may be helpful to discuss a real-world example and the lessons learned from an actual data mesh implementation.

article thumbnail

How to Build an Experimentation Culture for Data-Driven Product Development

Speaker: Margaret-Ann Seger, Head of Product, Statsig

Experimentation is often seen as an aspirational practice, especially at smaller, fast-moving companies who are strapped for time and resources. So, how can you get your team making decisions in a more data-driven way while continuing to remain lean and maintaining ship velocity? In this webinar, Margaret-Ann Seger, Head of Product at Statsig, will teach you how to build an experimentation culture from the ground-up, graduating from just getting started with data-driven development to operating

article thumbnail

Driving New Integrations with Confluent and ksqlDB at ACERTUS

Confluent

When companies need help with their vehicle fleets—including transport, storage, or renewing expired registrations—they don’t want to have to deal with multiple vehicle logistics providers. For these companies, ACERTUS provides […].

article thumbnail

Back to School! Time to Ditch the Promotions Calendar?

Teradata

As Back to School promotions hit the shelves, Christmas & New Year offers are already locked in. Are these long-lead cycles still effective in today’s dynamic Retail & CPG environment?

Retail 98
article thumbnail

The Foundations of a Modern Data-Driven Organisation: Change from Within (part 2 of 2)

Cloudera

In my previous blog post, I shared examples of how data provides the foundation for a modern organization to understand and exceed customers’ expectations. However, the important role data occupies extends beyond customer experience and revenue, as it becomes increasingly central in optimizing internal processes for the long-term growth of an organization.

article thumbnail

Predictive Lead Scoring: Discovering Best-Fit Prospects with Machine Learning

AltexSoft

B2B sales strategies can be roughly divided into two activities: lead generation and lead conversion. It’s clear how each works. The former, attracting visitors to your website and then helping them take certain actions, is almost automated and works through carefully placed calls to action. The latter, supporting a lead to make the purchasing decision, is done by professional sales people with their arsenal of personalized tactics.

article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

Let Your Analysts Build A Data Lakehouse With Cuelake

Data Engineering Podcast

Summary Data lakes have been gaining popularity alongside an increase in their sophistication and usability. Despite improvements in performance and data architecture they still require significant knowledge and experience to deploy and manage. In this episode Vikrant Dubey discusses his work on the Cuelake project which allows data analysts to build a lakehouse with SQL queries.

Building 100
article thumbnail

Addressing Data Mesh Technical Challenges with DataOps

DataKitchen

Below is our third post (3 of 5) on combining data mesh with DataOps to foster greater innovation while addressing the challenges of a decentralized architecture. We’ve talked about data mesh in organizational terms (see our first post, “ What is a Data Mesh? ”) and how team structure supports agility. Let’s take a look at some technical aspects of data mesh so we can work our way towards a pharmaceutical industry application example. .

article thumbnail

Announcing the Confluent Q3 ’21 Release

Confluent

The Confluent Q3 ‘21 release is here and packed full of new features that enable the world’s most innovative businesses to continue building what keeps them on top: real-time, mission-critical […].

Building 104
article thumbnail

Mitsui Sumitomo Insurance Co., Ltd.

Teradata

Vantage on AWS supports Next Best Action efforts - adding new supplemental coverage on policy renewals at a rate of 250%.

article thumbnail

The Big Payoff of Application Analytics

Outdated or absent analytics won’t cut it in today’s data-driven applications – not for your end users, your development team, or your business. That’s what drove the five companies in this e-book to change their approach to analytics. Download this e-book to learn about the unique problems each company faced and how they achieved huge returns beyond expectation by embedding analytics into applications.

article thumbnail

Minimizing Supply Chain Disruptions with Advanced Analytics

Cloudera

Minimizing Supply Chain Disruptions . January 2020 is a distant memory, but for most, the early days of the pandemic was a time that will be ingrained in memories for decades, if not generations. Over the last 18 months, supply chain issues have dominated our nightly news, social feeds and family conversations at the dinner table. Some but not all have stemmed from the pandemic. .

article thumbnail

Flight Price Predictor: Training Models to Pinpoint the Best Time for Booking

AltexSoft

Pricing in the airline industry is often compared to a brain game between carriers and passengers where each party pursues the best rates. Carriers aim at selling tickets as expensive as possible — while still not losing consumers to competitors. Passengers want to buy flights at the lowest cost — while not missing the chance to get on board. All this makes flight prices fluctuant and hard to predict.

article thumbnail

Prepare Your Unstructured Data For Machine Learning And Computer Vision Without The Toil Using Activeloop

Data Engineering Podcast

Summary The vast majority of data tools and platforms that you hear about are designed for working with structured, text-based data. What do you do when you need to manage unstructured information, or build a computer vision model? Activeloop was created for exactly that purpose. In this episode Davit Buniatyan, founder and CEO of Activeloop, explains why he is spending his time and energy on building a platform to simplify the work of getting your unstructured data ready for machine learning.

article thumbnail

What is a Data Mesh?

DataKitchen

The data mesh design pattern breaks giant, monolithic enterprise data architectures into subsystems or domains, each managed by a dedicated team. With an architecture comprised of numerous domains, enterprises need to manage order-of-operations issues, inter-domain communication, and shared services like environment creation and meta-orchestration. A DataOps superstructure provides the foundation to address the many challenges inherent in operating a group of interdependent domains.

article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

Implement a Cross-Platform Apache Kafka Producer and Consumer with C# and.NET

Confluent

Sometimes you’d like to write your own code for producing data to an Apache Kafka® topic and connecting to a Kafka cluster programmatically. Confluent provides client libraries for several different […].

Kafka 98
article thumbnail

Chugai Pharmaceutical

Teradata

Accelerating drug discovery and development with Teradata Vantage on AWS.

article thumbnail

Sharpening Cloudera’s Cloud Focus in Asia Pacific and Japan

Cloudera

Cloudera recently appointed a Cloud Director for Asia Pacific (APAC), Stevie Walsh, to help drive our hybrid and multi-cloud offerings in the region, supporting our customers in accelerating their digital transformation journey. We’ve asked her to share her cloud vision for Cloudera in APAC and the exciting plans that she has in her new position. What drew you to work in the cloud space?

Cloud 105
article thumbnail

7 Best Practices to Use While Annotating Images

AltexSoft

This is a guest article by tech writer Melanie Johnson. No matter how big or small your machine learning (ML) project might be, the overall output depends on the quality of data used to train the ML models. Data annotation plays a pivotal role in the process. And as we know it, it’s the process of marking machine-recognizable content using computer vision, or through natural language processing (NLP) in different formats, including texts, images, and videos.

article thumbnail

Driving Business Impact for PMs

Speaker: Jon Harmer, Product Manager for Google Cloud

Move from feature factory to customer outcomes and drive impact in your business! This session will provide you with a comprehensive set of tools to help you develop impactful products by shifting from output-based thinking to outcome-based thinking. You will deepen your understanding of your customers and their needs as well as identifying and de-risking the different kinds of hypotheses built into your roadmap.