Sat.Jul 31, 2021 - Fri.Aug 06, 2021

article thumbnail

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

Data Engineering Podcast

Summary Data lake architectures have largely been biased toward batch processing workflows due to the volume of data that they are designed for. With more real-time requirements and the increasing use of streaming data there has been a struggle to merge fast, incremental updates with large, historical analysis. Vinoth Chandar helped to create the Hudi project while at Uber to address this challenge.

Data Lake 130
article thumbnail

How Uber Achieves Operational Excellence in the Data Quality Experience

Uber Engineering

Uber delivers efficient and reliable transportation across the global marketplace, which is powered by hundreds of services, machine learning models, and tens of thousands of datasets. While growing rapidly, we’re also committed to maintaining data quality, as it can greatly … The post How Uber Achieves Operational Excellence in the Data Quality Experience appeared first on Uber Engineering Blog.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Minimizing Supply Chain Disruptions with Advanced Analytics

Cloudera

Minimizing Supply Chain Disruptions . January 2020 is a distant memory, but for most, the early days of the pandemic was a time that will be ingrained in memories for decades, if not generations. Over the last 18 months, supply chain issues have dominated our nightly news, social feeds and family conversations at the dinner table. Some but not all have stemmed from the pandemic. .

article thumbnail

What is a Data Mesh?

DataKitchen

The data mesh design pattern breaks giant, monolithic enterprise data architectures into subsystems or domains, each managed by a dedicated team. With an architecture comprised of numerous domains, enterprises need to manage order-of-operations issues, inter-domain communication, and shared services like environment creation and meta-orchestration. A DataOps superstructure provides the foundation to address the many challenges inherent in operating a group of interdependent domains.

article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

Data Discovery From Dashboards To Databases With Castor

Data Engineering Podcast

Summary Every organization needs to be able to use data to answer questions about their business. The trouble is that the data is usually spread across a wide and shifting array of systems, from databases to dashboards. The other challenge is that even if you do find the information you are seeking, there might not be enough context available to determine how to use it or what it means.

Database 100
article thumbnail

The New One-Stop Shop for Learning Apache Kafka

Confluent

Today, I’m very excited to announce an all-new website dedicated to Apache Kafka®, event streaming, and associated cloud technologies. The site is called Confluent Developer, and it represents a significant […].

Kafka 84

More Trending

article thumbnail

7 Best Practices to Use While Annotating Images

AltexSoft

This is a guest article by tech writer Melanie Johnson. No matter how big or small your machine learning (ML) project might be, the overall output depends on the quality of data used to train the ML models. Data annotation plays a pivotal role in the process. And as we know it, it’s the process of marking machine-recognizable content using computer vision, or through natural language processing (NLP) in different formats, including texts, images, and videos.

article thumbnail

How a Supply Chain Digital Hub Can Drive Post-Pandemic Supply Chain Resiliency

Teradata

A Supply Chain Data Hub provides a model-driven set of data objects with maximum data reuse, minimum technical debt, lower cost to build and faster time to market. Find out more.

article thumbnail

The Weekly ETL: How Do You “Thin Slice” a Data Pipeline?

Monte Carlo

In Monte Carlo’s Weekly ETL (Explanations Through Lior) series, Lior Gavish, Monte Carlo’s co-founder, and CTO answers a trending question on Reddit about some of data engineering’s hottest topics. Reddit thread can be found here Reddit user /treacherous_tim asks how do you “thin slice” a data pipeline and if anyone has faced this challenge before? First, I think it’s great that data engineers are now following best practices from DevOps and software engineering, in this case, starting wit

article thumbnail

Accelerating Insight and Uptime: Predictive Maintenance

Cloudera

Historically, maintenance has been driven by a preventative schedule. Today, preventative maintenance, where actions are performed regardless of actual condition, is giving way to Predictive, or Condition-Based, maintenance, where actions are based on actual, real-time insights into operating conditions. While both are far superior to traditional Corrective maintenance (action only after a piece of equipment fails), Predictive is by far the most effective.

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

Real-time: a fresh approach to data lineage

Datakin

Blog A real-time approach to data lineage Written by Ross Turk on August 5, 2021 A data ecosystem that spans multiple pipelines, teams, and platforms can be overwhelming. Each dataset and job exists in a unique operational context, with interdependencies that may seem simple…until they multiply. Every tiny piece has something in common, though: when it breaks, it becomes the most important thing to everyone you know.

article thumbnail

Building Data Factories to Create Thousands of Data Products

Teradata

The pressure to integrate analytics & machine learning into the automotive business is unrelenting. Find out what the auto industry needs to deliver on its digital promise.

article thumbnail

Data Engineering Annotated Monthly – July 2021

Big Data Tools

August is a good time to start new things – some people are on vacation and have more spare time to read than usual, while others are back and looking for a quick refresher on what’s new in data engineering. We’re launching this Annotated series to find interesting and useful content on different topics around data engineering, such as news, technical articles, tools, future conferences, and more.

article thumbnail

Building a Modern Data Architecture for the 2020s

DataKitchen

The post Building a Modern Data Architecture for the 2020s first appeared on DataKitchen.

article thumbnail

How to Build an Experimentation Culture for Data-Driven Product Development

Speaker: Margaret-Ann Seger, Head of Product, Statsig

Experimentation is often seen as an aspirational practice, especially at smaller, fast-moving companies who are strapped for time and resources. So, how can you get your team making decisions in a more data-driven way while continuing to remain lean and maintaining ship velocity? In this webinar, Margaret-Ann Seger, Head of Product at Statsig, will teach you how to build an experimentation culture from the ground-up, graduating from just getting started with data-driven development to operating

article thumbnail

Real-Time Data Ingestion: Snowflake, Snowpipe and Rockset

Rockset

Organizations that depend on data for their success and survival need robust, scalable data architecture, typically employing a data warehouse for analytics needs. Snowflake is often their cloud-native data warehouse of choice. With Snowflake, organizations get the simplicity of data management with the power of scaled-out data and distributed processing.

article thumbnail

Challenging Old Assumptions

Teradata

Cost income ratios in traditional banks remain untenably high. What’s required is a thorough analysis of the overall operating model to improve both sides of the cost income equation.

Banking 52
article thumbnail

Data Engineering Annotated Monthly – July 2021

Big Data Tools

August is a good time to start new things – some people are on vacation and have more spare time to read than usual, while others are back and looking for a quick refresher on what’s new in data engineering. We’re launching this Annotated series to find interesting and useful content on different topics around data engineering, such as news, technical articles, tools, future conferences, and more.

article thumbnail

Writing our Golden Path

Eventbrite Engineering

In my last blog post I explained how we defined our 3-year technical vision for the company. One of the key pillars of this vision is shifting from a model where we used the same tool for every job (mostly a combination of Python + Django + MySQL), to the right tool(s) for each job. … Continue reading "Writing our Golden Path" The post Writing our Golden Path appeared first on Engineering Blog.

MySQL 40
article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

The Data Janitor Letters - June 2021

Pipeline Data Engineering

Data engineering salon. News and interesting reads about the world of data. The Analytics Engineering Guide dbt Labs Collaborating as a data team to produce excellent datasets -- some parts are b t, but it's an interesting read. Welcome to Snowpark: New Data Programmability for the Data Cloud Isaac Kunen, Senior Product Manager, Snowflake Two words: Java functions.

Kafka 40
article thumbnail

Churn Prediction With BigQueryML to Increase Mobile Game Revenue

RudderStack

Torpedo Labs leveraged RudderStack and BigQuery ML to increase revenue for Wynn Casino’s Wynn Slots app to the tune of $10,000 a day by reducing customer churn.

40
article thumbnail

Pillars of Knowledge, Best Practices for Data Governance

Cloudera

Author Chris J. Preimesberger is Editor Emeritus of eWEEK. With hackers now working overtime to expose business data or implant ransomware processes, data security is largely IT managers’ top priority. And if data security tops IT concerns, data governance should be their second priority. Not only is it critical to protect data, but data governance is also the foundation for data-driven businesses and maximizing value from data analytics.

article thumbnail

20 Artificial Intelligence Project Ideas for Beginners to Practice

ProjectPro

Artificial Intelligence has made a significant impact on our daily lives. Every time you scroll through social media, open Spotify, or do a quick Google search, you are using an application of AI. The AI industry has expanded massively in the past few years and is predicted to grow even further, reaching around 126 billion U.S. dollars by 2025. Multinational companies like IBM, Accenture, and Apple are actively hiring AI practitioners.

Project 52
article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

How Airbnb Built “Wall” to prevent data bugs

Airbnb Tech

Gaining trust in data with extensive data quality, accuracy and anomaly checks As shared in our Data Quality Initiative post , Airbnb has embarked on a project of massive scale to ensure trustworthy data across the company. To enable employees to make faster decisions with data and provide better support for business metric monitoring, we introduced Midas , an analytical data certification process that certifies all important metrics and data sets.

article thumbnail

ETL vs ELT Explained

Grouparoo

The mission of many data teams is a very simple one. They seek to use data to help the business take smarter actions. The input is raw data from everywhere that touches the business. This includes many external sources, its own products, and various systems used for marketing, sales, and operations. The outputs often take the form of analysis, insights, models, and other usable mediums.

article thumbnail

Choosing Your Upgrade or Migration Path to Cloudera Data Platform

Cloudera

In our previous blog, we talked about the four paths to Cloudera Data Platform. . In-place Upgrade. Sidecar Migration. Rolling Sidecar Migration. Migrating to Cloud. If you haven’t read that yet, we invite you to take a moment and run through the scenarios in that blog. The four strategies will be relevant throughout the rest of this discussion. Today, we’ll discuss an example of how you might make this decision for a cluster using a “round of elimination” process based on our decision workflow.

Finance 119
article thumbnail

Designing and Architecting the Confluent CLI

Confluent

It is often difficult enough to build one application that talks to a single middleware or backend layer; e.g., a whole team of frontend engineers may build a web application […].

Designing 116
article thumbnail

Reimagined: Building Products with Generative AI

“Reimagined: Building Products with Generative AI” is an extensive guide for integrating generative AI into product strategy and careers featuring over 150 real-world examples, 30 case studies, and 20+ frameworks, and endorsed by over 20 leading AI and product executives, inventors, entrepreneurs, and researchers.

article thumbnail

Data Marts: What They Are and Why Businesses Need Them

AltexSoft

Imagine you run a candy store. Some sweets are presented on your display cases for quick access while the rest is kept in the storeroom. Now let’s think of sweets as the data required for your company’s daily operations. Instead of combing through the vast amounts of all organizational data stored in a data warehouse, you can use a data mart — a repository that makes specific pieces of data available quickly to any given business unit.

article thumbnail

20+ Image Processing Projects Ideas in Python with Source Code

ProjectPro

Perhaps the great French military leader Napolean Bonaparte wasn't too far off when he said, “A picture is worth a thousand words.” Ignoring the poetic value, if just for a moment, the facts have since been established to prove this statement's literal meaning. Humans, the truly visual beings we are, respond to and process visual data better than any other data type.

Coding 40
article thumbnail

Replace and Boost your Apache Storm Topologies with Apache NiFi Flows

Cloudera

Recently, I worked with a large fortune 500 customer on their migration from Apache Storm to Apache NiFi. If you’re asking yourself, “Isn’t Storm for complex event processing and NiFi for simple event processing?”, you’re correct. A few customers chose a complex event engine like Apache Storm for their simple event processing, even when Apache NiFi is the more practical choice, cutting drastically down on SDLC (software development lifecycle) time.

Kafka 119
article thumbnail

Getting Started: Automatic Detection and Alerting for Data Incidents with Monte Carlo

Monte Carlo

In this series, we highlight the critical steps your business must follow when building a data incident management workflow , including incident detection, response, root cause analysis & resolution (RCA), and a blameless post-mortem. Let’s start with incident detection and alerting, your first line of defense against data downtime and broken data pipelines.

article thumbnail

The Big Payoff of Application Analytics

Outdated or absent analytics won’t cut it in today’s data-driven applications – not for your end users, your development team, or your business. That’s what drove the five companies in this e-book to change their approach to analytics. Download this e-book to learn about the unique problems each company faced and how they achieved huge returns beyond expectation by embedding analytics into applications.