Sat.Sep 18, 2021 - Fri.Sep 24, 2021

article thumbnail

Airflow Trigger Rules: All you need to know!

Marc Lamberti

By default, your tasks get executed once all the parent tasks succeed. this behaviour is what you expect in general. But what if you want something more complex? What if you would like to execute a task as soon as one of its parents succeeds? Or maybe you would like to execute a different set of tasks if a task fails? Or act differently according to if a task succeeds, fails or event gets skipped?

article thumbnail

What’s New in Apache Kafka 3.0.0

Confluent

I’m pleased to announce the release of Apache Kafka 3.0 on behalf of the Apache Kafka® community. Apache Kafka 3.0 is a major release in more ways than one. Apache […].

Kafka 145
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Real-Time Exactly-Once Ad Event Processing with Apache Flink, Kafka, and Pinot

Uber Engineering

Uber recently launched a new capability: Ads on UberEats. With this new ability came new challenges that needed to be solved at Uber, such as systems for ad auctions, bidding, attribution, reporting, and more. This article focuses on how we … The post Real-Time Exactly-Once Ad Event Processing with Apache Flink, Kafka, and Pinot appeared first on Uber Engineering Blog.

Kafka 145
article thumbnail

Supercharge your Airflow Pipelines with the Cloudera Provider Package

Cloudera

Many customers looking at modernizing their pipeline orchestration have turned to Apache Airflow, a flexible and scalable workflow manager for data engineers. With 100s of open source operators, Airflow makes it easy to deploy pipelines in the cloud and interact with a multitude of services on premise, in the cloud, and across cloud providers for a true hybrid architecture. .

Python 101
article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

Start DataOps Today with ‘Lean DataOps’

DataKitchen

Data organizations don’t always have the budget or schedule required for DataOps when conceived as a top-to-bottom, enterprise-wide transformational change. An essential part of the DataOps methodology is Agile Development , which breaks development into incremental steps. DataOps can and should be implemented in small steps that complement and build upon existing workflows and data pipelines.

article thumbnail

Announcing ksqlDB 0.21.0

Confluent

We’re pleased to announce ksqlDB 0.21.0! This release includes a major upgrade to ksqlDB’s foreign-key joins, the new data type BYTES, and a new ARRAY_CONCAT function. All of these features […].

Bytes 140

More Trending

article thumbnail

Telecom Network Analytics: Transformation, Innovation, Automation

Cloudera

One of the most substantial big data workloads over the past fifteen years has been in the domain of telecom network analytics. Where does it stand today? What are its current challenges and opportunities? In a sense, there have been three phases of network analytics: the first was an appliance based monitoring phase; the second was an open-source expansion phase; and the third – that we are in right now – is a hybrid-data-cloud and governance phase.

article thumbnail

Unilever

Teradata

Teradata Vantage on Azure supports 27 business services across supply chain, sales, finance, HR, and more.

Finance 98
article thumbnail

Data Warehousing Basiscs

Data Science Blog: Data Engineering

Data Warehousing is applied Big Data Management and a key success factor in almost every company. Without a data warehouse, no company today can control its processes and make the right decisions on a strategic level as there would be a lack of data transparency for all decision makers. Bigger comanies even have multiple data warehouses for different purposes.

article thumbnail

Datakin is now open to all!

Datakin

Blog Datakin is now open to all! Written by Laurent Paris on Sep 24, 2021 This is it! We’re officially out of beta and excited to announce the general availability of Datakin. Our story began with the creation of Marquez over two years ago. We believed then, and still believe now, that a new approach to data lineage was essential to support today’s pipelines.

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

Speed Up Your Data Flow for Business Results

Cloudera

A slow car has never won a Formula One race. The Olympics doesn’t reward slow times in swimming, track or any other clock-timed sport. Likewise, slow data speeds don’t win over customers or colleagues in the real-time business world. Microsoft’s own research once reported that a person visiting a website on a connected device is likely to wait no more than 10 seconds to see it before moving to a competitor’s site.

Data 84
article thumbnail

6 Automated Data Capture Methods For Business Development

InData Labs

Today, digitization penetrates all spheres of business. 2.5 quintillion bytes of data that people create every day is predominantly unstructured data. Whether it is audio, video or text, big data – if meticulously collected, recognized, and processed – can generate business value through leveraging state-of-the-art technologies. But no matter how intelligent machines may be, they.

Bytes 52
article thumbnail

AWS Kinesis Firehose and Teradata Vantage

Teradata

Many Teradata customers are interested in integrating Vantage with Amazon AWS First Party Services. This Getting Started Guide will help you to connect Vantage with AWS Kinesis service.

AWS 52
article thumbnail

Bob Muglia, former Snowflake CEO, to Speak at IMPACT, the World’s First Data Observability Summit

Monte Carlo

Today, we’re thrilled to announce that Bob Muglia , entrepreneur, Fivetran board member, and former CEO of Snowflake, and DJ Patil, the first U.S. Chief Data Scientist, will speak at IMPACT: The Data Observability Summit. Muglia’s fireside chat with Monte Carlo CEO Barr Moses will cap off the event, and touch on such topics as the rise of data in the cloud, challenges and opportunities in the current tooling landscape, and his vision for the future of data engineering and analytics.

article thumbnail

How to Build an Experimentation Culture for Data-Driven Product Development

Speaker: Margaret-Ann Seger, Head of Product, Statsig

Experimentation is often seen as an aspirational practice, especially at smaller, fast-moving companies who are strapped for time and resources. So, how can you get your team making decisions in a more data-driven way while continuing to remain lean and maintaining ship velocity? In this webinar, Margaret-Ann Seger, Head of Product at Statsig, will teach you how to build an experimentation culture from the ground-up, graduating from just getting started with data-driven development to operating

article thumbnail

Partnerships that Enrich Solutions: a Spotlight Interview with Dell Enterprise Germany’s General Manager, Benjamin Krebs

Cloudera

During this Partner Perspective interview, Cloudera’s Alvin Heib seizes the opportunity to speak with Benjamin Krebs, General Manager of Technology Enterprise in Germany. The pair discuss Benjamin’s role at Dell, the importance of partnerships in his region, how the pandemic has altered Dell’s working landscape and finally, some predictions Benjamin has on Dell’s future.

article thumbnail

The Data Janitor Letters - August 2021

Pipeline Data Engineering

Data engineering salon. News and interesting reads about the world of data. From Data Driven to Driving Data — The dysfunctions of Data Engineering MrTrustworthy Many “data driven” initiatives are failing even though they had the best engineers on the task and picked the “best” stack of technologies. What's an OLAP cube? ? Claire Carroll, Analytics Engineer, analyticsengineers.club OLAP cubes were this intimidating concept, and the more they read, the less they understood, but it turns out that

Hadoop 52
article thumbnail

Flexibility and Resiliency Across the Supply Chain

Teradata

The supply chain is not just the sum of its parts. Each function, organization, decision & action are connected & have an effect on each part of the supply chain. Find out more.

IT 52
article thumbnail

Streaming Events From Salesforce for Lead Enrichment With RudderStack’s Webhook Source

RudderStack

How to use a webhook to stream new ‘lead created’ events from Salesforce through Rudderstack for lead enrichment w/ Clearbit data then back to Salesforce.

Data 40
article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

An Exploration Of The Data Engineering Requirements For Bioinformatics

Data Engineering Podcast

Summary Biology has been gaining a lot of attention in recent years, even before the pandemic. As an outgrowth of that popularity, a new field has grown up that pairs statistics and compuational analysis with scientific research, namely bioinformatics. This brings with it a unique set of challenges for data collection, data management, and analytical capabilities.

article thumbnail

Netflix Cloud Packaging in the Terabyte Era

Netflix Tech

By Xiaomei Liu , Rosanna Lee , Cyril Concolato Introduction Behind the scenes of the beloved Netflix streaming service and content, there are many technology innovations in media processing. Packaging has always been an important step in media processing. After content ingestion, inspection and encoding, the packaging step encapsulates encoded video and audio in codec agnostic container formats and provides features such as audio video synchronization, random access and DRM protection.

Cloud 94
article thumbnail

How We Improved the Concurrency and Scalability of Our Redis Rate Limiting System

Rockset

Background Rate limiting is a technique used to protect services from overload. In addition, it can be used to prevent starvation of a multi-tenant resource by a few very large customers. At Rockset, we primarily use rate limiting to protect our: metadata store from overload caused by too many API requests. log store from filling up due to mismatched input and output rates control plane from too many state transitions.

Systems 52
article thumbnail

RudderStack Product News Vol. #013 - Destinations Re-design and New Integrations

RudderStack

In this update, we share a new UI update along with several new integrations and highlight our recent blog series on migrating from Segment.

article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

Massively Parallel Data Processing In Python Without The Effort Using Bodo

Data Engineering Podcast

Summary Python has beome the de facto language for working with data. That has brought with it a number of challenges having to do with the speed and scalability of working with large volumes of information.There have been many projects and strategies for overcoming these challenges, each with their own set of tradeoffs. In this episode Ehsan Totoni explains how he built the Bodo project to bring the speed and processing power of HPC techniques to the Python data ecosystem without requiring any

article thumbnail

Data Observability: Five Quick Ways to Improve the Reliability of Your Data

Monte Carlo

If your data breaks, does it make a sound? Odds are, the answer is yes. But will you hear it? Probably not. Nowadays, organizations ingest large amounts of data across increasingly complex ecosystems, and very often their data breaks silently, and as a result data teams are left in the dark – until it’s too late. But, if said data is a report used by your Chief Revenue Officer to determine next quarter’s forecast, chances are this data will make a very, very large sound.

BI 52
article thumbnail

Apache Kafka Deployments and Systems Reliability – Part 1

Cloudera

There are many ways that Apache Kafka has been deployed in the field. In our Kafka Summit 2021 presentation, we took a brief overview of many different configurations that have been observed to date. In this blog series, we will discuss each of these deployments and the deployment choices made along with how they impact reliability. In Part 1, the discussion is related to: Serial and Parallel Systems Reliability as a concept, Kafka Clusters with and without Co-Located Apache Zookeeper, and Kafka

Kafka 117
article thumbnail

Dogfooding at RudderStack: Tracking Plans Part 1

RudderStack

Read about RudderStack’s API-first Tracking Plans feature and how you can leverage it to build data trust.

article thumbnail

Reimagined: Building Products with Generative AI

“Reimagined: Building Products with Generative AI” is an extensive guide for integrating generative AI into product strategy and careers featuring over 150 real-world examples, 30 case studies, and 20+ frameworks, and endorsed by over 20 leading AI and product executives, inventors, entrepreneurs, and researchers.

article thumbnail

Declarative Machine Learning Without The Operational Overhead Using Continual

Data Engineering Podcast

Summary Building, scaling, and maintaining the operational components of a machine learning workflow are all hard problems. Add the work of creating the model itself, and it’s not surprising that a majority of companies that could greatly benefit from machine learning have yet to either put it into production or see the value. Tristan Zajonc recognized the complexity that acts as a barrier to adoption and created the Continual platform in response.

article thumbnail

ACID vs BASE Concepts

Data Science Blog: Data Engineering

Understanding databases for storing, updating and analyzing data requires the understanding of two concepts: ACID and BASE. This is the first article of the article series Data Warehousing Basics. The properties of ACID are being applied for databases in order to fulfill enterprise requirements of reliability and consistency. ACID is an acronym, and stands for: Atomicity – Each transaction is either properly executed completely or does not happen at all.

NoSQL 52
article thumbnail

High Quality, Dynamic Images in Power BI

FreshBI

Dynamic Images in Power BI Power BI has an awesome feature where you can define column category types. This allows you to define all values in a column as image URLs. From there, you can use publicly hosted image URLs to populate that column to dynamically view images in Power BI. An example of what this would look like is one column for fruit names and the other for image URLs, using the fruit name column as a slicer to dynamically pick which fruit is displayed.

BI 52
article thumbnail

Your Guide to Creating a Warehouse-First Data Analytics Stack

RudderStack

Here’s a practical guide to the warehouse-first data analytics stack.

article thumbnail

The Big Payoff of Application Analytics

Outdated or absent analytics won’t cut it in today’s data-driven applications – not for your end users, your development team, or your business. That’s what drove the five companies in this e-book to change their approach to analytics. Download this e-book to learn about the unique problems each company faced and how they achieved huge returns beyond expectation by embedding analytics into applications.