Sat.May 08, 2021 - Fri.May 14, 2021

article thumbnail

How to make data pipelines idempotent

Start Data Engineering

What is an idempotent function Pre-requisites Why idempotency matters Making your data pipeline idempotent Conclusion Further reading References What is an idempotent function “Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application” - wikipedia Defined as f(f(x)) = f(x) In the data engineering context, this can come to mean that: running a data pipeline

article thumbnail

Introducing Confluent for Kubernetes

Confluent

We are excited to announce that Confluent for Kubernetes is generally available! Today, we are enabling our customers to realize many of the benefits of our cloud service with the […].

Cloud 137
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Making Analytical APIs Fast With Tinybird

Data Engineering Podcast

Summary Building an API for real-time data is a challenging project. Making it robust, scalable, and fast is a full time job. The team at Tinybird wants to make it easy to turn a continuous stream of data into a production ready API or data product. In this episode CEO Jorge Sancha explains how they have architected their system to handle high data throughput and fast response times, and why they have invested heavily in Clickhouse as the core of their platform.

article thumbnail

Automating CDP Private Cloud Installations with Ansible

Cloudera

The introduction of CDP Public Cloud has dramatically reduced the time in which you can be up and running with Cloudera’s latest technologies, be it with containerised Data Warehouse , Machine Learning , Operational Database or Data Engineering experiences or the multi-purpose VM-based Data Hub style of deployment. In CDP Private Cloud, the introduction of Cloudera Data Warehouse and Cloudera Machine Learning Experiences on RedHat OpenShift Kubernetes clusters means that we can deploy new

Cloud 105
article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

Beyond Resilience-The Next Generation of Supply Chain

Teradata

After the shock of COVID exposed the brittle nature of many global supply chains, focus has shifted to resilience, a necessary consideration but not the only one.

64
article thumbnail

Using kafka-merge-purge to Deal with Failure in an Event-Driven System at FLYERALARM

Confluent

Failures are inevitable in any system, and there are various options for mitigating them automatically. This is made possible by event-driven applications leveraging Apache Kafka® and built with fault tolerance […].

Kafka 95

More Trending

article thumbnail

5 Factors to Consider When Choosing a Stream Processing Engine

Cloudera

Are you using the right stream processing engine for the job at hand? You might think you are—and you very well might be!—but have you really examined the stream processing engines out there in a side-by-side comparison to make sure? Our Choose the Right Stream Processing Engine for Your Data Needs whitepaper makes those comparisons for you, so you can quickly and confidently determine which engine best meets your key business requirements.

Process 102
article thumbnail

Open Banking is Transforming Financial Services and Chipping Away the Relevance of Traditional Banks

Teradata

The sharing of client data in an Open Banking marketplace challenges banks to adopt a customer-centric approach & collaborate with new players to re-define their relevance.

Banking 59
article thumbnail

Responsive Mega Menu Using React Bootstrap

Grouparoo

Having clear and accessible navigation is huge for website conversions. Sites with poor navigation are frustrating to use. Nested navigation menus are a common way to help keep top-level navigation to a minimum, but they can have major usability issues. A better way to handle a large number of links in a dropdown is to create a mega menu. Recently, we gave our site navigation a face lift using mega menus.

Media 52
article thumbnail

Building Data Applications Powered by Real-Time Analytics

Rockset

For long-term success with real-time analytics it is important to use the right tool for the job. Data applications are an emerging breed of applications that demand sub-second analytics on fresh data. Examples include logistics tracking, gaming leaderboards, investment decisions systems, connected devices and embedded dashboards in SaaS apps. Real-time analytics is all about using data as soon as it is produced to answer questions, make predictions, understand relationships, and automate proces

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

cdpcurl: Low-Level CDP API Access

Cloudera

Cloudera Data Platform (CDP) provides an API that enables you to access CDP functionality from a script, or to integrate CDP features with an application. In practice you can use the CDP API to script repetitive tasks, manage CDP resources, or even create custom applications. You can learn more about the API in its official documentation. There are multiple ways to access the API, including through a dedicated CLI , through a Java SDK , and through a low-level tool called cdpcurl. cdpcurl is des

article thumbnail

Forrester – Chart Your Course To Insights-Driven Business Maturity

DataKitchen

As organizations strive to become more data-driven, Forrester recommends 5 actions to take to move from one stage of insights-driven business maturity to another. . After establishing a solid strategy, the second phase involves planning key processes and practices to support the strategy, including “the emerging and increasingly important DataOps and ModelOps processes and methodologies.”.

article thumbnail

Data Pipelining Mailchimp and Google Sheets

Grouparoo

We've improved the Getting Started Experience! Check out our UI Configuration method. The steps utilizing grouparoo generate will not be replicable as the command will be fully deprecated in v0.8.1 Web Developer Dylan : Hey there Mama's Travel, are you enjoying your new website? Client : Absolutely! There's just one more thing: I need a way to subscribe new people to my mailing list manually.

article thumbnail

Data.What? Why You Should Keep Doing Data Integration

Teradata

Data integration plays a key part of data management. But many enterprises have lost the faith in the value it can provide. Find out why data integration still matters.

article thumbnail

How to Build an Experimentation Culture for Data-Driven Product Development

Speaker: Margaret-Ann Seger, Head of Product, Statsig

Experimentation is often seen as an aspirational practice, especially at smaller, fast-moving companies who are strapped for time and resources. So, how can you get your team making decisions in a more data-driven way while continuing to remain lean and maintaining ship velocity? In this webinar, Margaret-Ann Seger, Head of Product at Statsig, will teach you how to build an experimentation culture from the ground-up, graduating from just getting started with data-driven development to operating

article thumbnail

Announcing the 2021 Data Impact Awards

Cloudera

2020 saw us hosting our first ever fully digital Data Impact Awards ceremony, and it certainly was one of the highlights of our year. We saw a record number of entries and incredible examples of how customers were using Cloudera’s platform and services to unlock the power of data. Each year, taking a moment to celebrate successes provides us with a wonderful opportunity to reflect on the incredible work we do together.

Food 70
article thumbnail

DataKitchen’s Chris Bergh Reveals the Steps for Enterprise DataOps Success at Data Summit Connect 2021

DataKitchen

The post DataKitchen’s Chris Bergh Reveals the Steps for Enterprise DataOps Success at Data Summit Connect 2021 first appeared on DataKitchen.

Data 52
article thumbnail

Change the Primary Key Type with Sequelize

Grouparoo

We recently adjusted how we handle primary keys. Previously they were UUIDs with a max length of 40 characters. With our Declarative Sync feature, we allow developers to set primary key values from their configuration files. Thus, we needed to lengthen the maximum number of characters allowed on primary keys in our database. Seems simple, right? I thought so, too.

article thumbnail

Find and Replace Text with SQL Regular Expressions in Rockset

Rockset

In our first blog , we used a regular expression to replace the quotes in genres. Afterward, we were able to UNNEST() the JSON object. We’ll be working with the same data set in this blog In our data: Embedded content: [link] there is a JSON string that’s called spoken_languages, and it’s formatted similarly to genres: [ { "spoken_languages": "[{'iso_639_1': 'fr', 'name': 'Français'}]" }] Assuming everything is consistent, we can just write the SQL statement similar to what we wrote for genres -

SQL 40
article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

RudderStack + Blendo: Better Together

RudderStack

RudderStack acquires Blendo: Kostas Pardalis, the founder of Blendo, talks about why he decided to merge Blendo with RudderStack, building the team, and working on the product together.

article thumbnail

Cloud Migration Series (Step 3 of 5): Assess Readiness

Cloud Academy

This is part 3 of a 5-part series on best practices for enterprise cloud migration. Released weekly from the end of April to the end of May 2021, each article will cover a new phase of a business’s transition to the cloud, what to be on the lookout for, and how to ensure the journey is a success. Be sure to subscribe to our blog to be notified when new content goes live!

Cloud 40
article thumbnail

Data Observability and Monitoring with DataOps

DataKitchen

Data errors impact decision-making. When analytics and dashboards are inaccurate, business leaders may not be able to solve problems and pursue opportunities. Data errors infringe on work-life balance. They cause people to work long hours at the expense of personal and family time. Data errors also affect careers. If you have been in the data profession for any length of time, you probably know what it means to face a mob of stakeholders who are angry about inaccurate or late analytics.

article thumbnail

Computer Vision in Healthcare: Creating an AI Diagnostic Tool for Medical Image Analysis

AltexSoft

Our lungs are the only body organs that constantly interact with the external environment, through the air we breathe. This exposure makes the respiratory system extremely susceptible to a wide range of diseases, from long-familiar asthma to novel COVID-19. Subtle at early stages, the signs of lung conditions are easy to overlook. And delays in diagnosis often lead to harsh consequences.

Medical 72
article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

Why Engineering and IT Need to own the CDP

RudderStack

Why Engineering and IT organizations should own CDP implementation & management. Here, we'll cover all of the benefits to Engineering and IT-led CDP implementation and management.

IT 40
article thumbnail

Building Your Data Warehouse On Top Of PostgreSQL

Data Engineering Podcast

Summary There is a lot of attention on the database market and cloud data warehouses. While they provide a measure of convenience, they also require you to sacrifice a certain amount of control over your data. If you want to build a warehouse that gives you both control and flexibility then you might consider building on top of the venerable PostgreSQL project.

article thumbnail

Accelerate Moving to CDP with Workload Manager

Cloudera

Since my last blog, What you need to know to begin your journey to CDP , we received many requests for a tool from Cloudera to analyze the workloads and help upgrade or migrate to Cloudera Data Platform (CDP). The good news is Cloudera has a tried and tested tool, Workload Manager (WM) that meets your needs. WM saves time and reduces risks during upgrades or migrations.

article thumbnail

Achieving observability in async workflows

Netflix Tech

Written by Colby Callahan , Megha Manohara , and Mike Azar. Managing and operating asynchronous workflows can be difficult without the proper tools and architecture that puts observability, debugging, and tracing at the forefront. Imagine getting paged outside normal work hours?—?users are having trouble with the application you’re responsible for, and you start diving into logs.

Java 63
article thumbnail

Reimagined: Building Products with Generative AI

“Reimagined: Building Products with Generative AI” is an extensive guide for integrating generative AI into product strategy and careers featuring over 150 real-world examples, 30 case studies, and 20+ frameworks, and endorsed by over 20 leading AI and product executives, inventors, entrepreneurs, and researchers.

article thumbnail

Real-Time Personalization with Redis and RudderStack

RudderStack

We break down technological challenges around personalization & show how companies use technologies like Redis & RudderStack to drive personalization use cases.

article thumbnail

Why are database columns 191 characters?

Grouparoo

Sometimes, when you are looking at a database’s schema, you see that there are text fields defined like this: email_address varchar ( 191 ) NOT NULL This means that the column supports strings with a maximum length of 191 characters, and can’t be null. 191 is such an odd number - where did it come from? In this post, we’ll look at the historical reasons for the 191 character limit as a default in most relational databases.

article thumbnail

Big Data Analytics: How It Works, Tools, and Real-Life Applications

AltexSoft

Big Data enjoys the hype around it and for a reason. But the understanding of the essence of Big Data and ways to analyze it is still blurred. The truth is, there’s more to this term than just the size of information generated. Not only does Big Data apply to the huge volumes of continuously growing data that come in different formats, but it also refers to the range of processes, tools, and approaches used to gain insights from that data.

article thumbnail

How to Extract Snowflake Data Observability Metrics Using SQL in 5 Steps

Monte Carlo

Your team just migrated to Snowflake. Your CTO is all in on this “modern data stack,” or as she calls it: “ The Enterprise Data Discovery.” But as any data engineer will tell you, not even the best tools will save you from broken pipelines. In fact, you’ve probably been on the receiving end of schema changes gone bad, duplicate tables, and one-too-many null values on more occasions than you wish to remember.

SQL 40
article thumbnail

The Big Payoff of Application Analytics

Outdated or absent analytics won’t cut it in today’s data-driven applications – not for your end users, your development team, or your business. That’s what drove the five companies in this e-book to change their approach to analytics. Download this e-book to learn about the unique problems each company faced and how they achieved huge returns beyond expectation by embedding analytics into applications.