Sat.Sep 26, 2020 - Fri.Oct 02, 2020

article thumbnail

Data Engineering Project: Stream Edition

Start Data Engineering

Table of Contents Table of Contents Introduction Project description and requirements Infrastructure overview Apache Flink Apache Kafka Design Detect fraudulent accounts Log account actions Prerequisites Code Defining dependencies Inheritance Server logs generator Defining data flow in Apache Flink Create a streaming environment Creating a consumer to read events from Apache Kafka Detecting fraud and generating alert events Writing server logs to a PostgreSQL DB Fraud detection logic Open proces

article thumbnail

How Real-Time Stream Processing Works with ksqlDB, Animated

Confluent

ksqlDB, the event streaming database, is becoming one of the most popular ways to work with Apache Kafka®. Every day, we answer many questions about the project, but here’s a […].

Process 145
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Upgrade Journey: The Path from CDH to CDP Private Cloud

Cloudera

Cloudera delivers an enterprise data cloud that enables companies to build end-to-end data pipelines for hybrid cloud, spanning edge devices to public or private cloud, with integrated security and governance underpinning it to protect customers data. Cloudera has found that customers have spent many years investing in their big data assets and want to continue to build on that investment by moving towards a more modern architecture that helps leverage the multiple form factors.

Cloud 132
article thumbnail

Speed Up And Simplify Your Streaming Data Workloads With Red Panda

Data Engineering Podcast

Summary Kafka has become a de facto standard interface for building decoupled systems and working with streaming data. Despite its widespread popularity, there are numerous accounts of the difficulty that operators face in keeping it reliable and performant, or trying to scale an installation. To make the benefits of the Kafka ecosystem more accessible and reduce the operational burden, Alexander Gallego and his team at Vectorized created the Red Panda engine.

Kafka 100
article thumbnail

Beyond the Basics of A/B Tests: Innovative Experimentation Tactics You Need to Know as a Data or Product Professional

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

Demystifying the Business Continuity Space: A Two Part Series

Teradata

In part 1 of this 2 part topic, we will define some of the commonly used (& misused) terms in the business continuity space & help you navigate what they mean to your organization.

52
article thumbnail

Introducing Confluent Platform 6.0

Confluent

Each month, we’ve announced a set of Confluent features organized around what we think are the key foundational traits of cloud-native data systems as part of Project Metamorphosis. Data systems […].

Project 143

More Trending

article thumbnail

Build a Slack Dashboard (Part 2): Loading Into Postgres & Creating Basic Charts

Preset

Build a beautiful Slack dashboard using open source tools Meltano and Superset. Part 2 of 3.

article thumbnail

Demystifying the Business Continuity Space: A Three Part Series

Teradata

In part 1 of this 3 part series, we will define some of the commonly used (& misused) terms in the business continuity space & help you navigate what they mean to your organization.

52
article thumbnail

ksqlDB Meets Java: An IoT-Inspired Demo of the Java Client for ksqlDB

Confluent

Stream processing applications, including streaming ETL pipelines, materialized caches, and event-driven microservices, are made easy with ksqlDB. Until recently, your options for interacting with ksqlDB were limited to its command-line […].

Java 119
article thumbnail

Cloudera Data Warehouse outperforms Azure HDInsight in TPC-DS benchmark

Cloudera

Performance is one of the key, if not the most important deciding criterion, in choosing a Cloud Data Warehouse service. In today’s fast changing world, enterprises have to make data driven decisions quickly and for that they rely heavily on their data warehouse service. . In this blog post, we compare Cloudera Data Warehouse (CDW) on Cloudera Data Platform (CDP) using Apache Hive-LLAP to Microsoft HDInsight (also powered by Apache Hive-LLAP) on Azure using the TPC-DS 2.9 benchmark.

article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

PowerBI distribution and sharing

FreshBI

Spotlight: The PowerBI Service Lately we have been getting a lot of questions surrounding licensing and release strategy in PowerBI. This guide should serve as an internal, quick reference manual. The following is a list of topics covered in this guide, each containing a summary of how it works and what the use case is. Licensing PowerBI Desktop / Free Who uses this?

BI 52
article thumbnail

Break Out of the Data Silo!

Teradata

Marketing might be the best place to start operationalizing a bank-wide data strategy. But, to be effective, the CMO needs to dissolve data silos & create a model for data orchestration.

Banking 52
article thumbnail

ksqlDB 0.12.0 Introduces Real-Time Query Upgrades and Automatic Query Restarts

Confluent

The ksqlDB team is pleased to announce ksqlDB 0.12.0. This release continues to improve upon the usability of ksqlDB and aims to reduce administration time. Highlights include query upgrades, which […].

Process 98
article thumbnail

How to enable Cloudera Data Visualization in CDW

Cloudera

In our previous blog post we introduced Cloudera Data Visualization in Cloudera Data Warehouse (CDW) available in tech preview, in CDP Public Cloud. This blog will help you get started with Cloudera Data Visualization, so you can start building interesting and powerful applications on all types of data. Before you start. Make sure that. You have a CDP account set up (for instance, you may use our trial experience ).

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

Building a Real-Time Customer 360 on Kafka, MongoDB and Rockset

Rockset

Users interact with services in real-time. They login to websites, like and share posts, purchase goods and even converse, all in real-time. So why is it that whenever you have a problem when using a service and you reach a customer support representative, that they never seem to know who you are or what you’ve been doing recently? This is likely because they haven’t built a customer 360 profile and if they have, it certainly isn’t real-time.

MongoDB 40
article thumbnail

Three Insights Into Delivering Value at Scale From Smart Factory Investments

Teradata

Industry 4.0 has promised productivity gains, but has not yet delivered. A large part of this has to do with the challenge of deploying analytics at scale. Find out more.

52
article thumbnail

How to Solve the “You’re Using THAT Table?!” Problem

Monte Carlo

As companies increasingly rely on data to power decision making and drive innovation, it’s important that this data is timely, accurate, and reliable. When you consider that only a small fraction of the over 7.5 septillion (7,700,000,000, 000,000,000,000) GB of data generated worldwide every day is usable, keeping tabs on what data assets are important has only gotten harder.

article thumbnail

Coffee with Cloudera: Meet Ali Bajwa, Partner Solutions – Engineer by Day, Rockstar by Night!

Cloudera

Meet Ali Bajwa , Director of Partner Solutions Engineering at Cloudera. For the past 6 years, Ali has been front and center in many partner field deployments, training, and discussions; he is a rockstar in the Cloudera Partner Ecosystem! We hope this interview helps you get to know the afterhours Ali. If you get a chance, follow Ali on twitter! @abajwa_hdp.

article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.