September, 2022

article thumbnail

Data Lake / Lakehouse Guide: Powered by Data Lake Table Formats (Delta Lake, Iceberg, Hudi)

Simon Späti

Image by Rachel Claire on Pexels Ever wanted or been asked to build an open-source Data Lake offloading data for analytics? Asked yourself what components and features would that include. Didn’t know the difference between a Data Lakehouse and a Data Warehouse? Or you just wanted to govern your hundreds to thousands of files and have more database-like features but don’t know how?

Data Lake 130
article thumbnail

Airflow Taskflow API: The Guide

Marc Lamberti

Airflow Taskflow is a new way of writing DAGs at ease. As you will see, you need to write fewer lines than before to obtain the same DAG. That helps to make DAGs easier to build, read, and maintain. The Taskflow API has three main aspects: XCOM Args, Decorator, and XCOM backends. In this tutorial, you will learn what the Taskflow API is, why it is crucial for you, and how to create your DAGs.

SQL 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Building Data Pipelines That Run From Source To Analysis And Activation With Hevo Data

Data Engineering Podcast

Summary Any business that wants to understand their operations and customers through data requires some form of pipeline. Building reliable data pipelines is a complex and costly undertaking with many layered requirements. In order to reduce the amount of time and effort required to build pipelines that power critical insights Manish Jethani co-founded Hevo Data.

article thumbnail

How to Select Rows and Columns in Pandas Using [ ],loc, iloc,at and.iat

KDnuggets

Subset selection is one of the most frequently performed tasks while manipulating data. Pandas provides different ways to efficiently select subsets of data from your DataFrame.

Data 160
article thumbnail

Get Better Network Graphs & Save Analysts Time

Many organizations today are unlocking the power of their data by using graph databases to feed downstream analytics, enahance visualizations, and more. Yet, when different graph nodes represent the same entity, graphs get messy. Watch this essential video with Senzing CEO Jeff Jonas on how adding entity resolution to a graph database condenses network graphs to improve analytics and save your analysts time.

article thumbnail

Real-Time Gaming Infrastructure for Millions of Users with Apache Kafka, ksqlDB, and WebSockets

Confluent

How gaming enterprises like Sony and Big Fish Games use Apache Kafka®, Confluent, and ksqlDB’s data streaming technologies for the best in-game experience, ROI, and real-time capabilities.

Kafka 122
article thumbnail

Top 10 Globally Recognized Certifications for Cyber Security

U-Next

Introduction . Cybersecurity or computer security and information security is the act of preventing theft, damage, loss, or unauthorized access to computers, networks, and data. As our interconnections grow, so do the chances for evil hackers to steal, destroy, or disrupt our lives. The increase in cybercrime has increased the demand for cybersecurity expertise.

More Trending

article thumbnail

Rejoice! The Vantage Analytics and Data Platform Provide Incredible Power for All in a “Cloudy” Environment

Teradata

With the release of VantageCloud Lake and ClearScape Analytics, Teradata brings a cloud-native architecture to extend the technical innovations and differentiators that Vantage is well known for.

article thumbnail

Build A Common Understanding Of Your Data Reliability Rules With Soda Core and Soda Checks Language

Data Engineering Podcast

Summary Regardless of how data is being used, it is critical that the information is trusted. The practice of data reliability engineering has gained momentum recently to address that question. To help support the efforts of data teams the folks at Soda Data created the Soda Checks Language and the corresponding Soda Core utility that acts on this new DSL.

Building 100
article thumbnail

More Performance Evaluation Metrics for Classification Problems You Should Know

KDnuggets

When building and optimizing your classification model, measuring how accurately it predicts your expected outcome is crucial. However, this metric alone is never the entire story, as it can still offer misleading results. That's where these additional performance evaluations come into play to help tease out more meaning from your model.

Building 160
article thumbnail

Keeping Multiple Databases in Sync Using Kafka Connect and CDC

Confluent

Microservices have numerous benefits, but data silos are incredibly challenging. Learn how Kafka Connect and CDC provide real-time database synchronization, bridging data silos between all microservice applications.

Kafka 120
article thumbnail

Understanding User Needs and Satisfying Them

Speaker: Scott Sehlhorst

We know we want to create products which our customers find to be valuable. Whether we label it as customer-centric or product-led depends on how long we've been doing product management. There are three challenges we face when doing this. The obvious challenge is figuring out what our users need; the non-obvious challenges are in creating a shared understanding of those needs and in sensing if what we're doing is meeting those needs.

article thumbnail

Improve Underwriting Using Data and Analytics

Cloudera

Insurance carriers are always looking to improve operational efficiency. We’ve previously highlighted opportunities to improve digital claims processing with data and AI. In this post, I’ll explore opportunities to enhance risk assessment and underwriting, especially in personal lines and small and medium-sized enterprises. Underwriting is an area that can yield improvements by applying the old saying “work smarter, not harder.

article thumbnail

The Rise of the Semantic Layer

Simon Späti

A semantic layer is something we use every day. We build dashboards with yearly and monthly aggregations. We design dimensions for drilling down reports by region, product, or whatever metrics we are interested in. What has changed is that we no longer use a singular business intelligence tool; different teams use different visualizations (BI, notebooks, and embedded analytics).

BI 130
article thumbnail

What Do You Want to be Famous for?

Teradata

Financial services organizations that exhibit true data literacy avoid bottlenecks and instead choose to build best in class solutions that meet current and future needs. Find out more.

article thumbnail

Power Your Real-Time Analytics Without The Headache Using Fivetran's Change Data Capture Integrations

Data Engineering Podcast

Summary Data integration from source systems to their downstream destinations is the foundational step for any data product. With the increasing expecation for information to be instantly accessible, it drives the need for reliable change data capture. The team at Fivetran have recently introduced that functionality to power real-time data products.

Food 100
article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

SQL vs NoSQL: 7 Key Takeaways

KDnuggets

People assume that NoSQL is a counterpart to SQL. Instead, it’s a different type of database designed for use-cases where SQL is not ideal. The differences between the two are many, although some are so crucial that they define both databases at their cores.

NoSQL 160
article thumbnail

Excited to be back at Google Cloud Next 2022!

Confluent

Highlighting sessions on the power of our Confluent-Google partnership: multi-layer data security, real-time cloud data streaming and analytics, database modernization, and more.

article thumbnail

#Clouderalife Volunteer Spotlight: Barry Laide

Cloudera

Cloudera’s September Volunteer Spotlight is Barry Laide, accounting manager for LATAM, based in Cork, Ireland. . Barry volunteers with Kerry Mountain Rescue to provide first aid and rescue in the uplands of southwestern Ireland. The organization was founded in 1966 following the deaths of two climbers on the mountains there, and since then has come to the assistance of numerous climbers and walkers in distress. .

article thumbnail

Timestone: Netflix’s High-Throughput, Low-Latency Priority Queueing System with Built-in Support…

Netflix Tech

Timestone: Netflix’s High-Throughput, Low-Latency Priority Queueing System with Built-in Support for Non-Parallelizable Workloads by Kostas Christidis Introduction Timestone is a high-throughput, low-latency priority queueing system we built in-house to support the needs of Cosmos , our media encoding platform. Over the past 2.5 years, its usage has increased, and Timestone is now also the priority queueing engine backing Conductor , our general-purpose workflow orchestration engine, and BDP Sch

Systems 85
article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

Living Out Our Purpose

Teradata

At Teradata, we are committed to operating a business that takes a responsible view of our impact on society and the planet. Find out how we are living this commitment everyday.

52
article thumbnail

Operational Analytics To Increase Efficiency For Multi-Location Businesses With OpsAnalitica

Data Engineering Podcast

Summary In order to improve efficiency in any business you must first know what is contributing to wasted effort or missed opportunities. When your business operates across multiple locations it becomes even more challenging and important to gain insights into how work is being done. In this episode Tommy Yionoulis shares his experiences working in the service and hospitality industries and how that led him to found OpsAnalitica, a platform for collecting and analyzing metrics on multi location

article thumbnail

Welcome to TensorFlow!

KDnuggets

TensorFlow in Action teaches you to construct, train, and deploy deep learning models using TensorFlow 2. In this practical tutorial, you’ll build reusable skills hands-on as you create production-ready applications.

article thumbnail

Event-Driven Microservices with Python and Apache Kafka

Confluent

A deep dive into how microservices work, why it’s the backbone of real-time applications, and how to build event-driven microservices applications with Python and Kafka.

Kafka 98
article thumbnail

The Big Payoff of Application Analytics

Outdated or absent analytics won’t cut it in today’s data-driven applications – not for your end users, your development team, or your business. That’s what drove the five companies in this e-book to change their approach to analytics. Download this e-book to learn about the unique problems each company faced and how they achieved huge returns beyond expectation by embedding analytics into applications.

article thumbnail

Ethics Sheet for AI-assisted Comic Book Art Generation

Cloudera

Introduction. This blog is intended to serve as an ethics sheet for the task of AI-assisted comic book art generation, inspired by “ Ethics Sheets for AI Tasks.” AI-assisted comic book art generation is a task I proposed in a blog post I authored on behalf of my employer, Cloudera. I’m a research engineer by trade and have been involved in software creation in some way or another for most of my professional life.

article thumbnail

This Business Analytics IIM Program Is Everything You Need To Become The New Age Leader!  

U-Next

The beauty of Business Analytics lies in its ability to grow businesses by an ‘x’ factor just by collecting and analyzing data. As businesses today continue their constant hunt to find newer innovative ways to enhance business processes, the role of a Business Analyst has never had greater importance. According to Techjury the global business intelligence market will grow to $33.3 billion by 2025.

article thumbnail

Three steps to maximise value of RegTech investments

Teradata

RegTech is the word on everyone’s lips as financial services businesses look for ways to manage the avalanche of regulatory reporting precipitated by the 2008 financial crisis.

article thumbnail

Build Confidence In Your Data Platform With Schema Compatibility Reports That Span Systems And Domains Using Schemata

Data Engineering Podcast

Summary Data engineering systems are complex and interconnected with myriad and often opaque chains of dependencies. As they scale, the problems of visibility and dependency management can increase at an exponential rate. In order to turn this into a tractable problem one approach is to define and enforce contracts between producers and consumers of data.

Systems 100
article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

5 Concepts You Should Know About Gradient Descent and Cost Function

KDnuggets

Why is Gradient Descent so important in Machine Learning? Learn more about this iterative optimization algorithm and how it is used to minimize a loss function.

article thumbnail

6 Ways Data Streaming is Transforming Financial Services

Confluent

How banks and finance companies use Confluent to transform their digital systems with event-driven architecture, real-time payment processing, fraud detection, and analytics.

Banking 93
article thumbnail

Announcing GA of DataFlow Functions

Cloudera

. Today, we’re excited to announce that DataFlow Functions (DFF), a feature within Cloudera DataFlow for the Public Cloud, is now generally available for AWS, Microsoft Azure, and Google Cloud Platform. DFF provides an efficient, cost optimized, scalable way to run NiFi flows in a completely serverless fashion. This is the first complete no-code, no-ops development experience for functions, allowing users to save time and resources. .

article thumbnail

Important AI and Management Tools of 2022

U-Next

Introduction . There has never been a better time to adopt Artificial Intelligence with tools for AI. From everyday activities such as shopping and content creation to innovative developments such as space exploration and medical research, this time of technological advancement will have an enormous impact on virtually every aspect of life. . According to a Gartner study , AI software will generate $62 billion in revenue by 2022.

article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.