Sat.Dec 11, 2021 - Fri.Dec 17, 2021

article thumbnail

Building Auditable Spark Pipelines At Capital One

Data Engineering Podcast

Summary Spark is a powerful and battle tested framework for building highly scalable data pipelines. Because of its proven ability to handle large volumes of data Capital One has invested in it for their business needs. In this episode Gokul Prabagaren shares his use for it in calculating your rewards points, including the auditing requirements and how he designed his pipeline to maintain all of the necessary information through a pattern of data enrichment.

Building 130
article thumbnail

Azure Data Factory Linked Service: Advanced Authoring

Azure Data Engineering

We have discussed Linked Service parameterization through the UI, in a previous post. But not all Linked Service Types support parametrization using the UI. In this post, we will discuss the Linked Services that can’t be parameterized using the UI. (i.e., they don’t have any option to add parameter). If you are familiar with Azure Services, you might know that the Linked Services or any other Azure artefact has corresponding underlying JSON code.

Coding 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

How to choose the right tools for your data pipeline

Start Data Engineering

1. Introduction 2. Requirements 3. Components 4. Choosing tools 4.1 Requirement x Component framework 4.2 Filters 5. Conclusion 6. Further reading 1. Introduction If you are building data pipelines from the ground up, the number of available data engineering tools to choose from can be overwhelming. If you are thinking Most of the tools seem to be doing the same/similar thing, which one should I choose?

article thumbnail

10 Key AI & Data Analytics Trends for 2022 and Beyond

KDnuggets

What AI and data analytics trends are taking the industry by storm this year? This comprehensive review highlights upcoming directions in AI to carefully watch and consider implementing in your personal work or organization.

article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

Cloudera Response to CVE-2021-44228

Cloudera

Summary. On December 10th 2021, the Apache Software Foundation released version 2.15.0 of the Log4j Java logging library, fixing CVE-2021-44228 , a remote code execution vulnerability affecting Log4j 2.0-2.14. An attacker can use this vulnerability to instruct affected systems to download and execute a malicious payload through submitting a custom-crafted request.

Java 129
article thumbnail

The Definitive Guide to Building a Data Mesh with Event Streams

Confluent

Data mesh. This oft-talked-about architecture has no shortage of blog posts, conference talks, podcasts, and discussions. One thing that you may have found lacking is a concrete guide on precisely […].

Building 127

More Trending

article thumbnail

Data Science & Analytics Industry Main Developments in 2021 and Key Trends for 2022

KDnuggets

We have solicited insights from experts at industry-leading companies, asking: "What were the main AI, Data Science, Machine Learning Developments in 2021 and what key trends do you expect in 2022?" Read their opinions here.

article thumbnail

Cadence Multi-Tenant Task Processing

Uber Engineering

Introduction. Cadence is a multi-tenant orchestration framework that helps developers at Uber to write fault-tolerant, long-running applications, also known as workflows. It scales horizontally to handle millions of concurrent executions from various customers. It is currently used by hundreds of … The post Cadence Multi-Tenant Task Processing appeared first on Uber Engineering Blog.

Process 121
article thumbnail

AI and ML: No Longer the Stuff of Science Fiction

Cloudera

Artificial Intelligence (AI) has revolutionized how various industries operate in recent years. But with growing demands, there’s a more nuanced need for enterprise-scale machine learning solutions and better data management systems. The 2021 Data Impact Awards aim to honor organizations who have shown exemplary work in this area. . The category “Data for Enterprise AI” awards companies from around the world that have built and deployed use cases for enterprise-scale machine learning and have in

article thumbnail

Data Sharing Patterns with Confluent Schema Registry

Confluent

Sharing metadata on the data you store in your Confluent cluster is paramount to allow for effective sharing of that data across the enterprise. As the usage of real-time data […].

Metadata 103
article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

Data Labeling for Machine Learning: Market Overview, Approaches, and Tools

KDnuggets

So much of data science and machine learning is founded on having clean and well-understood data sources that it is unsurprising that the data labeling market is growing faster than ever. Here, we highlight many of the top players in this industry and the techniques they use to help you consider which might make a good partner for your needs.

article thumbnail

DataKitchen’s Best of 2021 DataOps Resources

DataKitchen

Before we shut the door on 2021, we would like to share our most popular DataOps content in hopes that it can help you as you learn about and implement DataOps. We hope you and your family have happy holidays and we look forward to continuing your DataOps journey with you in the new year. Without further ado, here are DataKitchen’s top ten blog posts, top five white papers, and top five webinars from 2021.

article thumbnail

Cloudera Response to CVE-2021-4428

Cloudera

Summary. On December 10th 2021, the Apache Software Foundation released version 2.15.0 of the Log4j Java logging library, fixing CVE-2021-44228 , a remote code execution vulnerability affecting Log4j 2.0-2.14. An attacker can use this vulnerability to instruct affected systems to download and execute a malicious payload through submitting a custom-crafted request.

Java 97
article thumbnail

Quickly Deploy Confluent Platform with the New Ansible Installer

Confluent

An initial distributed deployment of Confluent Platform is often a necessary step toward supporting your first real-time data use case. We offer enterprise-grade deployment orchestration with Confluent for Kubernetes and […].

Data 98
article thumbnail

How to Build an Experimentation Culture for Data-Driven Product Development

Speaker: Margaret-Ann Seger, Head of Product, Statsig

Experimentation is often seen as an aspirational practice, especially at smaller, fast-moving companies who are strapped for time and resources. So, how can you get your team making decisions in a more data-driven way while continuing to remain lean and maintaining ship velocity? In this webinar, Margaret-Ann Seger, Head of Product at Statsig, will teach you how to build an experimentation culture from the ground-up, graduating from just getting started with data-driven development to operating

article thumbnail

A Full End-to-End Deployment of a Machine Learning Algorithm into a Live Production Environment

KDnuggets

How to use scikit-learn, pickle, Flask, Microsoft Azure and ipywidgets to fully deploy a Python machine learning algorithm into a live, production environment.

article thumbnail

8 analytics startups to watch over the next year

DataKitchen

The post 8 analytics startups to watch over the next year first appeared on DataKitchen.

124
124
article thumbnail

#ClouderaLife Spotlight: Manoj Shanmugasundaram – Principal Solutions Engineer

Cloudera

Manoj Shanmugasundaram has been with Cloudera for 5 and a half years bringing his talents to our Solutions Engineering team. . As a Principal Solutions Engineer, he says his core responsibility is “to take Cloudera’s latest and greatest technology and meet a customer’s complex business requirements, across the data lifecycle, on any cloud or the datacenter.”.

article thumbnail

Data Mesh and Data Virtualization are not the Same Thing

Teradata

The Data Mesh approach to enterprise data architecture has many benefits, but there is a widespread misunderstanding that will significantly limit those benefits for anyone who holds it.

article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

How I 14Xed my salary in 14 years as a data analytics/science professional

KDnuggets

Learn how one data scientist increased their full-time job salary 14 times in 14 years of a career, with highlights on experiencing an IPO, RSUs, start-ups and working at FAANG companies.

article thumbnail

How to Learn SQL Basics for Data Science in 2023?

ProjectPro

Data science and artificial intelligence might be the buzzwords of recent times, but they are of no value without the right data backing them. The process of data collection has increased exponentially over the last few years. The companies are churning out massive volumes of data every day for analysis and deriving business insights. All this data is stored in a database that requires SQL-based queries for retrieval and transformations, making it essential for every data professional to learn S

article thumbnail

How To Overcome Hybrid Cloud Migration Roadblocks

Cloudera

About the report. The Cloudera Enterprise Data Maturity Report is a global survey of 3,150 business and IT decision makers assessing organizations’ maturity when it comes to their current capabilities and handling of data and analytics. Organizations were evaluated based on their current use of data and analytics, parties championing the use of data and the extent to which data is used across processes, the presence of enterprise data strategies, and the extent to which capabilities relating to

Cloud 87
article thumbnail

Powering SQL Draw with Rockset, Retool and dbt

Rockset

If you were one of the 15,000 people who attended Coalesce 2021 , you will likely remember SQL Draw, the Slack-based game combining SQL with cartesian geometry, art, creativity and teamwork. If you missed it, you can read more about SQL Draw on the Omnata website. Below are a few of the artworks that received the most votes: Behind the scenes, SQL Draw is made up of two parts: The core game is built as a Slack app with a totally serverless backend architecture.

SQL 52
article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

What Is AI Model Governance?

KDnuggets

How exactly does AI model governance help tackle these issues? And how can you ensure you’re using it to best fit your needs? Read on.

article thumbnail

A Collection of Take-Home Data Science Challenges for 2023

ProjectPro

Challenges make us all uncomfortable but none of us can deny that difficult challenges only help us bring out the stronger and better version of ourselves. So, if you are a professional data scientist or an enthusiast, read this article for a collection of take-home Data Science Challenges and develop better skills by attempting them. Working on take-home data science challenges is equally important for professionals and beginners alike.

article thumbnail

Why Company Data Strategies Are Indelibly Linked with DEI

Cloudera

About the report. The Cloudera Enterprise Data Maturity Report is a global survey of 3,150 business and IT decision makers assessing organizations’ maturity when it comes to their current capabilities and handling of data and analytics. Organizations were evaluated based on their current use of data and analytics, parties championing the use of data and the extent to which data is used across processes, the presence of enterprise data strategies, and the extent to which capabilities relating to

Data 86
article thumbnail

What’s a Data Catalog and How to Choose the Right One

phData: Data Engineering

Your business might be moving to the cloud, just completed, or have been established with it for a little while, and you are likely wondering, “what data catalog tool is best for me?” The short answer is…it depends. There are a lot of options available, and choosing the right data catalog for your business will highly depend on: What drives your business Your data needs Your unique data culture How you can support your data To provide you with the best possible chance of success on your d

article thumbnail

Reimagined: Building Products with Generative AI

“Reimagined: Building Products with Generative AI” is an extensive guide for integrating generative AI into product strategy and careers featuring over 150 real-world examples, 30 case studies, and 20+ frameworks, and endorsed by over 20 leading AI and product executives, inventors, entrepreneurs, and researchers.

article thumbnail

12 Tips: From Data Analyst to Startup Co-Founder

KDnuggets

Thinking about taking your data science expertise to a new level of creating a start-up company? These tips -- learned from experience -- can help you forge an early path toward success.

article thumbnail

Machine Learning Engineer vs Data Scientist - The Differences

ProjectPro

Are you a newbie in the data science domain ready to embark on a rewarding journey but are confused between the roles of a Machine Learning Engineer vs Data Scientist? Many data science beginners do not clearly understand the two job roles and often find it challenging to understand the day-to-day roles and responsibilities revolving around these jobs.

article thumbnail

It’s Time to Listen More to Your Employees!

Cloudera

Now is the time to sit up and listen. Not to me, but to your teams. Much of 2020 and 2021 were spent coping with new demands of remote work while negotiating the multitude of disruptions resulting from the pandemic. And this year, even as we inch our way back to business as we knew it, redefining norms for a hybrid future requires us to answer questions that often cannot be resolved on our own.

Cloud 79
article thumbnail

What Team Supports Your Data Catalog Best?

phData: Data Engineering

Welcome to part two of our trilogy on data catalogs. If you missed our first blog on what a data catalog is , be sure to check it out! In this blog, we’ll explore what the ideal team to support your data catalog looks like. Who Are the Users of a Data Catalog? A tool is only as good as the team you have to support and champion it. When setting your data catalog, it is tempting to leave it with a technical team that can keep the automation running, onboard new datasets, and support upgrades and

article thumbnail

The Big Payoff of Application Analytics

Outdated or absent analytics won’t cut it in today’s data-driven applications – not for your end users, your development team, or your business. That’s what drove the five companies in this e-book to change their approach to analytics. Download this e-book to learn about the unique problems each company faced and how they achieved huge returns beyond expectation by embedding analytics into applications.