December, 2021

article thumbnail

Azure Data Factory Linked Service: Advanced Authoring

Azure Data Engineering

We have discussed Linked Service parameterization through the UI, in a previous post. But not all Linked Service Types support parametrization using the UI. In this post, we will discuss the Linked Services that can’t be parameterized using the UI. (i.e., they don’t have any option to add parameter). If you are familiar with Azure Services, you might know that the Linked Services or any other Azure artefact has corresponding underlying JSON code.

Coding 130
article thumbnail

How to choose the right tools for your data pipeline

Start Data Engineering

1. Introduction 2. Requirements 3. Components 4. Choosing tools 4.1 Requirement x Component framework 4.2 Filters 5. Conclusion 6. Further reading 1. Introduction If you are building data pipelines from the ground up, the number of available data engineering tools to choose from can be overwhelming. If you are thinking Most of the tools seem to be doing the same/similar thing, which one should I choose?

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

6 Predictive Models Every Beginner Data Scientist Should Master

KDnuggets

Data Science models come with different flavors and techniques — luckily, most advanced models are based on a couple of fundamentals. Which models should you learn when you want to begin a career as Data Scientist? This post brings you 6 models that are widely used in the industry, either in standalone form or as a building block for other advanced techniques.

article thumbnail

A Guide to Stream Processing and ksqlDB Fundamentals

Confluent

Event streaming applications are a powerful way to react to events as they happen and to take advantage of data while it is fresh. However, they can be a challenge […].

Process 141
article thumbnail

Beyond the Basics of A/B Tests: Innovative Experimentation Tactics You Need to Know as a Data or Product Professional

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

Cloudera Response to CVE-2021-44228

Cloudera

Summary. On December 10th 2021, the Apache Software Foundation released version 2.15.0 of the Log4j Java logging library, fixing CVE-2021-44228 , a remote code execution vulnerability affecting Log4j 2.0-2.14. An attacker can use this vulnerability to instruct affected systems to download and execute a malicious payload through submitting a custom-crafted request.

Java 127
article thumbnail

Building A System Of Record For Your Organization's Data Ecosystem At Metaphor

Data Engineering Podcast

Summary Building a well managed data ecosystem for your organization requires a holistic view of all of the producers, consumers, and processors of information. The team at Metaphor are building a fully connected metadata layer to provide both technical and social intelligence about your data. In this episode Pardhu Gunnam and Mars Lan explain how they have designed the architecture and user experience to allow everyone to collaborate on the data lifecycle and provide opportunities for automatio

Systems 100

More Trending

article thumbnail

DataKitchen’s Best of 2021 DataOps Resources

DataKitchen

Before we shut the door on 2021, we would like to share our most popular DataOps content in hopes that it can help you as you learn about and implement DataOps. We hope you and your family have happy holidays and we look forward to continuing your DataOps journey with you in the new year. Without further ado, here are DataKitchen’s top ten blog posts, top five white papers, and top five webinars from 2021.

article thumbnail

Building a solid data team

KDnuggets

How do you put together a solid data science team when it comes to developing data-driven products? A variety of roles are available to consider, so which ones do you need and which are most crucial?

Building 160
article thumbnail

Best Tutorials for Getting Started with Apache Kafka

Confluent

Each one of the more than 50 tutorials for Apache Kafka® on Confluent Developer answers a question that you might ask a knowledgeable friend or colleague about Kafka and its […].

Kafka 135
article thumbnail

Cloudera Data Engineering 2021 Year End Review

Cloudera

Since the release of Cloudera Data Engineering (CDE) more than a year ago , our number one goal was operationalizing Spark pipelines at scale with first class tooling designed to streamline automation and observability. In working with thousands of customers deploying Spark applications, we saw significant challenges with managing Spark as well as automating, delivering, and optimizing secure data pipelines.

article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

Deliver Personal Experiences In Your Applications With The Unomi Open Source Customer Data Platform

Data Engineering Podcast

Summary The core to providing your users with excellent service is to understand them and provide a personalized experience. Unfortunately many sites and applications take that to the extreme and collect too much information. In order to make it easier for developers to build customer profiles in a way that respects their privacy Serge Huber helped to create the Apache Unomi framework as an open source customer data platform.

article thumbnail

Reference Data: Smoothing Out the Bumps in M&A

Teradata

M&A is an important part of an organization's growth strategy. Getting reference data right can be foundational to overcoming many challenges that come with it.

Data 98
article thumbnail

8 analytics startups to watch over the next year

DataKitchen

The post 8 analytics startups to watch over the next year first appeared on DataKitchen.

124
124
article thumbnail

10 Key AI & Data Analytics Trends for 2022 and Beyond

KDnuggets

What AI and data analytics trends are taking the industry by storm this year? This comprehensive review highlights upcoming directions in AI to carefully watch and consider implementing in your personal work or organization.

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

I Interviewed Nearly 200 Apache Kafka Experts and I Learned These 10 Things

Confluent

Many leading lights of the Apache Kafka® community have appeared as guests on Streaming Audio at one time or another in the past three years. But some of its episodes […].

Kafka 128
article thumbnail

AI and ML: No Longer the Stuff of Science Fiction

Cloudera

Artificial Intelligence (AI) has revolutionized how various industries operate in recent years. But with growing demands, there’s a more nuanced need for enterprise-scale machine learning solutions and better data management systems. The 2021 Data Impact Awards aim to honor organizations who have shown exemplary work in this area. . The category “Data for Enterprise AI” awards companies from around the world that have built and deployed use cases for enterprise-scale machine learning and have in

article thumbnail

Data Driven Hiring For Data Professionals With Alooba

Data Engineering Podcast

Summary Hiring data professionals is challenging for a multitude of reasons, and as with every interview process there is a potential for bias to creep in. Tim Freestone founded Alooba to provide a more stable reference point for evaluating candidates to ensure that you can make more informed comparisons based on their actual knowledge. In this episode he explains how Alooba got started, how it is being used in the interview process for data oriented roles, and how it can also provide visibility

article thumbnail

Delivering Actionable Financial Insights to Automotive Business Leaders

Teradata

Automotive businesses need to build new frameworks for CFO Analytics that leverage existing systems to provide the granular, timely data they need to succeed. Read more.

Systems 89
article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

Snaring the Bad Folks

Netflix Tech

Project by Netflix’s Cloud Infrastructure Security team ( Alex Bainbridge , Mike Grima , Nick Siow) Cloud security is a hard problem, but an even harder one is cloud security at scale. In recent years we’ve seen several cloud focused data breaches and evidence shows that threat actors are becoming more advanced with their techniques, goals, and tooling.

AWS 78
article thumbnail

Alternative Feature Selection Methods in Machine Learning

KDnuggets

Feature selection methodologies go beyond filter, wrapper and embedded methods. In this article, I describe 3 alternative algorithms to select predictive features based on a feature importance score.

article thumbnail

The Definitive Guide to Building a Data Mesh with Event Streams

Confluent

Data mesh. This oft-talked-about architecture has no shortage of blog posts, conference talks, podcasts, and discussions. One thing that you may have found lacking is a concrete guide on precisely […].

Building 128
article thumbnail

In AI we trust? Why we Need to Talk About Ethics and Governance (part 2 of 2)

Cloudera

In part 1 of this blog post, we discussed the need to be mindful of data bias and the resulting consequences when certain parameters are skewed. Surely there are ways to comb through the data to minimise the risks from spiralling out of control. We need to get to the root of the problem. In 2019, the Gradient institute published a white paper outlining the practical challenges for Ethical AI.

article thumbnail

The Big Payoff of Application Analytics

Outdated or absent analytics won’t cut it in today’s data-driven applications – not for your end users, your development team, or your business. That’s what drove the five companies in this e-book to change their approach to analytics. Download this e-book to learn about the unique problems each company faced and how they achieved huge returns beyond expectation by embedding analytics into applications.

article thumbnail

Data-Driven in 2022: Data Management Opportunities in the Year Ahead

DataKitchen

The post Data-Driven in 2022: Data Management Opportunities in the Year Ahead first appeared on DataKitchen.

article thumbnail

Data Mesh and Data Virtualization are not the Same Thing

Teradata

The Data Mesh approach to enterprise data architecture has many benefits, but there is a widespread misunderstanding that will significantly limit those benefits for anyone who holds it.

article thumbnail

Install and Run Cockpit on Linux Virtual Machines

WeCloudData

Objectives This tutorial will walk you through installing the user-friendly Linux sysadmin web console tool Cockpit Prerequisites Installed Linux OS (this tutorial uses the Debian-based Linux distro Ubuntu) Introduction Linux is extremely useful and powerful but due to its flexibility, extensibility, and versatility as an operating system with a plethora of utilities, it can be […] The post Install and Run Cockpit on Linux Virtual Machines appeared first on WeCloudData.

article thumbnail

How to Get Certified as a Data Scientist

KDnuggets

If you are early in your journey to becoming a Data Scientist, an interesting option is to earn certification by DataCamp, and this guide offers tips that will help beginners complete the challenges.

article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

Serverless Stream Processing with Apache Kafka, AWS Lambda, and ksqlDB

Confluent

It seems like now more than ever developers are surrounded by a sea of terminology—but what does it really all mean? Here, we will take some often heard terms—some considered […].

AWS 125
article thumbnail

Cloudera Response to CVE-2021-4428

Cloudera

Summary. On December 10th 2021, the Apache Software Foundation released version 2.15.0 of the Log4j Java logging library, fixing CVE-2021-44228 , a remote code execution vulnerability affecting Log4j 2.0-2.14. An attacker can use this vulnerability to instruct affected systems to download and execute a malicious payload through submitting a custom-crafted request.

Java 95
article thumbnail

What is embedded analytics, and how does it benefit BI?

DataKitchen

The post What is embedded analytics, and how does it benefit BI? first appeared on DataKitchen.

BI 97
article thumbnail

Supply Chain Predictions for 2022

Teradata

What we've learned during the pandemic and through almost two years of unprecedented supply chain disruptions will have great impact on the future of supply chain as we enter 2022.

52
article thumbnail

Driving Business Impact for PMs

Speaker: Jon Harmer, Product Manager for Google Cloud

Move from feature factory to customer outcomes and drive impact in your business! This session will provide you with a comprehensive set of tools to help you develop impactful products by shifting from output-based thinking to outcome-based thinking. You will deepen your understanding of your customers and their needs as well as identifying and de-risking the different kinds of hypotheses built into your roadmap.