June, 2022

article thumbnail

Data Orchestration Trends: The Shift From Data Pipelines to Data Products

Simon Späti

Data consumers, such as data analysts, and business users, care mostly about the production of data assets. On the other hand, data engineers have historically focused on modeling the dependencies between tasks (instead of data assets) with an orchestrator tool. How can we reconcile both worlds? This article reviews open-source data orchestration tools (Airflow, Prefect, Dagster) and discusses how data orchestration tools introduce data assets as first-class objects.

article thumbnail

Azure Data Factory: New Monitoring View Features

Azure Data Engineering

It is very easy to visually monitor previous pipeline runs in Data Factory using the Monitor page in the Azure Data Factory , which we have already covered in a previous post. There have been some recent improvements to the monitoring view, we will go through these briefly in this post. Data from the Azure Monitor view can be easily exported to csv by clicking on the newly added Export to CSV button.

Data 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

5 Steps to land a high paying data engineering job

Start Data Engineering

1. Introduction 2. Steps 2.1. Choosing companies to work for 2.2. Optimizing your Linkedin & resume 2.3. Landing interviews 2.4. Preparing for interviews 2.5. Offers & Negotiation 3. Conclusion 4. Further reading 5. Reference 1. Introduction The data industry is booming! & data engineering salaries are skyrocketing. But landing a new job is not an easy task.

article thumbnail

Combining The Simplicity Of Spreadsheets With The Power Of Modern Data Infrastructure At Canvas

Data Engineering Podcast

Summary Data analysis is a valuable exercise that is often out of reach of non-technical users as a result of the complexity of data systems. In order to lower the barrier to entry Ryan Buick created the Canvas application with a spreadsheet oriented workflow that is understandable to a wide audience. In this episode Ryan explains how he and his team have designed their platform to bring everyone onto a level playing field and the benefits that it provides to the organization.

article thumbnail

Get Better Network Graphs & Save Analysts Time

Many organizations today are unlocking the power of their data by using graph databases to feed downstream analytics, enahance visualizations, and more. Yet, when different graph nodes represent the same entity, graphs get messy. Watch this essential video with Senzing CEO Jeff Jonas on how adding entity resolution to a graph database condenses network graphs to improve analytics and save your analysts time.

article thumbnail

An In-Depth Data Mesh Discussion with Zhamak Dehghani

Jesse Anderson

In 2021 I had the pleasure to first get to know and speak with Zhamak Dheghani, Director of Emerging Technologies at ThoughtWorks, in season one of the Data Dream Team series. Zhamak is a software engineer and architect who is (in)famously known as the founder of the data mesh concept, a paradigm shift in how we manage data-driven value at scale. I interviewed Zhamak last season as more of an introduction to Data Mesh.

article thumbnail

24 SQL Questions You Might See on Your Next Interview

KDnuggets

Preparing for the SQL job interview can be overwhelming enough. You don’t need someone telling you that you need to know everything on top of that! Be smart and focus on preparing the SQL questions that appear most often at the job interview.

SQL 160

More Trending

article thumbnail

Azure Data Factory: Script Activity

Azure Data Engineering

While we have discussed various ways for running custom SQL code in Azure Data Factory in a previous post , recently, a new activity has been added to Azure Data Factory called Script Activity , which provides a more flexible way of running custom SQL scripts. Azure Data Factory: Script Activity As shown in the screenshot above, this activity supports execution of custom Data Query Language (DQL) as well as Data Definition Language (DDL) and Data Manipulation Language (DML).

SQL 130
article thumbnail

The Future Is Hybrid Data, Embrace It

Cloudera

We live in a hybrid data world. In the past decade, the amount of structured data created, captured, copied, and consumed globally has grown from less than 1 ZB in 2011 to nearly 14 ZB in 2020. Impressive, but dwarfed by the amount of unstructured data, cloud data, and machine data – another 50 ZB. In fact, the total amount of data is expected to nearly triple by 2025.

IT 108
article thumbnail

Discover And De-Clutter Your Unstructured Data With Aparavi

Data Engineering Podcast

Summary Unstructured data takes many forms in an organization. From a data engineering perspective that often means things like JSON files, audio or video recordings, images, etc. Another category of unstructured data that every business deals with is PDFs, Word documents, workstation backups, and countless other types of information. Aparavi was created to tame the sprawl of information across machines, datacenters, and clouds so that you can reduce the amount of duplicate data and save time an

article thumbnail

Natively Connect Teradata QueryGrid to Google BigQuery

Teradata

With the Teradata QueryGrid Google BigQuery Connector, we’re enabling our customers to natively join data between Vantage and BigQuery in real-time, at scale.

Data 98
article thumbnail

Understanding User Needs and Satisfying Them

Speaker: Scott Sehlhorst

We know we want to create products which our customers find to be valuable. Whether we label it as customer-centric or product-led depends on how long we've been doing product management. There are three challenges we face when doing this. The obvious challenge is figuring out what our users need; the non-obvious challenges are in creating a shared understanding of those needs and in sensing if what we're doing is meeting those needs.

article thumbnail

20 Basic Linux Commands for Data Science Beginners

KDnuggets

Essential Linux commands to improve the data science workflow. It will give you the power to automate tasks, build pipelines, access file systems, and enhance development operations.

article thumbnail

Autonomous Networks — The Telco and Media Growth Engine

Confluent

How real-time integrations between modern and legacy systems benefit communication service providers with autonomous network features, enhanced customer experiences, and more.

Media 86
article thumbnail

Azure Data Factory: Monitor Self Hosted Integration Runtime Metrics

Azure Data Engineering

Self-hosted integration runtime in the context of Azure data factory is a gateway that connects the on-prem data sources to datastores in the cloud. To know more about Integration runtimes, please refer to the previous post. We have discussed how to check whether Integration Runtime is online or offline using PowerShell command in a previous post. In today’s post, lets have a look at how to monitor self-hosted integration runtime metrics such as CPU utilization, Available memory, number of concu

Utilities 130
article thumbnail

Moving Enterprise Data From Anywhere to Any System Made Easy

Cloudera

Since 2015, the Cloudera DataFlow team has been helping the largest enterprise organizations in the world adopt Apache NiFi as their enterprise standard data movement tool. Over the last few years, we have had a front-row seat in our customers’ hybrid cloud journey as they expand their data estate across the edge, on-premise, and multiple cloud providers.

Systems 99
article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

Strategies And Tactics For A Successful Master Data Management Implementation

Data Engineering Podcast

Summary The most complicated part of data engineering is the effort involved in making the raw data fit into the narrative of the business. Master Data Management (MDM) is the process of building consensus around what the information actually means in the context of the business and then shaping the data to match those semantics. In this episode Malcolm Hawker shares his years of experience working in this domain to explore the combination of technical and social skills that are necessary to mak

article thumbnail

Modernizing a public health system with Teradata’s connected analytic architecture

Teradata

How do you accelerate disease prevention and response? Teradata provides a response to help accelerate public health infrastructure modernization.

article thumbnail

Primary Supervised Learning Algorithms Used in Machine Learning

KDnuggets

In this tutorial, we are going to list some of the most common algorithms that are used in supervised learning along with a practical tutorial on such algorithms.

Algorithm 158
article thumbnail

Introducing the Current 2022 Program Committee

Confluent

The committee will ensure Current has the best speakers from top companies in every industry, and cover all streaming data technologies.

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

How Netflix Content Engineering makes a federated graph searchable (Part 2)

Netflix Tech

By Alex Hutter , Falguni Jhaveri , and Senthil Sayeebaba In a previous post , we described the indexing architecture of Studio Search and how we scaled the architecture by building a config-driven self-service platform that allowed teams in Content Engineering to spin up search indices easily. This post will discuss how Studio Search supports querying the data available in these indices.

article thumbnail

#ClouderaLife Spotlight: Hassan Mirza

Cloudera

In this #ClouderaLife Spotlight Hassan talks about three life themes that have kept him moving and motivated: learning from his father’s work ethic despite his family’s forcible displacement from their country of origin, his early experience with organized sports, and the value of mentorship. Hassan describes how these experiences led him to give back to his family and community by becoming a Mental Health First Aider and a mentor for refugees seeking a better life.

article thumbnail

Hire And Scale Your Data Team With Intention

Data Engineering Podcast

Summary Building a well rounded and effective data team is an iterative process, and the first hire can set the stage for future success or failure. Trupti Natu has been the first data hire multiple times and gone through the process of building teams across the different stages of growth. In this episode she shares her thoughts and insights on how to be intentional about establishing your own data team.

Metadata 100
article thumbnail

A Model Implementation

Teradata

How do you take the first steps to free the power of analytics from on-premise systems whilst protecting valuable data and de-risking transformation? Find out more.

Systems 85
article thumbnail

The Big Payoff of Application Analytics

Outdated or absent analytics won’t cut it in today’s data-driven applications – not for your end users, your development team, or your business. That’s what drove the five companies in this e-book to change their approach to analytics. Download this e-book to learn about the unique problems each company faced and how they achieved huge returns beyond expectation by embedding analytics into applications.

article thumbnail

Python: The programming language of machine learning

KDnuggets

You can't avoid learning Python if you work on machine learning problems. You need to know what other people's code means and you need to convey your ideas to them too.

article thumbnail

How to Elastically Scale Apache Kafka Clusters on Confluent Cloud

Confluent

How to elastically scale Kafka clusters from 0 to 100 MB/s and back with automatic cluster resizing, data rebalancing, real-time consumption optimization, and monitoring in seconds.

Kafka 80
article thumbnail

Cloud Computing Interview Questions And Answers 2022

U-Next

Unless and until you prepare for an interview, it’s impossible to crack a cloud computing interview. Preparation beforehand is a must, and here you can achieve that! Introduction To Cloud Computing Interview Questions. Since cloud computing is useful outside of only IT organisations, it has become a popular career in recent years. Businesses from a variety of sectors, including finance, computers, commerce, entertainment, and automobiles, have shifted to using cloud computing for information sto

article thumbnail

The Future of the Data Lakehouse – Open

Cloudera

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. In recent years, the term “data lakehouse” was coined to describe this architectural pattern of tabular analytics over data in the data lake.

article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

Simplify Data Security For Sensitive Information With The Skyflow Data Privacy Vault

Data Engineering Podcast

Summary The best way to make sure that you don’t leak sensitive data is to never have it in the first place. The team at Skyflow decided that the second best way is to build a storage system dedicated to securely managing your sensitive information and making it easy to integrate with your applications and data systems. In this episode Sean Falconer explains the idea of a data privacy vault and how this new architectural element can drastically reduce the potential for making a mistake wit

article thumbnail

Operational excellence—data ensures airlines maintain the right trajectory

Teradata

Learn how data and analytics can enable airlines to navigate towards more streamlined operations. Read more.

Data 98
article thumbnail

Learn MLOps with This Free Course

KDnuggets

Learn to train and track your experiments, create ML pipelines, model deployment, monitor the performance in production, and adopt best practices from DevOps.

159
159
article thumbnail

Confluent wins the 2022 Microsoft Commercial Marketplace Partner of the Year Award

Confluent

Our Marketplace Partner of the Year Award highlights Confluent's data streaming solution, cloud Apache Kafka, and fully integrated Azure security, management, billing, and data analytics.

Kafka 70
article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.