Top Data Engineering Digest Kafka Scala Content for March, 2021

March, 2021

Building a Data Engineering Project in 20 Minutes

Simon Späti

MARCH 9, 2021

This post focuses on practical data pipelines with examples from web-scraping real-estates, uploading them to S3 with MinIO, Spark and Delta Lake, adding some Data Science magic with Jupyter Notebooks, ingesting into Data Warehouse Apache Druid, visualising dashboards with Superset and managing everything with Dagster. The goal is to touch on the common data engineering challenges and using promising new technologies, tools or frameworks, which most of them I wrote about in Business Intelligence

Data Engineering

Data Engineering Data Engineer Engineering Project

Toward a Data Mesh (part 2) : Architecture & Technologies

François Nguyen

MARCH 22, 2021

Just an illustration – not the truth and you certainly can do it with other technologies. TL;DR After setting up and organizing the teams, we are describing 4 topics to make data mesh a reality. the selfserve platform based on a serverless philisophy (life is too short to do provisioning) the building of data products (as code) : we are building data workflows not data pipelines the promotion of data domains where the metadata on the data life cycle is as important as your data The old dat

Technology

Technology Architecture Google Cloud Metadata

Join 16,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Trending Sources

Apache Kafka Made Simple: A First Glimpse of a Kafka Without ZooKeeper

Confluent

MARCH 30, 2021

At the heart of Apache Kafka® sits the log—a simple data structure that uses sequential operations that work symbiotically with the underlying hardware. Efficient disk buffering and CPU cache usage, […].

Kafka

Kafka Data

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Data Quality Management For The Whole Team With Soda Data

Data Engineering Podcast

MARCH 29, 2021

Summary Data quality is on the top of everyone’s mind recently, but getting it right is as challenging as ever. One of the contributing factors is the number of people who are involved in the process and the potential impact on the business if something goes wrong. In this episode Maarten Masschelein and Tom Baeyens share the work they are doing at Soda to bring everyone on board to make your data clean and reliable.

Management

Management Data Warehouse BI Data

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

Data Science

How to trigger a spark job from AWS Lambda

Start Data Engineering

MARCH 27, 2021

Event driven pipelines Lambda function to trigger spark jobs Setup and run Monitoring and logging Teardown Conclusion Further reading References Event driven pipelines Event driven systems represent a software design pattern where a logic is executed in response to an event. This event can be a file creation on S3, a new database row, API call, etc.

AWS

AWS Cloud Storage Database Cloud

CFO Analytics: What Is It and Why Should You Care?

Teradata

MARCH 3, 2021

Finance-driven analytics might be the largest untapped opportunity for organizations & a catalyst for driving business value & strategic vision. But, what exactly is CFO analytics?

IT Finance

Building a Data Engineering Project in 20 Minutes

Simon Späti

MARCH 9, 2021

Data Engineering

Data Engineering Data Engineer Engineering Project

More Trending

Building a Data Engineering Project in 20 Minutes

Simon Späti

MARCH 9, 2021

Data Engineering

Data Engineering Data Engineer Engineering Project

International Women’s Day 2021: Challenging what’s possible

Cloudera

MARCH 1, 2021

This year’s International Women’s Day (IWD) on March 8th comes at a time when global communities, businesses, and governments find themselves continuing to pirouette, pivot, and adapt in the face of a relentless, global pandemic. . COVID-19 has touched every aspect of our lives. As women, overnight we suddenly found that we had a portfolio career – comprising our day jobs, caregiver, school teacher and house cleaner – that we had neither asked for, nor were consulted on. .

Portfolio

Portfolio Banking Consulting Government

Under the Hood of Real-Time Analytics with Apache Kafka and Pinot

Confluent

MARCH 9, 2021

Real-time analytics has become the need of the hour for modern internet companies. The ability to derive internal insights around business metrics, user growth and adoption as well as security […].

Kafka

Kafka Architecture

Real World Change Data Capture At Datacoral

Data Engineering Podcast

MARCH 22, 2021

Summary The world of business is becoming increasingly dependent on information that is accurate up to the minute. For analytical systems, the only way to provide this reliably is by implementing change data capture (CDC). Unfortunately, this is a non-trivial undertaking, particularly for teams that don’t have extensive experience working with streaming data and complex distributed systems.

Data Warehouse

Data Warehouse Metadata Data Lake Hadoop

ConsoleMe: A Central Control Plane for AWS Permissions and Access

Netflix Tech

MARCH 10, 2021

ConsoleMe: A Central Control Plane for AWS Permissions and Access By Curtis Castrapel , Patrick Sanders , and Hee Won Kim At AWS re:Invent 2020, we open sourced two new tools for managing multi-account AWS permissions and access. We’re very excited to bring you ConsoleMe (pronounced: kuhn-soul-mee ), and its CLI utility, Weep (pun intended)! If you missed the talk, check it out here.

AWS

AWS Accessible Accessibility Cloud

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

Engineering

How to Host a Virtual Global Data Science Hackathon

Teradata

MARCH 25, 2021

Learn how best to host a virtual hackathon, or any virtual event, with these tips and tricks from our Teradata team. Read more.

Data Science

Data Science Data

Reverse ETL with dbt and Grouparoo

Grouparoo

MARCH 30, 2021

Teams are centralizing their data in their data warehouse by loading data in and transforming it as necessary. Increasingly, we are seeing teams turn to dbt to do this transforming. The idea is to write *.sql files that, when run in the right order, create useful rollup tables or materialized views of the data. We've been asked by teams using dbt how Grouparoo can then sync their data to their cloud-based apps.

Data Warehouse

Data Warehouse SQL Project Database

Congratulations to our 2021 Partner Award Winners

Cloudera

MARCH 23, 2021

We announced at our Partner Sales Kickoff, the winners of the 2021 Cloudera Partner Awards. These six awards recognize Cloudera partners who are dedicated to enabling customers to do more with their data by leveraging the power of an enterprise data cloud. Thank you to this year’s winners for their partnership in helping our joint customers’ ability to drive value from their data in the hybrid cloud.

Healthcare

Healthcare Cloud Data Science Government

Monitoring Your Event Streams: Integrating Confluent with Prometheus and Grafana

Confluent

MARCH 29, 2021

Self-managing a highly scalable distributed system with Apache Kafka® at its core is not an easy feat. That’s why operators prefer tooling such as Confluent Control Center for administering and […].

Kafka

Kafka Systems Management IT

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

Building

Leave Your Data Where It Is And Automate Feature Extraction With Molecula

Data Engineering Podcast

MARCH 8, 2021

Summary A majority of the time spent in data engineering is copying data between systems to make the information available for different purposes. This introduces challenges such as keeping information synchronized, managing schema evolution, building transformations to match the expectations of the destination systems. H.O. Maycotte was faced with these same challenges but at a massive scale, leading him to question if there is a better way.

IT Data Warehouse MongoDB Kafka

A Day in the Life of an Experimentation and Causal Inference Scientist @ Netflix

Netflix Tech

MARCH 2, 2021

Stephanie Lane , Wenjing Zheng , Mihir Tendulkar Source credit: Netflix Within the rapid expansion of data-related roles in the last decade, the title Data Scientist has emerged as an umbrella term for myriad skills and areas of business focus. What does this title mean within a given company, or even within a given industry? It can be hard to know from the outside.

Data Science

Data Science Machine Learning Entertainment Algorithm

Enterprise Data Operating Systems in the Cloud: Necessary, But Not Sufficient

Teradata

MARCH 11, 2021

Getting your Cloud data architecture right starts with understanding which data products you need, the roles they perform, & the functional & non-functional characteristics that those roles demand.

Cloud

Cloud Systems Data Architecture Architecture

Community, Metadata Management, and More: Top 10 Links From Across the Web

Data Council

MARCH 25, 2021

Here's our March 2021 roundup of links from across the web that we selected for you: 1. How to Build a Community (Fishtown Analytics) Claire Carroll's first personal blog post on community-building is a must-read. As Fishtown Analytics' community manager for the last 2.5 years, she's arguably behind the success of the dbt community and its best-in-class practices, so we expected good advice… but she really hit the ball out of the park with this one!

Metadata

Metadata Management Building IT

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

Project

CDP Endpoint Gateway provides Secure Access to CDP Public Cloud Services running in private networks

Cloudera

MARCH 22, 2021

Cloudera Data Platform (CDP) Public Cloud allows users to deploy analytic workloads into their cloud accounts. These workloads cover the entire data lifecycle and are managed from a central multi-cloud Cloudera Control Plane. CDP provides the flexibility to deploy these resources into public or private subnets. Nearly unanimously, we’ve seen customers deploy their workloads to private subnets.

Accessible

Accessible Accessibility Cloud Kafka

How to Tune RocksDB for Your Kafka Streams Application

Confluent

MARCH 10, 2021

Apache Kafka ships with Kafka Streams, a powerful yet lightweight client library for Java and Scala to implement highly scalable and elastic applications and microservices that process and analyze data […].

Kafka

Kafka Scala Java Process

Promisifying Your Node Callback Functions

Grouparoo

MARCH 24, 2021

The Grouparoo application is written in JavaScript (Node). It uses the modern promise-based pattern ( async / await ) for reading and writing data asynchronously. And we do this a lot — we are a data sync tool! Every once in awhile we'll come across a JavaScript library that is written around the old callback-based pattern, where the error object is the first parameter in the callback function, followed by the result.

Utilities

Utilities Coding IT Data

Scaling Revenue & Growth Tooling

Netflix Tech

MARCH 22, 2021

Written by Nick Tomlin , Michael Possumato , and Rahul Pilani. This post shares how the Revenue & Growth Tools (RGT) team approaches creating full-stack tools for the teams that are the financial backbone of Netflix. Our primary partners are the teams of Revenue and Growth Engineering (RGE): Growth, Membership, Billing, Payments, and Partner Subscription.

Metadata

Metadata Portfolio Java Engineering

The Big Payoff of Application Analytics

Outdated or absent analytics won’t cut it in today’s data-driven applications – not for your end users, your development team, or your business. That’s what drove the five companies in this e-book to change their approach to analytics. Download this e-book to learn about the unique problems each company faced and how they achieved huge returns beyond expectation by embedding analytics into applications.

Building

Enhancing Customer Experience with Every Journey

Teradata

MARCH 4, 2021

Big Tech giants dominate by using data to improve product & experience. The auto industry can emulate this by analyzing data to improve customer experience & guide individual choices.

Data

Building the Future of Payments With RippleNet’s VP of Engineering

Ripple Engineering

MARCH 25, 2021

Amidst the work-from-home environment, Vidya Mani joined Ripple in early 2020 as the Vice President of Engineering for RippleNet. A year into her role, she focuses on improving Ripple’s infrastructure and strengthening her team to further the company’s vision for a more inclusive financial system. RippleNet is an enterprise solution which helps banks and other financial institutions streamline global payments and reach new customers.

Building

Building Engineering Banking Finance

Data governance beyond SDX: Adding third party assets to Apache Atlas

Cloudera

MARCH 9, 2021

Governance and the sustainable handling of data is a critical success factor in virtually all organizations. While Cloudera Data Platform (CDP) already supports the entire data lifecycle from ‘Edge to AI’, we at Cloudera are fully aware that enterprises have more systems outside of CDP. It is crucial to avoid that CDP becomes the next silo in your IT landscape.

Data Governance

Data Governance Government Metadata Datasets

To Pull or to Push Your Data with Kafka Connect? That Is the Question.

Confluent

MARCH 2, 2021

Today, every company is a data company. There are many different data pipeline, integration, and ingestion tools in the market, but before you can feed your data analytics needs, data […].

Kafka

Kafka Data Pipeline Data Analytics Data

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

Certification

Data-driven performance improvements: Distance running and data

Retail Insight

MARCH 18, 2021

As a former distance runner, I have seen first-hand how investment in elite sport is accelerating athletic performance. Just as the world has developed, our physical capabilities have too. New lifestyles, technologies, science , and world-class facilities all help to enabl e athletes to go ‘Faster – Higher – Stronger’, as the Olympics motto states.

Data

Data Technology

Dogfooding your product

Grouparoo

MARCH 17, 2021

“Eating your own dogfood” or “dogfooding” is a term that always felt a bit odd to me, but the principles underlying it are incredibly important to product teams small and large. In short, Dogfooding means using your own product in order to better empathize with your users. When you build more empathy for your users, you build a better product. I’ll be sharing some thoughts on why dogfooding is important and some pointers on how to dogfood well.

Building

Building IT

Don’t Just Collect Vehicle Data – Monetize It!

Teradata

MARCH 23, 2021

As the auto sector transforms, vehicle data is becoming one of the most important sources of insight. But if it is left in fragmented silos, it quickly becomes a cost & delivers little value.

IT Data

Deep Learning vs Machine Learning -What's the Difference?

ProjectPro

MARCH 17, 2021

“Machine Learning” and “Deep Learning” – are two of the most often confused and conflated terms that are used interchangeably in the AI world. However, there is one undeniable fact that both machine learning and deep learning are undergoing skyrocketing growth. According to Forbes , the global machine learning market will be worth $30.6 billion by 2024 and the deep learning market size is expected to reach $10.2 billion by 2025, expanding at a CAGR of 42.8% and 52.1

Deep Learning

Deep Learning Machine Learning Algorithm Datasets

Driving Business Impact for PMs

Speaker: Jon Harmer, Product Manager for Google Cloud

Move from feature factory to customer outcomes and drive impact in your business! This session will provide you with a comprehensive set of tools to help you develop impactful products by shifting from output-based thinking to outcome-based thinking. You will deepen your understanding of your customers and their needs as well as identifying and de-risking the different kinds of hypotheses built into your roadmap.

Certification

March, 2021

Building a Data Engineering Project in 20 Minutes

Toward a Data Mesh (part 2) : Architecture & Technologies

Webinars

Trending Sources

Apache Kafka Made Simple: A First Glimpse of a Kafka Without ZooKeeper

Webinars

Data Quality Management For The Whole Team With Soda Data

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

How to trigger a spark job from AWS Lambda

CFO Analytics: What Is It and Why Should You Care?

Building a Data Engineering Project in 20 Minutes

Sign up to get articles personalized to your interests!

More Trending

Building a Data Engineering Project in 20 Minutes

International Women’s Day 2021: Challenging what’s possible

Under the Hood of Real-Time Analytics with Apache Kafka and Pinot

Real World Change Data Capture At Datacoral

ConsoleMe: A Central Control Plane for AWS Permissions and Access

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

How to Host a Virtual Global Data Science Hackathon

Reverse ETL with dbt and Grouparoo

Congratulations to our 2021 Partner Award Winners

Monitoring Your Event Streams: Integrating Confluent with Prometheus and Grafana

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Leave Your Data Where It Is And Automate Feature Extraction With Molecula

A Day in the Life of an Experimentation and Causal Inference Scientist @ Netflix

Enterprise Data Operating Systems in the Cloud: Necessary, But Not Sufficient

Community, Metadata Management, and More: Top 10 Links From Across the Web

Entity Resolution Checklist: What to Consider When Evaluating Options

CDP Endpoint Gateway provides Secure Access to CDP Public Cloud Services running in private networks

How to Tune RocksDB for Your Kafka Streams Application

Promisifying Your Node Callback Functions

Scaling Revenue & Growth Tooling

The Big Payoff of Application Analytics

Enhancing Customer Experience with Every Journey

Building the Future of Payments With RippleNet’s VP of Engineering

Data governance beyond SDX: Adding third party assets to Apache Atlas

To Pull or to Push Your Data with Kafka Connect? That Is the Question.

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Data-driven performance improvements: Distance running and data

Dogfooding your product

Don’t Just Collect Vehicle Data – Monetize It!

Deep Learning vs Machine Learning -What's the Difference?

Driving Business Impact for PMs

Stay Connected