Top Data Engineering Digest Data Engineer Data Engineering Content for February, 2023

February, 2023

AWS Lambdas – Python vs Rust. Performance and Cost Savings.

Confessions of a Data Guy

FEBRUARY 26, 2023

Save money, save money!! Hear Hear! Someone on Linkedin recently brought up the point that companies could save gobs of money by swapping out AWS Python lambdas for Rust ones. While it raised the ire of many a Python Data Engineer, I thought it sounded like a great idea. At least it’s an excuse to […] The post AWS Lambdas – Python vs Rust.

AWS

AWS Python Data Engineering Data Engineer

Azure Databricks: A Comprehensive Guide

Analytics Vidhya

FEBRUARY 28, 2023

Introduction Azure Databricks is a fast, easy, and collaborative Apache Spark-based analytics platform that is built on top of the Microsoft Azure cloud. A collaborative and interactive workspace allows users to perform big data processing and machine learning tasks easily. In this blog post, we will take a closer look at Azure Databricks, its key features, […] The post Azure Databricks: A Comprehensive Guide appeared first on Analytics Vidhya.

Big Data

Big Data Machine Learning Cloud Data Process

Join 16,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Trending Sources

Finding My Pathless Path

Simon Späti

FEBRUARY 25, 2023

As I sit down to write this article, I’m filled with a sense of vulnerability and excitement. You see, this is a story that only I can tell. It’s a tale of finding my Pathless Path and discovering who I am in the process. I have learned that some of my best decision-making comes from following my gut, heart, and intuition, a place of inner knowing.

Process

Process IT

Webinars

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

The job market for new grads: worse than in 2008, but better than 2002

The Pragmatic Engineer

FEBRUARY 23, 2023

Originally published on 23 Feb 2023 👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of five topics in today’s subscriber-only The Scoop issue. If you're not yet a full subscriber, you missed the in-depth analysis this week: Are tech companies aggressively cutting back on vendor spend?

Software Engineer

Software Engineer Software Engineering Recruitment Portfolio

Get Better Network Graphs & Save Analysts Time

Many organizations today are unlocking the power of their data by using graph databases to feed downstream analytics, enahance visualizations, and more. Yet, when different graph nodes represent the same entity, graphs get messy. Watch this essential video with Senzing CEO Jeff Jonas on how adding entity resolution to a graph database condenses network graphs to improve analytics and save your analysts time.

Database

Docker for Data Science Cheat Sheet

KDnuggets

FEBRUARY 14, 2023

Docker is dependency management on steroids, helping to ensure both reproducibility and collaboration, making it an important tool for data science. Our latest cheat sheet serves as a handy Docker reference. Check it out now!

Data Science

Data Science Data Management IT

The Ultimate Guide to Java Virtual Threads

Rock the JVM

FEBRUARY 22, 2023

Another tour de force by Riccardo Cardin. Riccardo is a proud alumnus of Rock the JVM, now a senior engineer working on critical systems written in Java, Scala and Kotlin. Version 19 of Java came at the end of 2022, bringing us a lot of exciting stuff. One of the coolest is the preview of some hot topics concerning Project Loom: virtual threads ( JEP 425 ) and structured concurrency ( JEP 428 ).

Java

Java Programming Coding Scala

Data Types in Delta Lake + Spark. Join and Storage Performance.

Confessions of a Data Guy

FEBRUARY 10, 2023

Hmm … data types. We all know they are important, but we don’t take them very seriously. I mean we know the difference between boolean, string, and integers, those are easy to get right. But we all get sloppy, sometimes we got the string and varchar route because we don’t spend enough time on the […] The post Data Types in Delta Lake + Spark.

Data

Data Big Data Data Engineering Data Engineer

More Trending

Data Types in Delta Lake + Spark. Join and Storage Performance.

Confessions of a Data Guy

FEBRUARY 10, 2023

Data

Data Big Data Data Engineering Data Engineer

30 Best Data Science Books to Read in 2023

Analytics Vidhya

FEBRUARY 28, 2023

Introduction Data science has taken over all economic sectors in recent times. To achieve maximum efficiency, every company strives to use various data at every stage of its operations. Each aspect of data science, like data preparation, the importance of big data, and the process of automation, contributes to how data science is the future […] The post 30 Best Data Science Books to Read in 2023 appeared first on Analytics Vidhya.

Data Science

Data Science Data Preparation Big Data Data

The evolution of Facebook’s iOS app architecture

Engineering at Meta

FEBRUARY 6, 2023

Facebook for iOS (FBiOS) is the oldest mobile codebase at Meta. Since the app was rewritten in 2012 , it has been worked on by thousands of engineers and shipped to billions of users, and it can support hundreds of engineers iterating on it at a time. After years of iteration , the Facebook codebase does not resemble a typical iOS codebase: It’s full of C++, Objective-C(++), and Swift.

Architecture

Architecture Coding Engineering Systems

ChatGPT for Coding: Unleash the Power of ChatGPT

Edureka

FEBRUARY 8, 2023

We are introduced to new discoveries and technologies every day, and one of the best and most popular inventions today is artificial intelligence (AI) and its tools. One of them is Chat GPT, a conversational model of AI that is a powerful chatbot that answers follow-up questions and writes code for the users. The day it was launched, everybody was going gaga over the new technology and the remarkable uses of this AI-powered chatbot.

Coding

Coding Deep Learning Programming Java

20 Questions (with Answers) to Detect Fake Data Scientists: ChatGPT Edition, Part 2

KDnuggets

FEBRUARY 1, 2023

Can ChatGPT provide answers to data science questions to the same standard of humans? Check out this attempt to do so, and compare the answers to those from experts.

Data Science

Data Science Data

Understanding User Needs and Satisfying Them

Speaker: Scott Sehlhorst

We know we want to create products which our customers find to be valuable. Whether we label it as customer-centric or product-led depends on how long we've been doing product management. There are three challenges we face when doing this. The obvious challenge is figuring out what our users need; the non-obvious challenges are in creating a shared understanding of those needs and in sensing if what we're doing is meeting those needs.

Certification

Apache Kafka Beyond the Basics: Windowing

Confluent

FEBRUARY 8, 2023

Learn what windowing is, the difference between the four types of windows (hopping and tumbling, or session and sliding), and how to create them.

Kafka

Ownership and Borrowing in Rust – Data Engineering Gold Mine.

Confessions of a Data Guy

FEBRUARY 7, 2023

As I started to use Rust on and off, more out of curiosity than anything, I discovered some specs of gold buried down in the depths. Some of the things I’m going to talk about, well … all of it, is probably fairly obvious to most Rust folk, but it’s enjoyable to learn what new […] The post Ownership and Borrowing in Rust – Data Engineering Gold Mine. appeared first on Confessions of a Data Guy.

Data Engineering

Data Engineering Data Engineer Engineering Data

A Deep Dive into Data Replication: Most Effective Way to Protect Your Data

Analytics Vidhya

FEBRUARY 22, 2023

Introduction Data replication is also known as database replication, which is copying data to ensure that all information remains consistent across all data resources in real-time. data replication is like a safety net that keeps your information safe from disappearing or falling through the cracks. In most cases, data alters. It is constantly changing.

Database

Database Data NoSQL Datasets

Improving Meta’s global maps

Engineering at Meta

FEBRUARY 7, 2023

A lot has changed since the initial launch of our basemap in late 2020. We’re Meta now, but our mission remains the same: Giving people the power to build community and bring the world closer together. Across Meta, our family of applications (Facebook, Instagram, WhatsApp, among others) are using our basemap to connect people through functions like status updates, location sharing, and location-based searching.

Entertainment

Entertainment Transportation Data Schemas AWS

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

Data Science

Regulation: Hurdle or Driver for Data Analytics in Financial Services

Teradata

FEBRUARY 9, 2023

In the aftermath of the 2008 financial crash, service providers have been subject to increasing rules & requirements. To what extent has this climate held back advances in data analytics?

Data Analytics

Data Analytics Data

PySpark for Data Science

KDnuggets

FEBRUARY 27, 2023

In this tutorial, we will learn to Initiates the Spark session, load, and process the data, perform data analysis, and train a machine learning model.

Data Science

Data Science Machine Learning Data Analysis Data

Dynamic vs. Static Consumer Membership in Apache Kafka

Confluent

FEBRUARY 15, 2023

There are two main consumer group memberships in Apache Kafka®. Here’s how static and dynamic consumer groups work, how they affect rebalancing, and which to choose for your application.

Kafka

Pinterest is now on HTTP/3

Pinterest Engineering

FEBRUARY 23, 2023

Liang Ma | Software Engineer, Core Eng; Scott Beardsley | Engineering Manager, Traffic; Haowei Yuan | Software Engineer, Traffic Figure 1 — HTTP/3 at Pinterest Now Pinterest operates on HTTP/3. We have enabled HTTP/3 for major Pinterest production domains on our multi-CDN edge network, and we’ve upgraded client apps’ network stack to support the new protocol.

Bytes

Bytes Media Software Engineer Software Engineering

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

Building

Getting Started with The Basics of Docker

Analytics Vidhya

FEBRUARY 3, 2023

Introduction “Let’s containerize your code to ship worldwide!” If you read the above quote, you must think, what does this all mean? Well, my friend, this is what Docker is. Let me explain it with an example. Say Harish and Lisa are two people working on the same project but on two different systems(say windows and […] The post Getting Started with The Basics of Docker appeared first on Analytics Vidhya.

Coding

Coding Project Systems IT

Announcing Ray support on Databricks and Apache Spark Clusters

databricks

FEBRUARY 27, 2023

Ray is a prominent compute framework for running scalable AI and Python workloads, offering a variety of distributed machine learning tools, large-scale hyperparameter.

Machine Learning

Machine Learning Python Engineering

SQL Streambuilder Data Transformations

Cloudera

FEBRUARY 21, 2023

SQL Stream Builder (SSB) is a versatile platform for data analytics using SQL as a part of Cloudera Streaming Analytics, built on top of Apache Flink. It enables users to easily write, run, and manage real-time continuous SQL queries on stream data and a smooth user experience. Though SQL is a mature and well understood language for querying data, it is inherently a typed language.

SQL

SQL Kafka Raw Data Data

10 Free Machine Learning Courses from Top Universities

KDnuggets

FEBRUARY 2, 2023

Learn the basics of machine learning, including classification, SVM, decision tree learning, neural networks, convolutional, neural networks, boosting, and K nearest neighbors.

Machine Learning

The Big Payoff of Application Analytics

Outdated or absent analytics won’t cut it in today’s data-driven applications – not for your end users, your development team, or your business. That’s what drove the five companies in this e-book to change their approach to analytics. Download this e-book to learn about the unique problems each company faced and how they achieved huge returns beyond expectation by embedding analytics into applications.

Building

Combining CDC Transactional Messages Using Kafka Streams

Confluent

FEBRUARY 23, 2023

How to use Kafka Streams to aggregate change data capture (CDC) messages from a relational database into transactional messages, powering a scalable microservices architecture.

Kafka

Kafka Relational Database Architecture Database

Deploying Data Pipelines using the Saga pattern

Picnic Engineering

FEBRUARY 8, 2023

Delivering the right events at low latency and with a high volume is critical to Picnic’s system architecture. In our previous blog, Dima Kalashnikov explained how we configure our Internal services pipeline in the Analytics Platform. In this post, we will explain how our team automates the creation of new data pipeline deployments. The step towards automation was an important improvement for us, as the previous setup was manual, slow, and error-prone.

Data Pipeline

Data Pipeline Kafka Data Architecture

How to Normalize Relational Databases With SQL Code?

Analytics Vidhya

FEBRUARY 27, 2023

Introduction Data is the new oil in this century. The database is the major element of a data science project. To generate actionable insights, the database must be centralized and organized efficiently. If a corrupted, unorganized, or redundant database is used, the results of the analysis may become inconsistent and highly misleading. So, we are […] The post How to Normalize Relational Databases With SQL Code?

Relational Database

Relational Database Database SQL Coding

Hodor: Overload scenarios and the evolution of their detection and handling

LinkedIn Engineering

FEBRUARY 23, 2023

Co-Authors - Abhishek Gilra , Nizar Mankulangara , Salil Kanitkar , and Vivek Deshpande Introduction To connect professionals and make them more productive, it is crucial that LinkedIn is available at all times. For us, downtime means that our members and customers don’t have access to the conversations, connections, and knowledge that are essential to them achieving their objectives.

Algorithm

Algorithm Java Designing Systems

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

Engineering

Best Data Science Companies for Data Scientists !

U-Next

FEBRUARY 26, 2023

Introduction Data Science is revolutionizing the business world, and it has opened up unique opportunities for businesses to grow. Businesses are now looking for Data Scientists to help them make a difference in their company’s performance and reach even further. Data Science companies started to emerge due to this need for new people who can help businesses solve problems through data analytics.

Data Science

Data Science Machine Learning Food Consulting

skops: a new library to improve scikit-learn in production

KDnuggets

FEBRUARY 1, 2023

There are various challenges in MLOps and model sharing, including, security and reproducibility. To tackle these for scikit-learn models, we've developed a new open-source library: skops. In this article, I will walk you through how it works and how to use it with an end-to-end example.

Apache Kafka with Control and Data Planes

Confluent

FEBRUARY 21, 2023

With the advent of service mesh and microservices, control and data planes have become popular. This post shows you how to ensure security and governance controls in your Kafka system.

Kafka

Kafka Government Data Systems

Running a NixOS VM on macOS

Tweag

FEBRUARY 8, 2023

In this post I want to explore the current issues with developing parts of NixOS on macOS and how we can make this task easier. Why would I want to run a NixOS virtual machine on macOS? My colleague at Tweag, Dominic Steinitz, asked me this question after I shared my first minor achievement in this area, and it struck me that I have never described why exactly I run virtual machines (VMs) on my laptop and why I want to make it easier for myself (and others).

Systems

Systems Building Cloud IT

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

Certification

February, 2023

AWS Lambdas – Python vs Rust. Performance and Cost Savings.

Azure Databricks: A Comprehensive Guide

Webinars

Trending Sources

Finding My Pathless Path

Webinars

The job market for new grads: worse than in 2008, but better than 2002

Get Better Network Graphs & Save Analysts Time

Docker for Data Science Cheat Sheet

The Ultimate Guide to Java Virtual Threads

Data Types in Delta Lake + Spark. Join and Storage Performance.

Sign up to get articles personalized to your interests!

More Trending

Data Types in Delta Lake + Spark. Join and Storage Performance.

30 Best Data Science Books to Read in 2023

The evolution of Facebook’s iOS app architecture

ChatGPT for Coding: Unleash the Power of ChatGPT

20 Questions (with Answers) to Detect Fake Data Scientists: ChatGPT Edition, Part 2

Understanding User Needs and Satisfying Them

Apache Kafka Beyond the Basics: Windowing

Ownership and Borrowing in Rust – Data Engineering Gold Mine.

A Deep Dive into Data Replication: Most Effective Way to Protect Your Data

Improving Meta’s global maps

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Regulation: Hurdle or Driver for Data Analytics in Financial Services

PySpark for Data Science

Dynamic vs. Static Consumer Membership in Apache Kafka

Pinterest is now on HTTP/3

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Getting Started with The Basics of Docker

Announcing Ray support on Databricks and Apache Spark Clusters

SQL Streambuilder Data Transformations

10 Free Machine Learning Courses from Top Universities

The Big Payoff of Application Analytics

Combining CDC Transactional Messages Using Kafka Streams

Deploying Data Pipelines using the Saga pattern

How to Normalize Relational Databases With SQL Code?

Hodor: Overload scenarios and the evolution of their detection and handling

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Best Data Science Companies for Data Scientists !

skops: a new library to improve scikit-learn in production

Apache Kafka with Control and Data Planes

Running a NixOS VM on macOS

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Stay Connected