Top Data Engineering Digest Data Engineer Data Engineering Content for February, 2023

February, 2023

AWS Lambdas – Python vs Rust. Performance and Cost Savings.

Confessions of a Data Guy

FEBRUARY 26, 2023

Save money, save money!! Hear Hear! Someone on Linkedin recently brought up the point that companies could save gobs of money by swapping out AWS Python lambdas for Rust ones. While it raised the ire of many a Python Data Engineer, I thought it sounded like a great idea. At least it’s an excuse to […] The post AWS Lambdas – Python vs Rust.

AWS

AWS Python Data Engineering Data Engineer

Azure Databricks: A Comprehensive Guide

Analytics Vidhya

FEBRUARY 28, 2023

Introduction Azure Databricks is a fast, easy, and collaborative Apache Spark-based analytics platform that is built on top of the Microsoft Azure cloud. A collaborative and interactive workspace allows users to perform big data processing and machine learning tasks easily. In this blog post, we will take a closer look at Azure Databricks, its key features, […] The post Azure Databricks: A Comprehensive Guide appeared first on Analytics Vidhya.

Big Data

Big Data Machine Learning Cloud Data Process

Join 16,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Trending Sources

Finding My Pathless Path

Simon Späti

FEBRUARY 25, 2023

As I sit down to write this article, I’m filled with a sense of vulnerability and excitement. You see, this is a story that only I can tell. It’s a tale of finding my Pathless Path and discovering who I am in the process. I have learned that some of my best decision-making comes from following my gut, heart, and intuition, a place of inner knowing.

Process

Process IT

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

The job market for new grads: worse than in 2008, but better than 2002

The Pragmatic Engineer

FEBRUARY 23, 2023

Originally published on 23 Feb 2023 👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of five topics in today’s subscriber-only The Scoop issue. If you're not yet a full subscriber, you missed the in-depth analysis this week: Are tech companies aggressively cutting back on vendor spend?

Software Engineer

Software Engineer Software Engineering Recruitment Portfolio

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Speaker: Maher Hanafi, VP of Engineering at Betterworks & Tony Karrer, CTO at Aggregage

Executive leaders and board members are pushing their teams to adopt Generative AI to gain a competitive edge, save money, and otherwise take advantage of the promise of this new era of artificial intelligence. There's no question that it is challenging to figure out where to focus and how to advance when it’s a new field that is evolving everyday. 💡 This new webinar featuring Maher Hanafi, CTO of Betterworks, will explore a practical framework to transform Generative AI prototypes into

Data Collection

Docker for Data Science Cheat Sheet

KDnuggets

FEBRUARY 14, 2023

Docker is dependency management on steroids, helping to ensure both reproducibility and collaboration, making it an important tool for data science. Our latest cheat sheet serves as a handy Docker reference. Check it out now!

Data Science

Data Science Data Management IT

The Ultimate Guide to Java Virtual Threads

Rock the JVM

FEBRUARY 22, 2023

Another tour de force by Riccardo Cardin. Riccardo is a proud alumnus of Rock the JVM, now a senior engineer working on critical systems written in Java, Scala and Kotlin. Version 19 of Java came at the end of 2022, bringing us a lot of exciting stuff. One of the coolest is the preview of some hot topics concerning Project Loom: virtual threads ( JEP 425 ) and structured concurrency ( JEP 428 ).

Java

Java Programming Coding Scala

Data Types in Delta Lake + Spark. Join and Storage Performance.

Confessions of a Data Guy

FEBRUARY 10, 2023

Hmm … data types. We all know they are important, but we don’t take them very seriously. I mean we know the difference between boolean, string, and integers, those are easy to get right. But we all get sloppy, sometimes we got the string and varchar route because we don’t spend enough time on the […] The post Data Types in Delta Lake + Spark.

Data

Data Big Data Data Engineering Data Engineer

More Trending

Data Types in Delta Lake + Spark. Join and Storage Performance.

Confessions of a Data Guy

FEBRUARY 10, 2023

Data

Data Big Data Data Engineering Data Engineer

30 Best Data Science Books to Read in 2023

Analytics Vidhya

FEBRUARY 28, 2023

Introduction Data science has taken over all economic sectors in recent times. To achieve maximum efficiency, every company strives to use various data at every stage of its operations. Each aspect of data science, like data preparation, the importance of big data, and the process of automation, contributes to how data science is the future […] The post 30 Best Data Science Books to Read in 2023 appeared first on Analytics Vidhya.

Data Science

Data Science Data Preparation Big Data Data

The evolution of Facebook’s iOS app architecture

Engineering at Meta

FEBRUARY 6, 2023

Facebook for iOS (FBiOS) is the oldest mobile codebase at Meta. Since the app was rewritten in 2012 , it has been worked on by thousands of engineers and shipped to billions of users, and it can support hundreds of engineers iterating on it at a time. After years of iteration , the Facebook codebase does not resemble a typical iOS codebase: It’s full of C++, Objective-C(++), and Swift.

Architecture

Architecture Coding Engineering Systems

ChatGPT for Coding: Unleash the Power of ChatGPT

Edureka

FEBRUARY 8, 2023

We are introduced to new discoveries and technologies every day, and one of the best and most popular inventions today is artificial intelligence (AI) and its tools. One of them is Chat GPT, a conversational model of AI that is a powerful chatbot that answers follow-up questions and writes code for the users. The day it was launched, everybody was going gaga over the new technology and the remarkable uses of this AI-powered chatbot.

Coding

Coding Deep Learning Programming Java

PySpark for Data Science

KDnuggets

FEBRUARY 27, 2023

In this tutorial, we will learn to Initiates the Spark session, load, and process the data, perform data analysis, and train a machine learning model.

Data Science

Data Science Machine Learning Data Analysis Data

Leading the Development of Profitable and Sustainable Products

Speaker: Jason Tanner

While growth of software-enabled solutions generates momentum, growth alone is not enough to ensure sustainability. The probability of success dramatically improves with early planning for profitability. A sustainable business model contains a system of interrelated choices made not once but over time. Join this webinar for an iterative approach to ensuring solution, economic and relationship sustainability.

Certification

Apache Kafka Beyond the Basics: Windowing

Confluent

FEBRUARY 8, 2023

Learn what windowing is, the difference between the four types of windows (hopping and tumbling, or session and sliding), and how to create them.

Kafka

Ownership and Borrowing in Rust – Data Engineering Gold Mine.

Confessions of a Data Guy

FEBRUARY 7, 2023

As I started to use Rust on and off, more out of curiosity than anything, I discovered some specs of gold buried down in the depths. Some of the things I’m going to talk about, well … all of it, is probably fairly obvious to most Rust folk, but it’s enjoyable to learn what new […] The post Ownership and Borrowing in Rust – Data Engineering Gold Mine. appeared first on Confessions of a Data Guy.

Data Engineering

Data Engineering Data Engineer Engineering Data

A Deep Dive into Data Replication: Most Effective Way to Protect Your Data

Analytics Vidhya

FEBRUARY 22, 2023

Introduction Data replication is also known as database replication, which is copying data to ensure that all information remains consistent across all data resources in real-time. data replication is like a safety net that keeps your information safe from disappearing or falling through the cracks. In most cases, data alters. It is constantly changing.

Database

Database Data NoSQL Datasets

Improving Meta’s global maps

Engineering at Meta

FEBRUARY 7, 2023

A lot has changed since the initial launch of our basemap in late 2020. We’re Meta now, but our mission remains the same: Giving people the power to build community and bring the world closer together. Across Meta, our family of applications (Facebook, Instagram, WhatsApp, among others) are using our basemap to connect people through functions like status updates, location sharing, and location-based searching.

Entertainment

Entertainment Transportation Data Schemas AWS

Navigating the Future: Generative AI, Application Analytics, and Data

Generative AI is upending the way product developers & end-users alike are interacting with data. Despite the potential of AI, many are left with questions about the future of product development: How will AI impact my business and contribute to its success? What can product managers and developers expect in the future with the widespread adoption of AI?

Data

Pinterest is now on HTTP/3

Pinterest Engineering

FEBRUARY 23, 2023

Liang Ma | Software Engineer, Core Eng; Scott Beardsley | Engineering Manager, Traffic; Haowei Yuan | Software Engineer, Traffic Figure 1 — HTTP/3 at Pinterest Now Pinterest operates on HTTP/3. We have enabled HTTP/3 for major Pinterest production domains on our multi-CDN edge network, and we’ve upgraded client apps’ network stack to support the new protocol.

Bytes

Bytes Media Software Engineer Software Engineering

20 Questions (with Answers) to Detect Fake Data Scientists: ChatGPT Edition, Part 2

KDnuggets

FEBRUARY 1, 2023

Can ChatGPT provide answers to data science questions to the same standard of humans? Check out this attempt to do so, and compare the answers to those from experts.

Data Science

Data Science Data

Dynamic vs. Static Consumer Membership in Apache Kafka

Confluent

FEBRUARY 15, 2023

There are two main consumer group memberships in Apache Kafka®. Here’s how static and dynamic consumer groups work, how they affect rebalancing, and which to choose for your application.

Kafka

Regulation: Hurdle or Driver for Data Analytics in Financial Services

Teradata

FEBRUARY 9, 2023

In the aftermath of the 2008 financial crash, service providers have been subject to increasing rules & requirements. To what extent has this climate held back advances in data analytics?

Data Analytics

Data Analytics Data

Get Better Network Graphs & Save Analysts Time

Many organizations today are unlocking the power of their data by using graph databases to feed downstream analytics, enahance visualizations, and more. Yet, when different graph nodes represent the same entity, graphs get messy. Watch this essential video with Senzing CEO Jeff Jonas on how adding entity resolution to a graph database condenses network graphs to improve analytics and save your analysts time.

Database

Getting Started with The Basics of Docker

Analytics Vidhya

FEBRUARY 3, 2023

Introduction “Let’s containerize your code to ship worldwide!” If you read the above quote, you must think, what does this all mean? Well, my friend, this is what Docker is. Let me explain it with an example. Say Harish and Lisa are two people working on the same project but on two different systems(say windows and […] The post Getting Started with The Basics of Docker appeared first on Analytics Vidhya.

Coding

Coding Project Systems IT

Announcing Ray support on Databricks and Apache Spark Clusters

databricks

FEBRUARY 27, 2023

Ray is a prominent compute framework for running scalable AI and Python workloads, offering a variety of distributed machine learning tools, large-scale hyperparameter.

Machine Learning

Machine Learning Python Engineering

SQL Streambuilder Data Transformations

Cloudera

FEBRUARY 21, 2023

SQL Stream Builder (SSB) is a versatile platform for data analytics using SQL as a part of Cloudera Streaming Analytics, built on top of Apache Flink. It enables users to easily write, run, and manage real-time continuous SQL queries on stream data and a smooth user experience. Though SQL is a mature and well understood language for querying data, it is inherently a typed language.

SQL

SQL Kafka Raw Data Data

10 Free Machine Learning Courses from Top Universities

KDnuggets

FEBRUARY 2, 2023

Learn the basics of machine learning, including classification, SVM, decision tree learning, neural networks, convolutional, neural networks, boosting, and K nearest neighbors.

Machine Learning

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

Data Science

Combining CDC Transactional Messages Using Kafka Streams

Confluent

FEBRUARY 23, 2023

How to use Kafka Streams to aggregate change data capture (CDC) messages from a relational database into transactional messages, powering a scalable microservices architecture.

Kafka

Kafka Relational Database Architecture Database

Deploying Data Pipelines using the Saga pattern

Picnic Engineering

FEBRUARY 8, 2023

Delivering the right events at low latency and with a high volume is critical to Picnic’s system architecture. In our previous blog, Dima Kalashnikov explained how we configure our Internal services pipeline in the Analytics Platform. In this post, we will explain how our team automates the creation of new data pipeline deployments. The step towards automation was an important improvement for us, as the previous setup was manual, slow, and error-prone.

Data Pipeline

Data Pipeline Kafka Data Architecture

How to Normalize Relational Databases With SQL Code?

Analytics Vidhya

FEBRUARY 27, 2023

Introduction Data is the new oil in this century. The database is the major element of a data science project. To generate actionable insights, the database must be centralized and organized efficiently. If a corrupted, unorganized, or redundant database is used, the results of the analysis may become inconsistent and highly misleading. So, we are […] The post How to Normalize Relational Databases With SQL Code?

Relational Database

Relational Database Database SQL Coding

Hodor: Overload scenarios and the evolution of their detection and handling

LinkedIn Engineering

FEBRUARY 23, 2023

Co-Authors - Abhishek Gilra , Nizar Mankulangara , Salil Kanitkar , and Vivek Deshpande Introduction To connect professionals and make them more productive, it is crucial that LinkedIn is available at all times. For us, downtime means that our members and customers don’t have access to the conversations, connections, and knowledge that are essential to them achieving their objectives.

Algorithm

Algorithm Java Designing Systems

How To Get Promoted In Product Management

Speaker: John Mansour

If you're looking to advance your career in product management, there are more options than just climbing the management ladder. Join our upcoming webinar to learn about highly rewarding career paths that don't involve management responsibilities. We'll cover both career tracks and provide tips on how to position yourself for success in the one that's right for you.

Management

Running a NixOS VM on macOS

Tweag

FEBRUARY 8, 2023

In this post I want to explore the current issues with developing parts of NixOS on macOS and how we can make this task easier. Why would I want to run a NixOS virtual machine on macOS? My colleague at Tweag, Dominic Steinitz, asked me this question after I shared my first minor achievement in this area, and it struck me that I have never described why exactly I run virtual machines (VMs) on my laptop and why I want to make it easier for myself (and others).

Systems

Systems Building Cloud IT

skops: a new library to improve scikit-learn in production

KDnuggets

FEBRUARY 1, 2023

There are various challenges in MLOps and model sharing, including, security and reproducibility. To tackle these for scikit-learn models, we've developed a new open-source library: skops. In this article, I will walk you through how it works and how to use it with an end-to-end example.

Apache Kafka with Control and Data Planes

Confluent

FEBRUARY 21, 2023

With the advent of service mesh and microservices, control and data planes have become popular. This post shows you how to ensure security and governance controls in your Kafka system.

Kafka

Kafka Government Data Systems

Best Data Science Companies for Data Scientists !

U-Next

FEBRUARY 26, 2023

Introduction Data Science is revolutionizing the business world, and it has opened up unique opportunities for businesses to grow. Businesses are now looking for Data Scientists to help them make a difference in their company’s performance and reach even further. Data Science companies started to emerge due to this need for new people who can help businesses solve problems through data analytics.

Data Science

Data Science Machine Learning Consulting Food

How Embedded Analytics Gets You to Market Faster with a SAAS Offering

Start-ups & SMBs launching products quickly must bundle dashboards, reports, & self-service analytics into apps. Customers expect rapid value from your product (time-to-value), data security, and access to advanced capabilities. Traditional Business Intelligence (BI) tools can provide valuable data analysis capabilities, but they have a barrier to entry that can stop small and midsize businesses from capitalizing on them.

Data Analysis

February, 2023

AWS Lambdas – Python vs Rust. Performance and Cost Savings.

Azure Databricks: A Comprehensive Guide

Webinars

Trending Sources

Finding My Pathless Path

Webinars

The job market for new grads: worse than in 2008, but better than 2002

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Docker for Data Science Cheat Sheet

The Ultimate Guide to Java Virtual Threads

Data Types in Delta Lake + Spark. Join and Storage Performance.

Sign up to get articles personalized to your interests!

More Trending

Data Types in Delta Lake + Spark. Join and Storage Performance.

30 Best Data Science Books to Read in 2023

The evolution of Facebook’s iOS app architecture

ChatGPT for Coding: Unleash the Power of ChatGPT

PySpark for Data Science

Leading the Development of Profitable and Sustainable Products

Apache Kafka Beyond the Basics: Windowing

Ownership and Borrowing in Rust – Data Engineering Gold Mine.

A Deep Dive into Data Replication: Most Effective Way to Protect Your Data

Improving Meta’s global maps

Navigating the Future: Generative AI, Application Analytics, and Data

Pinterest is now on HTTP/3

20 Questions (with Answers) to Detect Fake Data Scientists: ChatGPT Edition, Part 2

Dynamic vs. Static Consumer Membership in Apache Kafka

Regulation: Hurdle or Driver for Data Analytics in Financial Services

Get Better Network Graphs & Save Analysts Time

Getting Started with The Basics of Docker

Announcing Ray support on Databricks and Apache Spark Clusters

SQL Streambuilder Data Transformations

10 Free Machine Learning Courses from Top Universities

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Combining CDC Transactional Messages Using Kafka Streams

Deploying Data Pipelines using the Saga pattern

How to Normalize Relational Databases With SQL Code?

Hodor: Overload scenarios and the evolution of their detection and handling

How To Get Promoted In Product Management

Running a NixOS VM on macOS

skops: a new library to improve scikit-learn in production

Apache Kafka with Control and Data Planes

Best Data Science Companies for Data Scientists !

How Embedded Analytics Gets You to Market Faster with a SAAS Offering

Stay Connected