February, 2023

article thumbnail

AWS Lambdas – Python vs Rust. Performance and Cost Savings.

Confessions of a Data Guy

Save money, save money!! Hear Hear! Someone on Linkedin recently brought up the point that companies could save gobs of money by swapping out AWS Python lambdas for Rust ones. While it raised the ire of many a Python Data Engineer, I thought it sounded like a great idea. At least it’s an excuse to […] The post AWS Lambdas – Python vs Rust.

AWS 356
article thumbnail

Azure Databricks: A Comprehensive Guide

Analytics Vidhya

Introduction Azure Databricks is a fast, easy, and collaborative Apache Spark-based analytics platform that is built on top of the Microsoft Azure cloud. A collaborative and interactive workspace allows users to perform big data processing and machine learning tasks easily. In this blog post, we will take a closer look at Azure Databricks, its key features, […] The post Azure Databricks: A Comprehensive Guide appeared first on Analytics Vidhya.

Big Data 310
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Finding My Pathless Path

Simon Späti

As I sit down to write this article, I’m filled with a sense of vulnerability and excitement. You see, this is a story that only I can tell. It’s a tale of finding my Pathless Path and discovering who I am in the process. I have learned that some of my best decision-making comes from following my gut, heart, and intuition, a place of inner knowing.

Process 289
article thumbnail

The job market for new grads: worse than in 2008, but better than 2002

The Pragmatic Engineer

Originally published on 23 Feb 2023 👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of five topics in today’s subscriber-only The Scoop issue. If you're not yet a full subscriber, you missed the in-depth analysis this week: Are tech companies aggressively cutting back on vendor spend?

article thumbnail

Get Better Network Graphs & Save Analysts Time

Many organizations today are unlocking the power of their data by using graph databases to feed downstream analytics, enahance visualizations, and more. Yet, when different graph nodes represent the same entity, graphs get messy. Watch this essential video with Senzing CEO Jeff Jonas on how adding entity resolution to a graph database condenses network graphs to improve analytics and save your analysts time.

article thumbnail

Docker for Data Science Cheat Sheet

KDnuggets

Docker is dependency management on steroids, helping to ensure both reproducibility and collaboration, making it an important tool for data science. Our latest cheat sheet serves as a handy Docker reference. Check it out now!

article thumbnail

The Ultimate Guide to Java Virtual Threads

Rock the JVM

Another tour de force by Riccardo Cardin. Riccardo is a proud alumnus of Rock the JVM, now a senior engineer working on critical systems written in Java, Scala and Kotlin. Version 19 of Java came at the end of 2022, bringing us a lot of exciting stuff. One of the coolest is the preview of some hot topics concerning Project Loom: virtual threads ( JEP 425 ) and structured concurrency ( JEP 428 ).

Java 145

More Trending

article thumbnail

30 Best Data Science Books to Read in 2023

Analytics Vidhya

Introduction Data science has taken over all economic sectors in recent times. To achieve maximum efficiency, every company strives to use various data at every stage of its operations. Each aspect of data science, like data preparation, the importance of big data, and the process of automation, contributes to how data science is the future […] The post 30 Best Data Science Books to Read in 2023 appeared first on Analytics Vidhya.

article thumbnail

The evolution of Facebook’s iOS app architecture

Engineering at Meta

Facebook for iOS (FBiOS) is the oldest mobile codebase at Meta. Since the app was rewritten in 2012 , it has been worked on by thousands of engineers and shipped to billions of users, and it can support hundreds of engineers iterating on it at a time. After years of iteration , the Facebook codebase does not resemble a typical iOS codebase: It’s full of C++, Objective-C(++), and Swift.

article thumbnail

ChatGPT for Coding: Unleash the Power of ChatGPT

Edureka

We are introduced to new discoveries and technologies every day, and one of the best and most popular inventions today is artificial intelligence (AI) and its tools. One of them is Chat GPT, a conversational model of AI that is a powerful chatbot that answers follow-up questions and writes code for the users. The day it was launched, everybody was going gaga over the new technology and the remarkable uses of this AI-powered chatbot.

Coding 130
article thumbnail

20 Questions (with Answers) to Detect Fake Data Scientists: ChatGPT Edition, Part 2

KDnuggets

Can ChatGPT provide answers to data science questions to the same standard of humans? Check out this attempt to do so, and compare the answers to those from experts.

article thumbnail

Understanding User Needs and Satisfying Them

Speaker: Scott Sehlhorst

We know we want to create products which our customers find to be valuable. Whether we label it as customer-centric or product-led depends on how long we've been doing product management. There are three challenges we face when doing this. The obvious challenge is figuring out what our users need; the non-obvious challenges are in creating a shared understanding of those needs and in sensing if what we're doing is meeting those needs.

article thumbnail

Apache Kafka Beyond the Basics: Windowing

Confluent

Learn what windowing is, the difference between the four types of windows (hopping and tumbling, or session and sliding), and how to create them.

Kafka 140
article thumbnail

Ownership and Borrowing in Rust – Data Engineering Gold Mine.

Confessions of a Data Guy

As I started to use Rust on and off, more out of curiosity than anything, I discovered some specs of gold buried down in the depths. Some of the things I’m going to talk about, well … all of it, is probably fairly obvious to most Rust folk, but it’s enjoyable to learn what new […] The post Ownership and Borrowing in Rust – Data Engineering Gold Mine. appeared first on Confessions of a Data Guy.

article thumbnail

A Deep Dive into Data Replication: Most Effective Way to Protect Your Data 

Analytics Vidhya

Introduction Data replication is also known as database replication, which is copying data to ensure that all information remains consistent across all data resources in real-time. data replication is like a safety net that keeps your information safe from disappearing or falling through the cracks. In most cases, data alters. It is constantly changing.

Database 269
article thumbnail

Improving Meta’s global maps

Engineering at Meta

A lot has changed since the initial launch of our basemap in late 2020. We’re Meta now, but our mission remains the same: Giving people the power to build community and bring the world closer together. Across Meta, our family of applications (Facebook, Instagram, WhatsApp, among others) are using our basemap to connect people through functions like status updates, location sharing, and location-based searching.

article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

Regulation: Hurdle or Driver for Data Analytics in Financial Services

Teradata

In the aftermath of the 2008 financial crash, service providers have been subject to increasing rules & requirements. To what extent has this climate held back advances in data analytics?

article thumbnail

PySpark for Data Science

KDnuggets

In this tutorial, we will learn to Initiates the Spark session, load, and process the data, perform data analysis, and train a machine learning model.

article thumbnail

Dynamic vs. Static Consumer Membership in Apache Kafka

Confluent

There are two main consumer group memberships in Apache Kafka®. Here’s how static and dynamic consumer groups work, how they affect rebalancing, and which to choose for your application.

Kafka 120
article thumbnail

Pinterest is now on HTTP/3

Pinterest Engineering

Liang Ma | Software Engineer, Core Eng; Scott Beardsley | Engineering Manager, Traffic; Haowei Yuan | Software Engineer, Traffic Figure 1 — HTTP/3 at Pinterest Now Pinterest operates on HTTP/3. We have enabled HTTP/3 for major Pinterest production domains on our multi-CDN edge network, and we’ve upgraded client apps’ network stack to support the new protocol.

Bytes 115
article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

Getting Started with The Basics of Docker

Analytics Vidhya

Introduction “Let’s containerize your code to ship worldwide!” If you read the above quote, you must think, what does this all mean? Well, my friend, this is what Docker is. Let me explain it with an example. Say Harish and Lisa are two people working on the same project but on two different systems(say windows and […] The post Getting Started with The Basics of Docker appeared first on Analytics Vidhya.

Coding 257
article thumbnail

Announcing Ray support on Databricks and Apache Spark Clusters

databricks

Ray is a prominent compute framework for running scalable AI and Python workloads, offering a variety of distributed machine learning tools, large-scale hyperparameter.

article thumbnail

SQL Streambuilder Data Transformations

Cloudera

SQL Stream Builder (SSB) is a versatile platform for data analytics using SQL as a part of Cloudera Streaming Analytics, built on top of Apache Flink. It enables users to easily write, run, and manage real-time continuous SQL queries on stream data and a smooth user experience. Though SQL is a mature and well understood language for querying data, it is inherently a typed language.

SQL 108
article thumbnail

10 Free Machine Learning Courses from Top Universities

KDnuggets

Learn the basics of machine learning, including classification, SVM, decision tree learning, neural networks, convolutional, neural networks, boosting, and K nearest neighbors.

article thumbnail

The Big Payoff of Application Analytics

Outdated or absent analytics won’t cut it in today’s data-driven applications – not for your end users, your development team, or your business. That’s what drove the five companies in this e-book to change their approach to analytics. Download this e-book to learn about the unique problems each company faced and how they achieved huge returns beyond expectation by embedding analytics into applications.

article thumbnail

Combining CDC Transactional Messages Using Kafka Streams

Confluent

How to use Kafka Streams to aggregate change data capture (CDC) messages from a relational database into transactional messages, powering a scalable microservices architecture.

Kafka 108
article thumbnail

Deploying Data Pipelines using the Saga pattern

Picnic Engineering

Delivering the right events at low latency and with a high volume is critical to Picnic’s system architecture. In our previous blog, Dima Kalashnikov explained how we configure our Internal services pipeline in the Analytics Platform. In this post, we will explain how our team automates the creation of new data pipeline deployments. The step towards automation was an important improvement for us, as the previous setup was manual, slow, and error-prone.

article thumbnail

How to Normalize Relational Databases With SQL Code?

Analytics Vidhya

Introduction Data is the new oil in this century. The database is the major element of a data science project. To generate actionable insights, the database must be centralized and organized efficiently. If a corrupted, unorganized, or redundant database is used, the results of the analysis may become inconsistent and highly misleading. So, we are […] The post How to Normalize Relational Databases With SQL Code?

article thumbnail

Hodor: Overload scenarios and the evolution of their detection and handling

LinkedIn Engineering

Co-Authors - Abhishek Gilra , Nizar Mankulangara , Salil Kanitkar , and Vivek Deshpande Introduction To connect professionals and make them more productive, it is crucial that LinkedIn is available at all times. For us, downtime means that our members and customers don’t have access to the conversations, connections, and knowledge that are essential to them achieving their objectives.

Algorithm 101
article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

Best Data Science Companies for Data Scientists !

U-Next

Introduction Data Science is revolutionizing the business world, and it has opened up unique opportunities for businesses to grow. Businesses are now looking for Data Scientists to help them make a difference in their company’s performance and reach even further. Data Science companies started to emerge due to this need for new people who can help businesses solve problems through data analytics.

article thumbnail

skops: a new library to improve scikit-learn in production

KDnuggets

There are various challenges in MLOps and model sharing, including, security and reproducibility. To tackle these for scikit-learn models, we've developed a new open-source library: skops. In this article, I will walk you through how it works and how to use it with an end-to-end example.

IT 138
article thumbnail

Apache Kafka with Control and Data Planes

Confluent

With the advent of service mesh and microservices, control and data planes have become popular. This post shows you how to ensure security and governance controls in your Kafka system.

Kafka 104
article thumbnail

Running a NixOS VM on macOS

Tweag

In this post I want to explore the current issues with developing parts of NixOS on macOS and how we can make this task easier. Why would I want to run a NixOS virtual machine on macOS? My colleague at Tweag, Dominic Steinitz, asked me this question after I shared my first minor achievement in this area, and it struck me that I have never described why exactly I run virtual machines (VMs) on my laptop and why I want to make it easier for myself (and others).

Systems 101
article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.