2023

article thumbnail

Replacing Pandas with Polars. A Practical Guide.

Confessions of a Data Guy

I remember those days, oh so long ago, it seems like another lifetime. I haven’t used Pandas in many a year, decades, or whatever. We’ve all been there, done that. Pandas I mean. I would dare say it’s a rite of passage for most data folk. For those using Python, it’s probably one of the […] The post Replacing Pandas with Polars.

Python 361
article thumbnail

AI is Eating Data Science

KDnuggets

When it's all said and done, and AI has been universally recognized as our rightful overlords, the idea of data science as a standalone field will have been but a blip on our collective radar.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

What is an Open Table Format? & Why to use one?

Start Data Engineering

1. Introduction 2. What is an Open Table Format (OTF) 3. Why use an Open Table Format (OTF) 3.0. Setup 3.1. Evolve data and partition schema without reprocessing 3.2. See previous point-in-time table state, aka time travel 3.3. Git like branches & tags for your tables 3.4. Handle multiple reads & writes concurrently 4. Conclusion 5. Further reading 6.

Data 323
article thumbnail

Scala as a Junior Developer

Rock the JVM

By Lucas Nouguier Hey everyone, Daniel here. Lucas’ story is shared by lots of beginner Scala developers, which is why I wanted to post it here on the blog. I’ve watched thousands of developers learn Scala from scratch, and, like Lucas, they love it! If you want to learn Scala well and fast, take a look at my Scala Essentials course at Rock the JVM.

Scala 142
article thumbnail

The AI Superhero Approach to Product Management

Speaker: Conrado Morlan

In this engaging and witty talk, we’ll explore how artificial intelligence can transform the daily tasks of product managers into streamlined, efficient processes. Using the lens of a superhero narrative, we’ll uncover how AI can be the ultimate sidekick, aiding in decision-making, enhancing productivity, and boosting innovation. Attendees will leave with practical tools and actionable insights, motivated to embrace AI and leverage its potential in their work. 🦸 🏢 Key objectives:

article thumbnail

Uniting the Machine Learning and Data Streaming Ecosystems - Part 1

Confluent

The future of data is real time and enriched by machine learning. How can we overcome socio-technical blockers and unite the ML and data streaming markets?

article thumbnail

AWS Lambdas – Python vs Rust. Performance and Cost Savings.

Confessions of a Data Guy

Save money, save money!! Hear Hear! Someone on Linkedin recently brought up the point that companies could save gobs of money by swapping out AWS Python lambdas for Rust ones. While it raised the ire of many a Python Data Engineer, I thought it sounded like a great idea. At least it’s an excuse to […] The post AWS Lambdas – Python vs Rust.

AWS 356

More Trending

article thumbnail

Drag, Drop, Analyze: The Rise of No-Code Data Science

KDnuggets

No-code or low-code functionalities in data science have gained significant traction in recent years. These solutions are well-proven and matured, and they make data science more accessible to a wider range of people.

article thumbnail

Ensuring the Successful Launch of Ads on Netflix

Netflix Tech

By Jose Fernandez , Ed Barker , Hank Jacobs Introduction In November 2022, we introduced a brand new tier —  Basic with ads. This tier extended existing infrastructure by adding new backend components and a new remote call to our ads partner on the playback path. As we were gearing up for launch, we wanted to ensure it would go as smoothly as possible.

Algorithm 139
article thumbnail

AMM Performance Testing Report

Ripple Engineering

Overview In the rippled 1.12.0 release, the AMM amendment stands out as a significant feature in both size and scope. Since September 2022, the RippleX performance team has collaborated closely with the engineering team responsible for the AMM feature implementation. This report presents a thorough overview of our testing approach, findings, and key takeaways.

AWS 144
article thumbnail

An educational side project

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of four topics in today’s subscriber-only The Scoop issue. If you’re not yet a full subscriber, you missed this week’s deep-dive on Agoda’s private cloud setup. To get the full issues, twice a week, subscribe here.

Education 363
article thumbnail

Provide Real Value in Your Applications with Data and Analytics

The complexity of financial data, the need for real-time insight, and the demand for user-friendly visualizations can seem daunting when it comes to analytics - but there is an easier way. With Logi Symphony, we aim to turn these challenges into opportunities. Our platform empowers you to seamlessly integrate advanced data analytics, generative AI, data visualization, and pixel-perfect reporting into your applications, transforming raw data into actionable insights.

article thumbnail

How Meta built the infrastructure for Threads

Engineering at Meta

On July 5, 2023, Meta launched Threads, the newest product in our family of apps, to an unprecedented success that saw it garner over 100 million sign ups in its first five days. A small, nimble team of engineers built Threads over the course of only five months of technical work. While the app’s production launch had been under consideration for some time, the business finally made the decision and informed the infrastructure teams to prepare for its launch with only two days’ advance notice.

article thumbnail

The State of WebAssembly 2023 by Colin Eberhardt

Scott Logic

The State of WebAssembly 2023 survey has closed, the results are in … and they are fascinating! If you want the TL;DR; here are the highlights: Rust and JavaScript usage is continuing to increase, but some more notable changes are happening a little further down - with both Swift and Zig seeing a significant increase in adoption. When it comes to which languages developers ‘desire’, with Zig, Kotlin and C# we see that desirability exceeds current usage WebAssembly is still most often used for we

article thumbnail

Snowflake To Acquire Ponder, Boosting Python Capabilities In the Data Cloud

Snowflake

Python’s popularity has more than doubled in the past decade¹ and it is quickly becoming the preferred language for development across machine learning, application development, pipelines, and more. One of our goals at Snowflake is to ensure we continue to deliver a best-in-class platform for Python developers. Snowflake customers are already harnessing the power of Python through Snowpark , a set of runtimes and libraries that securely deploy and process non-SQL code directly in Snowflake.

Python 141
article thumbnail

Building a Kimball dimensional model with dbt

dbt Developer Hub

Dimensional modeling is one of many data modeling techniques that are used by data practitioners to organize and present data for analytics. Other data modeling techniques include Data Vault (DV), Third Normal Form (3NF), and One Big Table (OBT) to name a few. Data modeling techniques on a normalization vs denormalization scale While the relevancy of dimensional modeling has been debated by data practitioners , it is still one of the most widely adopted data modeling technique for analytics.

Building 145
article thumbnail

Entity Resolution: Your Guide to Deciding Whether to Build It or Buy It

Adding high-quality entity resolution capabilities to enterprise applications, services, data fabrics or data pipelines can be daunting and expensive. Organizations often invest millions of dollars and years of effort to achieve subpar results. This guide will walk you through the requirements and challenges of implementing entity resolution. By the end, you'll understand what to look for, the most common mistakes and pitfalls to avoid, and your options.

article thumbnail

The Ultimate Guide to Java Virtual Threads

Rock the JVM

Another tour de force by Riccardo Cardin. Riccardo is a proud alumnus of Rock the JVM, now a senior engineer working on critical systems written in Java, Scala and Kotlin. Version 19 of Java came at the end of 2022, bringing us a lot of exciting stuff. One of the coolest is the preview of some hot topics concerning Project Loom: virtual threads ( JEP 425 ) and structured concurrency ( JEP 428 ).

Java 145
article thumbnail

Announcing FawltyDeps - a dependency checker for your Python code

Tweag

It is a truth universally acknowledged that the Python packaging ecosystem is in need of a good dependency checker. In the least, it’s our hope to convince you that Tweag’s new dependency checker, FawltyDeps, can help you maintain an environment that is minimal and reproducible for your Python project, by ensuring that required dependencies are explicitly declared and detecting unused dependencies.

Python 144
article thumbnail

Patching the PostgreSQL JDBC Driver

Zalando Engineering

Introduction This blog post describes a recent contribution from Zalando to the Postgres JDBC driver to address a long-standing issue with the driver’s integration with Postgres’ logical replication that resulted in runaway Write-Ahead Log (WAL) growth. We will describe the issue, how it affected us at Zalando, and detail the fix made upstream in the JDBC driver that fixes the issue for Debezium and all other clients of the Postgres JDBC driver.

article thumbnail

A Comprehensive Guide to Convolutional Neural Networks

KDnuggets

Artificial Intelligence has been witnessing monumental growth in bridging the gap between the capabilities of humans and machines. Researchers and enthusiasts alike, work on numerous aspects of the field to make amazing things happen. One of many such areas is the domain of Computer Vision.

article thumbnail

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Speaker: Maher Hanafi, VP of Engineering at Betterworks & Tony Karrer, CTO at Aggregage

Executive leaders and board members are pushing their teams to adopt Generative AI to gain a competitive edge, save money, and otherwise take advantage of the promise of this new era of artificial intelligence. There's no question that it is challenging to figure out where to focus and how to advance when it’s a new field that is evolving everyday. 💡 This new webinar featuring Maher Hanafi, VP of Engineering at Betterworks, will explore a practical framework to transform Generative AI pr

article thumbnail

Introducing Compute-Compute Separation for Real-Time Analytics

Rockset

Every database built for real-time analytics has a fundamental limitation. When you deconstruct the core database architecture, deep in the heart of it you will find a single component that is performing two distinct competing functions: real-time data ingestion and query serving. These two parts running on the same compute unit is what makes the database real-time: queries can reflect the effect of the new data that was just ingested.

article thumbnail

Confluent + Immerok: Cloud Native Kafka Meets Cloud Native Flink

Confluent

Introducing fully managed Apache Kafka® + Flink for the most robust, cloud-native data streaming platform with stream processing, integration, and streaming analytics in one.

Kafka 145
article thumbnail

A Tech Conference Listed Fake Speakers for Years: I Accidentally Noticed

The Pragmatic Engineer

For 3 years straight, the DevTernity conference listed non-existent Coinbase employees as featured speakers. When were they added and what could have the motivation been? Three featured speakers listed at DevTernity 2021, 2022 and 2023, and JDKon 2024. These people do not exist. A year ago, I spent months doing an investigative report on how UK events tech company Pollen had its staff work for free, as it had run out of money but still kept operating.

article thumbnail

Threads: The inside story of Meta’s newest social app

Engineering at Meta

Earlier this year, a small team of engineers at Meta started working on an idea for a new app. It would have all the features people expect from a text-based conversations app, but with one very key, distinctive goal – being an app that would allow people to share their content across multiple platforms. We wanted to build a decentralized (or federated) app that would enable people to post content that is viewable by anyone on other social apps, and vice versa.

Media 143
article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM

databricks

Two weeks ago, we released Dolly, a large language model (LLM) trained for less than $30 to exhibit ChatGPT-like human interactivity (aka instruction-following).

145
145
article thumbnail

Migrating Critical Traffic At Scale with No Downtime?—?Part 1

Netflix Tech

Migrating Critical Traffic At Scale with No Downtime — Part 1 Shyam Gala , Javier Fernandez-Ivern , Anup Rokkam Pratap , Devang Shah Hundreds of millions of customers tune into Netflix every day, expecting an uninterrupted and immersive streaming experience. Behind the scenes, a myriad of systems and services are involved in orchestrating the product experience.

Utilities 138
article thumbnail

How LinkedIn Is Using Embeddings to Up Its Match Game for Job Seekers

LinkedIn Engineering

Think of how many times a day you use some type of search functionality across your devices and applications to discover information, find a contact, or a new job opportunity. The truth is we all depend on the ability to search for things online, and finding the right match to the information, organization, or to a job that maps to your skills and interests makes all the difference in our experiences and the knowledge we can gain.

IT 133
article thumbnail

Failure Mitigation for Microservices: An Intro to Aperture

DoorDash Engineering

When dealing with failures in a microservice system, localized mitigation mechanisms like load shedding and circuit breakers have always been used, but they may not be as effective as a more globalized approach. These localized mechanisms ( as demonstrated in a systematic study on the subject published at SoCC 2022 ) are useful in preventing individual services from being overloaded, but they are not very effective in dealing with complex failures that involve interactions between services, whic

Metadata 136
article thumbnail

Demystifying DAPs: A Practical Guide to Digital Adoption Success

Speaker: Pulkit Agrawal

Digital Adoption Platforms (DAPs) are revolutionizing the way organizations interact with and optimize their software applications. As digital transformation continues to accelerate, DAPs have become essential tools for enhancing user engagement and software efficiency. This session is your guide into the robust world of DAPs, exploring their origins, evolution, and the current trends shaping their development.

article thumbnail

opam-nix: Nixify Your OCaml Projects

Tweag

opam is a source-based package manager for OCaml. It is the de-facto standard for package management in the OCaml ecosystem. opam’s main package repository contains over 4000 individual packages, on average spanning 7 versions each. Like many other language-specific package managers (e.g. cargo, cabal, etc.), opam performs four main tasks: Download the sources.

Project 144
article thumbnail

SimulatedRides: How Lyft uses load testing to ensure reliable service during peak events

Lyft Engineering

Authors: Remco van Bree , Ben Radler Contributors : Alex Ilyenko , Ben Radler , Francisco Souza , Garrett Heel , Nathan Hsieh , Remco van Bree , Shu Zheng , Alex Hartwell , Brian Witt “Load testing in production is great.” We know what you’re thinking — testing in production is one of the cardinal sins of software development. However, at Lyft we have come to realize that load testing in production is a powerful tool to prepare systems for unexpected bursty traffic and peak events.

Coding 132
article thumbnail

AI: Large Language & Visual Models

KDnuggets

This article discusses the significance of large language and visual models in AI, their capabilities, potential synergies, challenges such as data bias, ethical considerations, and their impact on the market, highlighting their potential for advancing the field of artificial intelligence.

Data 159
article thumbnail

ChatGPT for Coding: Unleash the Power of ChatGPT

Edureka

We are introduced to new discoveries and technologies every day, and one of the best and most popular inventions today is artificial intelligence (AI) and its tools. One of them is Chat GPT, a conversational model of AI that is a powerful chatbot that answers follow-up questions and writes code for the users. The day it was launched, everybody was going gaga over the new technology and the remarkable uses of this AI-powered chatbot.

Coding 130
article thumbnail

Deliver Mission Critical Insights in Real Time with Data & Analytics

In the fast-moving manufacturing sector, delivering mission-critical data insights to empower your end users or customers can be a challenge. Traditional BI tools can be cumbersome and difficult to integrate - but it doesn't have to be this way. Logi Symphony offers a powerful and user-friendly solution, allowing you to seamlessly embed self-service analytics, generative AI, data visualization, and pixel-perfect reporting directly into your applications.