2023

article thumbnail

An educational side project

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of four topics in today’s subscriber-only The Scoop issue. If you’re not yet a full subscriber, you missed this week’s deep-dive on Agoda’s private cloud setup. To get the full issues, twice a week, subscribe here.

Project 363
article thumbnail

What is an Open Table Format? & Why to use one?

Start Data Engineering

1. Introduction 2. What is an Open Table Format (OTF) 3. Why use an Open Table Format (OTF) 3.0. Setup 3.1. Evolve data and partition schema without reprocessing 3.2. See previous point-in-time table state, aka time travel 3.3. Git like branches & tags for your tables 3.4. Handle multiple reads & writes concurrently 4. Conclusion 5. Further reading 6.

Data 322
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Replacing Pandas with Polars. A Practical Guide.

Confessions of a Data Guy

I remember those days, oh so long ago, it seems like another lifetime. I haven’t used Pandas in many a year, decades, or whatever. We’ve all been there, done that. Pandas I mean. I would dare say it’s a rite of passage for most data folk. For those using Python, it’s probably one of the […] The post Replacing Pandas with Polars.

Python 361
article thumbnail

Azure Databricks: A Comprehensive Guide

Analytics Vidhya

Introduction Azure Databricks is a fast, easy, and collaborative Apache Spark-based analytics platform that is built on top of the Microsoft Azure cloud. A collaborative and interactive workspace allows users to perform big data processing and machine learning tasks easily. In this blog post, we will take a closer look at Azure Databricks, its key features, […] The post Azure Databricks: A Comprehensive Guide appeared first on Analytics Vidhya.

Big Data 311
article thumbnail

LLMs in Production: Tooling, Process, and Team Structure

Speaker: Dr. Greg Loughnane and Chris Alexiuk

Technology professionals developing generative AI applications are finding that there are big leaps from POCs and MVPs to production-ready applications. They're often developing using prompting, Retrieval Augmented Generation (RAG), and fine-tuning (up to and including Reinforcement Learning with Human Feedback (RLHF)), typically in that order. However, during development – and even more so once deployed to production – best practices for operating and improving generative AI applications are le

article thumbnail

The State of WebAssembly 2023 by Colin Eberhardt

Scott Logic

The State of WebAssembly 2023 survey has closed, the results are in … and they are fascinating! If you want the TL;DR; here are the highlights: Rust and JavaScript usage is continuing to increase, but some more notable changes are happening a little further down - with both Swift and Zig seeing a significant increase in adoption. When it comes to which languages developers ‘desire’, with Zig, Kotlin and C# we see that desirability exceeds current usage WebAssembly is still most often used for we

Coding 145

More Trending

article thumbnail

AMM Performance Testing Report

Ripple Engineering

Overview In the rippled 1.12.0 release, the AMM amendment stands out as a significant feature in both size and scope. Since September 2022, the RippleX performance team has collaborated closely with the engineering team responsible for the AMM feature implementation. This report presents a thorough overview of our testing approach, findings, and key takeaways.

AWS 144
article thumbnail

Snowflake To Acquire Ponder, Boosting Python Capabilities In the Data Cloud

Snowflake

Python’s popularity has more than doubled in the past decade¹ and it is quickly becoming the preferred language for development across machine learning, application development, pipelines, and more. One of our goals at Snowflake is to ensure we continue to deliver a best-in-class platform for Python developers. Snowflake customers are already harnessing the power of Python through Snowpark , a set of runtimes and libraries that securely deploy and process non-SQL code directly in Snowflake.

Python 141
article thumbnail

Scala as a Junior Developer

Rock the JVM

By Lucas Nouguier Hey everyone, Daniel here. Lucas’ story is shared by lots of beginner Scala developers, which is why I wanted to post it here on the blog. I’ve watched thousands of developers learn Scala from scratch, and, like Lucas, they love it! If you want to learn Scala well and fast, take a look at my Scala Essentials course at Rock the JVM.

Scala 142
article thumbnail

Creating a bespoke LLM for AI-generated documentation

databricks

We recently announced our AI-generated documentation feature, which uses large language models (LLMs) to automatically generate documentation for tables and columns in Unity.

article thumbnail

The Definitive Entity Resolution Buyer’s Guide

Are you thinking of adding enhanced data matching and relationship detection to your product or service? Do you need to know more about what to look for when assessing your options? The Senzing Entity Resolution Buyer’s Guide gives you step-by-step details about everything you should consider when evaluating entity resolution technologies. You’ll learn about use cases, technology and deployment options, top ten evaluation criteria and more.

article thumbnail

Learn Probability in Computer Science with Stanford University for FREE

KDnuggets

Probability is one of the foundational elements of computer science. Some bootcamps will skim over the topic, however, it is integral to your computer science knowledge.

article thumbnail

The Roots of Today's Modern Backend Engineering Practices

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover thee out of nine topics from today’s subscriber-only issue: The Past and Future of Modern Backend Practices.

article thumbnail

Threads: The inside story of Meta’s newest social app

Engineering at Meta

Earlier this year, a small team of engineers at Meta started working on an idea for a new app. It would have all the features people expect from a text-based conversations app, but with one very key, distinctive goal – being an app that would allow people to share their content across multiple platforms. We wanted to build a decentralized (or federated) app that would enable people to post content that is viewable by anyone on other social apps, and vice versa.

article thumbnail

AWS Lambdas – Python vs Rust. Performance and Cost Savings.

Confessions of a Data Guy

Save money, save money!! Hear Hear! Someone on Linkedin recently brought up the point that companies could save gobs of money by swapping out AWS Python lambdas for Rust ones. While it raised the ire of many a Python Data Engineer, I thought it sounded like a great idea. At least it’s an excuse to […] The post AWS Lambdas – Python vs Rust.

Python 356
article thumbnail

Migrating Netflix to GraphQL Safely

Netflix Tech

By Jennifer Shin , Tejas Shikhare , Will Emmanuel In 2022, a major change was made to Netflix’s iOS and Android applications. We migrated Netflix’s mobile apps to GraphQL with zero downtime, which involved a total overhaul from the client to the API layer. Until recently, an internal API framework, Falcor , powered our mobile apps. They are now backed by Federated GraphQL , a distributed approach to APIs where domain teams can independently manage and own specific sections of the API.

article thumbnail

Building a Kimball dimensional model with dbt

dbt Developer Hub

Dimensional modeling is one of many data modeling techniques that are used by data practitioners to organize and present data for analytics. Other data modeling techniques include Data Vault (DV), Third Normal Form (3NF), and One Big Table (OBT) to name a few. Data modeling techniques on a normalization vs denormalization scale While the relevancy of dimensional modeling has been debated by data practitioners , it is still one of the most widely adopted data modeling technique for analytics.

Building 145
article thumbnail

How LinkedIn Is Using Embeddings to Up Its Match Game for Job Seekers

LinkedIn Engineering

Think of how many times a day you use some type of search functionality across your devices and applications to discover information, find a contact, or a new job opportunity. The truth is we all depend on the ability to search for things online, and finding the right match to the information, organization, or to a job that maps to your skills and interests makes all the difference in our experiences and the knowledge we can gain.

IT 134
article thumbnail

Introducing Compute-Compute Separation for Real-Time Analytics

Rockset

Every database built for real-time analytics has a fundamental limitation. When you deconstruct the core database architecture, deep in the heart of it you will find a single component that is performing two distinct competing functions: real-time data ingestion and query serving. These two parts running on the same compute unit is what makes the database real-time: queries can reflect the effect of the new data that was just ingested.

article thumbnail

Confluent + Immerok: Cloud Native Kafka Meets Cloud Native Flink

Confluent

Introducing fully managed Apache Kafka® + Flink for the most robust, cloud-native data streaming platform with stream processing, integration, and streaming analytics in one.

Kafka 145
article thumbnail

The Ultimate Guide to Java Virtual Threads

Rock the JVM

Another tour de force by Riccardo Cardin. Riccardo is a proud alumnus of Rock the JVM, now a senior engineer working on critical systems written in Java, Scala and Kotlin. Version 19 of Java came at the end of 2022, bringing us a lot of exciting stuff. One of the coolest is the preview of some hot topics concerning Project Loom: virtual threads ( JEP 425 ) and structured concurrency ( JEP 428 ).

Java 145
article thumbnail

Databricks + Arcion: Real-time enterprise data replication to the Lakehouse

databricks

We are excited to announce that we have completed our acquisition of Arcion, a leading provider for real-time data replication technologies. Arcion’s capabilities w.

Data 135
article thumbnail

A Comprehensive Guide to Convolutional Neural Networks

KDnuggets

Artificial Intelligence has been witnessing monumental growth in bridging the gap between the capabilities of humans and machines. Researchers and enthusiasts alike, work on numerous aspects of the field to make amazing things happen. One of many such areas is the domain of Computer Vision.

article thumbnail

How Microsoft does Quality Assurance (QA)

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover one out of seven topics from today’s subscriber-only issue on How Big Tech does QA. To get full issues twice a week, subscribe here.

article thumbnail

Use Data Enrichment to Supercharge AI

Precisely

AI transforms how we interact with technology, make decisions, and solve complex problems. It has been at the heart of many innovations over the past two years, powering everything from the chatbots that enhance our customer experiences to the predictive analytics engines that help us make financial decisions. What defines a successful AI initiative, and how can your organization ensure that your investments and hard work deliver maximum value for your organization?

Data 118
article thumbnail

A Tour Around Buck2, Meta's New Build System

Tweag

Meta recently announced they have made Buck2 open-source. Buck2 is a from-scratch rewrite of Buck , a polyglot, monorepo build system that was developed and used at Meta (Facebook), and shares a few similarities with Bazel. As you may know, the Scalable Builds Group at Tweag has a strong interest in such scalable build systems. We were thrilled to have the opportunity to work with Meta on Buck2 to help make the tool useful and successful in the open-source use case.

Systems 134
article thumbnail

Ensuring the Successful Launch of Ads on Netflix

Netflix Tech

By Jose Fernandez , Ed Barker , Hank Jacobs Introduction In November 2022, we introduced a brand new tier —  Basic with ads. This tier extended existing infrastructure by adding new backend components and a new remote call to our ads partner on the playback path. As we were gearing up for launch, we wanted to ensure it would go as smoothly as possible.

Kafka 139
article thumbnail

Build faster with Buck2: Our open source build system

Engineering at Meta

Buck2, our new open source, large-scale build system , is now available on GitHub. Buck2 is an extensible and performant build system written in Rust and designed to make your build experience faster and more efficient. In our internal tests at Meta, we observed that Buck2 completed builds 2x as fast as Buck1. Buck2, Meta’s open source large-scale build system, is now publicly available via the Buck2 website and the Buck2 GitHub repository.

Building 135
article thumbnail

New Approaches For Detecting AI-Generated Profile Photos

LinkedIn Engineering

Co-authors: Shivansh Mundra , Gonzalo Aniano Porcile , Smit Marvaniya , Hany Farid A core part of what we do on the Trust Data Team at LinkedIn is create, deploy, and maintain models that detect and prevent many types of abuse. This spans the detection and prevention of fake accounts, account takeovers, and policy-violating content. We are constantly working to improve and increase the effectiveness of our anti-abuse defenses to protect the experiences of our members and customers.

article thumbnail

Snowpark ML: The ‘Easy Button’ for Open Source LLM Deployment in Snowflake

Snowflake

Companies want to train and use large language models (LLMs) with their own proprietary data. Open source generative models such as Meta’s Llama 2 are pivotal in making that possible. The next hurdle is finding a platform to harness the power of LLMs. Snowflake lets you apply near-magical generative AI transformations to your data all in Python, with the protection of its out-of-the-box governance and security features.

Python 127
article thumbnail

A Deep Dive Into Sending With librdkafka

Confluent

Learn how to write code that produces messages via librdkafka, how it will behave during error situations, and how your application should detect and respond to them.

Coding 119
article thumbnail

Failure Mitigation for Microservices: An Intro to Aperture

DoorDash Engineering

When dealing with failures in a microservice system, localized mitigation mechanisms like load shedding and circuit breakers have always been used, but they may not be as effective as a more globalized approach. These localized mechanisms ( as demonstrated in a systematic study on the subject published at SoCC 2022 ) are useful in preventing individual services from being overloaded, but they are not very effective in dealing with complex failures that involve interactions between services, whic

Java 134
article thumbnail

Data Intelligence Platforms

databricks

The observation that "software is eating the world" has shaped the modern tech industry. Today, software is ubiquitous in our lives, from the.

Data 143
article thumbnail

Drag, Drop, Analyze: The Rise of No-Code Data Science

KDnuggets

No-code or low-code functionalities in data science have gained significant traction in recent years. These solutions are well-proven and matured, and they make data science more accessible to a wider range of people.