Sat.Jan 14, 2023 - Fri.Jan 20, 2023

article thumbnail

Replacing Pandas with Polars. A Practical Guide.

Confessions of a Data Guy

I remember those days, oh so long ago, it seems like another lifetime. I haven’t used Pandas in many a year, decades, or whatever. We’ve all been there, done that. Pandas I mean. I would dare say it’s a rite of passage for most data folk. For those using Python, it’s probably one of the […] The post Replacing Pandas with Polars.

Python 361
article thumbnail

How To Hire Junior Data Engineers

Seattle Data Guy

With all the recent data events I have put together I inevitably run into new data engineers who are either finishing up college or looking to transition into a data engineer or data scientist position. In fact I have talked to several newly graduated engineers who are struggling to find work. A few told me… Read more The post How To Hire Junior Data Engineers appeared first on Seattle Data Guy.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

What Big Tech layoffs suggest for the industry

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of five topics in today’s subscriber-only The Scoop issue. To get the full issues, twice a week: subscribe here. Update on 20 January: less than a day after publishing this article, Google announced historic layoffs that will impact ~12,000 positions.

Banking 140
article thumbnail

Data News — Week 23.03

Christophe Blefari

Summer in coming ( credits ) Hey, new Friday, new Data News edition. I'm so happy to see new people coming every week. Thank you for every recommendation you do about the blog or the Data News. This kindness for my content gives me wings. This week I don't want to be late, so let's start the weekly wrap-up. I got less inspired this week, it means shorter edition.

article thumbnail

Get Better Network Graphs & Save Analysts Time

Many organizations today are unlocking the power of their data by using graph databases to feed downstream analytics, enahance visualizations, and more. Yet, when different graph nodes represent the same entity, graphs get messy. Watch this essential video with Senzing CEO Jeff Jonas on how adding entity resolution to a graph database condenses network graphs to improve analytics and save your analysts time.

article thumbnail

ChatGPT as a Python Programming Assistant

KDnuggets

Is ChatGPT useful for Python programmers, specifically those of us who use Python for data processing, data cleaning, and building machine learning models? Let's give it a try and find out.

Python 160
article thumbnail

What Is The State Of Data Engineering And Infrastructure In 2023

Seattle Data Guy

2022 is coming to an end. What is the state of data infra? Are Snowflake and Databricks still fighting over total cost of ownership? Is everyone switching to DuckDB? Are data engineers all learning Rust? Let’s try to answer these questions. Our team is putting together an all day event focused on helping answer some… Read more The post What Is The State Of Data Engineering And Infrastructure In 2023 appeared first on Seattle Data Guy.

More Trending

article thumbnail

Data News — Week 23.02

Christophe Blefari

Abandoned Pandas ( credits ) Hey. I have busy weeks, I'm sorry Data News are coming on Saturday again. This is a bit hard to travel by train, work and write at the same time. Plus I'm a fast context switcher, so it piles up. Also a few of you have sent me messages recently and I've not yet answered, I see you and I did not forget you.

Python 130
article thumbnail

20 Questions (with Answers) to Detect Fake Data Scientists: ChatGPT Edition, Part 1

KDnuggets

Can ChatGPT provide answers to data science questions to the same standard of humans? Check out this attempt to do so, and compare the answers to those from experts.

article thumbnail

Why You Should Simplify Your Data Infrastructure

Seattle Data Guy

Good Design Is Easier to Change Than Bad Design – The Pragmatic Programmer Programming is just one aspect of the difficulties of tech work for data engineers. Creating simple yet robust systems that help manage your data infrastructure is equally important. This challenge of building a simple yet robust data infrastructure remains even with no-code/low-code solutions.

Data 130
article thumbnail

Reducing Logging Cost by Two Orders of Magnitude using CLP

Uber Engineering

Uber’s Data team discusses how they used CLP to scale log ingestion, retention, and analytics for Petabytes of Spark logs, reducing log storage and management costs by 169x.

article thumbnail

Understanding User Needs and Satisfying Them

Speaker: Scott Sehlhorst

We know we want to create products which our customers find to be valuable. Whether we label it as customer-centric or product-led depends on how long we've been doing product management. There are three challenges we face when doing this. The obvious challenge is figuring out what our users need; the non-obvious challenges are in creating a shared understanding of those needs and in sensing if what we're doing is meeting those needs.

article thumbnail

Driving Data, Delivering Value: Data Leaders to Watch in 2023

Snowflake

The Chief Data Officer is arguably one of the most important roles at a company, particularly those that aspire to be data-driven. CDO appointments and the elevation of data leaders have accelerated in recent years, and the role has morphed as perceptions of data have evolved. Responsibilities span strategy and execution, people and processes, and the technology needed to deliver on the promise of data.

Data 104
article thumbnail

SQL and Data Integration: ETL and ELT

KDnuggets

In this article, we will discuss use cases and methods for using ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) processes along with SQL to integrate data from various sources.

SQL 120
article thumbnail

Data Integrity Trends for 2023

Precisely

For most enterprises, 2022 was a year of transition, as companies struggled to figure out how to accomplish more with fewer resources. Technology helped to bridge the gap, as AI, machine learning, and data analytics drove smarter decisions, and automation paved the way for greater efficiency. Data integrity trends for 2023, has agility toping the list of success factors for most firms, as business leaders focus on rapid time to value and an emphasis on responding quickly to emerging opportunitie

article thumbnail

Deduping and Storing Images at Uber Eats

Uber Engineering

Our engineers discuss how we dedupe and store millions of product images at Uber Eats using a content-addressable caching layer, which saves millions of image downloads every hour and ensures that every image is only stored once.

article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

DevTernity conference 2022 by Robat Williams

Scott Logic

Late last year I had the chance to attend DevTernity , an all-remote generalist software development conference. The first day was the main conference day, with the second (optional) day offering a choice of workshops by some of the speakers. It was a great conference. In this post I’ll cover off some points of interest from some of the talks I chose to attend, and reflect on the remote conference experience.

article thumbnail

Fast-track your next move with in-demand data skills

KDnuggets

DataCamp offers over 400 interactive courses, projects, and career tracks in the most popular data technologies such as Python, SQL, R, Power BI, and Tableau. Start today and save up to 67% on career-advancing learning.

BI 120
article thumbnail

What’s New With SQL User-Defined Functions

databricks

Since their initial release, SQL user-defined functions have become hugely popular among both Databricks Runtime and Databricks SQL customers. This simple yet powerful.

SQL 86
article thumbnail

Introducing WorkflowGuard: The Workflow Governance and Observability System That Oversees over 120,000 Data Workflows

Uber Engineering

Our Data Workflow Platform team introduces WorkflowGuard: a new service to govern executions, prioritize resources, and manage life cycle for repetitive data jobs. Check out how it improved workflow reliability and cost efficiency while bringing more observability to users.

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

The Insurance Industry is Ready for a lot More Change

Teradata

The dwindling personal auto insurance market is a harbinger of a lot more change to come. Find out more.

article thumbnail

How to Use Python and Machine Learning to Predict Football Match Winners

KDnuggets

We will be learning web scraping and training supervised machine-learning algorithms to predict winning teams.

article thumbnail

Easy Ingestion to Lakehouse With COPY INTO

databricks

A new data management architecture known as the data lakehouse emerged independently across many organizations and use cases to support AI and BI.

BI 91
article thumbnail

Uber’s Next Gen Push Platform on gRPC

Uber Engineering

Uber’s API platform team talks about how they built their Next Generation Push Platform on gRPC which helped improve the reliability and latency of messages significantly.

98
article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

Why GxP is Vital for Cloud Control

Teradata

GxPs are a set of guidelines used to reduce risk when dealing with tech suppliers. But guidelines are not certification tests. Learn what to consider when assessing reliability in the cloud.

Cloud 64
article thumbnail

Top Posts January 9-15: Python Matplotlib Cheat Sheets

KDnuggets

Python Matplotlib Cheat Sheets • How to Select Rows and Columns in Pandas • 7 Best Platforms to Practice SQL • How to Perform Unit Testing in Python? • Google Data Analytics Certification Review.

Python 107
article thumbnail

New Built-in Functions for Databricks SQL

databricks

Built-in functions extend the power of SQL with specific transformations of values for common needs and use cases. For example, the LOG10 function.

SQL 89
article thumbnail

MySQL to MyRocks Migration in Uber’s Distributed Datastores

Uber Engineering

Uber’s Storage Platform team talks about the massive strategic undertaking to migrate their Distributed Databases from MySQL to MyRocks resulting in significant Storage usage reduction. The blog details the migration process and challenges faced.

MySQL 96
article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

Language Models, Explained: How GPT and Other Models Work

AltexSoft

In 2020, a remarkable AI took Silicon Valley by storm. Dubbed GPT-3 and developed by OpenAI in San Francisco, it was the latest and strongest of its kind — a “large language model” capable of producing fluent text after having ingested billions of words from books, articles, and websites. According to the paper “Language Models are Few-Shot Learners” by OpenAI, GPT-3 was so advanced that many individuals had difficulty distinguishing between news stories generated by the model and those written

article thumbnail

KDnuggets Top Posts for December 2022: 5 Python Projects for Data Science Portfolio

KDnuggets

3 Free Machine Learning Courses for Beginners • The Complete Machine Learning Study Roadmap •Markdown Cheat Sheet • Learn Data Science From These GitHub Repositories • 7 Essential Cheat Sheets for Data Engineering • Scikit-Learn Cheat Sheet for Machine Learning • 7 Super Cheat Sheets You Need To Ace Machine Learning Interview.

article thumbnail

New! Diversity, equity, and inclusion analysis SpotApp helps businesses improve employee diversity

ThoughtSpot

Tech has a diversity problem. As a veteran People leader, I see and hear about it all the time — in media , in the board room, and in my daily work. And yet, as much as our industry is known for solving large-scale problems and disrupting the status quo, improvement in this area doesn’t seem to be happening fast enough. Why not? When I look at companies leading our industry in DEI, there’s one thing that stands out: data.

article thumbnail

Crane: Uber’s Next-Gen Infrastructure Stack

Uber Engineering

Uber’s infrastructure engineers deep dive into how they leverage Infrastructure as code to manage hundreds of thousands of servers across multiple cloud and on-prem providers.

Cloud 96
article thumbnail

How to Build an Experimentation Culture for Data-Driven Product Development

Speaker: Margaret-Ann Seger, Head of Product, Statsig

Experimentation is often seen as an aspirational practice, especially at smaller, fast-moving companies who are strapped for time and resources. So, how can you get your team making decisions in a more data-driven way while continuing to remain lean and maintaining ship velocity? In this webinar, Margaret-Ann Seger, Head of Product at Statsig, will teach you how to build an experimentation culture from the ground-up, graduating from just getting started with data-driven development to operating