The Path To Senior Engineer
Confessions of a Data Guy
MARCH 18, 2024
Want to know how to grow to the Senior Engineering position? Take a look. The post The Path To Senior Engineer appeared first on Confessions of a Data Guy.
This site uses cookies to improve your experience. By viewing our content, you are accepting the use of cookies. To help us insure we adhere to various privacy regulations, please select your country/region of residence. If you do not select a country we will assume you are from the United States. View our privacy policy and terms of use.
Confessions of a Data Guy
MARCH 18, 2024
Want to know how to grow to the Senior Engineering position? Take a look. The post The Path To Senior Engineer appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
MARCH 14, 2024
Most Software Engineers think of themselves as too smart. They think they are the best and brightest coder alive or that has ever lived. Doing so, they stunt themselves from becoming Senior Engineers and become hard to work with, the nightmare of the PR process. You don’t need to be the smartest person in the […] The post Don’t Be So Smart appeared first on Confessions of a Data Guy.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Confessions of a Data Guy
MARCH 14, 2024
When you’ve been data modeling as long as I have, it gets to be the same old … same old. People make data modeling harder than it has to be. There is a lot of jargon that gets thrown around … third-normal-form, OLAP, OLTP … I give you the 3-4 basics that are at the […] The post Data Modeling Is Easy appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
MARCH 13, 2024
Unless you’ve been hiding a rock you’ve probably heard the hubbub over Devin the new AI Software Engineer that is going to take your job. While this is a genius piece of marketing … it’s a bunch of crud. Never fear, you are in no more danger of losing your job in Software than when […] The post Is Devin Going To Take My Software Engineering Job?
Confessions of a Data Guy
MARCH 8, 2024
Recently an Architecture at Databricks recommended people use Notebooks for Production workloads. Very bad and horrible idea. Very expensive compute for most people (All Purpose Clusters) and it leads to horrible development practices. It set off a firestorm on Linkedin when I commented people SHOULD NOT follow this advice. Read here and here The post Never Put Databricks Notebooks in Production appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
MARCH 8, 2024
I recently did a challenge. The results were clear. DuckDB CANNOT handle larger-than-memory datasets. OOM Errors. See link below for more details. … DuckDB vs Polars – Thunderdome. 16GB on 4GB machine Challenge. The post DuckDB has MAJOR Problems! OOM Errors. appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
MARCH 4, 2024
You probably think this is another internet clickbait title uh? Just trying to get you to clickty clickty and sell you some Google Ads. Two problems. I don’t have Google Ads, and I know a small percentage of people will actually listen to this advice. Whatever. There is a reason some developers struggle to move […] The post The Best Piece of Software Engineering Advice appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
FEBRUARY 25, 2024
I’m not sure if others have this same problem, maybe they are lucky, they get to build in their favorite language 24/7, it’s their tool of choice. I feel like I have a great burden to bear, a heavy one. I love to write Rust … but I deploy Python. Even when I know I […] The post Why I Love Rust, but Deploy Python appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
FEBRUARY 21, 2024
New SQL Practice Problems I’m trying something new. I get a lot of questions from folks about getting into the Data Engineering space, how to get better, grow, learn, etc. So I came up with a solution. SQL Practice Problems. Some moons ago I wrote a Data Engineering Practice repo on GitHub for free, and some 1.2K stars later […] The post New SQL Practice Problems appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
FEBRUARY 17, 2024
There is a great evil Spirit that is haunting the streets of code in the land of programmers. It’s a Spirit of obfuscation and twisting things into what they are not. The Spirit wanders around on the loose looking for someone, and it finds ready victims among the ranks of new programmers and the innocent […] The post The Abstraction Problem – A Great Evil appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
JANUARY 26, 2024
Well, I hate to break the news to you. I was the same when I first started, writing code that is. I was a zealot. I was zealous for every new thing I learned, every new language, every new approach, I would find the preacher who was preaching the message I wanted to hear … […] The post The Difficulties of Senior Engineer … are not Engineering appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
JANUARY 16, 2024
Well, I finally got around to it. What you say? Fine-tuning an LLM, that’s what. I mean all the cool kids are talking about and caring on like it’s the next thing. What can I say … I’m jaded. I’ve been working on ML systems for a good few years now, and I’ve seen the […] The post Engineering Lessons Learned from LLM Fine Tuning appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
JANUARY 2, 2024
The post Polars vs Spark appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
DECEMBER 29, 2023
The post SQL Bad, Reddit Mad appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
DECEMBER 24, 2023
It’s true, even if you don’t want it to be. SparkSQL is destroying your data pipelines and possibly wreaking havoc on your entire data team, infrastructure, and life. In your heart of hearts, you’ve probably known it for years. With great power comes great responsibility. We all know that even us Data Engineers are human […] The post SparkSQL is Destroying your Pipelines appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
DECEMBER 21, 2023
Sometimes I just need something new and interesting to work on, to keep me engaged. A few days ago I was lying by the river next to a fire, with the cold air blowing on my face and the eagles soaring above. Thinking about and contemplating life and data engineering … something flitted across my […] The post Datafusion SQL CLI – Look Ma, I made a new ETL tool. appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
NOVEMBER 25, 2023
Ok. Get off your high horse. You are human just like the rest of us. Just like your ancient ancestors who were throwing rocks and sticks at each other a thousand years ago … you are looking for a leg up on the competition. Isn’t that the world we live in? At the end of […] The post How to be Better Than Everyone Else appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
NOVEMBER 14, 2023
Show me the money. That’s what it’s all about. I have a question for you, to tickle your ears and mind. Get you out of that humdrum funk you are in. Here is my question, riddle me this all you hobbits. “Of what use is, and what good does the best and most advanced architecture […] The post Fleetclusters for Databricks + AWS to reduce Costs. appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
OCTOBER 25, 2023
One thing all Data Engineers are doomed to do in purgatory will be to solve different date and datetime problems in an endless loop. I’m sure of it. I can’t imagine anything worse, so that must be it. Either way the constant need to manipulate dates and datetimes are just a way of life, something […] The post Date and DateTime Manipulation in Polars appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
OCTOBER 6, 2023
Is there anything more Chad than Apache Airflow … and Rust? I think not you whimp. What two things do I love most? At the moment Rust and Airflow are at least somewhere at the top of that list. I wring my hands sometimes, wishing that things and technologies somehow come together into some bubbling […] The post The Ultimate Data Engineering Chadstack.
Confessions of a Data Guy
OCTOBER 1, 2023
So perhaps you’re thinking it’s time to use Rust on your next project. You’ll find plenty of primers on how to get your feet wet in the language (and if you somehow made it this far without that much, The Book is that starting point), but maybe you’re feeling a bit lost amidst the seas […] The post Introduction to using Rust Libraries (cargo and crates) appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
SEPTEMBER 29, 2023
I always leave it to my dear readers and followers to give me pokes in the right direction. Nothing like the teaming masses to set you straight. Recently I was working on my Substack Newsletter, on the topic of Polars + Delta Lake, reading remove files from s3 … I left a question open on […] The post DuckDB + Delta Lake (the new lake house?
Confessions of a Data Guy
SEPTEMBER 9, 2023
In the vast world of data, it’s not just about gathering and analyzing information anymore; it’s also about ensuring that data pipelines, processes, and platforms run seamlessly and efficiently. Nothing screams “why are flying by night,” than coming into a Data Team only to find no tests, no docs, no deployments, no Docker, no nothing. […] The post The Role of DevOps and CI/CD in Data Engineering appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
AUGUST 24, 2023
I still remember that day. A day that shall live on in infamy in my mind. Well over a decade ago, in the days when SQL Server roamed the land devouring souls on the Altar of Stored Procedures. There was only one tool available at the time. SQL. That’s it. There was one problem that […] The post The Case of the Mysterious Recursive CTE appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
AUGUST 11, 2023
Do you think I’m just trying to get you to click? Maybe. Maybe not. After working in and around Data Teams for well over a decade, with both the smartest people to touch the keyboard, and the others, it’s become quite clear to me what the number one skill that identifies a Senior level Engineering […] The post Senior Engineer – The Number One Skill appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
AUGUST 4, 2023
The post Introduction to Delta Lake appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
AUGUST 4, 2023
The post Introduction to AWS Lambda (deployment) appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
JULY 22, 2023
Nothing gives me greater joy than rocking the boat. I take pleasure in finding what people love most in tech and trying to poke holes in it. Everything is sacred. Nothing is sacred. I also enjoy doing simple things, things that have a “real-life” feel to them. I suppose I could be like the others […] The post Polars vs Pandas. Inside an AWS Lambda. appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
JULY 7, 2023
Sometimes it seems like the Data Engineering landscape is starting to shoot off into infinity. With the rise of Rust, new tools like DuckDB, Polars, and whatever else, things do seem to shifting at a fundamental level. It seems like there is someone at the base of a titering rock with a crowbar, picking and […] The post Ballista (Rust) vs Apache Spark.
Confessions of a Data Guy
JUNE 28, 2023
I’ve been a dog licking my wounds for some time now. Over on my Substack newsletter, I’ve been doing a small series on DSA (Data Structures and Algorithms). I tackled some of the easier stuff first, like Linked Lists, Binary Search, and the like. What’s more, I actually did most of it in Rust, since […] The post Exploring Graphs in Rust.
Confessions of a Data Guy
JUNE 22, 2023
The post Conceptual Introduction to Delta Lake. appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
JUNE 20, 2023
Sometimes I think Data Engineering is the same as it was 10+ years ago when I started doing it, and sometimes I think everything has changed. It’s probably both. In some ways, the underlying concepts have not moved an inch, some certain truths and axioms still rule over us all like some distant landlord, requiring […] The post Old Dog Learn New Tricks?
Confessions of a Data Guy
JUNE 8, 2023
One of my greatest pleasures in life is watching the r/dataengineering Reddit board, I find it very entertaining and enlightening on many levels. It gives a fairly unique view into the wide range of Data Engineering companies, jobs, projects people are working on, tech stacks, and problems that are being faced. One thing I’ve come […] The post 4 Ways To Setup Your Data Engineering Game. appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
MAY 7, 2023
Polars is one of those tools that you just want … no … NEED a reason to use it. It’s gotten so bad, I’ve started to use it in my Rust code on the side, Polars that is. I mean you have a problem if you could use Polars Python, and you find yourself using […] The post Polars – Laziness and SQL Context. appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
APRIL 25, 2023
Anyone who’s been working in Data Land for any time at all, knows that the reality of life very rarely matches the glut of shiny snake oil we get sold on a daily basis. That’s just part of life. Every new tool, every single thingy-ma-bob we think is going to solve all our problems and […] The post Real Talk about Running Databricks + Delta Lake at Scale. appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
APRIL 16, 2023
I was wondering the other day … since Polars now has a SQL context and is getting more popular by the day, do I need DuckDB anymore? These two tools are hot. Very hot. I haven’t seen this since Databricks and Snowflake first came out and started throwing mud at each other. You might think […] The post DuckDB vs Polars for Data Engineering. appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
APRIL 15, 2023
PySpark. One of those things to hate and love, well … kinda hard not to love. PySpark is the abstraction that lets a bazillion Data Engineers forget about that blight Scala and cuddle their wonderfully soft and ever-kind Python code, while choking down gobs of data like some Harkonnen glutton. But, that comes with […] The post The Dog Days of PySpark appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
APRIL 6, 2023
The post QuickSort in Rust! appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
MARCH 28, 2023
Real talk. Polars is all the rage. People love Spark. People use Spark for small data, but data is too big for Pandas. Spark runs on a local machine. Polars runs on a local machine. What do I choose, Spark or Polars? Does it matter? I’ve written about Polars at different points, here, and here […] The post Polars vs Spark. Real Talk. appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
MARCH 26, 2023
The post Introduction to Linked Lists. appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
MARCH 22, 2023
The post Future Proof Yourself Against AI. appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
MARCH 20, 2023
Are lambdas one of those tools that everyone uses and no one talks about? I guess I’ve taken them for granted over the years, even though they are incredibly useful. For a lot of my Data Engineering career I didn’t really think about or use AWS lambdas, I just saw them as little annoying flies […] The post AWS Lambdas. Useful for Data Engineering?
Confessions of a Data Guy
MARCH 11, 2023
The post 5 git Commands your Grandma uses. appeared first on Confessions of a Data Guy.
Confessions of a Data Guy
MARCH 7, 2023
The post Contributing to Open-Source. appeared first on Confessions of a Data Guy.
Expert insights. Personalized for you.
We have resent the email to
Are you sure you want to cancel your subscriptions?
Let's personalize your content