April, 2024

article thumbnail

Weekend maintenance kicks an Italian bank offline for days

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover one out of four topics from today’s subscriber-only The Pulse issue. To get full issues twice a week, subscribe here.

Banking 199
article thumbnail

Data Analytics Suck! Worst Job Ever!

Confessions of a Data Guy

Being Data Analytics is a meat grinder, it’s the worst job ever. Horrible it is. It will crush you. The post Data Analytics Suck! Worst Job Ever! appeared first on Confessions of a Data Guy.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

How to test PySpark code with pytest

Start Data Engineering

1. Introduction 2. Ensure the code’s logic is working as expected with tests 2.1. Test types for data pipelines 2.2. pytest: A powerful Python library for testing 2.2.1. Set context, run code, check results & clean up 2.2.2. Tests are identified by their name 2.2.3. Use fixture to create fake data for testing 2.2.4. Define items to be shared among tests with conftest.

Coding 130
article thumbnail

10 Great Videos To Help You Learn Data Engineering

Seattle Data Guy

How data is structured, managed and processed will continue to grow in importance as the demand for AI and machine learning increase. It’s unavoidable that as businesses demand that their data teams implement AI, they will also realize that data engineers are a crucial piece of the data pipeline. That means, if you’re looking for… Read more The post 10 Great Videos To Help You Learn Data Engineering appeared first on Seattle Data Guy.

article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

Data News — Week 24.16

Christophe Blefari

easy ( credits ) Hey, new Friday, new Data News. This week, I feel like the selection is smaller than usual, so enjoy the links. I'm a bit late with the Recommendations emails, I'm sorry about that I got a few new leads as a freelancer I had to take in priority changing a bit my schedule. But don't worry it gonna be out soon. AI News 🤖 When do models get the same hype as 2007 iPhone release?

MySQL 130
article thumbnail

Stopping a Structured Streaming query

Waitingforcode

Streaming jobs are supposed to run continuously but it applies to the data processing logic. After all, sometimes you may need to release a new job package with upgraded dependencies or improved business logic. What happens then?

More Trending

article thumbnail

DuckDB Out Of Memory – Has it been fixed?

Confessions of a Data Guy

Back in March, I did a writeup and experiment called DuckDB vs Polars, Thunderdom, 16GB on 4GB machine challenge. The idea was to see if the two tools could process “larger than memory” datasets with lazy execution. Polars worked fine, DuckDB failed in spectacular fashion. I also noted how many people had opened issues in […] The post DuckDB Out Of Memory – Has it been fixed?

IT 140
article thumbnail

Docker Fundamentals for Data Engineers

Start Data Engineering

1. Introduction 2. Docker concepts 2.1. Define the OS and its configurations with an image 2.2. Use the image to run containers 2.2.1. Communicate between containers and local OS 2.2.2. Start containers with docker CLI or compose 3. Conclusion 1. Introduction Docker can be overwhelming to start with. Most data projects use Docker to set up the data infra locally (and often in production).

article thumbnail

Building Enterprise GenAI Apps with Meta Llama 3 on Databricks

databricks

We are excited to partner with Meta to release the latest state-of-the-art large language model, Meta Llama 3 , on Databricks. With Llama.

Building 133
article thumbnail

Data News — Week 24.15

Christophe Blefari

The fest we deserve ( credits ) I hope this Data News finds you well. In today's edition we have a large selection of links, I think you will enjoy it. But first I want to welcome all the new members joining this week after my new episode on DataGen with Robin Conquet. This is an episode in French and we talked mainly about the eventual end of the modern data stack.

BI 130
article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

Data enrichment strategies in Apache Flink

Waitingforcode

Data enrichment is a crucial step in making data more usable by the business users. Doing that with a batch is relatively easy due to the static nature of the dataset. When it comes to streaming, the task is more challenging.

Datasets 130
article thumbnail

10 GitHub Repositories to Master Computer Science

KDnuggets

These GitHub repositories provide valuable resources for mastering computer science, including comprehensive roadmaps, free books and courses, tutorials, and hands-on coding exercises to help you gain the skills and knowledge necessary to thrive in the ever-evolving field of technology.

article thumbnail

Databricks Doubles Cost. Reddit Explodes. I’m in Trouble!

Confessions of a Data Guy

I recently did a post on Linkedin and Reddit about Databricks removing Standard Tier and forcing folks into Unity Catalog. The post got big traction and blew up, more than I thought. Enough for the Databricks folk to hunt me down at work and tell me I’m naughty. I will be writing a more in-depth […] The post Databricks Doubles Cost. Reddit Explodes.

Data 130
article thumbnail

Snowflake Startup Challenge 2024: Announcing the 10 Semi-Finalists

Snowflake

In 2020, Snowflake announced a new global competition to recognize the work of early-stage startups building their apps — and their businesses — on Snowflake, offering up to $250,000 in investment as the top prize. Four years later, the Snowflake Startup Challenge has grown into a premiere showcase for emerging startups, garnering interest from companies in over 100 countries and offering a prize package featuring a portion of up to $1 million in potential investment opportunities and exclusive

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

Career Opportunities in Software Engineering

Knowledge Hut

Software engineering is a rapidly growing field with vast career opportunities. Software career path offers diverse options, from developing mobile applications and games to creating sophisticated software systems that power businesses and industries. With emerging technologies like AI, machine learning, and blockchain, the demand for software engineers has skyrocketed.

article thumbnail

Data News — Week 24.14

Christophe Blefari

Lost between ideas ( credits ) Hey, new Data News edition. I hope you will enjoy this week selection after skipping last week one. I was a bit overwhelmed with the amount of tasks I had on the desk—and I'm still. But here we are. Before jumping to the news, I want to let you know that I have improved the Recommendations page and the weekly emails with the recommendation should arrive soon.

SQL 130
article thumbnail

Rolling history logs in Spark History UI

Waitingforcode

Stream processing is great but it brings some gotchas that are not obvious. Logs are one of them.

Process 130
article thumbnail

10 GitHub Repositories to Master Python

KDnuggets

Learn Python through tutorials, blogs, books, project work, and exercises. Access all of it on GitHub for free and join a supportive open-source community.

Python 147
article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

Bringing MegaBlocks to Databricks

databricks

At Databricks, we’re committed to building the most efficient and performant training tools for large-scale AI models. With the recent release of DBRX.

Building 123
article thumbnail

Monte Carlo Releases Mastering Data Quality And Your ABCs, World’s First-Ever Children’s Book on Data Quality

Monte Carlo

Good Night Moon. Where The Wild Things Are. The Cat in the Hat. And now, from the mind of Barr Moses, comes the historic next children’s literary classic: Mastering Data Quality And Your ABCs. A follow up to 2022’s Data Quality Fundamentals: A Practical Guide to Building Reliable Data Pipelines published by O’Reilly Media , Mastering Data Quality And Your ABCs educates the next generation of data and AI engineers about the importance of highly reliable data.

article thumbnail

How To Run Your Python Scripts

Knowledge Hut

If you are planning to enter the world of Python programming, the first and the most essential skill you should learn is knowing how to run Python script and code. Once you grab a seat in the show, it will be easier for you to understand whether the code will actually work or not. To learn more about sys.argv command line argument, click here. Python, being one of the leading programming languages , has a relatively easy syntax which makes it even easier for the ones who are in their initial sta

Python 98
article thumbnail

Snowflake’s New Python API Empowers Data Engineers to Build Modern Data Pipelines with Ease

Snowflake

In today’s data-driven world, developer productivity is essential for organizations to build effective and reliable products, accelerate time to value, and fuel ongoing innovation. To deliver on these goals, developers must have the ability to manipulate and analyze information efficiently. Yet while SQL applications have long served as the gateway to access and manage data, Python has become the language of choice for most data teams, creating a disconnect.

article thumbnail

The Big Payoff of Application Analytics

Outdated or absent analytics won’t cut it in today’s data-driven applications – not for your end users, your development team, or your business. That’s what drove the five companies in this e-book to change their approach to analytics. Download this e-book to learn about the unique problems each company faced and how they achieved huge returns beyond expectation by embedding analytics into applications.

article thumbnail

High resolution data updates to Living Atlas World Elevation Layers (April 2024)

ArcGIS

In April 2024, elevation layers have been updated with high-res datasets of Wales, New Zealand & German states of Bavaria, Saxony and Brandenburg

Datasets 105
article thumbnail

The Psychology of Data Visualization: How to Present Data that Persuades

KDnuggets

This article discusses the psychology of data visualization, including the principles and techniques that underpin the creation of persuasive and effective visuals.

Data 145
article thumbnail

Announcing General Availability of Ray on Databricks

databricks

We released Ray support public preview last year and since then, hundreds of Databricks customers have been using it for variety of use.

IT 111
article thumbnail

A Look Back at the Gartner Data and Analytics Summit

Cloudera

Artificial intelligence (AI) is something that, by its very nature, can be surrounded by a sea of skepticism but also excitement and optimism when it comes to harnessing its power. With the arrival of the latest AI-powered technologies like large language models (LLMs) and generative AI (GenAI), there’s a vast amount of opportunities for innovation, growth, and improved business outcomes right around the corner.

article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.

article thumbnail

SAFe® Agilist Certification Vs PMI-ACP: Which One to Choose?

Knowledge Hut

The competition for jobs is getting tough in today’s world. Whether you are a job seeker, corporate employee, or a consultant, you should keep your skills up to date in a fast-paced, online world. Agile has become the standard of project management very fast in today’s world, specifically in the IT and service field. Most of the project management professionals have adopted Agile techniques, tools, and concepts to deliver the projects successfully that has never been seen before.

article thumbnail

Snowflake Ventures Invests in Coalesce to Enable Simplified Data Transformation Development and Management Natively on the Data Cloud

Snowflake

Data transformation is the process of converting data from one format to another, the “T” in ELT, or extract, load, transform, which enables organizations to get their data analytics-ready and derive insights and value from it. As companies collect more data, from disparate sources and in disparate formats, building and managing transformations has become exponentially more complex and time-consuming.

Cloud 103
article thumbnail

May I Borrow That Idea? – Pasting Feature Layer Properties

ArcGIS

Starting with ArcGIS Pro 3.2, you can copy layer properties from one feature layer and paste them to another.

126
126
article thumbnail

Ultimate Collection of 50 Free Courses for Mastering Data Science

KDnuggets

The collection includes free courses on Python, SQL, Data Analytics, Business Intelligence, Data Engineering, Machine Learning, Deep Learning, Generative AI, and MLOps.

article thumbnail

Driving Business Impact for PMs

Speaker: Jon Harmer, Product Manager for Google Cloud

Move from feature factory to customer outcomes and drive impact in your business! This session will provide you with a comprehensive set of tools to help you develop impactful products by shifting from output-based thinking to outcome-based thinking. You will deepen your understanding of your customers and their needs as well as identifying and de-risking the different kinds of hypotheses built into your roadmap.