Mon.Apr 22, 2024

article thumbnail

How to test PySpark code with pytest

Start Data Engineering

1. Introduction 2. Ensure the code’s logic is working as expected with tests 2.1. Test types for data pipelines 2.2. pytest: A powerful Python library for testing 2.2.1. Set context, run code, check results & clean up 2.2.2. Tests are identified by their name 2.2.3. Use fixture to create fake data for testing 2.2.4. Define items to be shared among tests with conftest.

Coding 130
article thumbnail

5 Free Stanford University Courses to Learn Data Science

KDnuggets

Are you an aspiring data scientist? If so, these free data science courses from Stanford will help you move forward in your data science journey!

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Docker Fundamentals for Data Engineers

Start Data Engineering

1. Introduction 2. Docker concepts 2.1. Define the OS and its configurations with an image 2.2. Use the image to run containers 2.2.1. Communicate between containers and local OS 2.2.2. Start containers with docker CLI or compose 3. Conclusion 1. Introduction Docker can be overwhelming to start with. Most data projects use Docker to set up the data infra locally (and often in production).

article thumbnail

Are we ready to put AI in the hands of business users? by Caitlin Salt

Scott Logic

Generative AI has been grabbing headlines, but many businesses are starting to feel left-behind. Large-model AI is becoming more and more influential in the market, and with the well-known tech giants starting to introduce easy-access AI stacks, a lot of businesses are left feeling that although there may be a use for AI in their business, they’re unable to see what use cases it might help them with.

BI 97
article thumbnail

Navigating the Future: Generative AI, Application Analytics, and Data

Generative AI is upending the way product developers & end-users alike are interacting with data. Despite the potential of AI, many are left with questions about the future of product development: How will AI impact my business and contribute to its success? What can product managers and developers expect in the future with the widespread adoption of AI?

article thumbnail

Drawing a Blank? Understanding Drawing Alerts in ArcGIS Pro

ArcGIS

A drawing alert notification system was added in ArcGIS Pro 3.2 as a method for resolving drawing issues in your ArcGIS Pro projects.

Project 95
article thumbnail

Semantic Search with Vector Databases

KDnuggets

Leverage the latest technology to improve our search engine capabilities.

Database 103

More Trending

article thumbnail

How to Standout and Safeguard Your Job in the Generative AI Era

KDnuggets

The secret recipe to excel in your career in AI.

105
105
article thumbnail

How Striim Enhances Healthcare at Discovery Health with Real-Time Data

Striim

Discovery Health, originating in South Africa, has transcended borders to extend its services to over 40 million customers across more than 40 global markets, encompassing regions in Asia, EMEA, and the Americas. Since its inception in 1992, the company has remained steadfast in its core purpose: “to make people healthier and to enhance and protect their lives.” As a multifaceted financial services organization, Discovery Health operates in various sectors including healthcare, life

article thumbnail

How to Install Python 3 on Ubuntu [Step-by-Step Guide]

Knowledge Hut

Anyone aspiring to be a data scientist, machine learning engineer, or software developer must have thought about learning Python. The popularity of this programming language has grown exponentially in the past ten years. Even those unfamiliar with coding have probably heard about it. As per the developer survey by Stack Overflow in 2021 , approximately 68% of software developers or data scientists who have worked on developing software using Python have expressed that they will continue doing so

Python 98
article thumbnail

Enhancing Distributed System Load Shedding with TCP Congestion Control Algorithm

Zalando Engineering

Introduction Our team is responsible for sending out communications to all our customers at Zalando - e.g. confirming a placed order, informing about new content from a favourite brand or announcing sales campaigns. During the preparation of those messages as well during sending those out via different service providers we have to deal with limited resources.

article thumbnail

Get Better Network Graphs & Save Analysts Time

Many organizations today are unlocking the power of their data by using graph databases to feed downstream analytics, enahance visualizations, and more. Yet, when different graph nodes represent the same entity, graphs get messy. Watch this essential video with Senzing CEO Jeff Jonas on how adding entity resolution to a graph database condenses network graphs to improve analytics and save your analysts time.

article thumbnail

Magnite’s Seamless Petabyte Scale Cross-Region Migration with Snowgrid

Snowflake

Magnite stands as the largest independent sell-side advertising platform, providing an essential bridge between publishers and advertisers. At its core, Magnite streamlines the advertising process, facilitating the buying and selling of advertising space across various channels, including connected TV (CTV), mobile, and desktop environments. By leveraging advanced technology and data analytics, Magnite offers a comprehensive suite of tools and services designed to maximize ad revenue for publish

AWS 68
article thumbnail

Web Performance Regression Detection (Part 1 of 3)

Pinterest Engineering

Michelle Vu | Web Performance Engineer Detecting, preventing, and resolving performance regressions has been a standard at Pinterest for many years. Over the years, we have seen many examples showing significant business metric movements resulting from performance optimizations and regressions. These concrete examples motivate us to optimize and maintain performance.

article thumbnail

Async APIs - don't confuse your events, commands and state by David Hope

Scott Logic

In my previous blog post I looked at various technologies for sending data asynchronously between services including RabbitMQ, Kafka, AWS EventBridge. This time round I’ll look at the messages themselves which over the last few years I’ve found to be a more complex and nuanced topic than expected. To set the scene see the diagram below of an imaginary financial trading application: There’s lots of data flying around varying from real time pricing data to instructions to execute trades.

AWS 52