Sat.Sep 23, 2023 - Fri.Sep 29, 2023

article thumbnail

Working at a Startup vs in Big Tech

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of four topics in today’s subscriber-only The Pulse issue. To get full newsletters twice a week, subscribe here. Willem Spruijt is a software engineer whom I worked on the same team with at Uber in Amsterdam, building payments systems.

article thumbnail

DuckDB + Delta Lake (the new lake house?)

Confessions of a Data Guy

I always leave it to my dear readers and followers to give me pokes in the right direction. Nothing like the teaming masses to set you straight. Recently I was working on my Substack Newsletter, on the topic of Polars + Delta Lake, reading remove files from s3 … I left a question open on […] The post DuckDB + Delta Lake (the new lake house?

Data 147
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Upgrade your Modern Data Stack

Christophe Blefari

Make your data stack take-off ( credits ) Hello, another edition of Data News. This week, we're going to take a step back and look at the current state of data platforms. What are the current trends and why are people fighting around the concept of the modern data stack. Early September is usually conference season. All over the world, people gather in huge venues to attend conferences.

article thumbnail

Arbitrary stateful processing in PySpark with applyInPandasWithState

Waitingforcode

It's always a huge pleasure to see the PySpark API covering more and more Scala API features. Starting from Apache Spark 3.4.0 you can even write arbitrary stateful processing jobs! But since the API is a little bit different than the one available on the Scala side, I wanted to take a deeper look.

Process 147
article thumbnail

Navigating the Future: Generative AI, Application Analytics, and Data

Generative AI is upending the way product developers & end-users alike are interacting with data. Despite the potential of AI, many are left with questions about the future of product development: How will AI impact my business and contribute to its success? What can product managers and developers expect in the future with the widespread adoption of AI?

article thumbnail

Getting started with Airflow in 10 mins

Marc Lamberti

At the end of this introduction to Airflow, you will be all set for getting started with Airflow. You will start with the basics, such as what Airflow is and the essential concepts. Then you will set up and run your local development environment using the Astro CLI to create your first data pipeline. I hope you’re getting excited. Fasten your seatbelt, take a deep breath, and let’s go For a complete hands-on introduction to Apache Airflow, here is a 6-hour course at a discount.

article thumbnail

Top 7 Free Cloud Notebooks for Data Science

KDnuggets

Cloud notebooks are game-changers for data science, providing free access to computing, pre-built environments, collaboration features, and third-party integrations - everything you need to enhance your workflow.

More Trending

article thumbnail

Deploy Private LLMs using Databricks Model Serving

databricks

We are excited to announce public preview of GPU and LLM optimization support for Databricks Model Serving! With this launch, you can deploy.

article thumbnail

Airflow TaskGroup: All you need to know!

Marc Lamberti

An Airflow TaskGroup helps make a complex DAG easier to organize and read. Airflow taskgroups are meant to replace SubDAGs, the historical way of grouping your tasks. Indeed, SubDAGs are too complicated only for grouping tasks. They bring a lot of complexity as you must create a DAG in a DAG, import the SubDagOperator (which is a sensor), define the parameters correctly, and so on.

Coding 130
article thumbnail

Getting Started with PyTorch in 5 Steps

KDnuggets

This tutorial provides an in-depth introduction to machine learning using PyTorch and its high-level wrapper, PyTorch Lightning. The article covers essential steps from installation to advanced topics, offering a hands-on approach to building and training neural networks, and emphasizing the benefits of using Lightning.

article thumbnail

Data News — Week 23.38 (late)

Christophe Blefari

Early like my run ( credits ) Hey. This is a super late Data News, I wanted to send it earlier but I was travelling then enjoying time with friends and family. I'm still struggling a bit to write as fast as I would like, but 🤷‍♂️ So, sorry for the late edition and enjoy. Gen AI 🤖 Announcing Microsoft Copilot — Having everything under a common brand is great and Copilot is a great name.

Data 130
article thumbnail

Get Better Network Graphs & Save Analysts Time

Many organizations today are unlocking the power of their data by using graph databases to feed downstream analytics, enahance visualizations, and more. Yet, when different graph nodes represent the same entity, graphs get messy. Watch this essential video with Senzing CEO Jeff Jonas on how adding entity resolution to a graph database condenses network graphs to improve analytics and save your analysts time.

article thumbnail

Old School: Adapting Esri Basemaps for Printed Products

ArcGIS

Esri basemaps are designed to be used at multiple scales, but a static map needs everything in one view. How doe we get around that?

Designing 121
article thumbnail

Fueling Data-Driven Decision-Making with Data Validation and Enrichment Processes

Precisely

77% of data and analytics professionals say data-driven decision-making is the top goal for their data programs. Data-driven decision-making and initiatives are certainly in demand, but their success hinges on … well, the data that supports them. More specifically, the quality and integrity of that data. It seems obvious enough, but checking that your data is up to the task and taking any necessary steps to improve and maintain its quality can be easier said than done.

article thumbnail

A Comparative Overview of the Top 10 Open Source Data Science Tools in 2023

KDnuggets

Are you looking for the open source tools to help you in your data science journey? Look no further. Discover these game-changers that will elevate your data-driven decisions.

article thumbnail

CISSP-ISSAP Certification Salary in 2023: Complete Earnings

Knowledge Hut

In the realm of cybersecurity, the pursuit of expertise is not only a passion but a necessity. As I embarked on the journey to strengthen my skills and broaden my horizons, the Certified Information Systems Security Professional (CISSP) credential emerged as a key qualification for professional recognition. However, achieving the CISSP-Information Systems Security Architecture Professional (ISSAP) specialization demanded more than just dedication—it necessitated guidance through the best CISSP t

article thumbnail

Understanding User Needs and Satisfying Them

Speaker: Scott Sehlhorst

We know we want to create products which our customers find to be valuable. Whether we label it as customer-centric or product-led depends on how long we've been doing product management. There are three challenges we face when doing this. The obvious challenge is figuring out what our users need; the non-obvious challenges are in creating a shared understanding of those needs and in sensing if what we're doing is meeting those needs.

article thumbnail

Lessons from debugging a tricky direct memory leak

Pinterest Engineering

Sanchay Javeria | Software Engineer, Ads Data Infrastructure To support metrics reporting for ads from external advertisers and real-time ad budget calculations at Pinterest, we run streaming pipelines using Apache Flink. These jobs have guaranteed an overall 99th percentile availability to our users; however, every once in a while some tasks get hit with nasty direct out-of-memory (OOM) errors on multiple operators that look something like this: As is the case with most failures in a distribute

article thumbnail

Announcing the Public Preview of Lakeview Dashboards!

databricks

We are excited to announce the public preview of the next generation of Databricks SQL dashboards, dubbed Lakeview dashboards. Available today, this new.

SQL 100
article thumbnail

5 Free Books to Help You Master Python

KDnuggets

From the basics of Python to clean architecture and more, here are five free books to level up your Python skills.

Python 149
article thumbnail

IBM Technology Chooses Cloudera as its Preferred Partner for Addressing Real Time Data Movement Using Kafka

Cloudera

Organizations increasingly rely on streaming data sources not only to bring data into the enterprise but also to perform streaming analytics that accelerate the process of being able to get value from the data early in its lifecycle. As lakehouse architectures (including offerings from Cloudera and IBM) become the norm for data processing and building AI applications, a robust streaming service becomes a critical building block for modern data architectures.

Kafka 89
article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

Working with Esri Vector Basemaps in ArcGIS Pro

ArcGIS

Esri Vector Basemaps are available for use in ArcGIS Pro, and that opens up some new possibilities for you.

Designing 116
article thumbnail

easyJet bets on Databricks Lakehouse and Generative AI to be an Innovation Leader in Aviation

databricks

This blog is authored by Ben Dias, Director of Data Science and Analytics and Ioannis Mesionis, Lead Data Scientist at easyJet Introduction to.

article thumbnail

The Data Maturity Pyramid: From Reporting to a Proactive Intelligent Data Platform

KDnuggets

This article describes the data maturity pyramid and its various levels, from simple reporting to AI-ready data platforms. It emphasizes the importance of data for business and illustrates how data platforms serve as the driving force behind AI.

Data 112
article thumbnail

How Snowflake Native Apps Deliver Security for App Builders and Consumers

Snowflake

The Snowflake Native App Framework , which leverages Snowflake’s advanced architecture, allows for a new level of security for applications. This security spans not just the application consumer, but also the application providers. Controlling all software and infrastructure in the Snowflake Data Cloud, Snowflake can secure the application code to protect the intellectual property (IP) of builders.

Python 81
article thumbnail

How Embedded Analytics Gets You to Market Faster with a SAAS Offering

Start-ups & SMBs launching products quickly must bundle dashboards, reports, & self-service analytics into apps. Customers expect rapid value from your product (time-to-value), data security, and access to advanced capabilities. Traditional Business Intelligence (BI) tools can provide valuable data analysis capabilities, but they have a barrier to entry that can stop small and midsize businesses from capitalizing on them.

article thumbnail

Robinhood Partners with Operation HOPE in Support of the 1865 Project 

Robinhood

The initiative supports Operation HOPE’s mission to expand economic opportunity Robinhood Markets, Inc. is deepening our partnership with Operation HOPE through the 1865 Project , a new initiative that supports the organization’s mission to expand economic opportunity in underserved communities through financial education and empowerment. With our support, the 1865 project will allow Operation HOPE to further grow and scale its work across America and fuel innovative programs and technologies.

Project 76
article thumbnail

Governing cybersecurity data across multiple clouds and regions using Unity Catalog & Delta Sharing

databricks

According to a 2023 report from Enterprise Search Group, 85% of organizations indicated they deploy applications on two or more IaaS providers, attesting.

article thumbnail

Building a Convolutional Neural Network with PyTorch

KDnuggets

This blog post provides a tutorial on constructing a convolutional neural network for image classification in PyTorch, leveraging convolutional and pooling layers for feature extraction as well as fully-connected layers for prediction.

Building 110
article thumbnail

Molex Improves Data Sharing, Visibility, and Performance with the Snowflake Manufacturing Data Cloud

Snowflake

The unprecedented amount of volatility in supply and demand has caused economic uncertainty throughout the world. To navigate today’s challenging economy, manufacturers must digitize their supply chain and manufacturing processes. Digital advancements such as smart manufacturing and automation through AI, machine learning (ML), robotics, and IoT require a connected value chain ecosystem with a secure, scalable, and flexible data platform.

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

Combatting Spammers in Real Time: Unleashing the Power of AI

Confluent

Use Confluent connectors, stream governance and stream processing and artificial intelligence (AI) to block spammers, reduce network load, and improve your customers' experiences in real time.

article thumbnail

Ballard Power Systems RDU (Remote Diagnostics Unit) Visualization Platform for Interactive At-Scale Industrial IoT Streaming Analytics

databricks

This article represents a collaborative effort between Plotly, Ballard Power Systems, and Databricks. Fleets of buses worldwide run on hydrogen fuel cells made.

Systems 83
article thumbnail

Deploying Your First Machine Learning Model

KDnuggets

With just 3 simple steps, you can build & deploy a glass classification model faster than you can say.glass classification model!

article thumbnail

Marketing Success in the Age of AI: Celebrating EMEA’s Modern Marketing Data Stack Pioneers

Snowflake

Data is an invaluable asset in today’s marketing ecosystem. With its unique blend of cultures, economies, and regulatory environments, the EMEA market offers a nuanced picture of how marketers harness data technologies to understand their audiences, calibrate campaigns in real time, and adhere to complex government and industry regulations. In our second annual Snowflake Modern Marketing Data Stack 2023 report , we delve into actual usage and adoption of marketing technologies within the Snowfla

article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.