Sat.May 27, 2023 - Fri.Jun 02, 2023

article thumbnail

What's new in Apache Spark 3.4.0 - Structured Streaming

Waitingforcode

The asynchronous progress tracking and correctness issue fixes presented in the previous blog posts are not the single new feature in Apache Spark Structured Streaming 3.4.0. There are many others but to keep the blog post readable, I'll focus here only on 3 of them.

130
130
article thumbnail

Data News — Week 23.21

Christophe Blefari

Me ( credits ) Hey, I've been sick in the last 3 days and it was impossible to write something. As I still want to send something, here a raw edition with no comments. See you on Friday. Gen Ai 🤖 QLoRA: Efficient Finetuning of Quantized LLMs — 65B parameter model on a single 48GB GPU reaching 99.3% of the performance level of ChatGPT on Vicuna.

Data 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

A Roadmap To Bootstrapping The Data Team At Your Startup

Data Engineering Podcast

Summary Building a data team is hard in any circumstance, but at a startup it can be even more challenging. The requirements are fluid, you probably don't have a lot of existing data talent to manage the hiring and onboarding, and there is a need to move fast. Ghalib Suleiman has been on both sides of this equation and joins the show to share his hard-won wisdom about how to start and grow a data team in the early days of company growth.

Data 130
article thumbnail

An educational side project

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of four topics in today’s subscriber-only The Scoop issue. If you’re not yet a full subscriber, you missed this week’s deep-dive on Agoda’s private cloud setup. To get the full issues, twice a week, subscribe here.

130
130
article thumbnail

Bard for Data Science Cheat Sheet

KDnuggets

Check out our latest cheat sheet to get you up to speed and provide a handy reference for using Google's LLM chat tool Bard for data science.

article thumbnail

Startup Spotlight: Making Snowflake Queries Smarter and Cheaper with Sundeck 

Snowflake

Welcome to Snowflake’s Startup Spotlight, where we highlight the people and companies building businesses on Snowflake. In this Q&A series, Jacques Nadeau, Co-Founder and CEO of Sundeck and co-creator of Apache Arrow, talks about what inspires him to make powerful data tools available to all, how Sundeck’s query engineering platform can help Snowflake users, and why they “eat, sleep, and drink” Snowflake every day at Sundeck.

SQL 96

More Trending

article thumbnail

Easy Ingestion to Lakehouse with File Upload and Add Data UI

databricks

Data ingestion into the Lakehouse can be a bottleneck for many organizations, but with Databricks, you can quickly and easily ingest data of.

article thumbnail

Top 10 Tools for Detecting ChatGPT, GPT-4, Bard, and Claude

KDnuggets

Top free tools for detecting thesis, research papers, assignments, documentation, and blogs generated by AI models.

138
138
article thumbnail

How DoorDash uses XcodeGen to eliminate project merge conflicts

DoorDash Engineering

At DoorDash, we work to implement efficient processes that can mitigate common conflicts within a large iOS development team. Part of those efforts involve using XcodeGen, a command line interface (CLI), to reduce merging conflicts within our various iOS teams. Here we will discuss its implementation to manage the intricate business scenarios and demanding requirements of the Dasher app, which lets our drivers receive, pick up, and securely deliver orders to customers.

Project 72
article thumbnail

My thoughts on Chat - GPT (controversial but true)

Medium Data Engineering

By all mercy, I don't want to be a victim of roko's basilisk, the thought experiment that is to be credited for the infamous wedding of… Continue reading on Medium »

article thumbnail

Welcoming bit.io to Databricks: Investing in the Developer Experience

databricks

We are excited to announce that bit.io is joining Databricks. At Databricks, we’ve always been focused on empowering organizations to solve their toughest p.

80
article thumbnail

KDnuggets Top Posts for March 2023: AutoGPT: Everything You Need To Know

KDnuggets

AutoGPT: Everything You Need To Know • Top 19 Skills You Need to Know in 2023 to Be a Data Scientist • 8 Open-Source Alternative to ChatGPT and Bard • LangChain 101: Build Your Own GPT-Powered Applications • 10 Websites to Get Amazing Data for Data Science Projects • Baby AGI: The Birth of a Fully Autonomous AI • Mastering Generative AI and Prompt Engineering: A Free eBook • Data Analytics: The Four Approaches to Analyzing Data and How To Use Them Effectively

article thumbnail

How to Handle Authentication in Angular SPAs?

Workfall

Reading Time: 4 minutes Angular is a good framework for creating Single Page Applications (SPAs) using JavaScript/TypeScript. With Single Page Applications, routing is handled on the client side. This calls for protecting routes on the client side as well. Angular comes with the Angular Routing module which handles routing. Sometimes you will have protected resources that you will only want your user to see the UI if and only if they are authenticated.

article thumbnail

Unlocking Scalable Workflow Orchestration: An Introduction to Argo Workflow

Medium Data Engineering

In the world of data-driven applications and complex workflows, efficient orchestration systems are crucial for managing and automating… Continue reading on Medium »

Data 93
article thumbnail

The Great Unlock: Large Language Models in Manufacturing

databricks

The manufacturing industry is constantly finding new ways to increase automation, gain operational visibility and accelerate product and technology development. This requires companies.

article thumbnail

Deep Learning with R

KDnuggets

In this tutorial, learn how to perform a deep learning task in R.

article thumbnail

Generative AI for the Enterprise

Cloudera

Riding the wave of the generative AI revolution, third party large language model (LLM) services like ChatGPT and Bard have swiftly emerged as the talk of the town, converting AI skeptics to evangelists and transforming the way we interact with technology. For proof of this megatrend look no further than the instant success of ChatGPT, where it set the record for the fastest-growing user base, reaching 100 million users in just 2 months after its launch.

article thumbnail

Data Engineering: Fast Spatial Joins Across ~2 Billion Rows on a Single Old GPU

Medium Data Engineering

Comparing the performance of ORC and Parquet on spatial joins across 2 Billion rows on an old Nvidia GeForce GTX 1060 GPU on a local… Continue reading on Towards Data Science »

article thumbnail

What is a MMM and why does it matter for marketers?

databricks

MMM (Marketing or Media Mix Modeling), is a data-driven methodology that enables companies to identify and measure the impact of their marketing campaigns.

IT 77
article thumbnail

The Top AutoML Frameworks You Should Consider in 2023

KDnuggets

AutoML frameworks are powerful tool for data analysts and machine learning specialists that can automate data preprocessing, model selection, hyperparameter tuning, and even perform complex tasks like feature engineering.

article thumbnail

ThoughtSpot Sage: data security with large language models

ThoughtSpot

With the recent announcement of ThoughtSpot Sage , we launched a number of enhancements to our search capabilities including AI-generated answers, AI-powered search suggestions, and AI-assisted data modeling. In this article we will walk you through the steps we take to secure your data during the LLM interaction. Looking more broadly, we’ll also describe the security process we follow during any application iteration or enhancement, so you can see the great lengths we take to keep your data se

article thumbnail

Data Engineering: Fast Spatial Joins Across ~2 Billion Rows on a Single Old GPU

Towards Data Science

Comparing the performance of ORC and Parquet on spatial joins across 2 Billion rows on an old Nvidia GeForce GTX 1060 GPU on a local machine Photo by Clay Banks on Unsplash Over the past few weeks I have been digging a bit deeper into the advances that GPU data processing libraries have made since I last focused on it in 2019. In 4 years I have found that many of the libraries that were in early alpha in 2019 have matured into solid projects that are being used in real world situations.

article thumbnail

Share Pop-up Charts from the Spatial Statistics and Space Time Pattern Mining Toolboxes to ArcGIS Online

ArcGIS

Use the Convert Spatial Statistics Popup Charts for Web Display tool to view the pop-up charts from your analysis in ArcGIS Online.

69
article thumbnail

4 Career Lessons That Helped Me Navigate the Difficult Job Market

KDnuggets

In this blog, I share 4 valuable lessons I learned while searching for data science roles amidst challenging circumstances, including 60-day immigration policies, layoffs, and health issues. My hope is to offer insights and guidance to those who are facing similar obstacles, whether due to recent layoffs or immigration challenges.

article thumbnail

Fivetran: Simplifying Data Pipeline Management and Accelerating Insights

Medium Data Engineering

Introduction: In the realm of data integration and pipeline management, Fivetran has emerged as a game-changer, enabling organizations to… Continue reading on Medium »

article thumbnail

Deep Dive into RabbitMQ with Spring Boot - Part 1 by Sonali Mendis

Scott Logic

This is the first of a series of 2 posts on RabbitMQ with Spring Boot. In this post, I intend to explain the Spring version of RabbitMQ Acknowledgement Modes. Part 2 will elaborate how to tweak your RabbitMQ configuration to alter the retry behaviour, and how to add parallel consumers, in your RabbitMQ Spring Boot Application. This article is not for you if you are only interested in getting a basic RabbitMQ publisher/consumer pattern to work in your Spring Boot Application.

Systems 52
article thumbnail

May the Speed be with You: 20K QPS on Rockset

Rockset

Scalability, performance and efficiency are the key considerations behind Rockset’s design and architecture. Today, we are thrilled to share a remarkable milestone in one of these dimensions. A customer workload achieved 20K queries per second (QPS) with a query latency (p95) of under 100ms, marking a significant demonstration of the scalability of our systems.

article thumbnail

How Hard is it to Get into FAANG Companies

KDnuggets

This article explores the history and current state of FAANG companies, and how low acceptance rates for these companies may be due to the rapid growth of the tech industry.

IT 69
article thumbnail

How to Build a Data Quality Integrity Framework

Monte Carlo

In a data-driven world, data integrity is the law of the land. And if data integrity is the law, then a data quality integrity framework is the FBI, the FDA, and the IRS all rolled into one. Data integrity is the ability to trust that a company’s data is reliable, compliant, and secure based on internal, industry, and regulatory standards. Because if we can’t trust our data, we also can’t trust the products they’re creating.

Data 52
article thumbnail

Deep Dive into RabbitMQ with Spring Boot - Part 2 by Sonali Mendis

Scott Logic

This is the final post of a series of 2 posts on RabbitMQ with Spring Boot. In the previous post, I explained the spring version of RabbitMQ acknowledgement mode. In this post, I’m hoping to explain how to tweak your RabbitMQ configuration to alter the retry behaviour, and how to add multiple consumers to allow parallel processing, in your Spring Boot application.

Coding 52
article thumbnail

Data Pipeline vs. ETL: Which Delivers More Value?

Ascend.io

In the modern world of data engineering, two concepts often find themselves in a semantic tug-of-war: data pipeline and ETL. In the early stages of data management evolution, ETL processes offered a substantial leap forward in how we handled data – they provided a structured, systematic way to move data from one place to another, transforming it along the way to fit specific needs.

article thumbnail

The Role of Open Source Tools in Accelerating Data Science Progress

KDnuggets

Open source tools have had a pivotal role in the evolution of data science, from providing the foundation for analysis, to fueling the innovation that shapes today's landscape. The open source impact on data science is demonstrated best by looking at the relationship's past, present, and future.

article thumbnail

How Backcountry Increases Data Team Efficiency by 30% with Monte Carlo

Monte Carlo

Online retailer Backcountry knows a thing or two about big adventures. Across multiple specialty brands and websites, the Park City, Utah-based company sells clothing and gear for outdoor sports enthusiasts. From hiking and camping to mountain biking and ice climbing, they cater to all kinds of experiences. But within the organization, one recent journey required some extra-special gear: the migration from a legacy platform to a modern, cloud-based data stack.

Data 52
article thumbnail

Snowflake Connector for Microsoft Power Platform Now Available 

Snowflake

Today, we’re excited to announce the Snowflake Connector for Microsoft Power Platform is now available. This connector provides instant access to up-to-date data within your Snowflake instance without manually integrating against API endpoints. Now anyone can easily build low-code applications or workflows on Power Platform that leverage Snowflake data without any previous technical or app development experience.

Coding 52