Wed.Mar 29, 2023

article thumbnail

Table file formats - Z-Order compaction: Delta Lake

Waitingforcode

In my recent exploration of the compaction, aka OPTIMIZE command, in Delta Lake, I found this famous Z-Ordering mode. It was one of the most outstanding features when I first heard about Delta Lake. You can't even imagine how impatient I was to see what it is doing under-the-hood. Fortunately, this time has come!

IT 130
article thumbnail

A Complete Collection of Data Science Free Courses – Part 2

KDnuggets

The second part covers the list of Machine Learning, Deep Learning, Computer Vision, Natural Language Processing, Data Engineering, and MLOps.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Confluent Achieves Google Cloud Ready - AlloyDB Designation

Confluent

Confluent announced that it has successfully achieved Google Cloud Ready - AlloyDB Designation for AlloyDB for PostgreSQL, Google Cloud’s newest fully managed PostgreSQL-compatible database service for the most demanding enterprise database workloads.

article thumbnail

Run SQL Queries on Databricks From Visual Studio Code

databricks

Today, we are excited to announce that users can now run SQL queries on Databricks from within Visual Studio Code via a preview.

SQL 107
article thumbnail

Get Better Network Graphs & Save Analysts Time

Many organizations today are unlocking the power of their data by using graph databases to feed downstream analytics, enahance visualizations, and more. Yet, when different graph nodes represent the same entity, graphs get messy. Watch this essential video with Senzing CEO Jeff Jonas on how adding entity resolution to a graph database condenses network graphs to improve analytics and save your analysts time.

article thumbnail

KDnuggets News, March 29: Automate the Boring Stuff with GPT-4 • Python Libraries for Data Cleaning

KDnuggets

Automate the Boring Stuff with GPT-4 and Python • Introduction to Python Libraries for Data Cleaning • Google Answer to ChatGPT by Adding Generative AI into Docs and Gmail • Top 15 YouTube Channels to Level Up Your Machine Learning Skills • 3 Mistakes That Could Be Affecting the Accuracy of Your Data Analytics

Python 77
article thumbnail

#ClouderaLife Volunteer Spotlight Pet Day Special: Julia Ostrowski

Cloudera

In this special edition of Cloudera Cares #VolunteerSpotlight for International Pet Day, Julia Ostrowski shares her experience of fostering cats and dogs for the past two decades. About you Where are you based? I live in beautiful downtown San Jose, CA and live in a house built in 1904. What’s your job role? I have been with the Support org since starting with Hortonworks almost 8 years ago (cannot believe it’s been that long) as Director of Support Enablement, until this January when I moved

Medical 76

More Trending

article thumbnail

Sliding Windows in Pandas

Towards Data Science

Identify Patterns in Time-Series Data with Overlapping Window Techniques Continue reading on Towards Data Science »

article thumbnail

Security best practices for the Databricks Lakehouse Platform

databricks

Your data security is our priority At Databricks, we know that data is one of your most valuable assets and always has to.

article thumbnail

Configuring Http4s Security: CORS and CSRF

Rock the JVM

This article is brought to you by Herbert Kateu , a new contributor. He got started after applying for the fresh new Technical Writer position at our new Job Board ! 1. Introduction With the growing number of cyber-attacks ever increasing, there’s a growing need for security in the applications we build. Http4s comes with several easily configurable security features and in this article, we will cover the two most common, CORS and CSRF.

Scala 58
article thumbnail

Meeting Data Wellness Needs in the Healthcare Industry

Confluent

Trusted healthcare puts Confluent at the heart of their organization to access data in real time to enable discovery of new insights and meet the needs of patients

article thumbnail

Understanding User Needs and Satisfying Them

Speaker: Scott Sehlhorst

We know we want to create products which our customers find to be valuable. Whether we label it as customer-centric or product-led depends on how long we've been doing product management. There are three challenges we face when doing this. The obvious challenge is figuring out what our users need; the non-obvious challenges are in creating a shared understanding of those needs and in sensing if what we're doing is meeting those needs.

article thumbnail

Monte Carlo Achieves Google Cloud Ready – BigQuery Designation

Monte Carlo

Today, I’m excited to share that Monte Carlo’s data observability platform has achieved a Google Cloud Ready – BigQuery designation. The Google Cloud Ready – BigQuery designation recognizes partner solutions that have met a core set of requirements to ensure the best possible integration between the partner product and BigQuery.

article thumbnail

Superset Community Newsletter!

Preset

Welcome to the Superset Community Monthly Newsletter

52
article thumbnail

Monte Carlo Launches Roadshow to Educate Children About Data Observability

Monte Carlo

SAN FRANCISCO, April 1, 2023 (PR NEWSWIRE) – Monte Carlo Data, creator of the data observability category, announced its first-ever brand mascot, “Billi the Observabili-Bear,” with an elementary school roadshow to educate future data engineers about the dangers of data downtime. “Billi’s Observability Bonanza,” developed in partnership with California Public Schools, aims to teach children between the ages of 5 and 12 about the importance of data reliability and how a proactive approach to data

article thumbnail

Iceberg Tables: Catalog Support Now Available

Snowflake

As announced at Snowflake Summit 2022 , Iceberg Tables combines unique Snowflake capabilities with Apache Iceberg and Apache Parquet open source projects to support your architecture of choice. As part of the latest Iceberg release, we’ve added catalog support to the Iceberg project to ensure that engines outside of Snowflake can interoperate with Iceberg Tables.

article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

Monte Carlo Achieves Google Cloud Ready – BigQuery Designation

Monte Carlo

Today, I’m excited to share that Monte Carlo’s data observability platform has achieved a Google Cloud Ready – BigQuery designation. The Google Cloud Ready – BigQuery designation recognizes partner solutions that have met a core set of requirements to ensure the best possible integration between the partner product and BigQuery.

article thumbnail

Frostbyte Team Drives Insights into Sports Fans with a Custom Fan 360 Solution 

Snowflake

The global sports industry is big business. In 2023 in North America alone, the market is expected to be worth over $83bn. By 2026 the global sports market is expected to reach over 700 billion, according to Statista. But globally, many sporting organizations lack an understanding who exactly their fans are and what they are interested in, which makes it difficult for these organizations to tailor customer experiences based on what customers want.

Media 54
article thumbnail

Using GPT-3.5-Turbo and GPT-4 to Apply Text-defined Data Quality Checks on Humanitarian Datasets

Towards Data Science

Using GPT-3.5-Turbo and GPT-4 for Predicting Humanitarian Data Categories Image created by Stable Diffusion with prompt ‘Predicting Cats’. TL;DR In this article, I explore using GPT-3.5-Turbo and GPT-4 to categorize datasets without the need for labeled data or model training, by prompting the model with data excerpts and category definitions. Using a small sample of categorized ‘Data Grid’ datasets found on the amazing Humanitarian Data Exchange (HDX), zero-shot prompting of GPT-4 resulted in 9

article thumbnail

Six Books that Have Shaped My Data Career

Towards Data Science

Great reads on modeling, processes, and leadership Photo by Emil Widlund on Unsplash At the very start of my journey in data, I thought I was going to be a data scientist, and my first foray into data was centered on studying statistics and linear algebra, not software engineering or database management. Fairly early in my career, however, I realized that I enjoyed building data assets more than reports or ML models.

article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.