Sat.Jul 15, 2023 - Fri.Jul 21, 2023

article thumbnail

H1 2023 Analytics & Data Science Spend & Trends Report

KDnuggets

The All Things Insights and marketing analytics and data science community completed an extensive survey covering what executives are thinking, how they’re spending and the issues and opportunities they face. Grab your free copy now.

article thumbnail

How to initialize state in Apache Spark Structured Streaming stateful jobs?

Waitingforcode

Starting from Apache Spark 3.2.0 is now possible to load an initial state of the arbitrary stateful pipelines. Even though the feature is easy to implement, it hides some interesting implementation details!

IT 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Data Engineering Best Practices - #1. Data flow & Code

Start Data Engineering

1. Introduction 2. Sample project 3. Best practices 3.1. Use standard patterns that progressively transform your data 3.2. Ensure data is valid before exposing it to its consumers (aka data quality checks) 3.3. Avoid data duplicates with idempotent pipelines 3.4. Write DRY code & keep I/O separate from data transformation 3.5. Know the when, how, & what (aka metadata) of pipeline runs for easier debugging 3.

Coding 130
article thumbnail

Datapreneurs - How Todays Business Leaders Are Using Data To Define The Future

Data Engineering Podcast

Summary Data has been one of the most substantial drivers of business and economic value for the past few decades. Bob Muglia has had a front-row seat to many of the major shifts driven by technology over his career. In his recent book "Datapreneurs" he reflects on the people and businesses that he has known and worked with and how they relied on data to deliver valuable services and drive meaningful change.

SQL 130
article thumbnail

Navigating the Future: Generative AI, Application Analytics, and Data

Generative AI is upending the way product developers & end-users alike are interacting with data. Despite the potential of AI, many are left with questions about the future of product development: How will AI impact my business and contribute to its success? What can product managers and developers expect in the future with the widespread adoption of AI?

article thumbnail

4 Alternatives to Fivetran: The Evolving Dynamics of the ETL & ELT Tool Market

Seattle Data Guy

The ETL & ELT tool market is experiencing continuous transformation, propelled by fluctuating pricing structures and the advent of inventive alternatives. This industry remains fiercely competitive due to these changing elements and a swiftly growing user base. In the following sections, we will explore four emerging alternatives to Fivetran. Of course, that is if you… Read more The post 4 Alternatives to Fivetran: The Evolving Dynamics of the ETL & ELT Tool Market appeared first

article thumbnail

Data News — Week 23.28

Christophe Blefari

Have fun train models on this ( credits ) Hey, it's Saturday I hope you're enjoying July, taking deserve break, reading data engineering articles while at the beach or traveling to unknown places. Sometimes there are Fridays when I don't find any glue between articles for the newsletter and I have an idea of something to compensate but it takes me the whole Friday of exploration.

Datasets 130

More Trending

article thumbnail

The Drag-and-Drop UI for Building LLM Flows: Flowise AI

KDnuggets

Don’t have any coding experience? Don’t worry. Check out this drag-and-drop tool that helps you to build your own customized LLM flows. And guess what, you don’t have to be a tech professional!

article thumbnail

Building your Generative AI apps with Meta's Llama 2 and Databricks

databricks

Today, Meta released their latest state-of-the-art large language model (LLM) Llama 2 to open source for commercial use1. This is a significant development.

article thumbnail

How ThoughtSpot Partnered with Google Cloud to put AI at the center of BI

ThoughtSpot

At ThoughtSpot, we believe making data accessible to every knowledge worker requires human-centered technology—an analytics experience that bridges the “language” barrier between technology and people. AI is the perfect compliment to search because it empowers organizations to analyze, understand, and act on data. In order to achieve this vision, we knew we’d need to work with some of the best, most innovative technology companies across the modern data stack —companies that put their users fir

article thumbnail

How to Master Data Transformations with DBT Materializations?

Workfall

Reading Time: 8 minutes Picture yourself in the bustling world of a leading streaming platform, where countless users rely on personalized recommendations for their next binge-watching adventure. Behind the scenes, a team of data wizards tirelessly crunches mountains of data to make those recommendations sparkle. As one of those wizards, we’ve seen the challenges we face: the struggle to transform massive datasets into meaningful insights, all while keeping queries fast and our system scal

article thumbnail

Get Better Network Graphs & Save Analysts Time

Many organizations today are unlocking the power of their data by using graph databases to feed downstream analytics, enahance visualizations, and more. Yet, when different graph nodes represent the same entity, graphs get messy. Watch this essential video with Senzing CEO Jeff Jonas on how adding entity resolution to a graph database condenses network graphs to improve analytics and save your analysts time.

article thumbnail

Unveiling the Power of Meta’s Llama 2: A Leap Forward in Generative AI?

KDnuggets

This article explores the technical details and implications of Meta's newly released Llama 2, a large language model that promises to revolutionize the field of generative AI. We delve into its capabilities, performance, and potential applications, while also discussing its open-source nature and the company's commitment to safety and transparency.

IT 86
article thumbnail

Never Miss a Beat: Announcing New Monitoring and Alerting capabilities in Databricks Workflows

databricks

We are excited to announce enhanced monitoring and observability features in Databricks Workflows. This includes a new real-time insights dashboard to see all.

84
article thumbnail

Storing a network diagram or not… This is a real question to consider!

ArcGIS

The purpose is to learn what network diagram storage means and provide guidance to avoid unnecessarily increasing database sizes.

article thumbnail

Being first to market with rideshare on CarPlay and Android Auto

Lyft Engineering

Our cross-functional development process By: Aastha Bhargava , Jake Hercules , Erik Kamp , Michael Ramdatt , Nathan Van Fleet , Rex Lam , Kieran Gupta Product For years, drivers have been clear about what they wanted: native Lyft support for CarPlay and Android Auto. They’ve made the request across social media platforms, through the app, and in feedback sessions with Lyft researchers.

article thumbnail

Understanding User Needs and Satisfying Them

Speaker: Scott Sehlhorst

We know we want to create products which our customers find to be valuable. Whether we label it as customer-centric or product-led depends on how long we've been doing product management. There are three challenges we face when doing this. The obvious challenge is figuring out what our users need; the non-obvious challenges are in creating a shared understanding of those needs and in sensing if what we're doing is meeting those needs.

article thumbnail

Data storytelling – the art of telling stories through data

KDnuggets

Data Storytelling with Python Altair and Generative AI teaches you how to turn raw data into effective, insightful data stories. You’ll learn exactly what goes into an effective data story, then combine your Python data skills with the Altair library and AI tools to rapidly create amazing visualizations.

article thumbnail

Analyzing Time Series for Pinterest Observability

Pinterest Engineering

Brian Overstreet | Software Engineer, Observability; Humsheen Geo | Software Engineer, Observability Time series is a critical part of Observability at Pinterest, powering 60,000 alerts and 5,000 dashboards. A time series is an identifier with values where the values are associated with a timestamp. Given the widespread use and critical nature of time series, it’s important to give engineers the ability to adequately express what operations to perform on the time series in a readable, understand

article thumbnail

Unlock The Full Potential Of Hive

Cloudera

In the realm of big data analytics, Hive has been a trusted companion for summarizing, querying, and analyzing huge and disparate datasets. But let’s face it, navigating the world of any SQL engine is a daunting task, and Hive is no exception. As a Hive user, you will find yourself wanting to go beyond surface-level analysis, and deep dive into the intricacies of how a Hive query is executed.

BI 72
article thumbnail

Databricks + MosaicML

databricks

Today, we’re excited to share that we’ve completed our acquisition of MosaicML, a leading platform for creating and customizing generative AI models for you.

77
article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

GPT-Engineer: Your New AI Coding Assistant

KDnuggets

GPT-Engineer is an AI-powered application builder that generates codebases from project descriptions. It simplifies building applications, including our key-value database example, and works well with GPT-4.

Coding 83
article thumbnail

Unlocking the Power of Data: Key Aspects of Effective Data Products

The Modern Data Company

Data Products Data products encompass several key aspects that contribute to their effectiveness and value in addressing data challenges and delivering actionable insights. These aspects ensure that data products are well-designed, user-centric, and aligned with business goals. Let’s explore the key aspects of a data product: Clear Purpose and Goals A data product must have a clear purpose and well-defined goals that align with the organization’s objectives.

article thumbnail

Career & Motherhood: How Cloudera Helped Me Transition Into Motherhood With Twins

Cloudera

Congratulations on your pregnancy! Finding out you are pregnant is an exciting and life-changing experience, but it can also bring some unexpected challenges – especially if you find out you’re pregnant with twins after accepting a new job offer. That’s exactly what happened to me. I was thrilled to secure a job at Cloudera, a company I greatly admired.

article thumbnail

Go from Months to Hours with Databricks Marketplace for Retailers

databricks

Let's say a distributor reached out wanting to understand what factors are driving the sale of carbonated beverages from customers in their convenience.

Retail 74
article thumbnail

How Embedded Analytics Gets You to Market Faster with a SAAS Offering

Start-ups & SMBs launching products quickly must bundle dashboards, reports, & self-service analytics into apps. Customers expect rapid value from your product (time-to-value), data security, and access to advanced capabilities. Traditional Business Intelligence (BI) tools can provide valuable data analysis capabilities, but they have a barrier to entry that can stop small and midsize businesses from capitalizing on them.

article thumbnail

How SAS can help catapult practitioners’ careers

KDnuggets

Let's explore the journeys of SAS users who harnessed the power of SAS to unlock new opportunities and achieve their career goals.

96
article thumbnail

Getting started with SAR satellite imagery

ArcGIS

This blog shares the resource to the ArcGIS Pro Learn Series about SAR satellite imagery.

article thumbnail

Environmental Impact – The supplier problem by Graham Odds

Scott Logic

We recently published our annual Environmental Impact Report , which documents Scott Logic’s carbon footprint in 2022, describes what we are currently doing to reduce our ongoing environmental impact, and sets out our roadmap to net zero. I’m extremely proud that we are managing to reduce our total emissions even as our business grows. Go read the report for all the details.

article thumbnail

5 Inspiring Learning Resources That Help Me Stay on Top of Data Analytics

Towards Data Science

5 Inspiring Learning Resources to Propel Your Skills and Expertise Continue reading on Towards Data Science »

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

Automating the Chain of Thought: How AI Can Prompt Itself to Reason

KDnuggets

Auto-CoT prompting method has LLMs automatically generate their own demonstrations to prompt complex reasoning, using diversity-based sampling and zero-shot generation, reducing human effort in creating prompts. Experiments show it matches performance of manual prompting across reasoning tasks.

IT 82
article thumbnail

Powering the Latest LLM Innovation, Llama v2 in Snowflake, Part 1

Snowflake

This blog series covers how to run, train, fine-tune, and deploy large language models securely inside your Snowflake Account with Snowpark Container Services This year there has been a surge of progress in the world of open source large language models (LLMs). This world of free and open source LLMs took yet another major step forward just this week with Meta’s release of Llama v2.

SQL 63
article thumbnail

The Executive’s Guide to Data, Analytics and AI Transformation, Part 7: Move to production and scale adoption

databricks

This is part seven of a multi-part series to share key insights and tactics with Senior Executives leading data and AI transformation initiatives.

article thumbnail

Unlocking the Secrets of Slowly Changing Dimension (SCD): A Comprehensive View of 8 Types

Towards Data Science

Deep Dive Guide for When and How to Use 8 Types of SCD Continue reading on Towards Data Science »

article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.