Sat.Feb 17, 2024 - Fri.Feb 23, 2024

article thumbnail

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Multiple open source projects and vendors have been working together to make this vision a reality. In this episode Dain Sundstrom, CTO of Starburst, explains how the combination of the Trino query engine and the Iceberg table format offer the ease of use and execution speed of data warehouses with the infinite storage and sc

Data Lake 262
article thumbnail

Data News — Week 24.08

Christophe Blefari

My ideas these days ( credits ) Hey, fresh Data News edition. This week I've participated to a round table about data and did a cool presentation about Engines. The idea was to depict the history of engines over the last 40 years and what leads to polars and DuckDB. Obviously the I forgot a few things and I'll do a more complete v2 soon. This is my third presentation about DuckDB in the last 3 months and I think I'll slow down a bit until I get new crazy things to share.

Data Lake 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Data Engineering Best Practices - #2. Metadata & Logging

Start Data Engineering

1. Introduction 2. Setup & Logging architecture 3. Data Pipeline Logging Best Practices 3.1. Metadata: Information about pipeline runs, & data flowing through your pipeline 3.2. Obtain visibility into the code’s execution sequence using text logs 3.3. Understand resource usage by tracking Metrics 3.4. Monitoring UI & Traceability 3.5.

Metadata 130
article thumbnail

Min rate limits for Apache Kafka

Waitingforcode

I bet you know it already. You can limit the max throughput for Apache Spark Structured Streaming jobs for popular data sources such as Apache Kafka, Delta Lake, or raw files. Have you known that you can also control the lower limit, at least for Apache Kafka?

Kafka 130
article thumbnail

Get Better Network Graphs & Save Analysts Time

Many organizations today are unlocking the power of their data by using graph databases to feed downstream analytics, enahance visualizations, and more. Yet, when different graph nodes represent the same entity, graphs get messy. Watch this essential video with Senzing CEO Jeff Jonas on how adding entity resolution to a graph database condenses network graphs to improve analytics and save your analysts time.

article thumbnail

The Abstraction Problem – A Great Evil

Confessions of a Data Guy

There is a great evil Spirit that is haunting the streets of code in the land of programmers. It’s a Spirit of obfuscation and twisting things into what they are not. The Spirit wanders around on the loose looking for someone, and it finds ready victims among the ranks of new programmers and the innocent […] The post The Abstraction Problem – A Great Evil appeared first on Confessions of a Data Guy.

Coding 113
article thumbnail

ArcGIS Pro 3.3 Moves to.NET 8

ArcGIS

ArcGIS Pro 3.3 is planned to be available in May 2024. Install.NET 8 before attempting to install ArcGIS Pro 3.3 for the best user experience!

143
143

More Trending

article thumbnail

A Roadmap For Your Data Career

KDnuggets

As you design your career in data, you’ve got to avoid getting stuck in your comfort zone or allowing your manager or current situation to determine your path.

Data 128
article thumbnail

New SQL Practice Problems

Confessions of a Data Guy

New SQL Practice Problems I’m trying something new. I get a lot of questions from folks about getting into the Data Engineering space, how to get better, grow, learn, etc. So I came up with a solution. SQL Practice Problems. Some moons ago I wrote a Data Engineering Practice repo on GitHub for free, and some 1.2K stars later […] The post New SQL Practice Problems appeared first on Confessions of a Data Guy.

SQL 100
article thumbnail

Is the modern data stack disappearing?

Christophe Blefari

No. This question generated a lot of content last week, and a lot of words were written. I wanted to keep my answer short so as not to burden you with a few thousand more words to read. Modern data stack has been coined by US companies and VCs—mainly Fivetran / dbt Labs—as a word to quickly emphasis a way to build data stack in the cloud related to ELT.

Data 100
article thumbnail

Top digital trends for 2024: Predictions and insights

InData Labs

Top digital trends for 2024 will be unprecedented technological advancements that will reshape the way businesses operate. Introducing them into corporate structures is a strategic move for all companies that want to stay ahead of the curve. The tech and digital marketing industry trends we discuss below will change the way organizations handle customer service, Запись Top digital trends for 2024: Predictions and insights впервые появилась InData Labs.

article thumbnail

Understanding User Needs and Satisfying Them

Speaker: Scott Sehlhorst

We know we want to create products which our customers find to be valuable. Whether we label it as customer-centric or product-led depends on how long we've been doing product management. There are three challenges we face when doing this. The obvious challenge is figuring out what our users need; the non-obvious challenges are in creating a shared understanding of those needs and in sensing if what we're doing is meeting those needs.

article thumbnail

Python in Finance: Real Time Data Streaming within Jupyter Notebook

KDnuggets

Learn a modern approach to stream real-time data in Jupyter Notebook. This guide covers dynamic visualizations, a Python for quant finance use case, and Bollinger Bands analysis with live data.

Finance 120
article thumbnail

Location Referencing Guide to Esri Partner Conference and Esri Developer Summit

ArcGIS

Join us for an exciting Partner Conference and Developer Summit! Discover the latest in ArcGIS Location Referencing and connect with experts.

article thumbnail

Announcing the General Availability of Azure Private Link and Azure Storage firewall support for Databricks SQL Serverless

databricks

We are excited to announce the upcoming general availability of Azure Private Link support for Databricks SQL (DBSQL) Serverless, planned in April 2024.

SQL 107
article thumbnail

Aligning Velox and Apache Arrow: Towards composable data management

Engineering at Meta

We’ve partnered with Voltron Data and the Arrow community to align and converge Apache Arrow with Velox , Meta’s open source execution engine. Apache Arrow 15 includes three new format layouts developed through this partnership: StringView, ListView, and Run-End-Encoding (REE). This new convergence helps Meta and the larger community build data management systems that are unified, more efficient, and composable.

article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

Navigating the Data Revolution: Exploring the Booming Trends in Data Science and Machine Learning

KDnuggets

Dive into transformative trends in data science, encompassing AI-powered automation, NLP, ethical considerations, decentralized computing, and interdisciplinary collaboration.

article thumbnail

WebSockets in Http4s

Rock the JVM

by Herbert Kateu 1. Introduction The WebSocket protocol enables persistent two-way communication between a client and a server where packets can be passed in both directions without the need for additional HTTP requests. The specification for this protocol is outlined in RFC 6455. WebSockets are used in applications such as Instant Messaging, Gaming, Simultaneous editing, and stock tickers to mention but a few.

Scala 94
article thumbnail

Strengthening Cyber Resilience through Efficient Data Management: A Response to M-21-31

databricks

In today's environment, proactive cybersecurity is crucial to any public sector agency. For many organizations, log data that security professionals need for effective.

article thumbnail

Simplify Application Development With Hybrid Tables

Snowflake

We previously announced Snowflake’s Unistore workload , which continues Snowflake’s legacy of breaking down data silos by uniting transactional and analytical data in a consistent and governed platform. Today, we are pleased to announce that Hybrid Tables — the core feature powering Unistore — is in public preview in select AWS regions. Hybrid Tables is a new table type that enables transactional use cases within Snowflake with fast, high-concurrency point operations.

article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

Prompt Engineering: An Integrated Dream

KDnuggets

Clickbait headlines like "AI's Hottest Job" have promised a career that anyone who knows how to chat with AI could pay a six-figure salary with no computer background. But is this reality, or just another internet pipe dream? Let's ditch the sensationalism and delve into the actual job market data to find out.

article thumbnail

8 Tips for Managing Stakeholder Expectations

Knowledge Hut

Why Stakeholder Management? One of the most critical aspects of project management is doing what’s necessary to develop and control relationships with all individuals that the project impacts. In this article, you will learn techniques for identifying stakeholders, analyzing their influence on the project, and developing strategies to communicate, set boundaries, and manage competing expectations.

article thumbnail

Understanding DynamoDB Secondary Indexes

Rockset

Introduction Indexes are a crucial part of proper data modeling for all databases, and DynamoDB is no exception. DynamoDB's secondary indexes are a powerful tool for enabling new access patterns for your data. In this post, we'll look at DynamoDB secondary indexes. First, we'll start with some conceptual points about how to think about DynamoDB and the problems that secondary indexes solve.

article thumbnail

Beyond the Buzz: Braze Equips Modern Marketers with Powerful AI Tools

Snowflake

A lot of the buzz around AI focuses on its future potential. And we get it — we’re talking about a transformative technology that presents seemingly limitless possibilities. But an important aspect of this world-changing tech story that gets lost in the hype is understanding exactly what AI solutions are available for you and your team to employ right now, today.

article thumbnail

Entity Resolution Checklist: What to Consider When Evaluating Options

Are you trying to decide which entity resolution capabilities you need? It can be confusing to determine which features are most important for your project. And sometimes key features are overlooked. Get the Entity Resolution Evaluation Checklist to make sure you’ve thought of everything to make your project a success! The list was created by Senzing’s team of leading entity resolution experts, based on their real-world experience.

article thumbnail

7 Free Kaggle Micro-Courses for Data Science Beginners

KDnuggets

Interested in learning data science? Check out these free micro-courses from Kaggle to learn essential data science skills.

article thumbnail

Announcing the General Availability of Unity Catalog Volumes

databricks

Today, we are excited to announce that Unity Catalog Volumes is now generally available on AWS, Azure, and GCP. Unity Catalog provides a.

AWS 91
article thumbnail

Stream Processing with Python, Kafka & Faust

Towards Data Science

How to Stream and Apply Real-Time Prediction Models on High-Throughput Time-Series Data Photo by JJ Ying on Unsplash Most of the stream processing libraries are not python friendly while the majority of machine learning and data mining libraries are python based. Although the Faust library aims to bring Kafka Streaming ideas into the Python ecosystem, it may pose challenges in terms of ease of use.

Kafka 79
article thumbnail

Delivering Telecom Sustainability Targets Using Autonomous Networks

Snowflake

As the world grapples with the escalating climate crisis, many industries are re-examining their operations to identify and implement sustainable practices. The telecommunications industry is no exception. Telecom companies face growing pressure from consumers, investors and regulators to reduce their carbon footprint and achieve net-zero emissions.

article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

Free Mastery Course: Become a Large Language Model Expert

KDnuggets

It is a self-paced course that covers fundamental and advanced concepts of LLMs and teaches how to deploy them in production.

IT 122
article thumbnail

Improve workflows with ArcGIS Aviation Airports and ArcGIS Aviation Charting

ArcGIS

ArcGIS Aviation Airports and ArcGIS Aviation Charting are extensions to ArcGIS Pro that allow users to do their best aviation work with the power of the next generation of desktop software. The tools in these two extensions are enhanced and incorporated in ArcGIS Pro to support your airport, charting, data management, migration, and design needs.

article thumbnail

Unlocking AI Assisted Development Safely: From Idea to GA

Pinterest Engineering

Sam Wang | Sr. Technical Program Manager; Joe Gordon | Sr. Staff Software Engineer At Pinterest we are continuously looking for ways to improve our developer experience, and we have recently shipped AI-assisted development for everyone while balancing safety, security, and cost. In this blog post, we share our journey of unlocking AI-assisted development, from the initial idea to the General Availability (GA) stage.

Scala 78
article thumbnail

12 Golden Signals To Discover Anomalies And Performance Issues on Your AWS RDS Fleet

Zalando Engineering

TL;DR : Database per service pattern in the microservices world brings an overhead on operating database instances, observing its health status and anomalies. Standardisation on methodology and tooling is a key factor for the success at the scale. We have incorporated learning from past incidents, anomalies and empirical observations into a methodology of observing the health status using 12 golden signals.

AWS 77
article thumbnail

How to Build an Experimentation Culture for Data-Driven Product Development

Speaker: Margaret-Ann Seger, Head of Product, Statsig

Experimentation is often seen as an aspirational practice, especially at smaller, fast-moving companies who are strapped for time and resources. So, how can you get your team making decisions in a more data-driven way while continuing to remain lean and maintaining ship velocity? In this webinar, Margaret-Ann Seger, Head of Product at Statsig, will teach you how to build an experimentation culture from the ground-up, graduating from just getting started with data-driven development to operating