February, 2024

article thumbnail

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Multiple open source projects and vendors have been working together to make this vision a reality. In this episode Dain Sundstrom, CTO of Starburst, explains how the combination of the Trino query engine and the Iceberg table format offer the ease of use and execution speed of data warehouses with the infinite storage and sc

Data Lake 262
article thumbnail

Kafka to MongoDB: Building a Streamlined Data Pipeline

Analytics Vidhya

Introduction Data is fuel for the IT industry and the Data Science Project in today’s online world. IT industries rely heavily on real-time insights derived from streaming data sources. Handling and processing the streaming data is the hardest work for Data Analysis. We know that streaming data is data that is emitted at high volume […] The post Kafka to MongoDB: Building a Streamlined Data Pipeline appeared first on Analytics Vidhya.

MongoDB 217
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Happy Leap Day!

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover one out of three topics from today’s subscriber-only The Pulse issue. Subscribe to get issues like this in your inbox, every week.

article thumbnail

Data Warehousing Essentials: A Guide To Data Warehousing

Seattle Data Guy

Photo by Tiger Lily Data warehouses and data lakes play a crucial role for many businesses. It gives businesses access to the data from all of their various systems. As well as often integrating data so that end-users can answer business critical questions. But if we take a step back and only focus on the… Read more The post Data Warehousing Essentials: A Guide To Data Warehousing appeared first on Seattle Data Guy.

Data Lake 162
article thumbnail

Get Better Network Graphs & Save Analysts Time

Many organizations today are unlocking the power of their data by using graph databases to feed downstream analytics, enahance visualizations, and more. Yet, when different graph nodes represent the same entity, graphs get messy. Watch this essential video with Senzing CEO Jeff Jonas on how adding entity resolution to a graph database condenses network graphs to improve analytics and save your analysts time.

article thumbnail

Anatomy of a Structured Streaming job

Waitingforcode

Apache Spark Structured Streaming relies on the micro-batch pattern which evaluates the same query in each execution. That's only a high level vision, though. Under-the-hood, there are many other interesting things that happen.

130
130
article thumbnail

Data News — Week 24.08

Christophe Blefari

My ideas these days ( credits ) Hey, fresh Data News edition. This week I've participated to a round table about data and did a cool presentation about Engines. The idea was to depict the history of engines over the last 40 years and what leads to polars and DuckDB. Obviously the I forgot a few things and I'll do a more complete v2 soon. This is my third presentation about DuckDB in the last 3 months and I think I'll slow down a bit until I get new crazy things to share.

Data Lake 130

More Trending

article thumbnail

Data Engineering Best Practices - #2. Metadata & Logging

Start Data Engineering

1. Introduction 2. Setup & Logging architecture 3. Data Pipeline Logging Best Practices 3.1. Metadata: Information about pipeline runs, & data flowing through your pipeline 3.2. Obtain visibility into the code’s execution sequence using text logs 3.3. Understand resource usage by tracking Metrics 3.4. Monitoring UI & Traceability 3.5.

Metadata 130
article thumbnail

Introducing Apache Kafka 3.7

Confluent

Apache Kafka 3.7 introduces updates to the Consumer rebalance protocol, an official Apache Kafka Docker image, JBOD support in Kraft-based clusters, and more!

Kafka 140
article thumbnail

Alternatives to SSIS(SQL Server Integration Services) – How To Migrate Away From SSIS

Seattle Data Guy

SQL Server Integration Services (SSIS) comes with a lot of functionality useful for extracting, transforming, and loading data. It can also play important roles in application development and other projects. But SSIS is far from the only platform that can provide these services. You might seek alternatives to SSIS because you want a more agile… Read more The post Alternatives to SSIS(SQL Server Integration Services) – How To Migrate Away From SSIS appeared first on Seattle Data Guy.

SQL 130
article thumbnail

Introducing DoorDash’s In-House Search Engine

DoorDash Engineering

We reviewed the architecture of our global search at DoorDash in early 2022 and concluded that our rapid growth meant within three years we wouldn’t be able to scale the system efficiently, particularly as global search shifted from store-only to a hybrid item-and-store search experience. Our analysis identified Elasticsearch as our architecture’s primary bottleneck.

article thumbnail

Understanding User Needs and Satisfying Them

Speaker: Scott Sehlhorst

We know we want to create products which our customers find to be valuable. Whether we label it as customer-centric or product-led depends on how long we've been doing product management. There are three challenges we face when doing this. The obvious challenge is figuring out what our users need; the non-obvious challenges are in creating a shared understanding of those needs and in sensing if what we're doing is meeting those needs.

article thumbnail

Top 5 AI Coding Assistants You Must Try

KDnuggets

Discover the top AI coding assistants that can 10X your productivity overnight - #5 has the best autocomplete feature, and #1 is the most advanced code assistant tool ever seen!

Coding 128
article thumbnail

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Data Engineering Podcast

Summary Building a database engine requires a substantial amount of engineering effort and time investment. Over the decades of research and development into building these software systems there are a number of common components that are shared across implementations. When Paul Dix decided to re-write the InfluxDB engine he found the Apache Arrow ecosystem ready and waiting with useful building blocks to accelerate the process.

Database 162
article thumbnail

ArcGIS Pro 3.3 Moves to.NET 8

ArcGIS

ArcGIS Pro 3.3 is planned to be available in May 2024. Install.NET 8 before attempting to install ArcGIS Pro 3.3 for the best user experience!

143
143
article thumbnail

Announcing Public Preview of Delta Sharing with Cloudflare R2 Integration

databricks

Special thanks to Phillip Jones, Senior Product Manager, and Harshal Brahmbhatt, Systems Engineer from Cloudflare for their contributions to this blog. Organizations across.

article thumbnail

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Speaker: Timothy Chan, PhD., Head of Data Science

Are you ready to move beyond the basics and take a deep dive into the cutting-edge techniques that are reshaping the landscape of experimentation? 🌐 From Sequential Testing to Multi-Armed Bandits, Switchback Experiments to Stratified Sampling, Timothy Chan, Data Science Lead, is here to unravel the mysteries of these powerful methodologies that are revolutionizing how we approach testing.

article thumbnail

New with Confluent Platform: Seamless Migration Off ZooKeeper, Arm64 Support, and More

Confluent

Confluent Platform 7.6 brings upgrading for existing clusters from ZooKeeper to KRaft, compaction support for Tiered Storage, OAuth (early access), improvements to the Oracle CDC premium connector, and more.

article thumbnail

Robinhood Money Drills Kicks Off 2024 With Three New Universities

Robinhood

Florida State University, Coastal Carolina University, and the University of California, Berkeley will introduce financial education coursework with support from Robinhood Money Drills Robinhood Markets, Inc. is launching Robinhood Money Drills with three new universities, including Florida State University, Coastal Carolina University, and the University of California, Berkeley.

Education 115
article thumbnail

Collection of Free Courses to Learn Data Science, Data Engineering, Machine Learning, MLOps, and LLMOps

KDnuggets

Begin your data professional journey from the basics of statistics to building a production-grade AI application.

article thumbnail

Data Sharing Across Business And Platform Boundaries

Data Engineering Podcast

Summary Sharing data is a simple concept, but complicated to implement well. There are numerous business rules and regulatory concerns that need to be applied. There are also numerous technical considerations to be made, particularly if the producer and consumer of the data aren't using the same platforms. In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building

Data Lake 147
article thumbnail

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Speaker: Aarushi Kansal, AI Leader & Author and Tony Karrer, Founder & CTO at Aggregage

Software leaders who are building applications based on Large Language Models (LLMs) often find it a challenge to achieve reliability. It’s no surprise given the non-deterministic nature of LLMs. To effectively create reliable LLM-based (often with RAG) applications, extensive testing and evaluation processes are crucial. This often ends up involving meticulous adjustments to prompts.

article thumbnail

Access Over 181,000 USGS Historical Topographic Maps

ArcGIS

We recently updated our online USGS historical topographic map collection with over 1,745 new maps for a new total of over 181,000 maps.

article thumbnail

OLMo is Here, Powered by Mosaic AI + Databricks

databricks

As Chief Scientist (Neural Networks) at Databricks, I lead our research team toward the goal of giving everyone the ability to build and.

Building 133
article thumbnail

IoT Data Streaming for Building Private Wireless Networks

Confluent

Confluent enables real-time, reliable, scalable, and secure communication between IoT devices, applications, and backend systems. Streamline data processing and unlock analytics to boost productivity and time to market while lowering infrastructure costs.

Building 116
article thumbnail

Robinhood Wallet and Arbitrum Team Up to Expand Access to Layer 2s

Robinhood

As part of the collaboration, Robinhood Wallet announces access to swaps on the Arbitrum network Today at ETHDenver, Robinhood and Arbitrum announced a collaboration that simplifies the path to Layer 2s (L2s) by giving Robinhood Wallet users access to Arbitrum swaps through decentralized exchanges. By opening access to Arbitrum’s advanced scaling solutions, Robinhood Wallet users can now take advantage of low transaction costs and fast transaction speeds on one of the most popular networks in t

article thumbnail

The Big Payoff of Application Analytics

Outdated or absent analytics won’t cut it in today’s data-driven applications – not for your end users, your development team, or your business. That’s what drove the five companies in this e-book to change their approach to analytics. Download this e-book to learn about the unique problems each company faced and how they achieved huge returns beyond expectation by embedding analytics into applications.

article thumbnail

Semantic Layers are the Missing Piece for AI-Enabled Analytics

KDnuggets

Integrating a semantic layer with Language Learning Models (LLMs) presents a clean solution to this, particularly in the realm of AI chatbots. This combination empowers businesses to generate fast responses and reports based on their data. Leveraging AI and semantic layers is advancing business intelligence, making it easier than ever for people to interact with data.

article thumbnail

DotSlash: Simplified executable deployment

Engineering at Meta

We’ve open sourced DotSlash , a tool that makes large executables available in source control with a negligible impact on repository size, thus avoiding I/O-heavy clone operations. With DotSlash, a set of platform-specific executables is replaced with a single script containing descriptors for the supported platforms. DotSlash handles transparently fetching, decompressing, and verifying the appropriate remote artifact for the current operating system and CPU.

Metadata 115
article thumbnail

Top digital trends for 2024: Predictions and insights

InData Labs

Top digital trends for 2024 will be unprecedented technological advancements that will reshape the way businesses operate. Introducing them into corporate structures is a strategic move for all companies that want to stay ahead of the curve. The tech and digital marketing industry trends we discuss below will change the way organizations handle customer service, Запись Top digital trends for 2024: Predictions and insights впервые появилась InData Labs.

article thumbnail

Performance Improvements for Stateful Pipelines in Apache Spark Structured Streaming

databricks

Introduction Apache Spark™ Structured Streaming is a popular open-source stream processing platform that provides scalability and fault tolerance, built on top of the S.

Process 106
article thumbnail

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Speaker: Anne Steiner and David Laribee

As a concept, Developer Experience (DX) has gained significant attention in the tech industry. It emphasizes engineers’ efficiency and satisfaction during the product development process. As product managers, we need to understand how a good DX can contribute not only to the well-being of our development teams but also to the broader objectives of product success and customer satisfaction.

article thumbnail

Welcome Noteable: Making Data Streaming Easier and More Approachable

Confluent

Confluent has hired many Noteable employees to help make application development easier for both Kafka and Flink developers.

Kafka 125
article thumbnail

Top 10 Data Science Websites to learn More

Knowledge Hut

Being a data scientist means constantly growing, enabling businesses to become more data-propelled, and learning newer trends and tools. There are various excellent resources in data science that can help you to develop your skillset. According to International Data Corporation (IDC), organizations are turning towards digitalization completely. This will help to create more investments, technology development and open various new jobs.

article thumbnail

26 Data Science Interview Questions You Should Know

KDnuggets

Learn about the most common questions asked during data science interviews. This blog covers non-technical, Python, SQL, statistics, data analysis, and machine learning questions.

article thumbnail

Location Referencing Guide to Esri Partner Conference and Esri Developer Summit

ArcGIS

Join us for an exciting Partner Conference and Developer Summit! Discover the latest in ArcGIS Location Referencing and connect with experts.

article thumbnail

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Speaker: David Bard, Principal at VP Product Coaching

In the fast-paced world of digital innovation, success is often accompanied by a multitude of challenges - like the pitfalls lurking at every turn, threatening to derail the most promising projects. But fret not, this webinar is your key to effective product development! Join us for an enlightening session to empower you to lead your team to greater heights.