Remove how-we-remove-partial-duplicates
article thumbnail

Migrating From Elasticsearch 7.17 to Elasticsearch 8.x: Pitfalls and Learnings

Zalando Engineering

What this article is about What kind of changes we had to make to the codebase How we did the actual upgrade What challenges we faced How we did the data transfer How the data was kept in sync What this article is not A step-by-step guide on how to upgrade Elasticsearch (read on to find out why).

Scala 86
article thumbnail

Rockset Converged Index Adds Clustered Search Index for 70% Query Latency Reduction

Rockset

In this blog, we will describe a new storage format that we adopted for our search index, one of the indexes in Rockset’s Converged Index. As described in our Converged Index blog, we store every column of every document in a row-based store, column-based store, and a search index.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Top Data Cleaning Techniques & Best Practices for 2024

Knowledge Hut

So, welcome to our guide where we'll talk about the latest and greatest data cleaning techniques for the future. It doesn't matter if you're a data expert or just starting out; knowing how to clean your data is a must-have skill. Here's a simplified guide on how to clean your data, step by step.

article thumbnail

Gotchas of Streaming Pipelines: Profiling & Performance Improvements

Lyft Engineering

Discover how Lyft identified and fixed performance issues in our streaming pipelines. When reviewing a pipeline’s performance, we ask the following questions: “Is there a bottleneck?”, “Is the pipeline performing optimally?”, “Will it continue to scale with increased load?” Background Every streaming pipeline is unique.

Utilities 123
article thumbnail

How We Optimized Rockset's Hot Storage Tier to Improve Efficiency By More Than 200%

Rockset

This blog describes how we optimized Rockset’s hot storage tier to improve efficiency by more than 200%. We delve into how we architect for efficiency by leveraging new hardware, maximizing the use of available storage, implementing better orchestration techniques and using snapshots for data durability.

article thumbnail

Mutable Data in Rockset

Rockset

It supports frequent updates and deletes on document level, and is also very efficient at performing partial updates, when only a few attributes (even those deeply nested ones) in your documents have changed. You can read more about mutability in real-time analytics and how Rockset solves this here.

SQL 59
article thumbnail

Five Strategies to Accelerate Data Product Development

Cloudera

With this first article of the two-part series on data product strategies, I am presenting some of the emerging themes in data product development and how they inform the prerequisites and foundational capabilities of an Enterprise data platform that would serve as the backbone for developing successful data product strategies.