HBase Deprecation at Pinterest

Pinterest Engineering
Pinterest Engineering Blog
6 min readMay 13, 2024

--

Alberto Ordonez Pereira | Senior Staff Software Engineer; Lianghong Xu | Senior Manager, Engineering;

This blog marks the first of a three-part series describing our journey at Pinterest transition from managing multiple online storage services supported by HBase to a brand new serving architecture with a new datastore and a unified storage service.

In this introductory post, we will provide an overview of how HBase is used at Pinterest, why we decided to migrate away from it and the high-level execution path. The subsequent blog post will delve into how we looked into our specific needs, evaluated multiple candidates and decided on the adoption of a new database technology. Finally, the last entry in this series will describe how we modernized our serving layer by consolidating multiple independent storage services into a unified multi-model, multi-backend storage framework.

Overview of HBase at Pinterest

Introduced in 2013, HBase was Pinterest’s first NoSQL datastore. Along with the rising popularity of NoSQL, HBase quickly became one of the most widely used storage backends at Pinterest. Since then, it has served as a foundational infrastructure building block in our tech stack, powering a number of in-house and open-source systems including our graph service (Zen), wide column store (UMS), monitoring storage (OpenTSDB), metrics reporting (Pinalytics), transactional DB (Omid/Sparrow), indexed datastore (Ixia), etc. These systems together enabled numerous use cases that allowed Pinterest to significantly scale its business as we continued to grow our user base and evolve the products over the past 10 years. Examples include smartfeed, URL crawler, user messages, pinner notifications, ads indexing, shopping catalogs, Statsboard (monitoring), experiment metrics, and many more. Figure 1 shows the massive ecosystem at Pinterest built around HBase.

Figure 1. The HBase ecosystem at Pinterest. HBase serves as the storage backend of many services and powers a broad range of applications across the entire company.

Pinterest hosted one of the largest production deployments of HBase in the world. At its peak usage, we had around 50 clusters, 9000 AWS EC2 instances, and over 6 PBs of data. A typical production deployment consists of a primary cluster and a standby cluster, inter-replicated between each other using write-ahead-logs (WALs) for extra availability. Online requests are routed to the primary cluster, while offline workflows and resource-intensive cluster operations (e.g., daily backups) are executed on the standby cluster. Upon failure of the primary cluster, a cluster-level failover is performed to switch the primary and standby clusters.

Figure 2. A typical HBase deployment in production. Both the primary and secondary clusters are three-way replicated and they are kept in sync with WAL replication.

Why deprecate HBase

HBase had proven to be durable, scalable, and generally performant since its introduction at Pinterest. Nevertheless, after a thorough evaluation with extensive feedback gathering from relevant stakeholders, at the end of 2021 we decided to deprecate this technology due to the following reasons.

High maintenance cost

At the time of the evaluation, the maintenance cost of HBase had become prohibitively high, mainly because of years of tech debt and its reliability risks. Due to historical reasons, our HBase version was five years behind the upstream, missing critical bug fixes and improvements. Yet the HBase version upgrade is a slow and painful process due to a legacy build/deploy/provisioning pipeline and compatibility issues (the last upgrade from 0.94 to 1.2 took almost two years). Additionally, it was increasingly difficult to find HBase domain experts and the barriers to entry are very high for new engineers.

Missing functionalities

HBase was designed to provide a relatively simple NoSQL interface. While it satisfies many of our use cases, its limited functionalities made it challenging to satisfy evolving customer requirements on stronger consistency, distributed transactions, global secondary index, rich query capabilities, etc. As a concrete example, the lack of distributed transactions in HBase led to a number of bugs and incidents of Zen, our in-house graph service, because partially failed updates could leave a graph in an inconsistent state. Debugging such problems was usually difficult and time-consuming, causing frustration for service owners and their customers.

High system complexity

To provide these advanced features for customers, we built several new services on top of HBase over the past few years. For example, we built Ixia on top of HBase and Manas realtime to support global secondary indexing in HBase. We also built Sparrow on top of Apache Phoenix Omid to support distributed transactions on top of HBase. While we had no better alternatives to satisfy the business requirements back then, these systems incurred significant development costs and increased the maintenance load.

High infra cost

Production HBase clusters typically used a primary-standby setup with six data replicas for fast disaster recovery, which, however, came at an extremely high infra cost at our scale. Migrating HBase to other data stores with lower cost per unique data replica would present a huge opportunity of infra savings. For example,,with careful replication and placement mechanisms, TiDB, Rockstore, or MySQL may use three replicas without sacrificing much on availability SLA.

Waning industry usage and community support

For the past few years, we have seen a seemingly steady decline in HBase usage and community activity in the industry, as many peer companies were looking for better alternatives to replace HBase in their production environments. This in turn has led to a shrinking talent pool, higher barrier to entry, and lower incentive for new engineers to become a subject matter expert of HBase.

The Path to a Complete Deprecation

A complete deprecation of HBase at Pinterest had once been deemed an impossible mission given its deep root into our existing tech stack. However, we were not the only team at Pinterest that realized the various disadvantages of HBase in dealing with different types of workloads. For example, we found that HBase performed worse than state-of-the-art solutions for OLAP workloads. It was not able to keep up with the ever increasing time series data volume, which led to significant challenges in scalability, performance, and maintenance load. It was also not as performant or infra efficient as compared to KVStore, an in-house key-value store built on top of RocksDB and Rocksplicator. As a result, in the past few years, several initiatives were started to replace HBase with more suitable technologies for these use case scenarios. Specifically, online analytics workloads would be migrated to Druid/StarRocks, time series data to Goku, an in-house time-series datastore, and key value use cases to KVStore. Thanks to these recent efforts, we identified a viable path to a complete deprecation of HBase at Pinterest.

To accommodate the remaining HBase use cases, we needed a new technology that offers great scalability like a NoSQL database while supporting powerful query capabilities and ACID semantics like a traditional RDBMS. We ended up choosing TiDB, a distributed NewSQL database that satisfied most of our requirements.

Up Next

The next part of this blog series will cover how we conducted a comprehensive evaluation to finalize our decision on storage selection.

Acknowledgements

HBase deprecation, TiDB adoption and SDS productionization would not have been possible without the diligent and innovative work from the Storage and Caching team engineers including Alberto Ordonez Pereira, Ankita Girish Wagh, Gabriel Raphael Garcia Montoya, Ke Chen, Liqi Yi, Mark Liu, Sangeetha Pradeep and Vivian Huang. We would like to thank cross-team partners James Fraser, Aneesh Nelavelly, Pankaj Choudhary, Zhanyong Wan, Wenjie Zhang for their close collaboration and all our customer teams for their support on the migration. Special thanks to our leadership Bo Liu, Chunyan Wang and David Chaiken for their guidance and sponsorship on this initiative. Last but not least, thanks to PingCap for helping along the way introduce TiDB into the Pinterest tech stack from initial prototyping to productionization at scale.

To learn more about engineering at Pinterest, check out the rest of our Engineering Blog and visit our Pinterest Labs site. To explore and apply to open roles, visit our Careers page.

--

--