Remove author yang-wang
article thumbnail

PinCompute: A Kubernetes Backed General Purpose Compute Platform for Pinterest

Pinterest Engineering

Harry Zhang, Jiajun Wang, Yi Li, Shunyao Li, Ming Zong, Haniel Martino, Cathy Lu, Quentin Miao, Hao Jiang, James Wen, David Westbrook | Cloud Runtime Team Image Source: [link] Overview Modern compute platforms are foundational to accelerating innovation and running applications more efficiently.

article thumbnail

Reducing Apache Spark Application Dependencies Upload by 99%

LinkedIn Engineering

Co-authors: Shu Wang , Biao He , and Minchu Yang At LinkedIn, Apache Spark is our primary compute engine for offline data analytics such as data warehousing, data science, machine learning, A/B testing, and metrics reporting. These applications rely heavily on dependencies ( JAR files ) for their computation needs.

Hadoop 124
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Unified Streaming And Batch Pipelines At LinkedIn: Reducing Processing time by 94% with Apache Beam

LinkedIn Engineering

Co-Authors: Yuhong Cheng , Shangjin Zhang , Xinyu Liu, and Yi Pan Efficient data processing is crucial in reducing learning curves, simplifying maintenance efforts, and decreasing operational complexity. In this blog post, we will share our progress, challenges, and lessons learned from implementing Apache Beam.

Process 97
article thumbnail

The Airflow Smart Sensor Service

Airbnb Tech

Consolidating long-running, lightweight tasks for improved resource utilization By: Yingbo Wang , Kevin Yang Introduction Airflow is a platform to programmatically author, schedule, and monitor data pipelines. All company, product and service names used in this website are for identification purposes only.