OSDI '23 - BWoS: Formally Verified Block-based Work Stealing for Parallel Processing

USENIX
USENIX
317 بار بازدید - 10 ماه پیش - OSDI '23 - BWoS: Formally
OSDI '23 - BWoS: Formally Verified Block-based Work Stealing for Parallel Processing

Jiawei Wang, Huawei Dresden Research Center, Huawei Central Software Institute, Technische Universität Dresden, Bohdan Trach, Huawei Dresden Research Center, Huawei Central Software Institute, Ming Fu, Huawei Dresden Research Center, Huawei Central Software Institute, Diogo Behrens, Huawei Dresden Research Center, Huawei Central Software Institute, Jonathan Schwender, Huawei Dresden Research Center, Huawei Central Software Institute, Yutao Liu, Huawei Dresden Research Center, Huawei Central Software Institute, Jitang Lei, Huawei Dresden Research Center, Huawei Central Software Institute, Viktor Vafeiadis, MPI-SWS, Hermann Härtig, Technische Universität Dresden, Haibo Chen, Huawei Central Software Institute, Shanghai Jiao Tong University

Work stealing is a widely-used scheduling technique for parallel processing on multicore. Each core owns a queue of tasks and avoids idling by stealing tasks from other queues. Prior work mostly focuses on balancing workload among cores, disregarding whether stealing may adversely impact the owner's performance or hinder synchronization optimizations. Real-world industrial runtimes for parallel processing heavily rely on work-stealing queues for scalability, and such queues can become bottlenecks to their performance.

We present Block-based Work Stealing (BWoS), a novel and pragmatic design that splits per-core queues into multiple blocks. Thieves and owners rarely operate on the same blocks, greatly removing interferences and enabling aggressive optimizations on the owner's synchronization with thieves. Furthermore, BWoS enables a novel probabilistic stealing policy that guarantees thieves steal from longer queues with higher probability. In our evaluation, using BWoS improves performance by up to 1.25x in the Renaissance macrobenchmark when applied to Java GC, provides average 1.26x speedup in JSON processing when applied to Go runtime, and improves maximum throughput of Hyper HTTP server by 1.12x when applied to Rust Tokio runtime. In microbenchmarks, it provides 8-11x better performance than state-of-the-art designs. We have formally verified and optimized BWoS on weak memory models with a model-checking-based framework.

View the full OSDI '23 program at https://www.usenix.org/conference/osd...
10 ماه پیش در تاریخ 1402/07/13 منتشر شده است.
317 بـار بازدید شده
... بیشتر