NSDI '24 - Horus: Granular In-Network Task Scheduler for Cloud Datacenters

USENIX
USENIX
768 بار بازدید - 4 ماه پیش - NSDI '24 - Horus: Granular
NSDI '24 - Horus: Granular In-Network Task Scheduler for Cloud Datacenters Parham Yassini, Simon Fraser University; Khaled Diab, Hewlett Packard Labs; Saeed Zangeneh and Mohamed Hefeeda, Simon Fraser University Short-lived tasks are prevalent in modern interactive datacenter applications. However, designing schedulers to assign these tasks to workers distributed across the whole datacenter is challenging, because such schedulers need to make decisions at a microsecond scale, achieve high throughput, and minimize the tail response time. Current task schedulers in the literature are limited to individual racks. We present Horus, a new in-network task scheduler for short tasks that operates at the datacenter scale. Horus efficiently tracks and distributes the worker state among switches, which enables it to schedule tasks in parallel at line rate while optimizing the scheduling quality. We propose a new distributed task scheduling policy that minimizes the state and communication overheads, handles dynamic loads, and does not buffer tasks in switches. We compare Horus against the state-of-the-art in-network scheduler in a testbed with programmable switches as well as using simulations of datacenters with more than 27K hosts and thousands of switches handling diverse and dynamic workloads. Our results show that Horus efficiently scales to large datacenters, and it substantially outperforms the state-of-the-art across all performance metrics, including tail response time and throughput. View the full NSDI '24 program at https://www.usenix.org/conference/nsdi24/technical-sessions
4 ماه پیش در تاریخ 1403/03/15 منتشر شده است.
768 بـار بازدید شده
... بیشتر