NSDI '24 - Fast Vector Query Processing for Large Datasets Beyond GPU Memory with Reordered...

USENIX
USENIX
517 بار بازدید - 4 ماه پیش - NSDI '24 - Fast Vector
NSDI '24 - Fast Vector Query Processing for Large Datasets Beyond GPU Memory with Reordered Pipelining Zili Zhang, Fangyue Liu, Gang Huang, Xuanzhe Liu, and Xin Jin, School of Computer Science, Peking University Vector query processing powers a wide range of AI applications. While GPUs are optimized for massive vector operations, today's practice relies on CPUs to process queries for large vector datasets, due to limited GPU memory. We present RUMMY, the first GPU-accelerated vector query processing system that achieves high performance and supports large vector datasets beyond GPU memory. The core of RUMMY is a novel reordered pipelining technique that exploits the characteristics of vector query processing to efficiently pipeline data transmission from host memory to GPU memory and query processing in GPU. Specifically, it leverages three ideas: (i) cluster-based retrofitting to eliminate redundant data transmission across queries in a batch, (ii) dynamic kernel padding with cluster balancing to maximize spatial and temporal GPU utilization for GPU computation, and (iii) query-aware reordering and grouping to optimally overlap transmission and computation. We also tailor GPU memory management for vector queries to reduce GPU memory fragmentation and cache misses. We evaluate RUMMY with a variety of billion-scale benchmarking datasets. The experimental results show that RUMMY outperforms IVF-GPU with CUDA unified memory by up to 135×. Compared to the CPU-based solution (with 64 vCPUs), RUMMY (with one NVIDIA A100 GPU) achieves up to 23.1× better performance and is up to 37.7× more cost-effective. View the full NSDI '24 program at https://www.usenix.org/conference/nsdi24/technical-sessions
4 ماه پیش در تاریخ 1403/03/15 منتشر شده است.
517 بـار بازدید شده
... بیشتر