Fast Inference of Mixture-of-Experts Language Models with Offloading
309 بار بازدید -
7 ماه پیش
-
The paper explores strategies for
The paper explores strategies for running large sparse Mixture-of-Experts (MoE) language models on consumer hardware with limited accelerator memory, proposing a novel offloading strategy that allows for efficient execution on desktop hardware and free-tier Google Colab instances.
https://arxiv.org/abs//2312.17238
YouTube: @arxivpapers
TikTok: TikTok: arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast...
Spotify: https://podcasters.spotify.com/pod/sh...
https://arxiv.org/abs//2312.17238
YouTube: @arxivpapers
TikTok: TikTok: arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast...
Spotify: https://podcasters.spotify.com/pod/sh...
7 ماه پیش
در تاریخ 1402/10/09 منتشر شده
است.
309
بـار بازدید شده