Fast Inference of Mixture-of-Experts Language Models with Offloading

Arxiv Papers
Arxiv Papers
309 بار بازدید - 7 ماه پیش - The paper explores strategies for
The paper explores strategies for running large sparse Mixture-of-Experts (MoE) language models on consumer hardware with limited accelerator memory, proposing a novel offloading strategy that allows for efficient execution on desktop hardware and free-tier Google Colab instances.

https://arxiv.org/abs//2312.17238

YouTube: @arxivpapers

TikTok: TikTok: arxiv_papers

Apple Podcasts: https://podcasts.apple.com/us/podcast...

Spotify: https://podcasters.spotify.com/pod/sh...
7 ماه پیش در تاریخ 1402/10/09 منتشر شده است.
309 بـار بازدید شده
... بیشتر