How to run Mixtral LLM on your Laptop - January 26, 2024 - Exciting AI Updates

Lev Selector
Lev Selector
593 بار بازدید - 6 ماه پیش - How to run Mixtral LLM
How to run Mixtral LLM on your Laptop - January 26, 2024 - Exciting AI Updates
Presented by Denis Mazur  & Artyom Eliseev
Slides - https://github.com/lselector/seminar/...

Run Mixtral on Nvidia 3060 with 12GB !
- https://arxiv.org/abs/2312.17238 - paper
- https://github.com/dvmazur/mixtral-of...
- Twitter: 1741103866047869222

Very elegant work.
Original Mixtral requires more than 90 GB of memory.
Almost 97% of this size is taken by Feed-Forward Networks in transformer layers. Authors used multiple ways to decrease the memory requirements while keeping the accuracy of the model.

Authors tested multiple quantization methods and selected a flexible quantization scheme where different parts of the network quantized differently.

To achieve further decrease in GPU memory requirements, authors have implemented the dynamic loading/offloading of experts networks in transformer layers. They used "speculative" loading - trying to predict and load only parts of Feed-Forward Network experts networks (as needed).

As a result they have demonstrated that you can run Mixtral on a modest laptop with Nvidia 3060 with only 12GB with decent (practical) performance.

Denis Mazur
- https://github.com/dvmazur
- https://huggingface.co/dvmazur

Artyom Eliseev
- https://github.com/lavawolfiee
- https://huggingface.co/lavawolfiee

My websites:
- Enterprise AI Solutions - https://EAIS.ai
- Linkedin - LinkedIn: levselector
- GitHub - https://github.com/lselector
6 ماه پیش در تاریخ 1402/11/06 منتشر شده است.
593 بـار بازدید شده
... بیشتر