QLoRA: Efficient Finetuning of Quantized Large Language Models (Tim Dettmers)

MLOps Learners
MLOps Learners
3.7 هزار بار بازدید - پارسال - Recent open-source large language models
Recent open-source large language models (LLMs) like LLaMA and Falcon are both high-quality and provide strong performance for their memory footprint. However, finetuning these LLMs is still challenging on consumer and mobile devices with a 32B LLaMA model requiring 384 GB of GPU memory for finetuning. In this talk, I introduce QLoRA, a technique that reduces the finetuning requirement of LLMs by roughly 17 times, making a 32B LLM finetunable on 24 GB consumer GPUs and 7B language models finetunable on mobile devices. The talk provides a self-contained introduction on quantization and discusses the critical factors which allow QLoRA to use 4-bit for LLM finetuning while still replicating full 16-bit finetuning performance. I also discuss the evaluation of LLMs and how we used insights from our LLM evaluation study to build one the most powerful open-source chatbots, Guanaco.

Speakers Bios (Tim Dettmers):
Tim is PhD student at the University of Washington advised by Luke Zettlemoyer, working on efficient deep learning to make training, fine-tuning, and inference of deep learning models more accessible in particular to those with the least resources. Tim is the maintainer of the bitsandbytes, a widely used machine learning library for 4-bit and 8-bit quantization with 200k pip installations per month. He has a background in applied math and industry automation.

Twitter: Tim_Dettmers

***
Hosted by Denys Linkov and MLOps Discord community: Discord: discord
پارسال در تاریخ 1402/04/15 منتشر شده است.
3,787 بـار بازدید شده
... بیشتر