QLoRA: Efficient Finetuning of Large Language Models on a Single GPU? LoRA & QLoRA paper review

Venelin Valkov
Venelin Valkov
9.6 هزار بار بازدید - پارسال - In this video, we'll look
In this video, we'll look at QLoRA, an efficient finetuning approach that significantly reduces the GPU memory usage of large language models. With QLoRA, you can now finetune a 65B parameter model on just a single 48GB GPU, while maintaining full 16-bit finetuning task performance. We'll dive into the technical details of QLoRA, which involves backpropagating gradients through a frozen, 4-bit quantized pretrained language model into Low Rank Adapters (LoRA).

Prompt Engineering Tutorial: https://www.mlexpert.io/prompt-engine...
Prompt Engineering GitHub Repository: https://github.com/curiousily/Get-Thi...

Discord: Discord: discord
Prepare for the Machine Learning interview: https://mlexpert.io
Subscribe: http://bit.ly/venelin-subscribe

QLoRA paper: https://arxiv.org/abs/2305.14314
QLoRA GitHub: https://github.com/artidoro/qlora
LoRA paper: https://arxiv.org/abs/2106.09685
HuggingFace blog post: https://huggingface.co/blog/4bit-tran...
Guanaco Playground: https://huggingface.co/spaces/uwnlp/g...

00:00 - Introduction
00:14 - LoRA
02:20 - QLoRA
10:58 - Guanaco Google Colab Notebook
11:56 - Conclusion

#chatgpt #gpt4 #llms #artificialintelligence #promptengineering #transformers #python #pytorch
پارسال در تاریخ 1402/03/10 منتشر شده است.
9,668 بـار بازدید شده
... بیشتر