LoRA and QLoRA Explanation | Parameterized Efficient Finetuning of Large Language Models | PEFT

NexGenAI
NexGenAI
1.7 هزار بار بازدید - 10 ماه پیش - In this video, we will
In this video, we will explain LoRA and QLoRA, two cutting-edge techniques for parameter-efficient fine-tuning of large language models (LLMs). LoRA and QLoRA allow you to fine-tune LLMs on smaller GPUs and with less memory, while still maintaining or improving their performance on downstream tasks.

LoRA (Low-Rank Adaptation) works by injecting trainable low-rank matrices into each layer of the LLM. These matrices are much smaller than the full weight matrices of the LLM, which leads to significant memory and computational savings.

QLoRA (Quantized LoRA) combines LoRA with quantization, a technique that reduces the number of bits used to store the LLM's weights. QLoRA allows you to fine-tune LLMs on even smaller GPUs and with even less memory than LoRA.

00:00 - 00:28 Intro
00:29 - 02:50 Issue with Full Finetuning
02:51 - 05:05 Low Rank or Rank Deficient Matrix
05:06 -  07:12 Neural Networking
07:13 - 10:59 LoRA( Low Rank Adaption of Large Language Models)
11:00 - 11:59 Rank Factorization Methods
12:00 - 14:43 Code of Singular Value Decomposition
14:44 - 15:21 Summary of LoRA
15:22 - 22:15 Into to QLoRA
22:16 - 32:00 k bit Quantization
32:01 - 36:54 Normal Float 4 bit Conversion
36:55 - 43:14 Double Quantization and Paged Optimizers
Conclusion



#LoRA #QLoRA #PEFT #finetuning #largelanguagemodels
10 ماه پیش در تاریخ 1402/07/13 منتشر شده است.
1,774 بـار بازدید شده
... بیشتر