Q: How to create an Instruction Dataset for Fine-tuning my LLM?

code_your_own_AI
code_your_own_AI
15.8 هزار بار بازدید - 11 ماه پیش - Welcome Beginners! Today we answer
Welcome Beginners! Today we answer multiple questions from viewers about when, how and why to fine-tune an LLM, and what if you have only pure text files, and no at all any instruction data sets? Here is the solution to all problems. ... at least concerning artificial intelligence .... including this Q: "I do not know how to code. Which AutoML do you recommend?"

Your Questions:
----------------------
When Do I fine-tune my LLM?
Can I fine-tune my LLM when I only have pure text files?
How can I create an instruction dataset for fine-tuning my LLM?
Do you recommend synthetic "Explanation fine-tuning" datasets, like Open ORCA? How to create ORCA?
Can I fine-tune for a general LLM domain knowledge first and then fine-tune with an instruction based dataset?
Is multi-task fine-tuning recommended?
Can I ask Bard or GPT-4 for fine-tuning advice? Which one is better?
I am a NO-coder! What autoML to choose for my next LLM fine-tuning?

BTW (for your further deep-dive into instruction tuning of LLMs):
--------
Just discovered, there is a recent summary of instruction based fine-tuning approaches:
"Instruction Tuning for Large Language Models: A Survey" (Aug 21, 2023)
by Zhejiang University, Shannon.AI, Nanyang Technological University and Amazon
https://arxiv.org/pdf/2308.10792.pdf
---------------------------------------------------

Instruction tuning involves training LLMs on a dataset of (INSTRUCTION, OUTPUT) pairs, bridging the gap between
A. the next-word prediction objective of LLMs and
B. the users' objective of having LLMs adhere to human instructions (eg including RLHF).

Learn about the general methodology of Instruction tuning, the creation (manual or synthetic) of Instruction tuning datasets, the training of Instruction tuning models, and applications across different modalities and domains.




#ai
#datasets
#bard
#gpt4
#automl
11 ماه پیش در تاریخ 1402/06/12 منتشر شده است.
15,800 بـار بازدید شده
... بیشتر