LLMs: A Journey Through Time and Architecture
1.7 هزار بار بازدید -
2 روز پیش
-
REFERENCES:
- Step-by-step guide converting
REFERENCES:
- Step-by-step guide converting GPT to Llama: https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/07_gpt_to_llama/converting-gpt-to-llama2.ipynb
- Build a Large Language Model (From Scratch): http://mng.bz/M96o
- LitGPT: https://github.com/Lightning-AI/litgpt
- The Llama 3 Herd of Models (31 July 2024), https://arxiv.org/abs/2407.21783
- Qwen2 Technical Report (15 July 2024), https://arxiv.org/abs/2407.10671
- Apple Intelligence Foundation Language Models (29 July 2024), https://arxiv.org/abs/2407.21075
- Gemma 2: Improving Open Language Models at a Practical Size (31 July 2024), https://arxiv.org/abs/2408.0011
DESCRIPTION:
In this video, you'll learn about the architectural difference between the original GPT model and the various Llama models. Moreover, you'll also learn about new pre-training recipes used for Qwen 2, Gemma 2, Apple's Foundation Models, and Llama 3, as well as some of the efficiency tweaks introduced by Mixtral, Llama 3, and Gemma 2.
---
To support this channel, please consider purchasing a copy of my books: https://sebastianraschka.com/books/
---
https://twitter.com/rasbt
https://linkedin.com/in/sebastianraschka/
https://magazine.sebastianraschka.com/
---
OUTLINE:
https://www.seevid.ir/fa/w/itIab9ZTAqk Introduction
https://www.seevid.ir/fa/w/itIab9ZTAqk Pre-training in 2024
https://www.seevid.ir/fa/w/itIab9ZTAqk GPT-architecture vs Llama
https://www.seevid.ir/fa/w/itIab9ZTAqk GPT and other architectures
https://www.seevid.ir/fa/w/itIab9ZTAqk Takeaways
2 روز پیش
در تاریخ 1403/07/03 منتشر شده
است.
1,709
بـار بازدید شده