LLMs: A Journey Through Time and Architecture

Sebastian Raschka
Sebastian Raschka
1.7 هزار بار بازدید - 2 روز پیش - REFERENCES: - Step-by-step guide converting
REFERENCES: - Step-by-step guide converting GPT to Llama: https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/07_gpt_to_llama/converting-gpt-to-llama2.ipynb - Build a Large Language Model (From Scratch): http://mng.bz/M96o - LitGPT: https://github.com/Lightning-AI/litgpt - The Llama 3 Herd of Models (31 July 2024), https://arxiv.org/abs/2407.21783 - Qwen2 Technical Report (15 July 2024), https://arxiv.org/abs/2407.10671 - Apple Intelligence Foundation Language Models (29 July 2024), https://arxiv.org/abs/2407.21075 - Gemma 2: Improving Open Language Models at a Practical Size (31 July 2024), https://arxiv.org/abs/2408.0011 DESCRIPTION: In this video, you'll learn about the architectural difference between the original GPT model and the various Llama models. Moreover, you'll also learn about new pre-training recipes used for Qwen 2, Gemma 2, Apple's Foundation Models, and Llama 3, as well as some of the efficiency tweaks introduced by Mixtral, Llama 3, and Gemma 2. --- To support this channel, please consider purchasing a copy of my books: https://sebastianraschka.com/books/ --- https://twitter.com/rasbt https://linkedin.com/in/sebastianraschka/ https://magazine.sebastianraschka.com/ --- OUTLINE: https://www.seevid.ir/fa/w/itIab9ZTAqk Introduction https://www.seevid.ir/fa/w/itIab9ZTAqk Pre-training in 2024 https://www.seevid.ir/fa/w/itIab9ZTAqk GPT-architecture vs Llama https://www.seevid.ir/fa/w/itIab9ZTAqk GPT and other architectures https://www.seevid.ir/fa/w/itIab9ZTAqk Takeaways
2 روز پیش در تاریخ 1403/07/03 منتشر شده است.
1,709 بـار بازدید شده
... بیشتر