Understanding STaR and how it powers Claude and Gemini/Gemma 2 (and maybe OpenAI Q* or Strawberry)

Chris Hay
Chris Hay
7.8 هزار بار بازدید - ماه قبل - Understanding STaR and how it
Understanding STaR and how it powers Claude and Gemini/Gemma 2B (and maybe Q* or Strawberry).   STaR is short for Self-Taught Reasoning and is rumored to power OpenAI's Q* (now Strawberry), but definitely powers Claude 3.5 sonnet and Gemma / Gemini models.  In this video Chris breaks down how Self Taught reasoning works and how it is used in the fine tuned phases of a model to improve training.   Chris also shows how you can use NVidia Nemotrons reward model to judge the outputs for STaR.  If you want to understand how to use the same techniques that frontier AI models such as Anthropic Claude and Google Gemini / Gemma use to improve their fine tuning, then check out this video
ماه قبل در تاریخ 1403/04/25 منتشر شده است.
7,874 بـار بازدید شده
... بیشتر