Mistral-7B

Data Science Gems
Data Science Gems
673 بار بازدید - 7 ماه پیش - Mistral 7B is a 7–billion-parameter
Mistral 7B is a 7–billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms the best open 13B model (Llama 2) across all evaluated benchmarks, and the best released 34B model (Llama 1) in reasoning, mathematics, and code generation. Mistral 7B leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. Mistral 7B – Instruct is a model fine-tuned to follow instructions. It surpasses Llama 2 13B – chat model both on human and automated benchmarks. These models are released under the Apache 2.0 license.

In this video, I will talk about the following: What is the architecture for Mistral-7B? How does Mistral-7B perform?

For more details, please look at https://arxiv.org/abs/2310.06825 and https://mistral.ai/news/announcing-mi... and https://github.com/mistralai/mistral-src

Jiang, Albert Q., Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand et al. "Mistral 7B." arXiv preprint arXiv:2310.06825 (2023).
7 ماه پیش در تاریخ 1402/10/28 منتشر شده است.
673 بـار بازدید شده
... بیشتر