GGUF quantization of LLMs with llama cpp

AI Bites
AI Bites
2.9 هزار بار بازدید - 6 ماه پیش - Would you like to run
Would you like to run LLMs on your laptop and tiny devices like mobile phones and watches? If so, you will need to quantize LLMs. LLAMA.cpp is an open-source library written in C and C++. It allows us to quantize a given model and run LLMs without GPUs. In this video, I demonstrate how we can quantize a fine-tuned LLM on a Macbook and run it on the same Macbook for inference. I quantize the fine-tuned Gemma 2 Billion parameter model that we fine-tuned in my previous tutorial but you can use the same steps for quantizing any other fine-tuned LLMs of your choice. MY KEY LINKS YouTube: youtube.com/@AIBites Twitter: twitter.com/ai_bites​ Patreon: www.patreon.com/ai_bites​ Github: github.com/ai-bites​ WHO AM I? I am a Machine Learning researcher/practitioner who has seen the grind of academia and start-ups. I started my career as a software engineer 15 years ago. Because of my love for Mathematics (coupled with a glimmer of luck), I graduated with a Master's in Computer Vision and Robotics in 2016 when the now happening AI revolution started. Life has changed for the better ever since. #machinelearning #deeplearning #aibites
6 ماه پیش در تاریخ 1403/01/03 منتشر شده است.
2,984 بـار بازدید شده
... بیشتر