GPU optimization workshop with OpenAI, NVIDIA, PyTorch, and Voltron Data

MLOps Learners
MLOps Learners
14.6 هزار بار بازدید - 4 ماه پیش - 00:30 Workshop
00:30 Workshop overview by ‪@ChipHuyen‬ 03:51 Crash course to GPU optimization (Mark Saroufim, Meta) 39:18 High-performance LLM serving on NVIDIA GPUs (Sharan Chetlur, NVIDIA) 1:19:18 Block-based GPU Programming with Triton (Philippe Tillet, OpenAI) 1:59:00 Scaling data processing from CPU to distributed GPUs (William Malpica, Voltron Data) Join the discussion on Discord: discord.gg/k3feVx8cK9 Shared note (during the event): docs.google.com/document/d/1TR_5Ax0rPqTj8I2sA7MH-a… GitHub repo with schedule: github.com/mlops-discord/gpu-optimization-workshop For more events hosted by Chip in the future: lu.ma/chiphuyen ​Philippe Tillet is leading the Triton team at OpenAI. He previously worked at pretty much all major chip makers, including NVIDIA, AMD, Intel, and Nervana. ​Sharan Chetlur, Principal engineer working on TensorRT-LLM at NVIDIA. He’s been working on CUDA since 2012, having optimized the performance of deep learning models from single GPU to full data center scale. Previously, he was Director of Engineer on the Kernels team at Cerebras. ​William Malpica, co-founder of Voltron Data and creator of BlazingSQL. He helped scale our GPU-native query engine to handle 100TB queries! Mark Saroufim, PyTorch core developer and cofounder of CUDA MODE. He also ran the really fun NeurIPS LLM Efficiency challenge last year. Previously, he was at Graphcore and Microsoft.
4 ماه پیش در تاریخ 1403/03/03 منتشر شده است.
14,680 بـار بازدید شده
... بیشتر