Studying Large Language Model Generalization with Influence Functions

Ai2
Ai2
1.4 هزار بار بازدید - 9 ماه پیش - Abstract: When trying to gain
Abstract: When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior?  Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set?  While influence functions have produced insights for small models, they are difficult to scale to large language models (LLMs) due to the difficulty of computing an inverse-Hessian-vector product (IHVP). We introduce efficient approximations that let us scale influence functions up to LLMs with up to 52 billion parameters. We use influence functions to investigate the generalization patterns of LLMs, including the sparsity of the influence patterns, increasing abstraction with scale, math and programming abilities, cross-lingual generalization, and role-playing behavior. Overall, influence functions give us a powerful new tool for studying the generalization properties of LLMs.

Bio: Roger Grosse is an Associate Professor of Computer Science at the University of Toronto and a founding member of the Vector Institute, focusing on machine learning and AI alignment. He is also a Member of Technical Staff on the Alignment Team at Anthropic. Previously, he was a postdoc at the University of Toronto, after having received a PhD at MIT, studying under Bill Freeman and Josh Tenenbaum. He holds or has received a Sloan Research Fellowship, a Canada Research Chair in Deep Learning and AI Alignment, and a Canada CIFAR AI Chair.
9 ماه پیش در تاریخ 1402/08/10 منتشر شده است.
1,495 بـار بازدید شده
... بیشتر