Return to Colloquia & Seminar listing
Heat Flows through Transformer Architectures
Mathematics of Data & DecisionsSpeaker: | Zaid Harchaoui, University of Washington |
Location: | 1025 PDSB |
Start time: | Tue, Dec 3 2024, 3:10PM |
Large language models, large vision models, and AI frontier models have changed the landscape of machine learning and AI research. These models are based on the same kind of artificial deep neural network architecture, which is primarily based on the iterated composition of a type of mapping called attention. The transformation of input data under consecutive attention maps can be interpreted as the evolution of a set of particles over time. We present a theoretical analysis of this evolution process, the dynamics driving it, and its convergence to a heat flow, within a framework of gradient flows in measure spaces. This is based on joint work with Medha Agarwal, Garrett Mulcahy, and Soumik Pal <https://arxiv.org/abs/2406.10823>.