Mathematics Colloquia and Seminars

Return to Colloquia & Seminar listing

Heat Flows through Transformer Architectures

Mathematics of Data & Decisions

Speaker: Zaid Harchaoui, University of Washington
Location: 1025 PDSB
Start time: Tue, Dec 3 2024, 3:10PM

Large language models, large vision models, and AI frontier models have changed the landscape of machine learning and AI research. These models are based on the same kind of artificial deep neural network architecture, which is primarily based on the iterated composition of a type of mapping called attention. The transformation of input data under consecutive attention maps can be interpreted as the evolution of a set of particles over time. We present a theoretical analysis of this evolution process, the dynamics driving it, and its convergence to a heat flow, within a framework of gradient flows in measure spaces. This is based on joint work with Medha Agarwal, Garrett Mulcahy, and Soumik Pal <https://arxiv.org/abs/2406.10823>.