News

Gastvortrag "Understanding Transformers"

Prof. Dr. Giovanni Fantuzzi von der Friedrich-Alexander-Universität Erlangen-Nürnberg wird als Teil des "MODUS Seminar" am 28.05.2025 von 12:15–13:45 in S102 (FAN-B) einen Vortrag zum Thema "Understanding Transformers: Hardmax Attention, Clustering, and Perfect Sequence Classification" halten.

Abstract: Transformers are an extremely successful machine learning model, famously known for powering platforms such as ChatGPT. What distinguishes them from classical deep neural networks is the presence of "attention" layers between standard "feed-forward" layers. In this talk, I will discuss how simple geometrical rules can explain the role of the attention layers and, consequently, the outstanding practical performance of transformers. Specifically, by focussing on a simplified class of transformers with "hardmax" attention, I will first show that attention layers induce clustering of the transformer's input data. I will then use this clustering effect to construct transformers that can perfectly classify a given set of input sequences with arbitrary but finite length, modelling, for example, books to be classified by a library. Crucially, the complexity of this construction is independent of the sequence length. This is in stark contrast to classical deep neural networks, explaining (at least in part) the superior performance of transformers for sequence classification tasks.