Druckansicht der Internetadresse:

Research Center for AI in Science & Society

Print page

News

Overview


Guest lecture "Understanding Transformers"

Prof. Dr. Giovanni Fantuzzi from Friedrich-Alexander University Erlangen–Nuremberg will give a lecture titled "Understanding Transformers: Hardmax Attention, Clustering, and Perfect Sequence Classification" as part of the MODUS Seminar on May 28, 2025, from 12:15 to 13:45 in room S102 (FAN-B).

Abstract: Transformers are an extremely successful machine learning model, famously known for powering platforms such as ChatGPT. What distinguishes them from classical deep neural networks is the presence of "attention" layers between standard "feed-forward" layers. In this talk, I will discuss how simple geometrical rules can explain the role of the attention layers and, consequently, the outstanding practical performance of transformers. Specifically, by focussing on a simplified class of transformers with "hardmax" attention, I will first show that attention layers induce clustering of the transformer's input data. I will then use this clustering effect to construct transformers that can perfectly classify a given set of input sequences with arbitrary but finite length, modelling, for example, books to be classified by a library. Crucially, the complexity of this construction is independent of the sequence length. This is in stark contrast to classical deep neural networks, explaining (at least in part) the superior performance of transformers for sequence classification tasks.

Facebook Youtube-Kanal Instagram LinkedIn UBT-A Contact