Logo del repository
  1. Home
 
Opzioni

A distributional simplicity bias in the learning dynamics of transformers

Rende,R.
•
Gerace, F.
•
Laio, A.
•
Goldt, S.
2024
  • conference object

Abstract
The remarkable capability of over-parameterised neural networks to generalise effectively has been explained by invoking a ``simplicity bias'': neural networks prevent overfitting by initially learning simple classifiers before progressing to more complex, non-linear functions. While simplicity biases have been described theoretically and experimentally in feed-forward networks for supervised learning, the extent to which they also explain the remarkable success of transformers trained with self-supervised techniques remains unclear. In our study, we demonstrate that transformers, trained on natural language data, also display a simplicity bias. Specifically, they sequentially learn many-body interactions among input tokens, reaching a saturation point in the prediction error for low-degree interactions while continuing to learn high-degree interactions. To conduct this analysis, we develop a procedure to generate \textit{clones} of a given natural language data set, which rigorously capture the interactions between tokens up to a specified order. This approach opens up the possibilities of studying how interactions of different orders in the data affect learning, in natural language processing and beyond.
Archivio
https://hdl.handle.net/20.500.11767/143213
info:eu-repo/semantics/altIdentifier/scopus/2-s2.0-105000531254
https://papers.nips.cc/paper_files/paper/2024/hash/ae6c81a39079ddeb88b034b6ef18c7fe-Abstract-Conference.html
https://openreview.net/forum?id=GgV6UczIWM
Diritti
open access
license:non specificato
license uri:na
Soggetti
  • Settore PHYS-04/A - F...

  • Settore PHYS-06/A - F...

google-scholar
Get Involved!
  • Source Code
  • Documentation
  • Slack Channel
Make it your own

DSpace-CRIS can be extensively configured to meet your needs. Decide which information need to be collected and available with fine-grained security. Start updating the theme to match your nstitution's web identity.

Need professional help?

The original creators of DSpace-CRIS at 4Science can take your project to the next level, get in touch!

Realizzato con Software DSpace-CRIS - Estensione mantenuta e ottimizzata da 4Science

  • Impostazioni dei cookie
  • Informativa sulla privacy
  • Accordo con l'utente finale
  • Invia il tuo Feedback