Logo del repository
  1. Home
 
Opzioni

Analisi di dati testuali cronologici in corpora diacronici: effetti della normalizzazione sul curve clustering

TREVISANI, MATILDE
•
Tuzzi, Arjuna
2016
  • conference object

Periodico
LEXICOMETRICA
Abstract
In bag-of-words approaches textual data are organized in words×texts contingency tables. Diachronic corpora include texts which have a chronological order and produce words×time-points contingency tables, i.e. the frequencies of each word in the text (or in the set of texts) that refers to each time-point. The temporal evolution of word frequencies is crucial to highlight the distinctive features of time spans as well as to cluster words portraying a similar temporal pattern. However, to take into account the fluctuating size of available texts for each time-point, the strong asymmetry of word frequencies and the general problem of data sparsity, a transformation of data is necessary. This study aims at examining how different data transformations affect curve clustering in terms of number and composition of word groups. A functional data approach that envisages a smoothing procedure (B-splines) combined with a distance-based curve clustering has been adopted. Examples are taken from the corpus of titles of scientific papers published by the Journal of the American Statistical Association (and its predecessors) in the time-span 1888-2012 and consist in the analysis of the life-cycle of 900 keywords through the timeline of 107 volumes.
Archivio
http://hdl.handle.net/11368/2888892
http://lexicometrica.univ-paris3.fr/jadt/
http://lexicometrica.univ-paris3.fr/jadt/jadt2016/01-ACTES/82630/82630.pdf
Diritti
closed access
license:digital rights management non definito
FVG url
https://arts.units.it/request-item?handle=11368/2888892
Soggetti
  • diachronic corpora

  • chronological textual...

  • data transformation

  • normalization

  • curve clustering

  • spline

  • functional data analy...

Visualizzazioni
1
Data di acquisizione
Apr 19, 2024
Vedi dettagli
google-scholar
Get Involved!
  • Source Code
  • Documentation
  • Slack Channel
Make it your own

DSpace-CRIS can be extensively configured to meet your needs. Decide which information need to be collected and available with fine-grained security. Start updating the theme to match your nstitution's web identity.

Need professional help?

The original creators of DSpace-CRIS at 4Science can take your project to the next level, get in touch!

Realizzato con Software DSpace-CRIS - Estensione mantenuta e ottimizzata da 4Science

  • Impostazioni dei cookie
  • Informativa sulla privacy
  • Accordo con l'utente finale
  • Invia il tuo Feedback