Logo del repository
  1. Home
 
Opzioni

Learning the evolution of disciplines from scientific literature: A functional clustering approach to normalized keyword count trajectories

Trevisani, Matilde
•
Tuzzi, Arjuna
2018
  • journal article

Periodico
KNOWLEDGE-BASED SYSTEMS
Abstract
The growing availability of large diachronic corpora of scientific literature offers the opportunity of reading the temporal evolution of concepts, methods and applications, i.e., the history of disciplines involved in the strand under investigation. After a retrieval process of the most relevant keywords, bag-of-words approaches produce words × time-points contingency tables, i.e. the frequencies of each word in the set of texts grouped by time-points. Through the analysis of word counts over the observed period of time, main purpose of the study is, after reconstructing the “life-cycle” of words, clustering words that have similar life-cycles and, thus, detecting prototypical or exemplary temporal patterns. Unveiling such relevant and (through expert opinion) meaningful inner dynamics enables us to trace a historical narrative of the discipline of interest. However, different history readings are possible depending on the type of data normalization, which is needed to account for the fluctuating size of texts across time and the general problems of data sparsity and strong asymmetry. This study proposes a methodology consisting of (1) a stepwise information retrieval procedure for keywords’ selection and (2) a functional clustering two-stage approach for statistical learning. Moreover, a sample of possible normalizations of word frequencies is considered, showing that the different concept of curve similarity induced in clustering by the type of transformation heavily affects groups’ composition and size. The corpus of titles of scientific papers published by the American Statistical Association journals in the time span 1888–2012 is examined for illustration.
DOI
10.1016/j.knosys.2018.01.035
WOS
WOS:000428494600009
Archivio
http://hdl.handle.net/11368/2921084
info:eu-repo/semantics/altIdentifier/scopus/2-s2.0-85041915418
https://www.sciencedirect.com/science/article/pii/S0950705118300510?via%3Dihub
Diritti
open access
license:copyright editore
license:digital rights management non definito
FVG url
https://arts.units.it/request-item?handle=11368/2921084
Soggetti
  • Chronological textual...

  • Curve clustering

  • Diachronic corpora

  • Functional data analy...

  • Keyword retrieval

  • Normalization

Scopus© citazioni
6
Data di acquisizione
Jun 7, 2022
Vedi dettagli
Web of Science© citazioni
7
Data di acquisizione
Mar 28, 2024
Visualizzazioni
2
Data di acquisizione
Apr 19, 2024
Vedi dettagli
google-scholar
Get Involved!
  • Source Code
  • Documentation
  • Slack Channel
Make it your own

DSpace-CRIS can be extensively configured to meet your needs. Decide which information need to be collected and available with fine-grained security. Start updating the theme to match your nstitution's web identity.

Need professional help?

The original creators of DSpace-CRIS at 4Science can take your project to the next level, get in touch!

Realizzato con Software DSpace-CRIS - Estensione mantenuta e ottimizzata da 4Science

  • Impostazioni dei cookie
  • Informativa sulla privacy
  • Accordo con l'utente finale
  • Invia il tuo Feedback