Logo del repository
  1. Home
 
Opzioni

How Many Crowd Workers Do I Need? On Statistical Power When Crowdsourcing Relevance Judgments

Kevin Roitero
•
David La Barbera
•
Michael Soprano
altro
Tetsuya Sakai
2023
  • journal article

Periodico
ACM TRANSACTIONS ON INFORMATION SYSTEMS
Abstract
To scale the size of Information Retrieval collections, crowdsourcing has become a common way to collect relevance judgments at scale. Crowdsourcing experiments usually employ 100-10,000 workers, but such a number is often decided in a heuristic way. The downside is that the resulting dataset does not have any guarantee of meeting predefined statistical requirements as, for example, have enough statistical power to be able to distinguish in a statistically significant way between the relevance of two documents. We propose a methodology adapted from literature on sound topic set size design, based on t-test and ANOVA, which aims at guaranteeing the resulting dataset to meet a predefined set of statistical requirements. We validate our approach on several public datasets. Our results show that we can reliably estimate the recommended number of workers needed to achieve statistical power, and that such estimation is dependent on the topic, while the effect of the relevance scale is limited. Furthermore, we found that such estimation is dependent on worker features such as agreement. Finally, we describe a set of practical estimation strategies that can be used to estimate the worker set size, and we also provide results on the estimation of document set sizes.
DOI
10.1145/3597201
WOS
WOS:001091665500021
Archivio
https://hdl.handle.net/11390/1257584
info:eu-repo/semantics/altIdentifier/scopus/2-s2.0-85164481047
https://dl.acm.org/doi/10.1145/3597201
https://ricerca.unityfvg.it/handle/11390/1257584
Diritti
open access
Soggetti
  • relevance judgments, ...

google-scholar
Get Involved!
  • Source Code
  • Documentation
  • Slack Channel
Make it your own

DSpace-CRIS can be extensively configured to meet your needs. Decide which information need to be collected and available with fine-grained security. Start updating the theme to match your nstitution's web identity.

Need professional help?

The original creators of DSpace-CRIS at 4Science can take your project to the next level, get in touch!

Realizzato con Software DSpace-CRIS - Estensione mantenuta e ottimizzata da 4Science

  • Impostazioni dei cookie
  • Informativa sulla privacy
  • Accordo con l'utente finale
  • Invia il tuo Feedback