Logo del repository
  1. Home
 
Opzioni

Pre-trained Data Augmentation for Text Classification

Queiroz Abonizio H.
•
Barbon Junior S.
2020
  • conference object

Abstract
Data augmentation is a widely adopted method for improving model performance in image classification tasks. Although it still not as ubiquitous in Natural Language Processing (NLP) community, some methods have already been proposed to increase the amount of training data using simple text transformations or text generation through language models. However, recent text classification tasks need to deal with domains characterized by a small amount of text and informal writing, e.g., Online Social Networks content, reducing the capabilities of current methods. Facing these challenges by taking advantage of the pre-trained language models, low computational resource consumption, and model compression, we proposed the PRE-trained Data AugmenTOR (PREDATOR) method. Our data augmentation method is composed of two modules: the Generator, which synthesizes new samples grounded on a lightweight model, and the Filter, that selects only the high-quality ones. The experiments comparing Bidirectional Encoder Representations from Transformer (BERT), Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM) and Multinomial Naive Bayes (NB) in three datasets exposed the effective improvement of accuracy. It was obtained 28.5% of accuracy improvement with LSTM on the best scenario and an average improvement of 8% across all scenarios. PREDATOR was able to augment real-world social media datasets and other domains, overcoming the recent text augmentation techniques.
DOI
10.1007/978-3-030-61377-8_38
Archivio
https://hdl.handle.net/11368/3037308
info:eu-repo/semantics/altIdentifier/scopus/2-s2.0-85094095986
https://link.springer.com/chapter/10.1007/978-3-030-61377-8_38
Diritti
open access
license:copyright editore
license:digital rights management non definito
license uri:iris.pri02
license uri:iris.pri00
FVG url
https://arts.units.it/request-item?handle=11368/3037308
Soggetti
  • Data augmentation

  • Online social network...

  • Text classification

google-scholar
Get Involved!
  • Source Code
  • Documentation
  • Slack Channel
Make it your own

DSpace-CRIS can be extensively configured to meet your needs. Decide which information need to be collected and available with fine-grained security. Start updating the theme to match your nstitution's web identity.

Need professional help?

The original creators of DSpace-CRIS at 4Science can take your project to the next level, get in touch!

Realizzato con Software DSpace-CRIS - Estensione mantenuta e ottimizzata da 4Science

  • Impostazioni dei cookie
  • Informativa sulla privacy
  • Accordo con l'utente finale
  • Invia il tuo Feedback