Logo del repository
  1. Home
 
Opzioni

Toward Text Data Augmentation for Sentiment Analysis

Abonizio H. Q.
•
Paraiso E. C.
•
Barbon S.
2022
  • journal article

Periodico
IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE
Abstract
A significant part of natural language processing (NLP) techniques for sentiment analysis is based on supervised methods, which are affected by the quality of data. Therefore, sentiment analysis needs to be prepared for data quality issues, such as imbalance and lack of labeled data. Data augmentation methods, widely adopted in image classification tasks, include data-space solutions to tackle the problem of limited data and enhance the size and quality of training datasets to provide better models. In this work, we study the advantages and drawbacks of text augmentation methods such as easy data augmentation, back-translation, BART, and pretrained data augmentor) with recent classification algorithms (long short-term memory, convolutional neural network, bidirectional encoder representations of transformers, support vector machine, gated recurrent units, random forests, and enhanced language representation with informative entities, that have attracted sentiment-analysis researchers and industry applications. We explored seven sentiment-analysis datasets to provide scenarios of imbalanced datasets and limited data to discuss the influence of a given classifier in overcoming these problems, and provide insights into promising combinations of transformation, paraphrasing, and generation methods of sentence augmentation. The results revealed improvements from the augmented dataset, mainly for reduced datasets. Furthermore, when balanced by augmenting the minority class, the datasets were found to have improved quality, leading to more robust classifiers. The contributions to this article include the taxonomy of NLP augmentation methods and their efficiency over several classifiers from recent research trends in sentiment analysis and related fields.
DOI
10.1109/TAI.2021.3114390
Archivio
https://hdl.handle.net/11368/3055528
info:eu-repo/semantics/altIdentifier/scopus/2-s2.0-85141424935
https://ieeexplore.ieee.org/document/9543519
Diritti
open access
license:copyright editore
license:digital rights management non definito
license uri:iris.pri02
license uri:iris.pri00
FVG url
https://arts.units.it/request-item?handle=11368/3055528
Soggetti
  • Machine learning

  • natural language proc...

  • sentiment analysi

  • text analysi

  • text mining

google-scholar
Get Involved!
  • Source Code
  • Documentation
  • Slack Channel
Make it your own

DSpace-CRIS can be extensively configured to meet your needs. Decide which information need to be collected and available with fine-grained security. Start updating the theme to match your nstitution's web identity.

Need professional help?

The original creators of DSpace-CRIS at 4Science can take your project to the next level, get in touch!

Realizzato con Software DSpace-CRIS - Estensione mantenuta e ottimizzata da 4Science

  • Impostazioni dei cookie
  • Informativa sulla privacy
  • Accordo con l'utente finale
  • Invia il tuo Feedback