Logo del repository
  1. Home
 
Opzioni

Active Learning of Regular Expressions for Entity Extraction

BARTOLI, Alberto
•
DE LORENZO, ANDREA
•
MEDVET, Eric
•
TARLAO, FABIANO
2017
  • journal article

Periodico
IEEE TRANSACTIONS ON CYBERNETICS
Abstract
We consider the automatic synthesis of an entity extractor, in the form of a regular expression, from examples of the desired extractions in an unstructured text stream. This is a long-standing problem for which many different approaches have been proposed, which all require the preliminary construction of a large dataset fully annotated by the user. In this work we propose an active learning approach aimed at minimizing the user annotation effort: the user annotates only one desired extraction and then merely answers extraction queries generated by the system. During the learning process, the system digs into the input text for selecting the most appropriate extraction query to be submitted to the user in order to improve the current extractor. We construct candidate solutions with Genetic Programming and select queries with a form of querying-by-committee, i.e., based on a measure of disagreement within the best candidate solutions. All the components of our system are carefully tailored to the peculiarities of active learning with Genetic Programming and of entity extraction from unstructured text. We evaluate our proposal in depth, on a number of challenging datasets and based on a realistic estimate of the user effort involved in answering each single query. The results demonstrate high accuracy with significant savings in terms of computational effort, annotated characters and execution time over a state-of-the-art baseline.
DOI
10.1109/TCYB.2017.2680466
WOS
WOS:000424826800020
Archivio
http://hdl.handle.net/11368/2898072
info:eu-repo/semantics/altIdentifier/scopus/2-s2.0-85016396290
http://ieeexplore.ieee.org/document/7886274/
Diritti
open access
license:copyright editore
license:digital rights management non definito
FVG url
https://arts.units.it/request-item?handle=11368/2898072
Soggetti
  • Inference mechanism

  • Semisupervised learni...

  • Evolutionary computat...

  • Genetic programming

  • Text processing

  • Man machine system

  • Automatic programming...

Scopus© citazioni
22
Data di acquisizione
Jun 15, 2022
Vedi dettagli
Web of Science© citazioni
22
Data di acquisizione
Mar 26, 2024
Visualizzazioni
1
Data di acquisizione
Apr 19, 2024
Vedi dettagli
google-scholar
Get Involved!
  • Source Code
  • Documentation
  • Slack Channel
Make it your own

DSpace-CRIS can be extensively configured to meet your needs. Decide which information need to be collected and available with fine-grained security. Start updating the theme to match your nstitution's web identity.

Need professional help?

The original creators of DSpace-CRIS at 4Science can take your project to the next level, get in touch!

Realizzato con Software DSpace-CRIS - Estensione mantenuta e ottimizzata da 4Science

  • Impostazioni dei cookie
  • Informativa sulla privacy
  • Accordo con l'utente finale
  • Invia il tuo Feedback