Logo del repository
  1. Home
 
Opzioni

Inference of Regular Expressions for Text Extraction from Examples

BARTOLI, Alberto
•
DE LORENZO, ANDREA
•
MEDVET, Eric
•
TARLAO, FABIANO
2016
  • journal article

Periodico
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
Abstract
A large class of entity extraction tasks from text that is either semistructured or fully unstructured may be addressed by regular expressions, because in many practical cases the relevant entities follow an underlying syntactical pattern and this pattern may be described by a regular expression. In this work we consider the long-standing problem of synthesizing such expressions automatically, based solely on examples of the desired behavior. We present the design and implementation of a system capable of addressing extraction tasks of realistic complexity. Our system is based on an evolutionary procedure carefully tailored to the specific needs of regular expression generation by examples. The procedure executes a search driven by a multiobjective optimization strategy aimed at simultaneously improving multiple performance indexes of candidate solutions while at the same time ensuring an adequate exploration of the huge solution space. We assess our proposal experimentally in great depth, on a number of challenging datasets. The accuracy of the obtained solutions seems to be adequate for practical usage and improves over earlier proposals significantly. Most importantly, our results are highly competitive even with respect to human operators. A prototype is available as a web application at http://regex.inginf.units.it.
DOI
10.1109/TKDE.2016.2515587
WOS
WOS:000374523000010
Archivio
http://hdl.handle.net/11368/2864925
info:eu-repo/semantics/altIdentifier/scopus/2-s2.0-84963858038
https://ieeexplore.ieee.org/document/7374717
Diritti
open access
license:copyright editore
license:copyright editore
FVG url
https://arts.units.it/request-item?handle=11368/2864925
Soggetti
  • Genetic Programming

  • Information extractio...

  • Programming by exampl...

  • Multiobjective optimi...

  • Heuristic search

Scopus© citazioni
54
Data di acquisizione
Jun 15, 2022
Vedi dettagli
Web of Science© citazioni
59
Data di acquisizione
Mar 16, 2024
google-scholar
Get Involved!
  • Source Code
  • Documentation
  • Slack Channel
Make it your own

DSpace-CRIS can be extensively configured to meet your needs. Decide which information need to be collected and available with fine-grained security. Start updating the theme to match your nstitution's web identity.

Need professional help?

The original creators of DSpace-CRIS at 4Science can take your project to the next level, get in touch!

Realizzato con Software DSpace-CRIS - Estensione mantenuta e ottimizzata da 4Science

  • Impostazioni dei cookie
  • Informativa sulla privacy
  • Accordo con l'utente finale
  • Invia il tuo Feedback