Logo del repository
  1. Home
 
Opzioni

Semisupervised Wrapper Choice and Generation for Print-Oriented Documents

BARTOLI, Alberto
•
DAVANZO, GIORGIO
•
MEDVET, Eric
•
SORIO, ENRICO
2014
  • journal article

Periodico
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
Abstract
Information extraction from printed documents is still a crucial problem in many interorganizational workflows. Solutions for other application domains, e.g., the web, do not fit this peculiar scenario well, as printed documents do not carry any explicit structural or syntactical description. Moreover, printed documents usually lack any explicit indication about their source. We present a system, which we call PATO, for extracting predefined items from printed documents in a dynamic multi-source scenario. PATO selects the source-specific wrapper required by each document, determines whether no suitable wrapper exists and generates one when necessary. PATO assumes that the need for new source-specific wrappers is part of normal system operation: new wrappers are generated on-line based on a few point-and-click operations performed by a human operator on a GUI. The role of operators is an integral part of the design and PATO may be configured to accommodate a broad range of automation levels. We show that PATO exhibits very good performance on a challenging dataset composed of more than 600 printed documents drawn from three different application domains: invoices, datasheets of electronic components, patents. We also perform an extensive analysis of the crucial trade-off between accuracy and automation level.
DOI
10.1109/TKDE.2012.254
WOS
WOS:000327656800018
Archivio
http://hdl.handle.net/11368/2640660
info:eu-repo/semantics/altIdentifier/scopus/2-s2.0-84890373751
Diritti
closed access
license:digital rights management non definito
FVG url
https://arts.units.it/request-item?handle=11368/2640660
Soggetti
  • document management

  • administrative data p...

  • business process auto...

  • retrieval model

  • human-computer intera...

  • data entry

Scopus© citazioni
5
Data di acquisizione
Jun 14, 2022
Vedi dettagli
Web of Science© citazioni
2
Data di acquisizione
Mar 28, 2024
google-scholar
Get Involved!
  • Source Code
  • Documentation
  • Slack Channel
Make it your own

DSpace-CRIS can be extensively configured to meet your needs. Decide which information need to be collected and available with fine-grained security. Start updating the theme to match your nstitution's web identity.

Need professional help?

The original creators of DSpace-CRIS at 4Science can take your project to the next level, get in touch!

Realizzato con Software DSpace-CRIS - Estensione mantenuta e ottimizzata da 4Science

  • Impostazioni dei cookie
  • Informativa sulla privacy
  • Accordo con l'utente finale
  • Invia il tuo Feedback