Logo del repository
  1. Home
 
Opzioni

Sjclust: Towards a framework for integrating similarity join algorithms and clustering

Ribeiro, Leonardo Andrade
•
CUZZOCREA, Alfredo Massimiliano
•
Bezerra, Karen Aline Alves
•
Do Nascimento, Ben Hur Bahia
2016
  • conference object

Abstract
A critical task in data cleaning and integration is the identification of duplicate records representing the same real-world entity. A popular approach to duplicate identification employs similarity join to find pairs of similar records followed by a clustering algorithm to group together records that refer to the same entity. However, the clustering algorithm is strictly used as a post-processing step, which slows down the overall performance and only produces results at the end of the whole process. In this paper, we propose SjClust, a framework to integrate similarity join and clustering into a single operation. Our approach allows to smoothly accommodating a variety of cluster representation and merging strategies into set similarity join algorithms, while fully leveraging state-of-the-art optimization techniques.
WOS
WOS:000393155500005
Archivio
http://hdl.handle.net/11368/2898316
info:eu-repo/semantics/altIdentifier/scopus/2-s2.0-84979530450
http://www.scitepress.org/DigitalLibrary/HomePage.aspx
Diritti
closed access
license:digital rights management non definito
FVG url
https://arts.units.it/request-item?handle=11368/2898316
Soggetti
  • Clustering

  • Data cleaning

  • Data integration

  • Duplicate identificat...

  • Set similarity join

  • Information Systems a...

  • Computer Science Appl...

Visualizzazioni
1
Data di acquisizione
Apr 19, 2024
Vedi dettagli
google-scholar
Get Involved!
  • Source Code
  • Documentation
  • Slack Channel
Make it your own

DSpace-CRIS can be extensively configured to meet your needs. Decide which information need to be collected and available with fine-grained security. Start updating the theme to match your nstitution's web identity.

Need professional help?

The original creators of DSpace-CRIS at 4Science can take your project to the next level, get in touch!

Realizzato con Software DSpace-CRIS - Estensione mantenuta e ottimizzata da 4Science

  • Impostazioni dei cookie
  • Informativa sulla privacy
  • Accordo con l'utente finale
  • Invia il tuo Feedback