Logo del repository
  1. Home
 
Opzioni

SjClust: A Framework for Incorporating Clustering into Set Similarity Join Algorithms

Ribeiro, Leonardo Andrade
•
Cuzzocrea, Alfredo
•
Bezerra, Karen Aline Alves
•
do Nascimento, Ben Hur Bahia
2018
  • journal article

Periodico
TRANSACTIONS ON LARGE-SCALE DATA- AND KNOWLEDGE-CENTERED SYSTEMS
Abstract
A critical task in data cleaning and integration is the identification of duplicate records representing the same real-world entity. Similarity join is largely used in order to detect pairs of similar records in combination with a subsequent clustering algorithm for grouping together records referring to the same entity. Unfortunately, the clustering algorithm is strictly used as a post-processing step, which slows down the overall performance, and final results are produced at the end of the whole process only. Inspired by this critical evidence, in this article we propose and experimentally evaluate SjClust, a framework to integrate similarity join and clustering into a single operation. The basic idea of our proposal consists in introducing a variety of cluster representations that are smoothly merged during the set similarity task carried out by the join algorithm. An optimization task is further applied on top of such framework. Experimental results derived from an extensive experimental campaign show that we outperform previous approaches by an order of magnitude in most settings.
DOI
10.1007/978-3-662-58384-5_4
Archivio
http://hdl.handle.net/11368/2939025
info:eu-repo/semantics/altIdentifier/scopus/2-s2.0-85057198604
https://www.springer.com/series/558
Diritti
closed access
license:copyright editore
FVG url
https://arts.units.it/request-item?handle=11368/2939025
Soggetti
  • Theoretical Computer ...

  • Computer Science (all...

Scopus© citazioni
2
Data di acquisizione
Jun 7, 2022
Vedi dettagli
Visualizzazioni
1
Data di acquisizione
Apr 19, 2024
Vedi dettagli
google-scholar
Get Involved!
  • Source Code
  • Documentation
  • Slack Channel
Make it your own

DSpace-CRIS can be extensively configured to meet your needs. Decide which information need to be collected and available with fine-grained security. Start updating the theme to match your nstitution's web identity.

Need professional help?

The original creators of DSpace-CRIS at 4Science can take your project to the next level, get in touch!

Realizzato con Software DSpace-CRIS - Estensione mantenuta e ottimizzata da 4Science

  • Impostazioni dei cookie
  • Informativa sulla privacy
  • Accordo con l'utente finale
  • Invia il tuo Feedback