Logo del repository
  1. Home
 
Opzioni

Hide and Mine in Strings: Hardness, Algorithms, and Experiments

Bernardini G.
•
Conte A.
•
Gourdel G.
altro
Sweering M.
2022
  • journal article

Periodico
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
Abstract
Data sanitization and frequent pattern mining are two well-studied topics in data mining. Our work initiates a study on the fundamental relation between data sanitization and frequent pattern mining in the context of sequential (string) data. Current methods for string sanitization hide confidential patterns. This, however, may lead to spurious patterns that harm the utility of frequent pattern mining. The main computational problem is to minimize this harm. Our contribution here is as follows. First, we present several hardness results, for different variants of this problem, essentially showing that these variants cannot be solved or even be approximated in polynomial time. Second, we propose integer linear programming formulations for these variants and algorithms to solve them, which work in polynomial time under realistic assumptions on the input parameters. We complement the integer linear programming algorithms with a greedy heuristic. Third, we present an extensive experimental study, using both synthetic and real-world datasets, that demonstrates the effectiveness and efficiency of our methods. Beyond sanitization, the process of missing value replacement may also lead to spurious patterns. Interestingly, our results apply in this context as well.
DOI
10.1109/TKDE.2022.3158063
WOS
WOS:000981944600035
Archivio
https://hdl.handle.net/11368/3020001
info:eu-repo/semantics/altIdentifier/scopus/2-s2.0-85126326575
https://ieeexplore.ieee.org/document/9732522
Diritti
open access
license:copyright editore
license:digital rights management non definito
license uri:iris.pri02
license uri:iris.pri00
FVG url
https://arts.units.it/request-item?handle=11368/3020001
Soggetti
  • Bioinformatic

  • Data integrity

  • Data mining

  • Data privacy

  • Data sanitization

  • DNA

  • Frequent pattern mini...

  • Genomic

  • Knowledge hiding

  • Privacy

  • Resist

  • String algorithms

google-scholar
Get Involved!
  • Source Code
  • Documentation
  • Slack Channel
Make it your own

DSpace-CRIS can be extensively configured to meet your needs. Decide which information need to be collected and available with fine-grained security. Start updating the theme to match your nstitution's web identity.

Need professional help?

The original creators of DSpace-CRIS at 4Science can take your project to the next level, get in touch!

Realizzato con Software DSpace-CRIS - Estensione mantenuta e ottimizzata da 4Science

  • Impostazioni dei cookie
  • Informativa sulla privacy
  • Accordo con l'utente finale
  • Invia il tuo Feedback