Logo del repository
  1. Home
 
Opzioni

Utility-Oriented String Mining

Bernardini, Giulia
•
Chen, Huiping
•
Conte, Alessio
altro
Pissis, Solon P.
2024
  • conference object

Abstract
A string is often provided with numerical scores (utilities) which quantify the importance, interest, profit, or risk of the letters occurring at every position of the string. For example, every DNA fragment produced by modern sequencing machines comes with a confidence score per position. Motivated by the abundance of strings with utilities, we introduce Utility-oriented String Mining (USM), a natural generalization of the classic frequent substring mining problem. Given a string S of length n and a threshold V, USM asks for every string R whose utility U(R) is at least V, where U is a function that maps R to a utility score based on the utilities of all letters of every occurrence of R in S. In addition, our work makes the following contributions: (1) We identify a class U of utility functions for which USM admits an O(n^2)-time algorithm. (2) We prove that no listing algorithm solves the USM problem in subquadratic time for every utility function, or even for every function in U. (3) We propose an O(n log n)-time algorithm that solves USM for a class of monotone functions from U. (4) We design another O(n log n)-time algorithm for the same problem that is comparable in runtime but offers drastic space savings in practice when, in addition, a lower bound on the length of the output strings is provided as input. (5) We demonstrate experimentally using publicly available, billion-letter datasets that our algorithms are many times more efficient, in terms of runtime and/or space, compared to an Apriori-like baseline which employs advanced string processing tools
DOI
10.1137/1.9781611978032.22
WOS
WOS:001281344400018
Archivio
https://hdl.handle.net/11368/3072798
info:eu-repo/semantics/altIdentifier/scopus/2-s2.0-85193517886
https://epubs.siam.org/doi/pdf/10.1137/1.9781611978032.22
Diritti
closed access
license:copyright editore
license uri:iris.pri02
FVG url
https://arts.units.it/request-item?handle=11368/3072798
Soggetti
  • String

  • Pattern

  • Utility-Oriented Mini...

  • String Mining

google-scholar
Get Involved!
  • Source Code
  • Documentation
  • Slack Channel
Make it your own

DSpace-CRIS can be extensively configured to meet your needs. Decide which information need to be collected and available with fine-grained security. Start updating the theme to match your nstitution's web identity.

Need professional help?

The original creators of DSpace-CRIS at 4Science can take your project to the next level, get in touch!

Realizzato con Software DSpace-CRIS - Estensione mantenuta e ottimizzata da 4Science

  • Impostazioni dei cookie
  • Informativa sulla privacy
  • Accordo con l'utente finale
  • Invia il tuo Feedback