Logo del repository
  1. Home
 
Opzioni

Elastic-Degenerate String Matching via Fast Matrix Multiplication

G. Bernardini
•
P. Gawrychowski
•
N. Pisanti
altro
G. Rosone
2022
  • journal article

Periodico
SIAM JOURNAL ON COMPUTING
Abstract
An elastic-degenerate (ED) string is a sequence of n sets of strings of total length N which was recently proposed to model a set of similar sequences. The ED string matching (EDSM) problem is to find all occurrences of a pattern of length m in an ED text. The EDSM problem has recently received some attention in the combinatorial pattern matching community, and an O(nm1.5logm−−−−−√+N)-time algorithm is known [Aoyama et al., CPM 2018]. The standard assumption in the prior work on this question is that N is substantially larger than both n and m, and thus we would like to have a linear dependency on the former. Under this assumption, the natural open problem is whether we can decrease the 1.5 exponent in the time complexity, similarly as in the related (but, to the best of our knowledge, not equivalent) word break problem [Backurs and Indyk, FOCS 2016]. Our starting point is a conditional lower bound for the EDSM problem. We use the popular combinatorial Boolean matrix multiplication (BMM) conjecture stating that there is no truly subcubic combinatorial algorithm for BMM [Abboud and Williams, FOCS 2014]. By designing an appropriate reduction, we show that a combinatorial algorithm solving the EDSM problem in O(nm1.5−ε+N) time, for any ε>0, refutes this conjecture. Our reduction should be understood as an indication that decreasing the exponent requires fast matrix multiplication. String periodicity and fast Fourier transform are two standard tools in string algorithms. Our main technical contribution is that we successfully combine these tools with fast matrix multiplication to design a noncombinatorial O~(nmω−1+N)-time algorithm for EDSM, where ω denotes the matrix multiplication exponent and the O~(⋅) notation suppresses polylog factors. To the best of our knowledge, we are the first to combine these tools. In particular, using the fact that ω<2.373 [Alman and Williams, SODA 2021; Le Gall, ISSAC 2014; Williams, STOC 2012], we obtain an O(nm1.373+N)-time algorithm for EDSM. An important building block in our solution that might find applications in other problems is a method of selecting a small set of length-l substrings of the pattern, called anchors, so that any occurrence of a string from an ED text set contains at least one but not too many (on average) such anchors inside.
DOI
10.1137/20M1368033
WOS
WOS:001130401900010
Archivio
https://hdl.handle.net/11368/3020913
info:eu-repo/semantics/altIdentifier/scopus/2-s2.0-85130612240
https://epubs.siam.org/doi/10.1137/20M1368033
Diritti
open access
license:creative commons
license:copyright editore
license uri:http://creativecommons.org/licenses/by/4.0/
license uri:iris.pri02
FVG url
https://arts.units.it/request-item?handle=11368/3020913
Soggetti
  • string algorithm

  • pattern matching

  • elastic-degenerate st...

  • matrix multiplication...

  • fast Fourier transfor...

google-scholar
Get Involved!
  • Source Code
  • Documentation
  • Slack Channel
Make it your own

DSpace-CRIS can be extensively configured to meet your needs. Decide which information need to be collected and available with fine-grained security. Start updating the theme to match your nstitution's web identity.

Need professional help?

The original creators of DSpace-CRIS at 4Science can take your project to the next level, get in touch!

Realizzato con Software DSpace-CRIS - Estensione mantenuta e ottimizzata da 4Science

  • Impostazioni dei cookie
  • Informativa sulla privacy
  • Accordo con l'utente finale
  • Invia il tuo Feedback