Logo del repository
  1. Home
 
Opzioni

LZ77 Computation Based on the Run-Length Encoded BWT

POLICRITI, Alberto
•
Prezza, Nicola
2017
  • journal article

Periodico
ALGORITHMICA
Abstract
Abstract ComputingtheLZ77factorizationisafundamentaltaskintextcompression and indexing, being the size z of this compressed representation closely related to the self-repetitiveness of the text. A long-standing problem is to compute LZ77 using small working space. Considering that O(z) words of space can be significantly (up to exponentially) smaller than the size n of the input text, even succinct and entropy- compressed solutions are often unduly memory demanding. In this work we focus on an important measure of text repetitiveness: the number r of equal-letter runs in the Burrows–Wheeler transform of the reversed input text. As z, the measure r is closely related to the number of repetitions in the text and can be exponentially smaller than n. We describe two algorithms computing LZ77 in O(r log n) bits of working space and O(n log r ) time. Roughly speaking, our algorithms store a constant number of memory words per BWT run to keep track of first-last run-positions and a suitable indexing mechanism to sample the runs of the BWT (instead of its positions). Important consequences of our results include (i) the possibility to convert from RLBWT- to LZ77-based compressed formats without first decompressing the text, and (ii) the existence of asymptotically-optimal construction algorithms for repetition-aware self- indexes based on these compression techniques. We finally describe an implementation of our solutions and present extensive experiments on highly repetitive datasets. Our algorithms use a working space as small as 1% of the dataset size and are two to three orders of magnitude more space-efficient (albeit slower) than existing solutions based, respectively, on entropy compression and suffix arrays.
DOI
10.1007/s00453-017-0327-z
WOS
WOS:000430979100002
Archivio
http://hdl.handle.net/11390/1119228
info:eu-repo/semantics/altIdentifier/scopus/2-s2.0-85024483176
Diritti
open access
Soggetti
  • LempelaĚ‚ Ziv factori...

  • Repetition-aware data...

  • Repetitive text colle...

  • Run-length encoded BW...

  • Computer Science (all...

  • Computer Science Appl...

  • Applied Mathematics

Scopus© citazioni
26
Data di acquisizione
Jun 7, 2022
Vedi dettagli
Web of Science© citazioni
18
Data di acquisizione
Mar 21, 2024
Visualizzazioni
2
Data di acquisizione
Apr 19, 2024
Vedi dettagli
google-scholar
Get Involved!
  • Source Code
  • Documentation
  • Slack Channel
Make it your own

DSpace-CRIS can be extensively configured to meet your needs. Decide which information need to be collected and available with fine-grained security. Start updating the theme to match your nstitution's web identity.

Need professional help?

The original creators of DSpace-CRIS at 4Science can take your project to the next level, get in touch!

Realizzato con Software DSpace-CRIS - Estensione mantenuta e ottimizzata da 4Science

  • Impostazioni dei cookie
  • Informativa sulla privacy
  • Accordo con l'utente finale
  • Invia il tuo Feedback