• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

DMCA

Compressed full-text indexes (2007)

Cached

  • Download as a PDF

Download Links

  • [www.dcc.uchile.cl]
  • [swp.dcc.uchile.cl]
  • [www.dcc.uchile.cl]
  • [swp.dcc.uchile.cl]
  • [www.dcc.uchile.cl]
  • [www.dcc.uchile.cl]
  • [www.cs.helsinki.fi]
  • [www.cs.helsinki.fi]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Gonzalo Navarro , Veli Mäkinen
Venue:ACM COMPUTING SURVEYS
Citations:263 - 94 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@ARTICLE{Navarro07compressedfull-text,
    author = {Gonzalo Navarro and Veli Mäkinen},
    title = { Compressed full-text indexes},
    journal = {ACM COMPUTING SURVEYS},
    year = {2007}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

Full-text indexes provide fast substring search over large text collections. A serious problem of these indexes has traditionally been their space consumption. A recent trend is to develop indexes that exploit the compressibility of the text, so that their size is a function of the compressed text length. This concept has evolved into self-indexes, which in addition contain enough information to reproduce any text portion, so they replace the text. The exciting possibility of an index that takes space close to that of the compressed text, replaces it, and in addition provides fast search over it, has triggered a wealth of activity and produced surprising results in a very short time, and radically changed the status of this area in less than five years. The most successful indexes nowadays are able to obtain almost optimal space and search time simultaneously. In this paper we present the main concepts underlying self-indexes. We explain the relationship between text entropy and regularities that show up in index structures and permit compressing them. Then we cover the most relevant self-indexes up to date, focusing on the essential aspects on how they exploit the text compressibility and how they solve efficiently various search problems. We aim at giving the theoretical background to understand and follow the developments in this area.

Keyphrases

full-text index    essential aspect    main concept    serious problem    relevant self-indexes    short time    fast search    large text collection    search time    successful index nowadays    recent trend    text compressibility    compressed text length    exciting possibility    index structure    compressed text    space consumption    theoretical background    text portion    addition contain enough information    surprising result    text entropy    various search problem    optimal space   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University