Results 1 
8 of
8
Algorithmics on SLPcompressed strings: a survey,
 Groups Complex. Cryptol.
, 2012
"... Abstract Results on algorithmic problems on strings that are given in a compressed form via straightline programs are surveyed. A straightline program is a contextfree grammar that generates exactly one string. In this way, exponential compression rates can be achieved. Among others, we study pat ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
(Show Context)
Abstract Results on algorithmic problems on strings that are given in a compressed form via straightline programs are surveyed. A straightline program is a contextfree grammar that generates exactly one string. In this way, exponential compression rates can be achieved. Among others, we study pattern matching for compressed strings, membership problems for compressed strings in various kinds of formal languages, and the problem of querying compressed strings. Applications in combinatorial group theory and computational topology and to the solution of word equations are discussed as well. Finally, extensions to compressed trees and pictures are considered.
Practical Compressed String DictionariesI
"... The need to store and query a set of strings – a string dictionary – arises in many kinds of applications. While classically these string dictionaries have accounted for a small share of the total space budget (e.g., in Natural Language Processing or when indexing text collections), recent applicati ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
The need to store and query a set of strings – a string dictionary – arises in many kinds of applications. While classically these string dictionaries have accounted for a small share of the total space budget (e.g., in Natural Language Processing or when indexing text collections), recent applications in Web engines, Semantic Web (RDF) graphs, Bioinformatics, and many others, handle very large string dictionaries, whose size is a significant fraction of the whole data. In these cases, string dictionary management is a scalability issue by itself. This paper focuses on the problem of managing large static string dictionaries in compressed main memory space. We revisit classical solutions for string dictionaries like hashing, tries, and frontcoding, and improve them by using compression techniques. We also introduce some novel string dictionary representations built on top of recent advances in succinct data structures and fulltext indexes. All these structures are empirically compared on a heterogeneous testbed formed by realworld string dictionaries. We show that the compressed representations may use as little as 5 % of the original dictionary size, while supporting lookup operations within a few microseconds. These numbers outperform the stateoftheart space/time tradeoffs in many cases. Furthermore, we enhance some representations to provide prefix and substringbased searches, which also perform competitively. The results show that compressed string dictionaries are
Grammar Compression: Grammatical Inference by Compression and Its Application to Real Data
"... A grammatical inference algorithm tries to find as a small grammar as possible representing a potentially infinite sequence of strings. Here, let us consider a simple restriction: the input is a finite sequence or it might be a singleton set. Then the restricted problem is called the grammar compres ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
A grammatical inference algorithm tries to find as a small grammar as possible representing a potentially infinite sequence of strings. Here, let us consider a simple restriction: the input is a finite sequence or it might be a singleton set. Then the restricted problem is called the grammar compression to find the smallest CFG generating just the input. In the last decade many researchers have tackled this problem because of its scalable applications, e.g., expansion of data storage capacity, speedingup information retrieval, DNA sequencing, frequent pattern mining, and similarity search. We would review the history of grammar compression and its wide applications together with an important future work. The study of grammar compression has begun with the bad news: the smallest CFG problem is NPhard. Hence, the first question is: Can we get a nearoptimal solution in a polynomial time? (Is there a reasonable approximation algorithm?) And the next question is: Can we minimize the costs of time and space? (Does a linear time algorithm exist within an optimal working space?) The recent results produced by the research community answer affirmatively the questions. We introduce several important results and typical applications to a huge text collection. On the other hand, the shrinkage of the advantage of grammar compression is caused by the data explosion, since there is no working space for storing the whole data supplied from data stream. The last question is: How can we handle the stream data? For this question, we propose the framework of stream grammar compression for the next generation and its attractive application to fast data transmission. 1.
D.: General document retrieval in compact space
 ACM Journal of Experimental Algorithmics
"... ..."
Estructuras de datos sucintas para Recuperación De Documentos
, 2012
"... La recuperación de documentos consiste en, dada una colección de documentos y un patrón de consulta, obtener los documentos más relevantes para la consulta. Cuando los documentos están disponibles con anterioridad a las consultas, es posible construir un índice que permita, al momento de realizar la ..."
Abstract
 Add to MetaCart
La recuperación de documentos consiste en, dada una colección de documentos y un patrón de consulta, obtener los documentos más relevantes para la consulta. Cuando los documentos están disponibles con anterioridad a las consultas, es posible construir un índice que permita, al momento de realizar las consultas, obtener documentos relevantes en tiempo razonable. Contar con índices que resuelvan un problema como éste es fundamental en áreas como recuperación de la información, minería de datos y bioinformática, entre otros. Cuando el texto que se indexa es lenguaje natural, la solución paradigmática corresponde al índice invertido. Sin embargo, los problemas de recuperación de documentos emergen también en escenarios en que el texto y los patrones de consulta pueden ser secuencias generales de caracteres, como lenguajes orientales, bases de datos multimedia, secuencias genómicas, etc. En estos escenarios los índices invertidos clásicos no se aplican con el mismo éxito. Si bien existen soluciones que requieren espacio lineal en este escenario de texto general, el espacio que utilizan es un problema importante: estas soluciones pueden utilizar más de 20 veces el espacio de la colección. Esta tesis presenta nuevos algoritmos y estructuras de datos para resolver algunos problemas
A Faster GrammarBased SelfIndex
"... To store and search genomic databases efficiently, researchers have recently started building compressed selfindexes based on grammars. In this paper we show how, given a straightline program with r rules for a string S[1..n] whose LZ77 parse consists of z phrases, we can store a selfindex for S ..."
Abstract
 Add to MetaCart
(Show Context)
To store and search genomic databases efficiently, researchers have recently started building compressed selfindexes based on grammars. In this paper we show how, given a straightline program with r rules for a string S[1..n] whose LZ77 parse consists of z phrases, we can store a selfindex for S in O(r + z log log n) space such that, given a pattern P [1..m], we can list the occ occurrences of P in S in O(m2 + occ log log n) time. If the straightline program is balanced and we accept a small probability of building a faulty index, then we can reduce the O(m2) term to O(m logm). All previous selfindexes are larger or slower in the worst case.