Results 1 -
4 of
4
A really simple approximation of smallest grammar
- IN PROC. 25TH ANNUAL SYMPOSIUM ON COMBINATORIAL PATTERN MATCHING (CPM), LNCS 8486
, 2014
"... In this paper we present a really simple linear-time algorithm constructing a context-free grammar of size O(g log(N/g)) for the input string, where N is the size of the input string and g the size of the optimal grammar generating this string. The algorithm works for arbitrary size alphabets, but ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
In this paper we present a really simple linear-time algorithm constructing a context-free grammar of size O(g log(N/g)) for the input string, where N is the size of the input string and g the size of the optimal grammar generating this string. The algorithm works for arbitrary size alphabets, but the running time is linear assuming that the alphabet Σ of the input string can be identified with numbers from {1,..., Nc} for some constant c. Algorithms with such an approximation guarantee and running time are known, however all of them were non-trivial and their analyses were involved. The here presented algorithm computes the LZ77 factorisation and transforms it in phases to a grammar. In each phase it maintains an LZ77-like factorisation of the word with at most ` factors as well as additional O(`) letters, where ` was the size of the original LZ77 factorisation. In one phase in a greedy way (by a left-to-right sweep and a help of the factorisation) we choose a set of pairs of consecutive letters to be replaced with new symbols, i.e. nonterminals of the constructed grammar. We choose at least 2/3 of the letters in the word and there are O(`) many different pairs among them. Hence there are O(logN) phases, each of them introduces O(`) nonterminals to a grammar. A more precise analysis yields a bound O( ` log(N/`)). As ` ≤ g, this yields the desired bound O(g log(N/g)).
Constructing Small Tree Grammars and Small Circuits for Formulas
, 2014
"... Abstract It is shown that every tree of size n over a fixed set of σ different ranked symbols can be decomposed into O( n log σ n ) = O( n log σ log n ) many hierarchically defined pieces. Formally, such a hierarchical decomposition has the form of a straight-line linear context-free tree grammar o ..."
Abstract
- Add to MetaCart
Abstract It is shown that every tree of size n over a fixed set of σ different ranked symbols can be decomposed into O( n log σ n ) = O( n log σ log n ) many hierarchically defined pieces. Formally, such a hierarchical decomposition has the form of a straight-line linear context-free tree grammar of size O( n log σ n ), which can be used as a compressed representation of the input tree. This generalizes an analogous result for strings. Previous grammar-based tree compressors were not analyzed for the worst-case size of the computed grammar, except for the top dag of Bille et al., for which only the weaker upper bound of O( n log 0.19 n ) for unranked and unlabelled trees has been derived. The main result is used to show that every arithmetical formula of size n, in which only m ≤ n different variables occur, can be transformed (in time O(n log n)) into an arithmetical circuit of size O( n·log m log n ) and depth O(log n). This refines a classical result of Brent, according to which an arithmetical formula of size n can be transformed into a logarithmic depth circuit of size O(n). Missing proofs can be found in the long version ACM Subject Classification E.4 Data compaction and compression Keywords and phrases grammar-based compression, tree compression, arithmetical circuits Introduction Grammar-based compression has emerged to an active field in string compression during the past 20 years. The idea is to represent a given string s by a small context-free grammar that generates only s; such a grammar is also called a straight-line program, briefly SLP. For instance, the word (ab) 1024 can be represented by the SLP with the productions A 0 → ab and A i → A i−1 A i−1 for 1 ≤ i ≤ 10 (A 10 is the start symbol). The size of this grammar is much smaller than the size (length) of the string (ab) 1024 . In general, an SLP of size n (the size of an SLP is usually defined as the total length of all right-hand sides of the productions) can produce a string of length 2 Ω(n) . Hence, an SLP can be seen indeed as a succinct representation of the generated string. The goal of grammar-based string compression is to construct from a given input string s a small SLP that produces s. Several algorithms for this have been proposed and analyzed. Prominent grammar-based string compressors are for instance LZ78, RePair, and BISECTION, see To evaluate the compression performance of a grammar-based compressor C, two different approaches can be found in the literature: A first approach is to analyze the size of the SLP produced by C for an input string x compared to the size of a smallest SLP for x. This leads to the approximation ratio for C, see