Results 1 
3 of
3
The Dyck language edit distance problem in nearlinear time. FOCS
, 2014
"... Abstract Given a string σ over alphabet Σ and a grammar G defined over the same alphabet, how many minimum number of repairs (insertions, deletions and substitutions) are required to map σ into a valid member of G? The seminal work of Aho and Peterson in 1972 initiated the study of this language ed ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract Given a string σ over alphabet Σ and a grammar G defined over the same alphabet, how many minimum number of repairs (insertions, deletions and substitutions) are required to map σ into a valid member of G? The seminal work of Aho and Peterson in 1972 initiated the study of this language edit distance problem providing a dynamic programming algorithm for context free languages that runs in O(G 2 n 3 ) time, where n is the string length and G is the grammar size. While later improvements reduced the running time to O(Gn 3 ), the cubic running time on the input length held a major bottleneck for applying these algorithms to their multitude of applications. In this paper, we study the language edit distance problem for a fundamental context free language, DYCK(s) representing the language of wellbalanced parentheses of s different types, that has been pivotal in the development of formal language theory. We provide the very first nearlinear time algorithm to tightly approximate the DYCK(s) language edit distance problem for any arbitrary s. DYCK(s) language edit distance significantly generalizes the wellstudied string edit distance problem, and appears in most applications of language edit distance ranging from data quality in databases, generating automated errorcorrecting parsers in compiler optimization to structure prediction problems in biological sequences. Its nondeterministic counterpart is known as the hardest context free language. Our main result is an algorithm for edit distance computation to DYCK(s) for any positive integer s that runs in O(n 1+ polylog(n)) time and achieves an approximation factor of O( 1 β(n) log OP T ), for any > 0. Here OP T is the optimal edit distance to DYCK(s) and β(n) is the best approximation factor known for the simpler problem of string edit distance running in analogous time. If we allow O(n 1+ + OP T  2 n ) time, then the approximation factor can be reduced to O( 1 log OP T ). Since the best known nearlinear time algorithm for the string edit distance problem has β(n) = polylog(n), under nearlinear time computation model both DYCK(s) language and string edit distance problems have polylog(n) approximation factors. This comes as a surprise since the former is a significant generalization of the latter and their exact computations via dynamic programming show a stark difference in time complexity. Rather less surprisingly, we show that the framework for efficiently approximating edit distance to DYCK(s) can be utilized for many other languages. We illustrate this by considering various memory checking languages (studied extensively under distributed verification) such as STACK, QUEUE, PQ and DEQUE which comprise of valid transcripts of stacks, queues, priority queues and doubleended queues respectively. Therefore, any language that can be recognized by these data structures, can also be repaired efficiently by our algorithm.
Faster Language Edit Distance, Connection to Allpairs Shortest Paths and Related Problems
"... Given a context free language L(G) over alphabet Σ and a string s ∈ Σ∗, the language edit distance problem seeks the minimum number of edits (insertions, deletions and substitutions) required to convert s into a valid member of L(G). The wellknown dynamic programming algorithm solves this problem i ..."
Abstract
 Add to MetaCart
Given a context free language L(G) over alphabet Σ and a string s ∈ Σ∗, the language edit distance problem seeks the minimum number of edits (insertions, deletions and substitutions) required to convert s into a valid member of L(G). The wellknown dynamic programming algorithm solves this problem in O(n3) time (ignoring grammar size) where n is the string length [Aho, Peterson 1972, Myers 1985]. Despite its numerous applications in data management, machine learning, compiler optimization, computational biology, computer vision and linguistics, there is no algorithm known till date that computes or approximates language edit distance problem in true subcubic time. In this paper we give the first such algorithm that computes language edit distance almost optimally. For any arbitrary > 0, our algorithm runs in Õ ( n ω poly() ) time and returns an estimate within a multiplicative approximation factor of (1 + ) with high probability, where ω is the exponent of ordinary matrix multiplication of n dimensional square matrices. It also computes the edit script. We further solve the local alignment problem; for all substrings of s, we can estimate their language edit distance