Results 1  10
of
27
OneUnambiguous Regular Languages
 Information and computation
, 1997
"... The ISO standard for the Standard Generalized Markup Language (SGML) provides a syntactic metalanguage for the definition of textual markup systems. In the standard, the righthand sides of productions are based on regular expressions, although only regular expressions that denote words unambigu ..."
Abstract

Cited by 131 (9 self)
 Add to MetaCart
The ISO standard for the Standard Generalized Markup Language (SGML) provides a syntactic metalanguage for the definition of textual markup systems. In the standard, the righthand sides of productions are based on regular expressions, although only regular expressions that denote words unambiguously, in the sense of the ISO standard, are allowed. In general, a word that is denoted by a regular expression is witnessed by a sequence of occurrences of symbols in the regular expression that match the word. In an unambiguous regular expression as defined by Book, Even, Greibach, and Ott, each word has at most one witness. But the SGML standard also requires that a witness be computed incrementally from the word with a onesymbol lookahead; we call such regular expressions 1unambiguous. A regular language is a 1unambiguous language if it is denoted by some 1unambiguous regular expression. We give a Kleene theorem for 1unambiguous languages and characterize 1unambiguous regu...
Canonical derivatives, partial derivatives and finite automaton constructions
 Theor. Comput. Sci
"... Let E be a regular expression. Our aim is to establish a theoretical relation between two wellknown automata recognizing the language of E, namely the position automaton PE constructed by Glushkov or McNaughton and Yamada, and the equation automaton EE constructed by Mirkin or Antimirov. We define ..."
Abstract

Cited by 31 (4 self)
 Add to MetaCart
(Show Context)
Let E be a regular expression. Our aim is to establish a theoretical relation between two wellknown automata recognizing the language of E, namely the position automaton PE constructed by Glushkov or McNaughton and Yamada, and the equation automaton EE constructed by Mirkin or Antimirov. We define the notion of cderivative (for canonical derivative) of a regular expression E and show that if E is linear then two Brzozowski’s derivatives of E are acisimilar if and only if the corresponding cderivatives are identical. It allows us to represent the BerrySethi’s set of continuations of a position by a unique cderivative, called the ccontinuation of the position. Hence the definition of CE, the ccontinuation automaton of E, whose states are pairs made of a position of E and of the associated ccontinuation. If states are viewed as positions, CE is isomorphic to PE. On the other hand, a partial derivative, as defined by Antimirov, is a class of cderivatives for some equivalence relation, thus CE reduces to EE. Finally CE makes it possible to go from PE to EE, while this cannot be achieved directly (from the state graphs). These theoretical results lead to an O(E  2) space and time algorithm to compute the equation automaton, where E  is the size of the expression. This is the complexity of the most efficient constructions yielding the position automaton, while the size of the equation automaton is not greater and generally much smaller than the size of the position automaton.
Standard Generalized Markup Language: Mathematical and Philosophical Issues
 Computer Science Today. Recent Trends and Developments
, 1995
"... . The Standard Generalized Markup Language (SGML), an ISO standard, has become the accepted method of defining markup conventions for text files. SGML is a metalanguage for defining grammars for textual markup in much the same way that BackusNaur Form is a metalanguage for defining programming ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
. The Standard Generalized Markup Language (SGML), an ISO standard, has become the accepted method of defining markup conventions for text files. SGML is a metalanguage for defining grammars for textual markup in much the same way that BackusNaur Form is a metalanguage for defining programminglanguage grammars. Indeed, HTML, the method of marking up a hypertext documents for the World Wide Web, is an SGML grammar. The underlying assumptions of the SGML initiative are that a logical structure of a document can be identified and that it can be indicated by the insertion of labeled matching brackets (start and end tags). Moreover, it is assumed that the nesting relationships of these tags can be described with an extended contextfree grammar (the righthand sides of productions are regular expressions). In this survey of some of the issues raised by the SGML initiative, I reexamine the underlying assumptions and address some of the theoretical questions that SGML raises....
The Validation of SGML Content Models
 MATHEMATICAL AND COMPUTER MODELLING
, 1997
"... The Standard Generalized Markup Language (SGML) is an ISO standard that provides a syntactic metalanguage for the definition of textual markup systems, which are used to indicate the structure of documents so that they can be electronically typeset, searched, and communicated. We address only on ..."
Abstract

Cited by 14 (8 self)
 Add to MetaCart
The Standard Generalized Markup Language (SGML) is an ISO standard that provides a syntactic metalanguage for the definition of textual markup systems, which are used to indicate the structure of documents so that they can be electronically typeset, searched, and communicated. We address only one problem raised by the standard, namely: In SGML, the righthand sides of contextfree productions are regular expressions, called content models, that are restricted to be what the standard calls "unambiguous," but what is more appropriately called deterministic. We solve the problem of how to define determinism precisely, how to recognize deterministic regular expressions efficiently, and how to recognize deterministic regular languages. Any SGML parser must check that a given document grammar conforms to the standard; that is, it must validate it. Hence, our results are an important step in the clarification of the standard and in the efficient implementation of an SGML parser fo...
On the average state complexity of partial derivative automata
 International Journal of Foundations of Computer Science
, 2011
"... The partial derivative automaton (Apd) is usually smaller than other nondeterministic finite automata constructed from a regular expression, and it can be seen as a quotient of the Glushkov automaton (Apos). By estimating the number of regular expressions that have ε as a partial derivative, we comp ..."
Abstract

Cited by 6 (6 self)
 Add to MetaCart
(Show Context)
The partial derivative automaton (Apd) is usually smaller than other nondeterministic finite automata constructed from a regular expression, and it can be seen as a quotient of the Glushkov automaton (Apos). By estimating the number of regular expressions that have ε as a partial derivative, we compute a lower bound of the average number of mergings of states in Apos and describe its asymptotic behaviour. This depends on the alphabet size, k, and for growing k’s its limit approaches half the number of states in Apos. The lower bound corresponds to consider the Apd automaton for the marked version ofthe regularexpression, i.e.where allitsletters are made different. Experimental results suggest that the average number of states of this automaton, and of the Apd automaton for the unmarked regular expression, are very close to each other.
From CContinuations to New Quadratic Algorithms For Automaton Synthesis
"... Two classical nondeterministic automata recognize the language denoted by a regular expression: the position automaton which deduces from the position sets defined by Glushkov and McNaughtonYamada, and the equation automaton which can be computed via Mirkin’s prebases or Antimirov’s partial deriva ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
Two classical nondeterministic automata recognize the language denoted by a regular expression: the position automaton which deduces from the position sets defined by Glushkov and McNaughtonYamada, and the equation automaton which can be computed via Mirkin’s prebases or Antimirov’s partial derivatives. Let E  be the size of the expression and �E � be its alphabetic width, i.e. the number of symbol occurrences. The number of states in the equation automaton is less than or equal to the number of states in the position automaton, which is equal to �E�+1. On the other hand, the worstcase time complexity of Antimirov algorithm is O(�E � 3 · E  2), while it is only O(�E � · E) for the most efficient implementations yielding the position automaton (BrüggemannKlein, Chang and Paige, Champarnaud et al.). We present an O(E  2) space and time algorithm to compute the equation automaton. It is based on the notion of canonical derivative which makes it possible to efficiently handle sets of word derivatives. By the way, canonical derivatives also lead to a new O(E  2) space and time algorithm to construct the position automaton. 1
A Characterization of Thompson Digraphs
 Discrete Applied Mathematics
, 1999
"... A finitestate machine is called a Thompson machine if it can be constructed from a regular expression using Thompson's construction. We call the underlying digraph of a Thompson machine a Thompson digraph. We characterize Thompson digraphs and, as one application of the characterization, we gi ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
A finitestate machine is called a Thompson machine if it can be constructed from a regular expression using Thompson's construction. We call the underlying digraph of a Thompson machine a Thompson digraph. We characterize Thompson digraphs and, as one application of the characterization, we give an algorithm that generates an equivalent regular expression from a Thompson machine in time linear in the number of states. Although the construction is simple, it is novel in that the usual constructions of equivalent regular expressions from finitestate machines produce regular expressions that have size exponential in the size of the given machine, in the worst case. The construction provides a first step in the construction of small expressions from finitestate machines. 1 Introduction In 1968, Thompson [8] introduced his inductive construction of a finitestate machine from a regular expression. Thompson's construction is elegant and efficient. Although Kleene [5] gave an inductive con...
2 KAT and Hoare Logic with Derivatives ∗
, 2013
"... Kleene algebra with tests (KAT) is an equational system for program verification, which is the combination of Boolean algebra (BA) and Kleene algebra (KA), the algebra of regular expressions. In particular, KAT subsumes the propositional fragment of Hoare logic (PHL) which is a formal system for the ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Kleene algebra with tests (KAT) is an equational system for program verification, which is the combination of Boolean algebra (BA) and Kleene algebra (KA), the algebra of regular expressions. In particular, KAT subsumes the propositional fragment of Hoare logic (PHL) which is a formal system for the specification and verification of programs, and that is currently the base of most tools for checking program correctness. Both the equational theory of KAT and the encoding of PHL in KAT are known to be decidable. In this paper we present a new decision procedure for the equivalence of two KAT expressions based on the notion of partial derivatives. We also introduce the notion of derivative modulo particular sets of equations. With this we extend the previous procedure for deciding PHL. Some experimental results are also presented. 1
On the average number of states of partial derivative automata
"... Abstract. The partial derivative automaton (Apd) is usually smaller than other nondeterministic finite automata constructed from a regular expression, and it can be seen as a quotient of the Glushkov automaton (Apos). By estimating the number of regular expressions that have ε as a partial derivati ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract. The partial derivative automaton (Apd) is usually smaller than other nondeterministic finite automata constructed from a regular expression, and it can be seen as a quotient of the Glushkov automaton (Apos). By estimating the number of regular expressions that have ε as a partial derivative, we compute a lower bound of the average number of mergings of states in Apos and describe its asymptotic behaviour. This depends on the alphabet size, k, and its limit, as k goes to infinity, is 1 2. The lower bound corresponds exactly to consider the Apd automaton for the marked version of the regular expression, i.e. where all its letters are made different. Experimental results suggest that the average number of states of this automaton, and of the Apd automaton for the unmarked regular expression, are very close to each other. 1