Results 1 - 10
of
2,526
Head-Driven Statistical Models for Natural Language Parsing
, 2003
"... This article describes three statistical models for natural language parsing. The models extend methods from probabilistic context-free grammars to lexicalized grammars, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down ..."
Abstract
-
Cited by 780 (13 self)
- Add to MetaCart
This article describes three statistical models for natural language parsing. The models extend methods from probabilistic context-free grammars to lexicalized grammars, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree. Independence assumptions then lead to parameters that encode the X-bar schema, subcategorization, ordering of complements, placement of adjuncts, bigram lexical dependencies, wh-movement, and preferences for close attachment. All of these preferences are expressed by probabilities conditioned on lexical heads. The models are evaluated on the Penn Wall Street Journal Treebank, showing that their accuracy is competitive with other models in the literature. To gain a better understanding of the models, we also give results on different constituent types, as well as a breakdown of precision/recall results in recovering various types of dependencies. We analyze various characteristics of the models through experiments on parsing accuracy, by collecting frequencies of various structures in the treebank, and through linguistically motivated examples. Finally, we compare the models to others that have been applied to parsing the treebank, aiming to give some explanation of the difference in performance of the various models
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases
, 1997
"... In semistructured databases there is no schema fixed in advance. To provide the benefits of a schema in such environments, we introduce DataGuides: concise and accurate structural summaries of semistructured databases. DataGuides serve as dynamic schemas, generated from the database; they are ..."
Abstract
-
Cited by 459 (14 self)
- Add to MetaCart
In semistructured databases there is no schema fixed in advance. To provide the benefits of a schema in such environments, we introduce DataGuides: concise and accurate structural summaries of semistructured databases. DataGuides serve as dynamic schemas, generated from the database; they are useful for browsing database structure, formulating queries, storing information such as statistics and sample values, and enabling query optimization. This paper presents the theoretical foundations of DataGuides along with an algorithm for their creation and an overview of incremental maintenance. We provide performance results based on our implementation of DataGuides in the Lore DBMS for semistructured data. We also describe the use of DataGuides in Lore, both in the user interface to enable structure browsing and query formulation, and as a means of guiding the query processor and optimizing query execution.
Parsing English with a Link Grammar
, 1991
"... We define a new formal grammatical system called a link grammar . A sequence of words is in the language of a link grammar if there is a way to draw links between words in such a way that (1) the local requirements of each word are satisfied, (2) the links do not cross, and (3) the words form a conn ..."
Abstract
-
Cited by 313 (2 self)
- Add to MetaCart
We define a new formal grammatical system called a link grammar . A sequence of words is in the language of a link grammar if there is a way to draw links between words in such a way that (1) the local requirements of each word are satisfied, (2) the links do not cross, and (3) the words form a connected graph. We have encoded English grammar into such a system, and written a program (based on new algorithms) for efficiently parsing with a link grammar. The formalism is lexical and makes no explicit use of constituents and categories. The breadth of English phenomena that our system handles is quite large. A number of sophisticated and new techniques were used to allow efficient parsing of this very complex grammar. Our program is written in C, and the entire system may be obtained via anonymous ftp. Several other researchers have begun to use link grammars in their own research. 1 Introduction Most sentences of most natural languages have the property that if arcs are drawn connecti...
A Guided Tour to Approximate String Matching
- ACM Computing Surveys
, 1999
"... We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining t ..."
Abstract
-
Cited by 306 (38 self)
- Add to MetaCart
We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms and their complexities. We present a number of experiments to compare the performance of the different algorithms and show which are the best choices according to each case. We conclude with some future work directions and open problems. 1
Regular models of phonological rule systems." Paper presented to
- Oxford University
, 1988
"... This paper presents a set of mathematical and computational tools for manipulating and reasoning about regular languages and regular relations and argues that they provide a solid basis for computational phonology. It shows in detail how this framework applies to ordered sets of context-sensitive re ..."
Abstract
-
Cited by 290 (4 self)
- Add to MetaCart
This paper presents a set of mathematical and computational tools for manipulating and reasoning about regular languages and regular relations and argues that they provide a solid basis for computational phonology. It shows in detail how this framework applies to ordered sets of context-sensitive rewriting rules and also to grammars in Koskenniemi's two-level formalism. This analysis provides a common representation of phonological constraints that supports efficient generation and recognition by a single simple interpreter. 1.
Convolution Kernels on Discrete Structures
, 1999
"... We introduce a new method of constructing kernels on sets whose elements are discrete structures like strings, trees and graphs. The method can be applied iteratively to build a kernel on an infinite set from kernels involving generators of the set. The family of kernels generated generalizes the fa ..."
Abstract
-
Cited by 281 (0 self)
- Add to MetaCart
We introduce a new method of constructing kernels on sets whose elements are discrete structures like strings, trees and graphs. The method can be applied iteratively to build a kernel on an infinite set from kernels involving generators of the set. The family of kernels generated generalizes the family of radial basis kernels. It can also be used to define kernels in the form of joint Gibbs probability distributions. Kernels can be built from hidden Markov random elds, generalized regular expressions, pair-HMMs, or ANOVA decompositions. Uses of the method lead to open problems involving the theory of infinitely divisible positive definite functions. Fundamentals of this theory and the theory of reproducing kernel Hilbert spaces are reviewed and applied in establishing the validity of the method.
Model Checking and Modular Verification
- ACM Transactions on Programming Languages and Systems
, 1991
"... We describe a framework for compositional verification of finite state processes. The framework is based on two ideas: a subset of the logic CTL for which satisfaction is preserved under composition; and a preorder on structures which captures the relation between a component and a system containing ..."
Abstract
-
Cited by 242 (11 self)
- Add to MetaCart
We describe a framework for compositional verification of finite state processes. The framework is based on two ideas: a subset of the logic CTL for which satisfaction is preserved under composition; and a preorder on structures which captures the relation between a component and a system containing the component. Satisfaction of a formula in the logic corresponds to being below a particular structure (a tableau for the formula) in the preorder. We show how to do assume-guarantee style reasoning within this framework. In addition, we demonstrate efficient methods for model checking in the logic and for checking the preorder in several special cases. We have implemented a system based on these methods, and we use it to give a compositional verification of a CPU controller. 1 Introduction Temporal logic model checking procedures are useful tools for the verification of finite state systems [3, 12, 20]. However, these procedures have traditionally suffered from the state explosion proble...
Combating web spam with trustrank
- In VLDB
, 2004
"... Web spam pages use various techniques to achieve higher-than-deserved rankings in a search engine’s results. While human experts can identify spam, it is too expensive to manually evaluate a large number of pages. Instead, we propose techniques to semi-automatically separate reputable, good pages fr ..."
Abstract
-
Cited by 220 (2 self)
- Add to MetaCart
Web spam pages use various techniques to achieve higher-than-deserved rankings in a search engine’s results. While human experts can identify spam, it is too expensive to manually evaluate a large number of pages. Instead, we propose techniques to semi-automatically separate reputable, good pages from spam. We first select a small set of seed pages to be evaluated by an expert. Once we manually identify the reputable seed pages, we use the link structure of the web to discover other pages that are likely to be good. In this paper we discuss possible ways to implement the seed selection and the discovery of good pages. We present results of experiments run on the World Wide Web indexed by AltaVista and evaluate the performance of our techniques. Our results show that we can effectively filter out spam from a significant fraction of the web, based on a good seed set of less than 200 sites. 1
A Really Temporal Logic
- Journal of the ACM
, 1989
"... . We introduce a temporal logic for the specification of real-time systems. Our logic, TPTL, employs a novel quantifier construct for referencing time: the freeze quantifier binds a variable to the time of the local temporal context. TPTL is both a natural language for specification and a suitable f ..."
Abstract
-
Cited by 213 (26 self)
- Add to MetaCart
. We introduce a temporal logic for the specification of real-time systems. Our logic, TPTL, employs a novel quantifier construct for referencing time: the freeze quantifier binds a variable to the time of the local temporal context. TPTL is both a natural language for specification and a suitable formalism for verification. We present a tableau-based decision procedure and a model checking algorithm for TPTL. Several generalizations of TPTL are shown to be highly undecidable. 1 Introduction Linear temporal logic is a widely accepted language for specifying properties of reactive systems and their behavior over time [Pnu77, OL82, MP92]. The tableau-based satisfiability algorithm for its propositional version, PTL, forms the basis for the automatic verification and synthesis of finite-state systems [LP84, MW84]. PTL is interpreted over models that abstract away from the actual times at which events occur, retaining only temporal ordering information about the states of a system. The a...
HTN planning: Complexity and expressivity
- In AAAI-94
, 1994
"... Most practical work on AI planning systems during the last fteen years has been based on hierarchical task network (HTN) decomposition, but until now, there has been very little analytical work on the properties of HTN planners. This paper describes how the complexity of HTN planning varies with var ..."
Abstract
-
Cited by 209 (13 self)
- Add to MetaCart
Most practical work on AI planning systems during the last fteen years has been based on hierarchical task network (HTN) decomposition, but until now, there has been very little analytical work on the properties of HTN planners. This paper describes how the complexity of HTN planning varies with various conditions on the task networks.

