Results 11 - 20
of
427
Efficient Distribution-free Learning of Probabilistic Concepts
- Journal of Computer and System Sciences
, 1993
"... In this paper we investigate a new formal model of machine learning in which the concept (boolean function) to be learned may exhibit uncertain or probabilistic behavior---thus, the same input may sometimes be classified as a positive example and sometimes as a negative example. Such probabilistic c ..."
Abstract
-
Cited by 214 (8 self)
- Add to MetaCart
(Show Context)
In this paper we investigate a new formal model of machine learning in which the concept (boolean function) to be learned may exhibit uncertain or probabilistic behavior---thus, the same input may sometimes be classified as a positive example and sometimes as a negative example. Such probabilistic concepts (or p-concepts) may arise in situations such as weather prediction, where the measured variables and their accuracy are insufficient to determine the outcome with certainty. We adopt from the Valiant model of learning [27] the demands that learning algorithms be efficient and general in the sense that they perform well for a wide class of p-concepts and for any distribution over the domain. In addition to giving many efficient algorithms for learning natural classes of p-concepts, we study and develop in detail an underlying theory of learning p-concepts. 1 Introduction Consider the following scenarios: A meteorologist is attempting to predict tomorrow's weather as accurately as pos...
Word sense disambiguation: a survey
- ACM COMPUTING SURVEYS
, 2009
"... Word sense disambiguation (WSD) is the ability to identify the meaning of words in context in a computational manner. WSD is considered an AI-complete problem, that is, a task whose solution is at least as hard as the most difficult problems in artificial intelligence. We introduce the reader to the ..."
Abstract
-
Cited by 191 (16 self)
- Add to MetaCart
Word sense disambiguation (WSD) is the ability to identify the meaning of words in context in a computational manner. WSD is considered an AI-complete problem, that is, a task whose solution is at least as hard as the most difficult problems in artificial intelligence. We introduce the reader to the motivations for solving the ambiguity of words and provide a description of the task. We overview supervised, unsupervised, and knowledge-based approaches. The assessment of WSD systems is discussed in the context of the Senseval/Semeval campaigns, aiming at the objective evaluation of systems participating in several different disambiguation tasks. Finally, applications, open problems, and future directions are discussed.
Decision Lists For Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French
, 1994
"... This paper presents a statistical decision procedure for lexical ambiguity resolution. The algorithm exploits both local syntactic patterns and more distant collocational evidence, generating an efficient, effective, and highly perspicuous recipe for resolving a given ambiguity. By identifying and u ..."
Abstract
-
Cited by 191 (3 self)
- Add to MetaCart
This paper presents a statistical decision procedure for lexical ambiguity resolution. The algorithm exploits both local syntactic patterns and more distant collocational evidence, generating an efficient, effective, and highly perspicuous recipe for resolving a given ambiguity. By identifying and utilizing only the single best disambiguating evidence in a target context, the algorithm avoids the problematic complex modeling of statistical dependencies. Although directly applicable to a wide class of ambiguities, the algorithm is described and evaluated in a realistic case study, the problem of restoring missing accents in Spanish and French text. Current accuracy exceeds 99% on the full task, and typically is over 90% for even the most difficult ambiguities.
Stochastic Dynamic Programming with Factored Representations
, 1997
"... Markov decision processes(MDPs) have proven to be popular models for decision-theoretic planning, but standard dynamic programming algorithms for solving MDPs rely on explicit, state-based specifications and computations. To alleviate the combinatorial problems associated with such methods, we pro-p ..."
Abstract
-
Cited by 189 (10 self)
- Add to MetaCart
(Show Context)
Markov decision processes(MDPs) have proven to be popular models for decision-theoretic planning, but standard dynamic programming algorithms for solving MDPs rely on explicit, state-based specifications and computations. To alleviate the combinatorial problems associated with such methods, we pro-pose new representational and computational techniques for MDPs that exploit certain types of problem structure. We use dynamic Bayesian networks (with decision trees representing the local families of conditional probability distributions) to represent stochastic actions in an MDP, together with a decision-tree representation of rewards. Based on this representation, we develop versions of standard dynamic programming algorithms that directly manipulate decision-tree representations of policies and value functions. This generally obviates the need for state-by-state computation, aggregating states at the leaves of these trees and requiring computations only for each aggregate state. The key to these algorithms is a decision-theoretic generalization of classic regression analysis, in which we determine the features relevant to predicting expected value. We demonstrate the method empirically on several planning problems,
One sense per collocation
- In Proceedings of the ARPA Human Language Technology Workshop
, 1993
"... Previous work [Gale, Church and Yarowsky, 1992] showed that with high probability a polysemous word has one sense per discourse. In this paper we show that for certain definitions of collocation, a polysemous word exhibits essentially only one sense per collocation. We test this empirical hypothesis ..."
Abstract
-
Cited by 186 (6 self)
- Add to MetaCart
Previous work [Gale, Church and Yarowsky, 1992] showed that with high probability a polysemous word has one sense per discourse. In this paper we show that for certain definitions of collocation, a polysemous word exhibits essentially only one sense per collocation. We test this empirical hypothesis for several definitions of sense and collocation, and discover that it holds with 90-99 % accuracy for binary ambiguities. We utilize this property in a disambiguation algorithm that achieves precision of 92 % using combined models of very local context. 1.
Hierarchical Wrapper Induction for Semistructured Information Sources
- Journal of Autonomous Agents and Multi-Agent Systems
, 2001
"... With the tremendous amount of information that becomes available on the Web on a daily basis, the ability to quickly develop information agents has become a crucial problem. A vital component of any Webbased information agent is a set of wrappers that can extract the relevant data from semistructur ..."
Abstract
-
Cited by 185 (34 self)
- Add to MetaCart
With the tremendous amount of information that becomes available on the Web on a daily basis, the ability to quickly develop information agents has become a crucial problem. A vital component of any Webbased information agent is a set of wrappers that can extract the relevant data from semistructured information sources. Our novel approach to wrapper induction is based on the idea of hierarchical information extraction, which turns the hard problem of extracting data from an arbitrarily complex document into a series of simpler extraction tasks. We introduce an inductive algorithm, stalker, that generates high accuracy extraction rules based on userlabeled training examples. Labeling the training data represents the major bottleneck in using wrapper induction techniques, and our experimental results show that stalker requires up to two orders of magnitude fewer examples than other algorithms. Furthermore, stalker can wrap information sources that could not be wrapped by existing inductive techniques
Learning in the Presence of Malicious Errors
- SIAM Journal on Computing
, 1993
"... In this paper we study an extension of the distribution-free model of learning introduced by Valiant [23] (also known as the probably approximately correct or PAC model) that allows the presence of malicious errors in the examples given to a learning algorithm. Such errors are generated by an advers ..."
Abstract
-
Cited by 184 (12 self)
- Add to MetaCart
(Show Context)
In this paper we study an extension of the distribution-free model of learning introduced by Valiant [23] (also known as the probably approximately correct or PAC model) that allows the presence of malicious errors in the examples given to a learning algorithm. Such errors are generated by an adversary with unbounded computational power and access to the entire history of the learning algorithm's computation. Thus, we study a worst-case model of errors. Our results include general methods for bounding the rate of error tolerable by any learning algorithm, efficient algorithms tolerating nontrivial rates of malicious errors, and equivalences between problems of learning with errors and standard combinatorial optimization problems. 1 Introduction In this paper, we study a practical extension to Valiant's distribution-free model of learning: the presence of errors (possibly maliciously generated by an adversary) in the sample data. The distribution-free model typically makes the idealize...
The Independent Choice Logic for modelling multiple agents under uncertainty
- Artificial Intelligence
, 1997
"... Inspired by game theory representations, Bayesian networks, influence diagrams, structured Markov decision process models, logic programming, and work in dynamical systems, the independent choice logic (ICL) is a semantic framework that allows for independent choices (made by various agents, includi ..."
Abstract
-
Cited by 173 (10 self)
- Add to MetaCart
(Show Context)
Inspired by game theory representations, Bayesian networks, influence diagrams, structured Markov decision process models, logic programming, and work in dynamical systems, the independent choice logic (ICL) is a semantic framework that allows for independent choices (made by various agents, including nature) and a logic program that gives the consequence of choices. This representation can be used as a specification for agents that act in a world, make observations of that world and have memory, as well as a modelling tool for dynamic environments with uncertainty. The rules specify the consequences of an action, what can be sensed and the utility of outcomes. This paper presents a possible-worlds semantics for ICL, and shows how to embed influence diagrams, structured Markov decision processes, and both the strategic (normal) form and extensive (game-tree) form of games within the Thanks to Craig Boutilier and Holger Hoos for detailed comments on this paper. This work was supporte...
Learning to resolve natural language ambiguities: A unified approach
- In Proceedings of the National Conference on Artificial Intelligence. 806-813. Segond F., Schiller A., Grefenstette & Chanod F.P
, 1998
"... distinct semanticonceptsuch as interest rate and has interest in Math are conflated in ordinary text. We analyze a few of the commonly used statistics based The surrounding context- word associations and syn-and machine learning algorithms for natural language tactic patterns in this case- are suffl ..."
Abstract
-
Cited by 172 (78 self)
- Add to MetaCart
(Show Context)
distinct semanticonceptsuch as interest rate and has interest in Math are conflated in ordinary text. We analyze a few of the commonly used statistics based The surrounding context- word associations and syn-and machine learning algorithms for natural language tactic patterns in this case- are sufflcicnt to identify disambiguation tasks and observe tha they can bc recast as learning linear separators in the feature space. the correct form. Each of the methods makes a priori assumptions, which Many of these arc important stand-alone problems it employs, given the data, when searching for its hy- but even more important is thei role in many applicapothesis. Nevertheless, as we show, it searches a space tions including speech recognition, machine translation, that is as rich as the space of all linear separators. information extraction and intelligent human-machine We use this to build an argument for a data driven interaction. Most of the ambiguity resolution problems approach which merely searches for a good linear sepa- are at the lower level of the natural language inferences rator in the feature space, without further assumptions chain; a wide range and a large number of ambigui-