Results 1 - 10
of
147
Reinforcement learning: a survey
- Journal of Artificial Intelligence Research
, 1996
"... This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem ..."
Abstract
-
Cited by 1134 (21 self)
- Add to MetaCart
This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word "reinforcement." The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.
Locally weighted learning
- Artificial Intelligence Review
, 1997
"... This paper surveys locally weighted learning, a form of lazy learning and memorybased learning, and focuses on locally weighted linear regression. The survey discusses distance functions, smoothing parameters, weighting functions, local model structures, regularization of the estimates and bias, ass ..."
Abstract
-
Cited by 370 (43 self)
- Add to MetaCart
This paper surveys locally weighted learning, a form of lazy learning and memorybased learning, and focuses on locally weighted linear regression. The survey discusses distance functions, smoothing parameters, weighting functions, local model structures, regularization of the estimates and bias, assessing predictions, handling noisy data and outliers, improving the quality of predictions by tuning t parameters, interference between old and new data, implementing locally weighted learning e ciently, and applications of locally weighted learning. A companion paper surveys how locally weighted learning can be used in robot learning and control.
Generalization in Reinforcement Learning: Safely Approximating the Value Function
- Advances in Neural Information Processing Systems 7
, 1995
"... To appear in: G. Tesauro, D. S. Touretzky and T. K. Leen, eds., Advances in Neural Information Processing Systems 7, MIT Press, Cambridge MA, 1995. A straightforward approach to the curse of dimensionality in reinforcement learning and dynamic programming is to replace the lookup table with a genera ..."
Abstract
-
Cited by 224 (3 self)
- Add to MetaCart
To appear in: G. Tesauro, D. S. Touretzky and T. K. Leen, eds., Advances in Neural Information Processing Systems 7, MIT Press, Cambridge MA, 1995. A straightforward approach to the curse of dimensionality in reinforcement learning and dynamic programming is to replace the lookup table with a generalizing function approximator such as a neural net. Although this has been successful in the domain of backgammon, there is no guarantee of convergence. In this paper, we show that the combination of dynamic programming and function approximation is not robust, and in even very benign cases, may produce an entirely wrong policy. We then introduce Grow-Support, a new algorithm which is safe from divergence yet can still reap the benefits of successful generalization. 1 INTRODUCTION Reinforcement learning---the problem of getting an agent to learn to act from sparse, delayed rewards---has been advanced by techniques based on dynamic programming (DP). These algorithms compute a value function ...
Constructive Incremental Learning from Only Local Information
, 1998
"... ... This article illustrates the potential learning capabilities of purely local learning and offers an interesting and powerful approach to learning with receptive fields. ..."
Abstract
-
Cited by 126 (35 self)
- Add to MetaCart
... This article illustrates the potential learning capabilities of purely local learning and offers an interesting and powerful approach to learning with receptive fields.
Efficient algorithms for minimizing cross validation error
- In Proceedings of the Eleventh International Conference on Machine Learning
, 1994
"... Model selection is important in many areas of supervised learning. Given a dataset and a set of models for predicting with that dataset, we must choose the model which is expected to best predict future data. In some situations, such as online learning for control of robots or factories, data is che ..."
Abstract
-
Cited by 117 (6 self)
- Add to MetaCart
Model selection is important in many areas of supervised learning. Given a dataset and a set of models for predicting with that dataset, we must choose the model which is expected to best predict future data. In some situations, such as online learning for control of robots or factories, data is cheap and human expertise costly. Cross validation can then be a highly effective method for automatic model selection. Large scale cross validation search can, however, be computationally expensive. This paper introduces new algorithms to reduce the computational burden of such searches. We show how experimental design methods can achieve this, using a technique similar to a Bayesian version of Kaelbling’s Interval Estimation. Several improvements are then given, including (1) the use of blocking to quickly spot near-identical models, and (2) schemata search: a new method for quickly finding families of relevant features. Experiments are presented for robot data and noisy synthetic datasets. The new algorithms speed up computation without sacrificing reliability, and in some cases are more reliable than conventional techniques. 1
Similarity Metric Learning for a Variable-Kernel Classifier
- Neural Computation
, 1995
"... Nearest-neighbour interpolation algorithms have many useful properties for applications to learning, but they often exhibit poor generalization. In this paper, it is shown that much better generalization can be obtained by using a variable interpolation kernel in combination with conjugate gradient ..."
Abstract
-
Cited by 100 (1 self)
- Add to MetaCart
Nearest-neighbour interpolation algorithms have many useful properties for applications to learning, but they often exhibit poor generalization. In this paper, it is shown that much better generalization can be obtained by using a variable interpolation kernel in combination with conjugate gradient optimization of the similarity metric and kernel size. The resulting method is called variable-kernel similarity metric (VSM) learning. It has been tested on several standard classification data sets, and on these problems it shows better generalization than back propagation and most other learning methods. An important advantage is that the system can operate as a black box in which no model minimization parameters need to be experimentally set by the user. The number of parameters that must be determined through optimization are orders of magnitude less than for back-propagation or RBF networks, which may indicate that the method better captures the essential degrees of variation in learni...
Efficient Memory-based Learning for Robot Control
, 1990
"... This dissertation is about the application of machine learning to robot control. A system which has no initial model of the robot/world dynamics should be able to construct such a model using data received through its sensors--an approach which is formalized here as the $AB (State-Action-Behaviour) ..."
Abstract
-
Cited by 94 (1 self)
- Add to MetaCart
This dissertation is about the application of machine learning to robot control. A system which has no initial model of the robot/world dynamics should be able to construct such a model using data received through its sensors--an approach which is formalized here as the $AB (State-Action-Behaviour) control cycle. A method of learning is presented in which all the experiences in the lifetime of the robot are explicitly remembered. The experiences are stored in a manner which permits fast recall of the closest previous experience to any new situation, thus permitting very quick predictions of the effects of proposed actions and, given a goal behaviour, permitting fast generation of a candidate action. The learning can take place in high-dimensional non-linear control spaces with real-valued ranges of variables. Furthermore, the method avoids a number of shortcomings of earlier learning methods in which the controller can become trapped in inadequate performance which does not improve. Also considered is how the system is made resistant to noisy inputs and how it adapts to environmental changes. A well founded mechanism for choosing actions is introduced which solves the experiment/perform dilemma for this domain with adequate computational efficiency, and with fast convergence to the goal behaviour. The dissertation explefins in detail how the $AB control cycle can be integrated into both low and high complexity tasks. The methods and algorithms are evaluated with numerous experiments using both real and simulated robot domefins. The final experiment also illustrates how a compound learning task can be structured into a hierarchy of simple learning tasks.
Local Regression: Automatic Kernel Carpentry
- Statistical Science
, 1993
"... . A kernel smoother is an intuitive estimate of a regression function or conditional expectation; at each point x 0 the estimate of E(Y j x 0 ) is a weighted mean of the sample Y i , with observations close to x 0 receiving the largest weights. Unfortunately this simplicity has flaws. At the boundar ..."
Abstract
-
Cited by 93 (2 self)
- Add to MetaCart
. A kernel smoother is an intuitive estimate of a regression function or conditional expectation; at each point x 0 the estimate of E(Y j x 0 ) is a weighted mean of the sample Y i , with observations close to x 0 receiving the largest weights. Unfortunately this simplicity has flaws. At the boundary of the predictor space, the kernel neighborhood is asymmetric and the estimate may have substantial bias. Bias can be a problem in the interior as well if the predictors are nonuniform or if the regression function has substantial curvature. These problems are particularly severe when the predictors are multidimensional. A variety of kernel modifications have been proposed to provide approximate and asymptotic adjustment for these biases. Such methods generally place substantial restrictions on the regression problems that can be considered; in unfavorable situations, they can perform very poorly. Moreover, the necessary modifications are very difficult to implement in the multidimensional...
Chaos and Nonlinear Dynamics: Application to Financial Markets
- Journal of Finance
, 1990
"... After the stock market crash of October 19, 1987, interest in nonlinear dynamics, especially deterministic chaotic dynamics, has increased in both the financial press and the academic literature. This has come about because the frequency of large moves in stock markets is greater than would be expec ..."
Abstract
-
Cited by 84 (3 self)
- Add to MetaCart
After the stock market crash of October 19, 1987, interest in nonlinear dynamics, especially deterministic chaotic dynamics, has increased in both the financial press and the academic literature. This has come about because the frequency of large moves in stock markets is greater than would be expected under a normal distribution. There are a number of possible explanations. A popular one is that the stock market is governed by chaotic dynamics. What exactly is chaos and how is it related to nonlinear dynamics? How does one detect chaos? Is there chaos in financial markets? Are there other explanations of the movements of financial prices other than chaos? The purpose of this paper is to explore these issues. -1Chaos has captured the fancy of many macroeconomists and financial economists. The attractiveness of chaotic dynamics is its ability to generate large movements which appear to be random, with greater frequency than linear models. As a result, there has been an explosion of pa...
Map-reduce for machine learning on multicore
- In Proceedings of NIPS
, 2007
"... We are at the beginning of the multicore era. Computers will have increasingly many cores (processors), but there is still no good programming framework for these architectures, and thus no simple and unified way for machine learning to take advantage of the potential speed up. In this paper, we dev ..."
Abstract
-
Cited by 83 (5 self)
- Add to MetaCart
We are at the beginning of the multicore era. Computers will have increasingly many cores (processors), but there is still no good programming framework for these architectures, and thus no simple and unified way for machine learning to take advantage of the potential speed up. In this paper, we develop a broadly applicable parallel programming method, one that is easily applied to many different learning algorithms. Our work is in distinct contrast to the tradition in machine learning of designing (often ingenious) ways to speed up a single algorithm at a time. Specifically, we show that algorithms that fit the Statistical Query model [15] can be written in a certain “summation form, ” which allows them to be easily parallelized on multicore computers. We adapt Google’s map-reduce [7] paradigm to demonstrate this parallel speed up technique on a variety of learning algorithms including locally weighted linear regression (LWLR), k-means, logistic regression

