Results 1 - 10
of
336
Practical Issues in Temporal Difference Learning
- Machine Learning
, 1992
"... This paper examines whether temporal difference methods for training connectionist networks, such as Suttons's TD(lambda) algorithm can be successfully applied to complex real-world problems. A number of important practical issues are identified and discussed from a general theoretical perspective. ..."
Abstract
-
Cited by 334 (2 self)
- Add to MetaCart
This paper examines whether temporal difference methods for training connectionist networks, such as Suttons's TD(lambda) algorithm can be successfully applied to complex real-world problems. A number of important practical issues are identified and discussed from a general theoretical perspective. These practical issues are then examined in the context of a case study in which TD(lambda) is applied to learning the game of backgammon from the outcome of self-play. This is apparently the first application of this algorithm to a complex nontrivial task. It is found that, with zero knowledge built in, the network is able to learn from scratch to play the entire game at a fairly strong intermediate level of performance which is clearly better than conventional commercial programs and which in fact surpasses comparable networks trained on a massive human expert data set. This indicates that TD learning may work better in practice than one would expect based on current theory, and it suggests that further analysis of TD methods, as well as applications in other complex domains may be worth investigating.
Face Recognition Based on Fitting a 3D Morphable Model
- IEEE Trans. Pattern Anal. Mach. Intell
, 2003
"... Abstract—This paper presents a method for face recognition across variations in pose, ranging from frontal to profile views, and across a wide range of illuminations, including cast shadows and specular reflections. To account for these variations, the algorithm simulates the process of image format ..."
Abstract
-
Cited by 251 (11 self)
- Add to MetaCart
Abstract—This paper presents a method for face recognition across variations in pose, ranging from frontal to profile views, and across a wide range of illuminations, including cast shadows and specular reflections. To account for these variations, the algorithm simulates the process of image formation in 3D space, using computer graphics, and it estimates 3D shape and texture of faces from single images. The estimate is achieved by fitting a statistical, morphable model of 3D faces to images. The model is learned from a set of textured 3D scans of heads. We describe the construction of the morphable model, an algorithm to fit the model to images, and a framework for face identification. In this framework, faces are represented by model parameters for 3D shape and texture. We present results obtained with 4,488 images from the publicly available CMU-PIE database and 1,940 images from the FERET database. Index Terms—Face recognition, shape estimation, deformable model, 3D faces, pose invariance, illumination invariance. æ 1
The dynamics of reinforcement learning in cooperative multiagent systems
- In Proceedings of National Conference on Artificial Intelligence (AAAI-98
, 1998
"... Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate their action choices in multiagent systems. We examine some of the factors that can influence the dynamics of the learning process in such a setting. We first distinguish reinforcement learners that a ..."
Abstract
-
Cited by 249 (1 self)
- Add to MetaCart
Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate their action choices in multiagent systems. We examine some of the factors that can influence the dynamics of the learning process in such a setting. We first distinguish reinforcement learners that are unaware of (or ignore) the presence of other agents from those that explicitly attempt to learn the value of joint actions and the strategies of their counterparts. We study (a simple form of) Q-learning in cooperative multiagent systems under these two perspectives, focusing on the influence of that game structure and exploration strategies on convergence to (optimal and suboptimal) Nash equilibria. We then propose alternative optimistic exploration strategies that increase the likelihood of convergence to an optimal equilibrium. 1
GTM: The generative topographic mapping
- Neural Computation
, 1998
"... Latent variable models represent the probability density of data in a space of several dimensions in terms of a smaller number of latent, or hidden, variables. A familiar example is factor analysis which is based on a linear transformations between the latent space and the data space. In this paper ..."
Abstract
-
Cited by 234 (5 self)
- Add to MetaCart
Latent variable models represent the probability density of data in a space of several dimensions in terms of a smaller number of latent, or hidden, variables. A familiar example is factor analysis which is based on a linear transformations between the latent space and the data space. In this paper we introduce a form of non-linear latent variable model called the Generative Topographic Mapping for which the parameters of the model can be determined using the EM algorithm. GTM provides a principled alternative to the widely used Self-Organizing Map (SOM) of Kohonen (1982), and overcomes most of the significant limitations of the SOM. We demonstrate the performance of the GTM algorithm on a toy problem and on simulated data from flow diagnostics for a multi-phase oil pipeline. Copyright c○MIT Press (1998). 1
Algorithms for Sequential Decision Making
, 1996
"... Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one ..."
Abstract
-
Cited by 158 (7 self)
- Add to MetaCart
Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one of a finite set of actions, "should" is maximize a long-run measure of reward, and "I" is an automated planning or learning system (agent). In particular,
Linear least-squares algorithms for temporal difference learning
- Machine Learning
, 1996
"... Abstract. We introduce two new temporal difference (TD) algorithms based on the theory of linear leastsquares function approximation. We define an algorithm we call Least-Squares TD (LS TD) for which we prove probability-one convergence when it is used with a function approximator linear in the adju ..."
Abstract
-
Cited by 139 (0 self)
- Add to MetaCart
Abstract. We introduce two new temporal difference (TD) algorithms based on the theory of linear leastsquares function approximation. We define an algorithm we call Least-Squares TD (LS TD) for which we prove probability-one convergence when it is used with a function approximator linear in the adjustable parameters. We then define a recursive version of this algorithm, Recursive Least-Squares TD (RLS TD). Although these new TD algorithms require more computation per time-step than do Sutton's TD(A) algorithms, they are more efficient in a statistical sense because they extract more information from training experiences. We describe a simulation experiment showing the substantial improvement in learning rate achieved by RLS TD in an example Markov prediction problem. To quantify this improvement, we introduce the TD error variance of a Markov chain, arc,, and experimentally conclude that the convergence rate of a TD algorithm depends linearly on ~ro. In addition to converging more rapidly, LS TD and RLS TD do not have control parameters, such as a learning rate parameter, thus eliminating the possibility of achieving poor performance by an unlucky choice of parameters.
Enterprise modeling
, 1998
"... ... This article motivates the need for enterprise models and introduces the concepts of generic and deductive enterprise models. It reviews research to date on enterprise modeling and considers in detail the Toronto virtual enterprise effort at the University of Toronto. ..."
Abstract
-
Cited by 109 (5 self)
- Add to MetaCart
... This article motivates the need for enterprise models and introduces the concepts of generic and deductive enterprise models. It reviews research to date on enterprise modeling and considers in detail the Toronto virtual enterprise effort at the University of Toronto.
Self-Organizing Maps: Ordering, Convergence Properties and Energy Functions
- Biological Cybernetics
, 1992
"... We investigate the convergence properties of the self-organizing feature map algorithm for a simple, but very instructive case: the formation of a topographic representation of the unit interval [0; 1] by a linear chain of neurons. We extend the proofs of convergence of Kohonen and of Cottrell and F ..."
Abstract
-
Cited by 92 (2 self)
- Add to MetaCart
We investigate the convergence properties of the self-organizing feature map algorithm for a simple, but very instructive case: the formation of a topographic representation of the unit interval [0; 1] by a linear chain of neurons. We extend the proofs of convergence of Kohonen and of Cottrell and Fort to hold in any case where the neighborhood function, which is used to scale the change in the weight values at each neuron, is a monotonically decreasing function of distance from the winner neuron. We prove that the learning dynamics cannot be described by a gradient descent on a single energy function, but may be described using a set of potential functions, one for each neuron, which are independently minimized following a stochastic gradient descent. We derive the correct potential functions for the one- and multi-dimensional case, and show that the energy functions given by Tolat (1990) are an approximation which is no longer valid in the case of highly disordered maps or steep neig...
A Context-Sensitive Generalization of ICA
, 1996
"... Source separation arises in a surprising number of signal processing applications, from speech recognition to EEG analysis. In the square linear blind source separation problem without time delays, one must find an unmixing matrix which can detangle the result of mixing n unknown independent sources ..."
Abstract
-
Cited by 86 (7 self)
- Add to MetaCart
Source separation arises in a surprising number of signal processing applications, from speech recognition to EEG analysis. In the square linear blind source separation problem without time delays, one must find an unmixing matrix which can detangle the result of mixing n unknown independent sources through an unknown n \Theta n mixing matrix. The recently introduced ICA blind source separation algorithm (Baram and Roth 1994; Bell and Sejnowski 1995) is a powerful and surprisingly simple technique for solving this problem. ICA is all the more remarkable for performing so well despite making absolutely no use of the temporal structure of its input! This paper presents a new algorithm, contextual ICA, which derives from a maximum likelihood density estimation formulation of the problem. cICA can incorporate arbitrarily complex adaptive history-sensitive source models, and thereby make use of the temporal structure of its input. This allows it to separate in a number of situations where s...
Markov Chain Monte Carlo Estimation of Exponential Random Graph Models
- Journal of Social Structure
, 2002
"... This paper is about estimating the parameters of the exponential random graph model, also known as the p # model, using frequentist Markov chain Monte Carlo (MCMC) methods. The exponential random graph model is simulated using Gibbs or Metropolis-Hastings sampling. The estimation procedures consider ..."
Abstract
-
Cited by 84 (13 self)
- Add to MetaCart
This paper is about estimating the parameters of the exponential random graph model, also known as the p # model, using frequentist Markov chain Monte Carlo (MCMC) methods. The exponential random graph model is simulated using Gibbs or Metropolis-Hastings sampling. The estimation procedures considered are based on the Robbins-Monro algorithm for approximating a solution to the likelihood equation.

