Results 1  10
of
121
Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images.
 IEEE Trans. Pattern Anal. Mach. Intell.
, 1984
"... AbstractWe make an analogy between images and statistical mechanics systems. Pixel gray levels and the presence and orientation of edges are viewed as states of atoms or molecules in a latticelike physical system. The assignment of an energy function in the physical system determines its Gibbs di ..."
Abstract

Cited by 5126 (1 self)
 Add to MetaCart
AbstractWe make an analogy between images and statistical mechanics systems. Pixel gray levels and the presence and orientation of edges are viewed as states of atoms or molecules in a latticelike physical system. The assignment of an energy function in the physical system determines its Gibbs distribution. Because of the Gibbs distribution, Markov random field (MRF) equivalence, this assignment also determines an MRF image model. The energy function is a more convenient and natural mechanism for embodying picture attributes than are the local characteristics of the MRF. For a range of degradation mechanisms, including blurring, nonlinear deformations, and multiplicative or additive noise, the posterior distribution is an MRF with a structure akin to the image model. By the analogy, the posterior distribution defines another (imaginary) physical system. Gradual temperature reduction in the physical system isolates low energy states ("annealing"), or what is the same thing, the most probable states under the Gibbs distribution. The analogous operation under the posterior distribution yields the maximum a posteriori (MAP) estimate of the image given the degraded observations. The result is a highly parallel "relaxation" algorithm for MAP estimation. We establish convergence properties of the algorithm and we experiment with some simple pictures, for which good restorations are obtained at low signaltonoise ratios.
A learning algorithm for Boltzmann machines
 Cognitive Science
, 1985
"... The computotionol power of massively parallel networks of simple processing elements resides in the communication bandwidth provided by the hardware connections between elements. These connections con allow a significant fraction of the knowledge of the system to be applied to an instance of a probl ..."
Abstract

Cited by 584 (13 self)
 Add to MetaCart
The computotionol power of massively parallel networks of simple processing elements resides in the communication bandwidth provided by the hardware connections between elements. These connections con allow a significant fraction of the knowledge of the system to be applied to an instance of a problem in o very short time. One kind of computation for which massively porollel networks appear to be well suited is large constraint satisfaction searches, but to use the connections efficiently two conditions must be met: First, a search technique that is suitable for parallel networks must be found. Second, there must be some way of choosing internal representations which allow the preexisting hardware connections to be used efficiently for encoding the constraints in the domain being searched. We describe a generol parallel search method, based on statistical mechanics, and we show how it leads to a general learning rule for modifying the connection strengths so as to incorporate knowledge obout o task domain in on efficient way. We describe some simple examples in which the learning algorithm creates internal representations thot ore demonstrobly the most efficient way of using the preexisting connectivity structure. 1.
Parallel Networks that Learn to Pronounce English Text
 COMPLEX SYSTEMS
, 1987
"... This paper describes NETtalk, a class of massivelyparallel network systems that learn to convert English text to speech. The memory representations for pronunciations are learned by practice and are shared among many processing units. The performance of NETtalk has some similarities with observed h ..."
Abstract

Cited by 549 (5 self)
 Add to MetaCart
(Show Context)
This paper describes NETtalk, a class of massivelyparallel network systems that learn to convert English text to speech. The memory representations for pronunciations are learned by practice and are shared among many processing units. The performance of NETtalk has some similarities with observed human performance. (i) The learning follows a power law. (;i) The more words the network learns, the better it is at generalizing and correctly pronouncing new words, (iii) The performance of the network degrades very slowly as connections in the network are damaged: no single link or processing unit is essential. (iv) Relearning after damage is much faster than learning during the original training. (v) Distributed or spaced practice is more effective for longterm retention than massed practice. Network models can be constructed that have the same performance and learning characteristics on a particular task, but differ completely at the levels of synaptic strengths and singleunit responses. However, hierarchical clustering techniques applied to NETtalk reveal that these different networks have similar internal representations of lettertosound correspondences within groups of processing units. This suggests that invariant internal representations may be found in assemblies of neurons intermediate in size between highly localized and completely distributed representations.
Connectionist Learning Procedures
 ARTIFICIAL INTELLIGENCE
, 1989
"... A major goal of research on networks of neuronlike processing units is to discover efficient learning procedures that allow these networks to construct complex internal representations of their environment. The learning procedures must be capable of modifying the connection strengths in such a way ..."
Abstract

Cited by 410 (9 self)
 Add to MetaCart
A major goal of research on networks of neuronlike processing units is to discover efficient learning procedures that allow these networks to construct complex internal representations of their environment. The learning procedures must be capable of modifying the connection strengths in such a way that internal units which are not part of the input or output come to represent important features of the task domain. Several interesting gradientdescent procedures have recently been discovered. Each connection computes the derivative, with respect to the connection strength, of a global measure of the error in the performance of the network. The strength is then adjusted in the direction that decreases the error. These relatively simple, gradientdescent learning procedures work well for small tasks and the new challenge is to find ways of improving their convergence rate and their generalization abilities so that they can be applied to larger, more realistic tasks.
Bayesian Modeling of Uncertainty in LowLevel Vision
, 1990
"... The need for error modeling, multisensor fusion, and robust algorithms i becoming increasingly recognized in computer vision. Bayesian modeling is a powerful, practical, and general framework for meeting these requirements. This article develops a Bayesian model for describing and manipulating the d ..."
Abstract

Cited by 204 (17 self)
 Add to MetaCart
The need for error modeling, multisensor fusion, and robust algorithms i becoming increasingly recognized in computer vision. Bayesian modeling is a powerful, practical, and general framework for meeting these requirements. This article develops a Bayesian model for describing and manipulating the dense fields, such as depth maps, associated with lowlevel computer vision. Our model consists of three components: a prior model, a sensor model, and a posterior model. The prior model captures a priori information about he structure of the field. We construct this model using the smoothness constraints from regularization to define a Markov Random Field. The sensor model describes the behavior and noise characteristics of our measurement system. We develop a number of sensor models for both sparse and dense measurements. The posterior model combines the information from the prior and sensor models using Bayes ' rule. We show how to compute optimal estimates from the posterior model and also how to compute the uncertainty (variance) in these estimates. To demonstrate the utility of our Bayesian framework, we present three examples of its application to real vision problems. The first application is the online extraction of depth from motion. Using a twodimensional generalization of the Kalman filter, we develop an incremental algorithm that provides a dense online estimate of depth whose accuracy improves over time. In the second application, we use a Bayesian model to determine observer motion from sparse depth (range) measurements. In the third application, we use the Bayesian interpretation f regularization to choose the optimal smoothing parameter for interpolation. The uncertainty modeling techniques that we develop, and the utility of these techniques invarious applications, support our claim that Bayesian modeling is a powerful and practical framework for lowlevel vision.
Gradient calculation for dynamic recurrent neural networks: a survey
 IEEE Transactions on Neural Networks
, 1995
"... Abstract  We survey learning algorithms for recurrent neural networks with hidden units, and put the various techniques into a common framework. We discuss xedpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and non xedpoint algorithms, namely backp ..."
Abstract

Cited by 181 (3 self)
 Add to MetaCart
(Show Context)
Abstract  We survey learning algorithms for recurrent neural networks with hidden units, and put the various techniques into a common framework. We discuss xedpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and non xedpoint algorithms, namely backpropagation through time, Elman's history cuto, and Jordan's output feedback architecture. Forward propagation, an online technique that uses adjoint equations, and variations thereof, are also discussed. In many cases, the uni ed presentation leads to generalizations of various sorts. We discuss advantages and disadvantages of temporally continuous neural networks in contrast to clocked ones, continue with some \tricks of the trade" for training, using, and simulating continuous time and recurrent neural networks. We present somesimulations, and at the end, address issues of computational complexity and learning speed.
Learning Machines
, 1965
"... This book is about machines that learn to discover hidden relationships in data. A constant sfream of data bombards our senses and millions of sensory channels carry information into our brains. Brains are also learning machines that condition, ..."
Abstract

Cited by 176 (0 self)
 Add to MetaCart
(Show Context)
This book is about machines that learn to discover hidden relationships in data. A constant sfream of data bombards our senses and millions of sensory channels carry information into our brains. Brains are also learning machines that condition,
Generative models for discovering sparse distributed representations
 Philosophical Transactions of the Royal Society B
, 1997
"... We describe a hierarchical, generative model that can be viewed as a nonlinear generalization of factor analysis and can be implemented in a neural network. The model uses bottomup, topdown and lateral connections to perform Bayesian perceptual inference correctly. Once perceptual inference has b ..."
Abstract

Cited by 139 (7 self)
 Add to MetaCart
(Show Context)
We describe a hierarchical, generative model that can be viewed as a nonlinear generalization of factor analysis and can be implemented in a neural network. The model uses bottomup, topdown and lateral connections to perform Bayesian perceptual inference correctly. Once perceptual inference has been performed the connection strengths can be updated using a very simple learning rule that only requires locally available information. We demonstrate that the network learns to extract sparse, distributed, hierarchical representations.
Bayesian computation in recurrent neural circuits
 Neural Computation
, 2004
"... A large number of human psychophysical results have been successfully explained in recent years using Bayesian models. However, the neural implementation of such models remains largely unclear. In this paper, we show that a network architecture commonly used to model the cerebral cortex can implem ..."
Abstract

Cited by 94 (4 self)
 Add to MetaCart
(Show Context)
A large number of human psychophysical results have been successfully explained in recent years using Bayesian models. However, the neural implementation of such models remains largely unclear. In this paper, we show that a network architecture commonly used to model the cerebral cortex can implement Bayesian inference for an arbitrary hidden Markov model. We illustrate the approach using an orientation discrimination task and a visual motion detection task. In the case of orientation discrimination, we show that the model network can infer the posterior distribution over orientations and correctly estimate stimulus orientation in the presence of significant noise. In the case of motion detection, we show that the resulting model network exhibits direction selectivity and correctly computes the posterior probabilities over motion direction and position. When used to solve the wellknown random dots motion discrimination task, the model generates responses that mimic the activities of evidenceaccumulating neurons in cortical areas LIP and FEF. The framework introduced in the paper posits a new interpretation of cortical activities in terms of log posterior probabilities of stimuli occurring in the natural world. 1 1
THE COPYCAT PROJECT: An Experiment in Nondeterminism and Creative Analogies
 Massachusetts Institute of Technology
, 1984
"... A microworld is described, in which many analogies involving strikingly different concepts and levels of subtlety can be made. The question "What differentiates the good ones from the bad ones?" is discussed, and then the problem of how to implement a computational model of the human abil ..."
Abstract

Cited by 73 (3 self)
 Add to MetaCart
A microworld is described, in which many analogies involving strikingly different concepts and levels of subtlety can be made. The question "What differentiates the good ones from the bad ones?" is discussed, and then the problem of how to implement a computational model of the human ability to come up with such analogies (and to have a sense for their quality) is consicered. A key part of the proposed system, now under development, is its dependence on statistically emergent properties of stochastically interacting "codelets" (small pieces of readytorun code created by the system, and selected at random to run with probability proportional to heuristically assigned "urgencies"). Another key element is a network of linked concepts of varying levels of "semanticity", in which activation spreads and indirectly controls the urgencies of new codelets. There is pressure in the syste.rn toward maximizing the degree of "semanticity" or "intensionality" of descriptions of structures, but many such pressures, often conflicting, m.,u$ interact with one another, and compromises must be made. The shifting of (1) percei/v'bd boundaries insi,d,e stru,c, tures, (2) descriptive concepts chosen to apply to structures, and (3)4eatures perceived as salient or not, is called "slippage". What can slip, and how, are emergent consequences of the interaction of (1) the temporary ("cytoplasmic") structures involved in the analogy with (2) the permanent ("Platonic") concepts and links in the conceptual proximity network, or "slippability network". The architecture of this system is postulated as a general architecture suitable for dealing not only with fluid analogies, but also with other types of abstract perception and categorization tasks, such as musical perception, scientific theorizing,...