Results 1 - 10
of
224
A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge
- Psychological review
, 1997
"... How do people know as much as they do with as little information as they get? The problem takes many forms; learning vocabulary from text is an especially dramatic and convenient case for research. A new general theory of acquired similarity and knowledge representation, latent semantic analysis (LS ..."
Abstract
-
Cited by 764 (9 self)
- Add to MetaCart
How do people know as much as they do with as little information as they get? The problem takes many forms; learning vocabulary from text is an especially dramatic and convenient case for research. A new general theory of acquired similarity and knowledge representation, latent semantic analysis (LSA), is presented and used to successfully simulate such learning and several other psycholinguistic phenomena. By inducing global knowledge indirectly from local co-occurrence data in a large body of representative text, LSA acquired knowledge about the full vocabulary of English at a comparable rate to schoolchildren. LSA uses no prior linguistic or perceptual similarity knowledge; it is based solely on a general mathematical learning method that achieves powerful inductive effects by extracting the right number of dimensions (e.g., 300) to represent objects and contexts. Relations to other theories, phenomena, and problems are sketched. Prologue "How much do we know at any time? Much more, or so I believe, than we know we know!" —Agatha Christie, The Moving Finger A typical American seventh grader knows the meaning of
Parallel Networks that Learn to Pronounce English Text
- COMPLEX SYSTEMS
, 1987
"... This paper describes NETtalk, a class of massively-parallel network systems that learn to convert English text to speech. The memory representations for pronunciations are learned by practice and are shared among many processing units. The performance of NETtalk has some similarities with observed h ..."
Abstract
-
Cited by 413 (5 self)
- Add to MetaCart
This paper describes NETtalk, a class of massively-parallel network systems that learn to convert English text to speech. The memory representations for pronunciations are learned by practice and are shared among many processing units. The performance of NETtalk has some similarities with observed human performance. (i) The learning follows a power law. (;i) The more words the network learns, the better it is at generalizing and correctly pronouncing new words, (iii) The performance of the network degrades very slowly as connections in the network are damaged: no single link or processing unit is essential. (iv) Relearning after damage is much faster than learning during the original training. (v) Distributed or spaced practice is more effective for long-term retention than massed practice. Network models can be constructed that have the same performance and learning characteristics on a particular task, but differ completely at the levels of synaptic strengths and single-unit responses. However, hierarchical clustering techniques applied to NETtalk reveal that these different networks have similar internal representations of letter-to-sound correspondences within groups of processing units. This suggests that invariant internal representations may be found in assemblies of neurons intermediate in size between highly localized and completely distributed representations.
An integrated theory of the mind
- PSYCHOLOGICAL REVIEW
, 2004
"... There has been a proliferation of proposed mental modules in an attempt to account for different cognitive functions but so far there has been no successful account of their integration. ACT-R (Anderson & Lebiere, 1998) has evolved into a theory that consists of multiple modules but also explains ho ..."
Abstract
-
Cited by 367 (39 self)
- Add to MetaCart
There has been a proliferation of proposed mental modules in an attempt to account for different cognitive functions but so far there has been no successful account of their integration. ACT-R (Anderson & Lebiere, 1998) has evolved into a theory that consists of multiple modules but also explains how they are integrated to produce coherent cognition. The perceptual-motor modules, the goal module, and the declarative memory module are presented as examples of specialized systems in ACT-R. These modules are associated with distinct cortical regions. These modules place chunks in buffers where they can be detected by a production system that responds to patterns of information in the buffers. At any point in time a single production rule is selected to respond to the current pattern. Subsymbolic processes serve to guide the selection of rules to fire as well as the internal operations of some modules. Much of learning involves tuning of these subsymbolic processes. Empirical examples are presented that illustrate the predictions of ACT-R’s modules. In addition, two models of complex tasks are described to illustrate how these modules result in strong predictions when they are brought together. One of these models is concerned with complex patterns of behavioral data in a dynamic task and the other is concerned with fMRI data obtained in a study of symbol manipulation.
Learning and Sequential Decision Making
- LEARNING AND COMPUTATIONAL NEUROSCIENCE
, 1989
"... In this report we show how the class of adaptive prediction methods that Sutton called "temporal difference," or TD, methods are related to the theory of squential decision making. TD methods have been used as "adaptive critics" in connectionist learning systems, and have been proposed as models of ..."
Abstract
-
Cited by 185 (10 self)
- Add to MetaCart
In this report we show how the class of adaptive prediction methods that Sutton called "temporal difference," or TD, methods are related to the theory of squential decision making. TD methods have been used as "adaptive critics" in connectionist learning systems, and have been proposed as models of animal learning in classical conditioning experiments. Here we relate TD methods to decision tasks formulated in terms of a stochastic dynamical system whose behavior unfolds over time under the influence of a decision maker's actions. Strategies are sought for selecting actions so as to maximize a measure of long-term payoff gain. Mathematically, tasks such as this can be formulated as Markovian decision problems, and numerous methods have been proposed for learning how to solve such problems. We show how a TD method can be understood as a novel synthesis of concepts from the theory of stochastic dynamic programming, which comprises the standard method for solving such tasks when a model of the dynamical system is available, and the theory of parameter estimation, which provides the appropriate context for studying learning rules in the form of equations for updating associative strengths in behavioral models, or connection weights in connectionist networks. Because this report is oriented primarily toward the non-engineer interested in animal learning, it presents tutorials on stochastic sequential decision tasks, stochastic dynamic programming, and parameter estimation.
A theory of causal learning in children: Causal maps and Bayes nets
- PSYCHOLOGICAL REVIEW
, 2004
"... The authors outline a cognitive and computational account of causal learning in children. They propose that children use specialized cognitive systems that allow them to recover an accurate “causal map ” of the world: an abstract, coherent, learned representation of the causal relations among events ..."
Abstract
-
Cited by 95 (16 self)
- Add to MetaCart
The authors outline a cognitive and computational account of causal learning in children. They propose that children use specialized cognitive systems that allow them to recover an accurate “causal map ” of the world: an abstract, coherent, learned representation of the causal relations among events. This kind of knowledge can be perspicuously understood in terms of the formalism of directed graphical causal models, or Bayes nets. Children’s causal learning and inference may involve computations similar to those for learning causal Bayes nets and for predicting with them. Experimental results suggest that 2to 4-year-old children construct new causal maps and that their learning is consistent with the Bayes net formalism.
Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia
, 2005
"... The prefrontal cortex has long been thought to subserve both working memory (the holding of information online for processing) and executive functions (deciding how to manipulate working memory and perform processing). Although many computational models of working memory have been developed, the mec ..."
Abstract
-
Cited by 63 (4 self)
- Add to MetaCart
The prefrontal cortex has long been thought to subserve both working memory (the holding of information online for processing) and executive functions (deciding how to manipulate working memory and perform processing). Although many computational models of working memory have been developed, the mechanistic basis of executive function remains elusive, often amounting to a homunculus. This article presents an attempt to deconstruct this homunculus through powerful learning mechanisms that allow a computational model of the prefrontal cortex to control both itself and other brain areas in a strategic, task-appropriate manner. These learning mechanisms are based on subcortical structures in the midbrain, basal ganglia, and amygdala, which together form an actor-critic architecture. The critic system learns which prefrontal representations are task relevant and trains the actor, which in turn provides a dynamic gating mechanism for controlling working memory updating. Computationally, the learning mechanism is designed to simultaneously solve the temporal and structural credit assignment problems. The model’s performance compares favorably with standard backpropagation-based temporal learning mechanisms on the challenging 1-2-AX working memory task and other benchmark working memory tasks.
Structure and Strength in Causal Induction
"... We present a framework for the rational analysis of elemental causal induction – learning about the existence of a relationship between a single cause and effect – based upon causal graphical models. This framework makes precise the distinction between causal structure and causal strength: the diffe ..."
Abstract
-
Cited by 56 (26 self)
- Add to MetaCart
We present a framework for the rational analysis of elemental causal induction – learning about the existence of a relationship between a single cause and effect – based upon causal graphical models. This framework makes precise the distinction between causal structure and causal strength: the difference between asking whether a causal relationship exists and asking how strong that causal relationship might be. We show that two leading rational models of elemental causal induction, ∆P and causal power, both estimate causal strength, and introduce a new rational model, causal support, that assesses causal structure. Causal support predicts several key phenomena of causal induction that cannot be accounted for by other rational models, which we explore through a series of experiments. These phenomena include the complex interaction between ∆P and the base-rate probability of the effect in the absence of the cause, sample size effects, inferences from incomplete contingency tables, and causal learning from rates. Causal support also provides a better account of a number of existing datasets than either ∆P or causal power.
Comparing models of rule-based classification learning: A replication and extension of Shepard, . . .
, 1994
"... ... difficulty for learning six fundamental types of rule-based categorization problems. Our main results mirrored those of Shepard et al., with the ordering of task difficulty being the same as in the original study. A much richer data set was collected, however, which enabled the generation of blo ..."
Abstract
-
Cited by 37 (6 self)
- Add to MetaCart
... difficulty for learning six fundamental types of rule-based categorization problems. Our main results mirrored those of Shepard et al., with the ordering of task difficulty being the same as in the original study. A much richer data set was collected, however, which enabled the generation of block-by-block learning curves suitable for quantitative fitting. Four current computational models of classification learning were fitted to the learning data: ALCOVE (Kruschke, 1992), the rational model (Anderson, 3991), the configural-cue model (cluck & Bower, 1988b), and an extended version of the conf`igural-cue model with dimensionalized, adaptive learning rate mechanisms. Although all of the models captured important qualitative aspects of the learning data, ALCOVE provided the best overall quantitative fit. The results suggest the need to incorporate same form of selective attention todimensions in category-learning models based on stimulus generalization and cue conditioning.
Toward a unified model of attention in associative learning
- Journal of Mathematical Psychology
, 2001
"... Two connectionist models of attention in associative learning, previously used to model human category learning, are shown to have special cases that are essentially equivalent to N. J. Mackintosh's (1975, Psychological Review, 82, 276 298) classic model of attention in animal learning. The models u ..."
Abstract
-
Cited by 37 (1 self)
- Add to MetaCart
Two connectionist models of attention in associative learning, previously used to model human category learning, are shown to have special cases that are essentially equivalent to N. J. Mackintosh's (1975, Psychological Review, 82, 276 298) classic model of attention in animal learning. The models unify formulas for associative weight change with formulas for attentional change, under a common goal of error reduction. Error-driven attentional shifting accelerates learning of new associations but also protects previously learned associations from retroactive interference. The models are fit to data from a recent experiment in human associative learning (J. K. Kruschke 6 N. J. Blair, 2000, Psychonomic Bulletin 6 Review, 7, 636 645), which shows that blocking of learning involves learned inattention. The approach also provides a novel and unifying theory of latent inhibition (the preexposure effect) in terms of blocking. The discussion summarizes how the approach accounts for a variety of other ``irrational' ' phenomena in associative learning, including base rate effects, perseveration of attention through relevance
Shaping Robot Behavior Using Principles from Instrumental Conditioning
, 1997
"... Shaping by successive approximations is an important animal training technique in which behavior is gradually adjusted in response to strategically timed reinforcements. We describe a computational model of this shaping process and its implementation on a mobile robot. Innate behaviors in our model ..."
Abstract
-
Cited by 36 (1 self)
- Add to MetaCart
Shaping by successive approximations is an important animal training technique in which behavior is gradually adjusted in response to strategically timed reinforcements. We describe a computational model of this shaping process and its implementation on a mobile robot. Innate behaviors in our model are sequences of actions and enabling conditions, and shaping is a behavior editing process realized by multiple editing mechanisms. The model replicates some fundamental phenomena associated with instrumental learning in animals, and allows an RWI B21 robot to learn several distinct tasks derived from the same innate behavior. 1. Introduction Service dogs trained to assist a disabled person will respond to over 60 verbal commands to, for example, turn on lights, open a refrigerator door, or retrieve a dropped object [9]. Chicks can be taught to play a toy piano (peck out a key sequence until a reinforcement is received at the end of the tune) [6], and rats have been conditioned to perform c...

