Results 1  10
of
136
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
"... Semantic word spaces have been very useful but cannot express the meaning of longer phrases in a principled way. Further progress towards understanding compositionality in tasks such as sentiment detection requires richer supervised training and evaluation resources and more powerful models of compo ..."
Abstract

Cited by 191 (7 self)
 Add to MetaCart
(Show Context)
Semantic word spaces have been very useful but cannot express the meaning of longer phrases in a principled way. Further progress towards understanding compositionality in tasks such as sentiment detection requires richer supervised training and evaluation resources and more powerful models of composition. To remedy this, we introduce a Sentiment Treebank. It includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality. To address them, we introduce the Recursive Neural Tensor Network. When trained on the new treebank, this model outperforms all previous methods on several metrics. It pushes the state of the art in single sentence positive/negative classification from 80 % up to 85.4%. The accuracy of predicting finegrained sentiment labels for all phrases reaches 80.7%, an improvement of 9.7 % over bag of features baselines. Lastly, it is the only model that can accurately capture the effects of negation and its scope at various tree levels for both positive and negative phrases. 1
Semantic Compositionality through Recursive MatrixVector Spaces
"... Singleword vector space models have been very successful at learning lexical information. However, they cannot capture the compositional meaning of longer phrases, preventing them from a deeper understanding of language. We introduce a recursive neural network (RNN) model that learns compositional ..."
Abstract

Cited by 183 (11 self)
 Add to MetaCart
(Show Context)
Singleword vector space models have been very successful at learning lexical information. However, they cannot capture the compositional meaning of longer phrases, preventing them from a deeper understanding of language. We introduce a recursive neural network (RNN) model that learns compositional vector representations for phrases and sentences of arbitrary syntactic type and length. Our model assigns a vector and a matrix to every node in a parse tree: the vector captures the inherent meaning of the constituent, while the matrix captures how it changes the meaning of neighboring words or phrases. This matrixvector RNN can learn the meaning of operators in propositional logic and natural language. The model obtains state of the art performance on three different experiments: predicting finegrained sentiment distributions of adverbadjective pairs; classifying sentiment labels of movie reviews and classifying semantic relationships such as causeeffect or topicmessage between nouns using the syntactic path between them. 1
Processing Capacity Defined by Relational Complexity: Implications for Comparative, Developmental, and Cognitive Psychology
, 1989
"... It is argued that working memory limitations are best defined in terms of the complexity of relations that can be processed in parallel. Relational complexity is related to processing loads in problem solving, and discriminates between higher animal species, as well as between children of differen ..."
Abstract

Cited by 182 (14 self)
 Add to MetaCart
It is argued that working memory limitations are best defined in terms of the complexity of relations that can be processed in parallel. Relational complexity is related to processing loads in problem solving, and discriminates between higher animal species, as well as between children of different ages. Complexity is defined by the number of dimensions, or sources of variation, that are related. A unary relation has one argument and one source of variation, because its argument can be instantiated in only one way at a time. A binary relation has two arguments, and two sources of variation, because two argument instantiations are possible at once. Similarly, a ternary relation is three dimensional, a quaternary relation is four dimensional, and so on. Dimensionality is related to number of chunks, because both attributes on dimensions and chunks are independent units of information of arbitrary size. Empirical studies of working memory limitations indicate a soft limit which corresponds to processing one quaternary relation in parallel. More complex concepts are processed by segmentation or conceptual chunking. Segmentation entails breaking tasks into components which do not exceed processing capacity, and which are processed serially. Conceptual chunking entails "collapsing" representations to reduce their dimensionality and consequently their processing load, but at the cost of making some relational information inaccessible. Parallel distributed processing implementations of relational representations show that relations with more arguments entail a higher computational cost, which corresponds to empirical observations of higher processing loads in humans. Empirical evidence is presented that relational complexity discriminates between higher species...
A General Framework for Adaptive Processing of Data Structures
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 1998
"... A structured organization of information is typically required by symbolic processing. On the other hand, most connectionist models assume that data are organized according to relatively poor structures, like arrays or sequences. The framework described in this paper is an attempt to unify adaptive ..."
Abstract

Cited by 150 (61 self)
 Add to MetaCart
A structured organization of information is typically required by symbolic processing. On the other hand, most connectionist models assume that data are organized according to relatively poor structures, like arrays or sequences. The framework described in this paper is an attempt to unify adaptive models like artificial neural nets and belief nets for the problem of processing structured information. In particular, relations between data variables are expressed by directed acyclic graphs, where both numerical and categorical values coexist. The general framework proposed in this paper can be regarded as an extension of both recurrent neural networks and hidden Markov models to the case of acyclic graphs. In particular we study the supervised learning problem as the problem of learning transductions from an input structured space to an output structured space, where transductions are assumed to admit a recursive hidden statespace representation. We introduce a graphical formalism for r...
Composition in distributional models of semantics
, 2010
"... Distributional models of semantics have proven themselves invaluable both in cognitive modelling of semantic phenomena and also in practical applications. For example, they have been used to model judgments of semantic similarity (McDonald, 2000) and association (Denhire and Lemaire, 2004; Griffit ..."
Abstract

Cited by 148 (3 self)
 Add to MetaCart
(Show Context)
Distributional models of semantics have proven themselves invaluable both in cognitive modelling of semantic phenomena and also in practical applications. For example, they have been used to model judgments of semantic similarity (McDonald, 2000) and association (Denhire and Lemaire, 2004; Griffiths et al., 2007) and have been shown to achieve human level performance on synonymy tests (Landuaer and Dumais, 1997; Griffiths et al., 2007) such as those included in the Test of English as Foreign Language (TOEFL). This ability has been put to practical use in automatic thesaurus extraction (Grefenstette, 1994). However, while there has been a considerable amount of research directed at the most effective ways of constructing representations for individual words, the representation of larger constructions, e.g., phrases and sentences, has received relatively little attention. In this thesis we examine this issue of how to compose meanings within distributional models of semantics to form representations of multiword structures. Natural language data typically consists of such complex structures, rather than
Representing word meaning and order information in a composite holographic lexicon
 Psychological Review
, 2007
"... The authors present a computational model that builds a holographic lexicon representing both word meaning and word order from unsupervised experience with natural language. The model uses simple convolution and superposition mechanisms (cf. B. B. Murdock, 1982) to learn distributed holographic repr ..."
Abstract

Cited by 123 (14 self)
 Add to MetaCart
(Show Context)
The authors present a computational model that builds a holographic lexicon representing both word meaning and word order from unsupervised experience with natural language. The model uses simple convolution and superposition mechanisms (cf. B. B. Murdock, 1982) to learn distributed holographic representations for words. The structure of the resulting lexicon can account for empirical data from classic experiments studying semantic typicality, categorization, priming, and semantic constraint in sentence completions. Furthermore, order information can be retrieved from the holographic representations, allowing the model to account for limited word transitions without the need for builtin transition rules. The model demonstrates that a broad range of psychological data can be accounted for directly from the structure of lexical representations learned in this way, without the need for complexity to be built into either the processing mechanisms or the representations. The holographic representations are an appropriate knowledge representation to be used by higher order models of language comprehension, relieving the complexity required at the higher level.
The computational modeling of analogymaking
 Trends in Cognitive Sciences
, 2002
"... Our ability to see a particular object or situation in one context as being “the same as” another object or situation in another context is the essence of analogymaking. It encompasses our ability to explain new concepts in terms of alreadyfamiliar ones, to emphasize particular aspects of situatio ..."
Abstract

Cited by 65 (2 self)
 Add to MetaCart
(Show Context)
Our ability to see a particular object or situation in one context as being “the same as” another object or situation in another context is the essence of analogymaking. It encompasses our ability to explain new concepts in terms of alreadyfamiliar ones, to emphasize particular aspects of situations, to generalize, to characterize situations, to explain
Logic Programs and Connectionist Networks
 Journal of Applied Logic
, 2004
"... One facet of the question of integration of Logic and Connectionist Systems, and how these can complement each other, concerns the points of contact, in terms of semantics, between neural networks and logic programs. In this paper, we show that certain semantic operators for propositional logic p ..."
Abstract

Cited by 62 (22 self)
 Add to MetaCart
(Show Context)
One facet of the question of integration of Logic and Connectionist Systems, and how these can complement each other, concerns the points of contact, in terms of semantics, between neural networks and logic programs. In this paper, we show that certain semantic operators for propositional logic programs can be computed by feedforward connectionist networks, and that the same semantic operators for firstorder normal logic programs can be approximated by feedforward connectionist networks. Turning the networks into recurrent ones allows one also to approximate the models associated with the semantic operators. Our methods depend on a wellknown theorem of Funahashi, and necessitate the study of when Funahasi's theorem can be applied, and also the study of what means of approximation are appropriate and significant.
Approximating the Semantics of Logic Programs by Recurrent Neural Networks
"... In [18] we have shown how to construct a 3layered recurrent neural network that computes the fixed point of the meaning function TP of a given propositional logic program P, which corresponds to the computation of the semantics of P. In this article we consider the first order case. We define a no ..."
Abstract

Cited by 61 (10 self)
 Add to MetaCart
In [18] we have shown how to construct a 3layered recurrent neural network that computes the fixed point of the meaning function TP of a given propositional logic program P, which corresponds to the computation of the semantics of P. In this article we consider the first order case. We define a notion of approximation for interpretations and prove that there exists a 3layered feed forward neural network that approximates the calculation of TP for a given first order acyclic logic program P with an injective level mapping arbitrarily well. Extending the feed forward network by recurrent connections we obtain a recurrent neural network whose iteration approximates the fixed point of TP. This result is proven by taking advantage of the fact that for acyclic logic programs the function TP is a contraction mapping on a complete metric space defined by the interpretations of the program. Mapping this space to the metric space IR with Euclidean distance, a real valued function fP can be defined which corresponds to TP and is continuous as well as a contraction. Consequently it can be approximated by an appropriately chosen class of feed forward neural networks.
A Comparison of Vectorbased Representations for Semantic Composition
"... In this paper we address the problem of modeling compositional meaning for phrases and sentences using distributional methods. We experiment with several possible combinations of representation and composition, exhibiting varying degrees of sophistication. Some are shallow while others operate over ..."
Abstract

Cited by 58 (1 self)
 Add to MetaCart
(Show Context)
In this paper we address the problem of modeling compositional meaning for phrases and sentences using distributional methods. We experiment with several possible combinations of representation and composition, exhibiting varying degrees of sophistication. Some are shallow while others operate over syntactic structure, rely on parameter learning, or require access to very large corpora. We find that shallow approaches are as good as more computationally intensive alternatives with regards to two particular tests: (1) phrase similarity and (2) paraphrase detection. The sizes of the involved training corpora and the generated vectors are not as important as the fit between the meaning representation and compositional method. 1