Results 1 - 10
of
16
Recursive neural networks can learn logical semantics
- In Proc. of the 3rd Workshop on Continuous Vector Space Models and their Compositionality
, 2015
"... Tree-structured recursive neural networks (TreeRNNs) for sentence meaning have been successful for many applications, but it remains an open question whether the fixed-length representations that they learn can support tasks as demanding as logi-cal deduction. We pursue this question by evaluating w ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
(Show Context)
Tree-structured recursive neural networks (TreeRNNs) for sentence meaning have been successful for many applications, but it remains an open question whether the fixed-length representations that they learn can support tasks as demanding as logi-cal deduction. We pursue this question by evaluating whether two such models— plain TreeRNNs and tree-structured neural tensor networks (TreeRNTNs)—can cor-rectly learn to identify logical relation-ships such as entailment and contradiction using these representations. In our first set of experiments, we generate artificial data from a logical grammar and use it to eval-uate the models ’ ability to learn to handle basic relational reasoning, recursive struc-tures, and quantification. We then evaluate the models on the more natural SICK chal-lenge data. Both models perform compet-itively on the SICK data and generalize well in all three experiments on simulated data, suggesting that they can learn suit-able representations for logical inference in natural language.
Existence of V (m, t) vectors
- J. Statist. Plann. Inference
"... Local resampling for patch-based texture synthesis in ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
(Show Context)
Local resampling for patch-based texture synthesis in
Document Modeling with Gated Recurrent Neural Network for Sentiment Classification
"... Document level sentiment classification remains a challenge: encoding the intrin-sic relations between sentences in the se-mantic meaning of a document. To ad-dress this, we introduce a neural network model to learn vector-based document rep-resentation in a unified, bottom-up fash-ion. The model fi ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Document level sentiment classification remains a challenge: encoding the intrin-sic relations between sentences in the se-mantic meaning of a document. To ad-dress this, we introduce a neural network model to learn vector-based document rep-resentation in a unified, bottom-up fash-ion. The model first learns sentence rep-resentation with convolutional neural net-work or long short-term memory. After-wards, semantics of sentences and their relations are adaptively encoded in docu-ment representation with gated recurren-t neural network. We conduct documen-t level sentiment classification on four large-scale review datasets from IMDB and Yelp Dataset Challenge. Experimen-tal results show that: (1) our neural mod-el shows superior performances over sev-eral state-of-the-art algorithms; (2) gat-ed recurrent neural network dramatically outperforms standard recurrent neural net-work in document modeling for sentiment classification.1 1
Long short-term memory over recursive structures.
- In Proceedings of the 32nd International Conference on Machine Learning (ICML-15),
, 2015
"... Abstract The chain-structured long short-term memory (LSTM) has showed to be effective in a wide range of problems such as speech recognition and machine translation. In this paper, we propose to extend it to tree structures, in which a memory cell can reflect the history memories of multiple child ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract The chain-structured long short-term memory (LSTM) has showed to be effective in a wide range of problems such as speech recognition and machine translation. In this paper, we propose to extend it to tree structures, in which a memory cell can reflect the history memories of multiple child cells or multiple descendant cells in a recursive process. We call the model S-LSTM, which provides a principled way of considering long-distance interaction over hierarchies, e.g., language or image parse structures. We leverage the models for semantic composition to understand the meaning of text, a fundamental problem in natural language understanding, and show that it outperforms a state-of-theart recursive model by replacing its composition layers with the S-LSTM memory blocks. We also show that utilizing the given structures is helpful in achieving a performance better than that without considering the structures.
GATED GRAPH SEQUENCE NEURAL NETWORKS
, 2016
"... Graph-structured data appears frequently in domains including chemistry, natural language semantics, social networks, and knowledge bases. In this work, we study feature learning techniques for graph-structured inputs. Our starting point is previous work on Graph Neural Networks (Scarselli et al., 2 ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Graph-structured data appears frequently in domains including chemistry, natural language semantics, social networks, and knowledge bases. In this work, we study feature learning techniques for graph-structured inputs. Our starting point is previous work on Graph Neural Networks (Scarselli et al., 2009), which we modify to use gated recurrent units and modern optimization techniques and then extend to output sequences. The result is a flexible and broadly useful class of neural net-work models that has favorable inductive biases relative to purely sequence-based models (e.g., LSTMs) when the problem is graph-structured. We demonstrate the capabilities on some simple AI (bAbI) and graph algorithm learning tasks. We then show it achieves state-of-the-art performance on a problem from program verification, in which subgraphs need to be matched to abstract data structures.
Molding CNNs for text: non-linear, non-consecutive convolutions
"... The success of deep learning often de-rives from well-chosen operational build-ing blocks. In this work, we revise the temporal convolution operation in CNNs to better adapt it to text processing. In-stead of concatenating word representa-tions, we appeal to tensor algebra and use low-rank n-gram te ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
The success of deep learning often de-rives from well-chosen operational build-ing blocks. In this work, we revise the temporal convolution operation in CNNs to better adapt it to text processing. In-stead of concatenating word representa-tions, we appeal to tensor algebra and use low-rank n-gram tensors to directly exploit interactions between words already at the convolution stage. Moreover, we extend the n-gram convolution to non-consecutive words to recognize patterns with interven-ing words. Through a combination of low-rank tensors, and pattern weighting, we can efficiently evaluate the resulting con-volution operation via dynamic program-ming. We test the resulting architecture on standard sentiment classification and news categorization tasks. Our model achieves state-of-the-art performance both in terms of accuracy and training speed. For in-stance, we obtain 51.2 % accuracy on the fine-grained sentiment classification task.1 1
Statistical Script Learning with Recurrent Neural Nets
"... Abstract Statistical Scripts are probabilistic models of sequences of events. For example, a script model might encode the information that the event "Smith met with the President" should strongly predict the event "Smith spoke to the President." We present a number of results i ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract Statistical Scripts are probabilistic models of sequences of events. For example, a script model might encode the information that the event "Smith met with the President" should strongly predict the event "Smith spoke to the President." We present a number of results improving the state of the art of learning statistical scripts for inferring implicit events. First, we demonstrate that incorporating multiple arguments into events, yielding a more complex event representation than is used in previous work, helps to improve a co-occurrence-based script system's predictive power. Second, we improve on these results with a Recurrent Neural Network script sequence model which uses a Long Short-Term Memory component. We evaluate in two ways: first, we evaluate systems' ability to infer held-out events from documents (the "Narrative Cloze" evaluation); second, we evaluate novel event inferences by collecting human judgments. We propose a number of further extensions to this work. First, we propose a number of new probabilistic script models leveraging recent advances in Neural Network training. These include recurrent sequence models with different hidden unit structure and Convolutional Neural Network models. Second, we propose integrating more lexical and linguistic information into events. Third, we propose incorporating discourse relations between spans of text into event co-occurrence models, either as output by an off-the-shelf discourse parser or learned automatically. Finally, we propose investigating the interface between models of event co-occurrence and coreference resolution, in particular by integrating script information into general coreference systems.
Long Short-term Memory Network over Rhetorical Structure Theory for Sentence-level Sentiment Analysis
, 2016
"... Abstract Using deep learning models to solve sentiment analysis of sentences is still a challenging task. Long short-term memory (LSTM) network solves the gradient disappeared problem existed in recurrent neural network (RNN), but LSTM structure is linear chain-structure that can't capture tex ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract Using deep learning models to solve sentiment analysis of sentences is still a challenging task. Long short-term memory (LSTM) network solves the gradient disappeared problem existed in recurrent neural network (RNN), but LSTM structure is linear chain-structure that can't capture text structure information. Afterwards, Tree-LSTM is proposed, which uses LSTM forget gate to skip sub-trees that have little effect on the results to get good performance. It illustrates that the chain-structured LSTM more strongly depends on text structure. However, Tree-LSTM can't clearly figure out which sub-trees are important and which sub-trees have little effect. We propose a simple model which uses Rhetorical Structure Theory (RST) for text parsing. By building LSTM network on RST parse structure, we make full use of LSTM structural characteristics to automatically enhance the nucleus information and filter the satellite information of text. Furthermore, this approach can make the representations concerning the relations between segments of text, which can improve text semantic representations. Experiment results show that this method not only has higher classification accuracy, but also trains quickly.
Deep Unordered Composition Rivals Syntactic Methods for Text Classification
"... Many existing deep learning models for natural language processing tasks focus on learning the compositionality of their in-puts, which requires many expensive com-putations. We present a simple deep neural network that competes with and, in some cases, outperforms such models on sen-timent analysis ..."
Abstract
- Add to MetaCart
(Show Context)
Many existing deep learning models for natural language processing tasks focus on learning the compositionality of their in-puts, which requires many expensive com-putations. We present a simple deep neural network that competes with and, in some cases, outperforms such models on sen-timent analysis and factoid question an-swering tasks while taking only a fraction of the training time. While our model is syntactically-ignorant, we show significant improvements over previous bag-of-words models by deepening our network and ap-plying a novel variant of dropout. More-over, our model performs better than syn-tactic models on datasets with high syn-tactic variance. We show that our model makes similar errors to syntactically-aware models, indicating that for the tasks we con-sider, nonlinearly transforming the input is more important than tailoring a network to incorporate word order and syntax. 1
Under review as a conference paper at ICLR 2016 TOWARDS UNIVERSAL PARAPHRASTIC SENTENCE EMBEDDINGS
"... In this paper, we show how to create paraphrastic sentence embeddings using the Paraphrase Database (Ganitkevitch et al., 2013), an extensive semantic resource with millions of phrase pairs. We consider several compositional architectures and evaluate them on 24 textual similarity datasets encompass ..."
Abstract
- Add to MetaCart
(Show Context)
In this paper, we show how to create paraphrastic sentence embeddings using the Paraphrase Database (Ganitkevitch et al., 2013), an extensive semantic resource with millions of phrase pairs. We consider several compositional architectures and evaluate them on 24 textual similarity datasets encompassing domains such as news, tweets, web forums, news headlines, machine translation output, glosses, and image and video captions. We present the interesting result that simple com-positional architectures based on updated vector averaging vastly outperform long short-term memory (LSTM) recurrent neural networks and that these simpler ar-chitectures allow us to learn models with superior generalization. Our models are efficient, very easy to use, and competitive with task-tuned systems. We make them available to the research community1 with the hope that they can serve as the new baseline for further work on universal paraphrastic sentence embeddings. 1