Results 1 -
6 of
6
Improved semantic representations from tree-structured long short-term memory networks
- IN PROC. ACL
, 2015
"... Because of their superior ability to preserve sequence information over time, Long Short-Term Memory (LSTM) networks, a type of recurrent neural net-work with a more complex computational unit, have obtained strong results on a variety of sequence modeling tasks. The only underlying LSTM structure t ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
(Show Context)
Because of their superior ability to preserve sequence information over time, Long Short-Term Memory (LSTM) networks, a type of recurrent neural net-work with a more complex computational unit, have obtained strong results on a variety of sequence modeling tasks. The only underlying LSTM structure that has been explored so far is a linear chain. However, natural language exhibits syntactic properties that would naturally com-bine words to phrases. We introduce the Tree-LSTM, a generalization of LSTMs to tree-structured network topologies. Tree-LSTMs outperform all existing systems and strong LSTM baselines on two tasks: predicting the semantic relatedness of two
Recursive neural networks can learn logical semantics
- In Proc. of the 3rd Workshop on Continuous Vector Space Models and their Compositionality
, 2015
"... Tree-structured recursive neural networks (TreeRNNs) for sentence meaning have been successful for many applications, but it remains an open question whether the fixed-length representations that they learn can support tasks as demanding as logi-cal deduction. We pursue this question by evaluating w ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
(Show Context)
Tree-structured recursive neural networks (TreeRNNs) for sentence meaning have been successful for many applications, but it remains an open question whether the fixed-length representations that they learn can support tasks as demanding as logi-cal deduction. We pursue this question by evaluating whether two such models— plain TreeRNNs and tree-structured neural tensor networks (TreeRNTNs)—can cor-rectly learn to identify logical relation-ships such as entailment and contradiction using these representations. In our first set of experiments, we generate artificial data from a logical grammar and use it to eval-uate the models ’ ability to learn to handle basic relational reasoning, recursive struc-tures, and quantification. We then evaluate the models on the more natural SICK chal-lenge data. Both models perform compet-itively on the SICK data and generalize well in all three experiments on simulated data, suggesting that they can learn suit-able representations for logical inference in natural language.
Long short-term memory over recursive structures.
- In Proceedings of the 32nd International Conference on Machine Learning (ICML-15),
, 2015
"... Abstract The chain-structured long short-term memory (LSTM) has showed to be effective in a wide range of problems such as speech recognition and machine translation. In this paper, we propose to extend it to tree structures, in which a memory cell can reflect the history memories of multiple child ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract The chain-structured long short-term memory (LSTM) has showed to be effective in a wide range of problems such as speech recognition and machine translation. In this paper, we propose to extend it to tree structures, in which a memory cell can reflect the history memories of multiple child cells or multiple descendant cells in a recursive process. We call the model S-LSTM, which provides a principled way of considering long-distance interaction over hierarchies, e.g., language or image parse structures. We leverage the models for semantic composition to understand the meaning of text, a fundamental problem in natural language understanding, and show that it outperforms a state-of-theart recursive model by replacing its composition layers with the S-LSTM memory blocks. We also show that utilizing the given structures is helpful in achieving a performance better than that without considering the structures.
A Comparative Study on Regularization Strategies for Embedding-based Neural Networks
"... This paper aims to compare different reg-ularization strategies to address a com-mon phenomenon, severe overfitting, in embedding-based neural networks for NLP. We chose two widely studied neu-ral models and tasks as our testbed. We tried several frequently applied or newly proposed regularization s ..."
Abstract
- Add to MetaCart
(Show Context)
This paper aims to compare different reg-ularization strategies to address a com-mon phenomenon, severe overfitting, in embedding-based neural networks for NLP. We chose two widely studied neu-ral models and tasks as our testbed. We tried several frequently applied or newly proposed regularization strategies, including penalizing weights (embeddings excluded), penalizing embeddings, re-embedding words, and dropout. We also emphasized on incremental hyperparame-ter tuning, and combining different regu-larizations. The results provide a picture on tuning hyperparameters for neural NLP models. 1
When Are Tree Structures Necessary for Deep Learning of Representations?
"... Abstract Recursive neural models, which use syntactic parse trees to recursively generate representations bottom-up, are a popular architecture. However there have not been rigorous evaluations showing for exactly which tasks this syntax-based method is appropriate. In this paper, we benchmark recu ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract Recursive neural models, which use syntactic parse trees to recursively generate representations bottom-up, are a popular architecture. However there have not been rigorous evaluations showing for exactly which tasks this syntax-based method is appropriate. In this paper, we benchmark recursive neural models against sequential recurrent neural models, enforcing applesto-apples comparison as much as possible. We investigate 4 tasks: (1) sentiment classification at the sentence level and phrase level; (2) matching questions to answerphrases; (3) discourse parsing; (4) semantic relation extraction. Our goal is to understand better when, and why, recursive models can outperform simpler models. We find that recursive models help mainly on tasks (like semantic relation extraction) that require longdistance connection modeling, particularly on very long sequences. We then introduce a method for allowing recurrent models to achieve similar performance: breaking long sentences into clause-like units at punctuation and processing them separately before combining. Our results thus help understand the limitations of both classes of models, and suggest directions for improving recurrent models.