• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Distributed representations of sentences and documents. arXiv:1405.4053, (2014)

by Q V Le and T Mikolov
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 93
Next 10 →

Deep fragment embeddings for bidirectional image sentence mapping

by Andrej Karpathy, Armand Joulin, Li Fei-fei - In arXiv:1406.5679 , 2014
"... We introduce a model for bidirectional retrieval of images and sentences through a deep, multi-modal embedding of visual and natural language data. Unlike pre-vious models that directly map images or sentences into a common embedding space, our model works on a finer level and embeds fragments of im ..."
Abstract - Cited by 29 (2 self) - Add to MetaCart
We introduce a model for bidirectional retrieval of images and sentences through a deep, multi-modal embedding of visual and natural language data. Unlike pre-vious models that directly map images or sentences into a common embedding space, our model works on a finer level and embeds fragments of images (ob-jects) and fragments of sentences (typed dependency tree relations) into a com-mon space. We then introduce a structured max-margin objective that allows our model to explicitly associate these fragments across modalities. Extensive exper-imental evaluation shows that reasoning on both the global level of images and sentences and the finer level of their respective fragments improves performance on image-sentence retrieval tasks. Additionally, our model provides interpretable predictions for the image-sentence retrieval task since the inferred inter-modal alignment of fragments is explicit. 1
(Show Context)

Citation Context

...n language domain, several neural network models have been proposed to learn word/n-gram representations [29, 30, 31, 32, 33, 34], sentence representations [35] and paragraph/document representations =-=[36]-=-. 3 Proposed Model Learning and Inference. Our task is to retrieve relevant images given a sentence query, and conversely, relevant sentences given an image query. We train our model on a set of N ima...

Improved semantic representations from tree-structured long short-term memory networks

by Kai Sheng Tai, Richard Socher, Christopher D. Manning - IN PROC. ACL , 2015
"... Because of their superior ability to preserve sequence information over time, Long Short-Term Memory (LSTM) networks, a type of recurrent neural net-work with a more complex computational unit, have obtained strong results on a variety of sequence modeling tasks. The only underlying LSTM structure t ..."
Abstract - Cited by 16 (1 self) - Add to MetaCart
Because of their superior ability to preserve sequence information over time, Long Short-Term Memory (LSTM) networks, a type of recurrent neural net-work with a more complex computational unit, have obtained strong results on a variety of sequence modeling tasks. The only underlying LSTM structure that has been explored so far is a linear chain. However, natural language exhibits syntactic properties that would naturally com-bine words to phrases. We introduce the Tree-LSTM, a generalization of LSTMs to tree-structured network topologies. Tree-LSTMs outperform all existing systems and strong LSTM baselines on two tasks: predicting the semantic relatedness of two
(Show Context)

Citation Context

... 2014). Method Fine-grained Binary RAE (Socher et al., 2013) 43.2 82.4 MV-RNN (Socher et al., 2013) 44.4 82.9 RNTN (Socher et al., 2013) 45.7 85.4 DCNN (Blunsom et al., 2014) 48.5 86.8 Paragraph-Vec (=-=Le and Mikolov, 2014-=-) 48.7 87.8 CNN-non-static (Kim, 2014) 48.0 87.2 CNN-multichannel (Kim, 2014) 47.4 88.1 DRNN (Irsoy and Cardie, 2014) 49.8 86.6 LSTM 46.4 (1.1) 84.9 (0.6) Bidirectional LSTM 49.1 (1.0) 87.5 (0.5) 2-la...

Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space

by Arvind Neelakantan, Jeevan Shankar, Re Passos, Andrew Mccallum
"... There is rising interest in vector-space word embeddings and their use in NLP, especially given recent methods for their fast estimation at very large scale. Nearly all this work, however, assumes a sin-gle vector per word type—ignoring poly-semy and thus jeopardizing their useful-ness for downstrea ..."
Abstract - Cited by 14 (1 self) - Add to MetaCart
There is rising interest in vector-space word embeddings and their use in NLP, especially given recent methods for their fast estimation at very large scale. Nearly all this work, however, assumes a sin-gle vector per word type—ignoring poly-semy and thus jeopardizing their useful-ness for downstream tasks. We present an extension to the Skip-gram model that efficiently learns multiple embeddings per word type. It differs from recent related work by jointly performing word sense discrimination and embedding learning, by non-parametrically estimating the num-ber of senses per word type, and by its ef-ficiency and scalability. We present new state-of-the-art results in the word similar-ity in context task and demonstrate its scal-ability by training with one machine on a corpus of nearly 1 billion tokens in less than 6 hours. 1

A Neural Network for Factoid Question Answering over Paragraphs

by Mohit Iyyer, Jordan Boyd-graber, Leonardo Claudino, Richard Socher, Hal Daume ́ Iii
"... Text classification methods for tasks like factoid question answering typi-cally use manually defined string match-ing rules or bag of words representa-tions. These methods are ineffective when question text contains very few individual words (e.g., named entities) that are indicative of the answer. ..."
Abstract - Cited by 10 (1 self) - Add to MetaCart
Text classification methods for tasks like factoid question answering typi-cally use manually defined string match-ing rules or bag of words representa-tions. These methods are ineffective when question text contains very few individual words (e.g., named entities) that are indicative of the answer. We introduce a recursive neural network (rnn) model that can reason over such input by modeling textual composition-ality. We apply our model, qanta, to a dataset of questions from a trivia competition called quiz bowl. Unlike previous rnn models, qanta learns word and phrase-level representations that combine across sentences to reason about entities. The model outperforms multiple baselines and, when combined with information retrieval methods, ri-vals the best human players. 1

Existence of V (m, t) vectors

by Renjie Chen, Ligang Liu, Guangchang Dong - J. Statist. Plann. Inference
"... Local resampling for patch-based texture synthesis in ..."
Abstract - Cited by 8 (3 self) - Add to MetaCart
Local resampling for patch-based texture synthesis in
(Show Context)

Citation Context

...in order to backpropagate through the composition weights. Consequently, these methods learn highquality sentence representations but are tuned only for their respective task. The paragraph vector of =-=[7]-=- is an alternative to the above models in that it can learn unsupervised sentence representations by introducing a distributed sentence indicator as part of a neural language model. The downside is at...

A Multiplicative Model for Learning Distributed Text-Based Attribute Representations

by Ryan Kiros, Richard S. Zemel, Ruslan Salakhutdinov
"... In this paper we propose a general framework for learning distributed represen-tations of attributes: characteristics of text whose representations can be jointly learned with word embeddings. Attributes can correspond to a wide variety of concepts, such as document indicators (to learn sentence vec ..."
Abstract - Cited by 8 (1 self) - Add to MetaCart
In this paper we propose a general framework for learning distributed represen-tations of attributes: characteristics of text whose representations can be jointly learned with word embeddings. Attributes can correspond to a wide variety of concepts, such as document indicators (to learn sentence vectors), language in-dicators (to learn distributed language representations), meta-data and side infor-mation (such as the age, gender and industry of a blogger) or representations of authors. We describe a third-order model where word context and attribute vectors interact multiplicatively to predict the next word in a sequence. This leads to the notion of conditional word similarity: how meanings of words change when con-ditioned on different attributes. We perform several experimental tasks including sentiment classification, cross-lingual document classification, and blog author-ship attribution. We also qualitatively evaluate conditional word neighbours and attribute-conditioned text generation. 1
(Show Context)

Citation Context

...njoyed success in several NLP tasks [1, 2]. More recently, the use of distributed representations have been extended to model concepts beyond the word level, such as sentences, phrases and paragraphs =-=[3, 4, 5, 6]-=-, entities and relationships [7, 8] and embeddings of semantic categories [9, 10]. In this paper we propose a general framework for learning distributed representations of attributes: characteristics ...

Factor-based compositional embedding models

by Mo Yu, Matthew R. Gormley, Mark Dredze - In NIPS Workshop on Learning Semantics , 2014
"... Introduction Word embeddings, which are distributed word representations learned by neural language models [1, 2, 3], have been shown to be powerful word representations. They have been successfully applied to a range of NLP tasks, including syntax [2, 4, 5] and semantics [6, 7, 8]. Information abou ..."
Abstract - Cited by 7 (2 self) - Add to MetaCart
Introduction Word embeddings, which are distributed word representations learned by neural language models [1, 2, 3], have been shown to be powerful word representations. They have been successfully applied to a range of NLP tasks, including syntax [2, 4, 5] and semantics [6, 7, 8]. Information about language structure is critical in many NLP tasks, where substructures of a sen-tence and its annotations inform downstream NLP task. Yet word representations alone do not
(Show Context)

Citation Context

...beddings. A traditional approach for composition is to form a linear combination (e.g. sum) of single word representations with compositional operators either pre-defined [9, 10] or learned from data =-=[11]-=-. However, this approach ignores the useful structural information associated with the input (e.g. the order of words in a sentence and its syntactic tree). To address this problem, recent work has de...

Topical word embeddings

by Yang Liu, Zhiyuan Liu, Tat-seng Chua, Maosong Sun - In TwentyNinth AAAI Conference on Artificial Intelligence , 2015
"... Most word embedding models typically represent each word using a single vector, which makes these mod-els indiscriminative for ubiquitous homonymy and pol-ysemy. In order to enhance discriminativeness, we em-ploy latent topic models to assign topics for each word in the text corpus, and learn topica ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
Most word embedding models typically represent each word using a single vector, which makes these mod-els indiscriminative for ubiquitous homonymy and pol-ysemy. In order to enhance discriminativeness, we em-ploy latent topic models to assign topics for each word in the text corpus, and learn topical word embeddings (TWE) based on both words and their topics. In this way, contextual word embeddings can be flexibly ob-tained to measure contextual word similarity. We can also build document representations, which are more expressive than some widely-used document models such as latent topic models. In the experiments, we eval-uate the TWE models on two tasks, contextual word similarity and text classification. The experimental re-sults show that our models outperform typical word em-bedding models including the multi-prototype version on contextual word similarity, and also exceed latent topic models and other representative document mod-els on text classification. The source code of this pa-per can be obtained from
(Show Context)

Citation Context

...(Fan et al. 2008). We set the dimensions of both word and topic embeddings as K = 400. We consider the following baselines, bag-of-words (BOW) model, LDA, Skip-Gram, and Paragraph Vector (PV) models (=-=Le and Mikolov 2014-=-). The BOW model represents each document as a bag of words and the weighting scheme is TFIDF. For the TFIDF method, we select top 50, 000 words according to TFIDF scores as features. LDA represents e...

Deep recursive neural networks for compositionality in language

by Claire Cardie - In Proceedings of NIPS , 2014
"... Recursive neural networks comprise a class of architecture that can operate on structured input. They have been previously successfully applied to model com-positionality in natural language using parse-tree-based structural representations. Even though these architectures are deep in structure, the ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
Recursive neural networks comprise a class of architecture that can operate on structured input. They have been previously successfully applied to model com-positionality in natural language using parse-tree-based structural representations. Even though these architectures are deep in structure, they lack the capacity for hierarchical representation that exists in conventional deep feed-forward networks as well as in recently investigated deep recurrent neural networks. In this work we introduce a new architecture — a deep recursive neural network (deep RNN) — constructed by stacking multiple recursive layers. We evaluate the proposed model on the task of fine-grained sentiment classification. Our results show that deep RNNs outperform associated shallow counterparts that employ the same number of parameters. Furthermore, our approach outperforms previous baselines on the sentiment analysis task, including a multiplicative RNN variant as well as the re-cently introduced paragraph vectors, achieving new state-of-the-art results. We provide exploratory analyses of the effect of multiple layers and show that they capture different aspects of compositionality in language. 1
(Show Context)

Citation Context

... in which the composition is defined as a bilinear tensor product (RNTN) [8]. Additionally, we use a method that is capable of generating representations for larger pieces of text (PARAGRAPH VECTORS) =-=[17]-=-, and the dynamic convolutional neural network (DCNN) [18]. We use the previously published results for comparison using the same trainingdevelopment-test partitioning of the data. Activation Units. F...

Modelling, visualising and summarising documents with a single convolutional neural network

by Misha Denil, Alban Demiraj, Nal Kalchbrenner, Phil Blunsom N, O De Freitas , 2014
"... Capturing the compositional process which maps the meaning of words to that of documents is a central challenge for researchers in Natural Language Process-ing and Information Retrieval. We introduce a model that is able to represent the meaning of documents by embedding them in a low dimensional ve ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
Capturing the compositional process which maps the meaning of words to that of documents is a central challenge for researchers in Natural Language Process-ing and Information Retrieval. We introduce a model that is able to represent the meaning of documents by embedding them in a low dimensional vector space, while preserving distinctions of word and sentence order crucial for capturing nu-anced semantics. Our model is based on an extended Dynamic Convolution Neu-ral Network, which learns convolution filters at both the sentence and document level, hierarchically learning to capture and compose low level lexical features into high level semantic concepts. We demonstrate the effectiveness of this model on a range of document modelling tasks, achieving strong results with no fea-ture engineering and with a more compact model. Inspired by recent advances in visualising deep convolution networks for computer vision, we present a novel vi-sualisation technique for our document networks which not only provides insight into their learning process, but also can be interpreted to produce a compelling automatic summarisation system for texts. 1
(Show Context)

Citation Context

...ight: Error rates on the IMDB movie review data set. The first block is from Maas et al. [16], the second from Dahl et al. [3], the third from Wang and Manning [24] and the fourth from Le and Mikolov =-=[15]-=-. 3 Application to multiple tasks Learning a single model that is capable of solving multiple tasks has been one of the holy grails of the field of machine learning. Our ConvNet approach is strongly m...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University