Results 1 - 10
of
40
Efficient estimation of word representations in vector space
, 2013
"... We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previ-ously best performing techniques based on different types ..."
Abstract
-
Cited by 311 (6 self)
- Add to MetaCart
We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previ-ously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art perfor-mance on our test set for measuring syntactic and semantic word similarities.
Distributed representations of words and phrases and their compositionality
- In Advances in Neural Information Processing Systems
, 2013
"... ar ..."
Multimodal Neural Language Models
"... We introduce two multimodal neural language models: models of natural language that can be conditioned on other modalities. An image-text multimodal neural language model can be used to retrieve images given complex sentence queries, retrieve phrase descriptions given image queries, as well as gener ..."
Abstract
-
Cited by 28 (4 self)
- Add to MetaCart
We introduce two multimodal neural language models: models of natural language that can be conditioned on other modalities. An image-text multimodal neural language model can be used to retrieve images given complex sentence queries, retrieve phrase descriptions given image queries, as well as generate text conditioned on images. We show that in the case of image-text modelling we can jointly learn word representa-tions and image features by training our models together with a convolutional network. Unlike many of the existing methods, our approach can generate sentence descriptions for images with-out the use of templates, structured prediction, and/or syntactic trees. While we focus on image-text modelling, our algorithms can be easily ap-plied to other modalities such as audio. 1.
Joint Language and Translation Modeling with Recurrent Neural Networks
"... We present a joint language and transla-tion model based on a recurrent neural net-work which predicts target words based on an unbounded history of both source and tar-get words. The weaker independence as-sumptions of this model result in a vastly larger search space compared to related feed-forwa ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
We present a joint language and transla-tion model based on a recurrent neural net-work which predicts target words based on an unbounded history of both source and tar-get words. The weaker independence as-sumptions of this model result in a vastly larger search space compared to related feed-forward-based language or translation models. We tackle this issue with a new lattice rescor-ing algorithm and demonstrate its effective-ness empirically. Our joint model builds on a well known recurrent neural network language model (Mikolov, 2012) augmented by a layer of additional inputs from the source language. We show competitive accuracy compared to the traditional channel model features. Our best results improve the output of a system trained on WMT 2012 French-English data by up to 1.5 BLEU, and by 1.1 BLEU on average across several test sets. 1
Deep neural network language models
- In Proceedings of NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
, 2012
"... In recent years, neural network language models (NNLMs) have shown success in both peplexity and word error rate (WER) compared to conventional n-gram language models. Most NNLMs are trained with one hidden layer. Deep neural networks (DNNs) with more hidden layers have been shown to capture higher- ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
In recent years, neural network language models (NNLMs) have shown success in both peplexity and word error rate (WER) compared to conventional n-gram language models. Most NNLMs are trained with one hidden layer. Deep neural networks (DNNs) with more hidden layers have been shown to capture higher-level discriminative information about input features, and thus produce better networks. Motivated by the success of DNNs in acoustic modeling, we explore deep neural network language models (DNN LMs) in this paper. Results on a Wall Street Journal (WSJ) task demonstrate that DNN LMs offer improvements over a single hidden layer NNLM. Furthermore, our preliminary results are competitive with a model M language model, considered to be one of the current state-of-the-art techniques for language modeling. 1
Recurrent Neural Networks for Language Understanding
"... Recurrent Neural Network Language Models (RNN-LMs) have recently shown exceptional performance across a variety of applications. In this paper, we modify the architecture to perform Language Understanding, and advance the state-of-the-art for the widely used ATIS dataset. The core of our approach is ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
(Show Context)
Recurrent Neural Network Language Models (RNN-LMs) have recently shown exceptional performance across a variety of applications. In this paper, we modify the architecture to perform Language Understanding, and advance the state-of-the-art for the widely used ATIS dataset. The core of our approach is to take words as input as in a standard RNN-LM, and then to predict slot labels rather than words on the output side. We present several variations that differ in the amount of word context that is used on the input side, and in the use of non-lexical features. Remarkably, our simplest model produces state-of-the-art results, and we advance state-of-the-art through the use of bagof-words, word embedding, named-entity, syntactic, and wordclass features. Analysis indicates that the superior performance is attributable to the task-specific word representations learned by the RNN.
Code completion with statistical language models
- In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation
, 2014
"... We address the problem of synthesizing code completions for pro-grams using APIs. Given a program with holes, we synthesize com-pletions for holes with the most likely sequences of method calls. Our main idea is to reduce the problem of code completion to a natural-language processing problem of pre ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
(Show Context)
We address the problem of synthesizing code completions for pro-grams using APIs. Given a program with holes, we synthesize com-pletions for holes with the most likely sequences of method calls. Our main idea is to reduce the problem of code completion to a natural-language processing problem of predicting probabilities of sentences. We design a simple and scalable static analysis that extracts sequences of method calls from a large codebase, and index these into a statistical language model. We then employ the language model to find the highest ranked sentences, and use them to synthesize a code completion. Our approach is able to synthesize sequences of calls across multiple objects together with their arguments. Experiments show that our approach is fast and effective. Virtu-ally all computed completions typecheck, and the desired comple-tion appears in the top 3 results in 90 % of the cases.
Minimum Translation Modeling with Recurrent Neural Networks
- In Proc. of EACL. Association for Computational Linguistics
, 2014
"... We introduce recurrent neural network-based Minimum Translation Unit (MTU) models which make predictions based on an unbounded history of previous bilin-gual contexts. Traditional back-off n-gram models suffer under the sparse nature of MTUs which makes estimation of high-order sequence models chall ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
We introduce recurrent neural network-based Minimum Translation Unit (MTU) models which make predictions based on an unbounded history of previous bilin-gual contexts. Traditional back-off n-gram models suffer under the sparse nature of MTUs which makes estimation of high-order sequence models challenging. We tackle the sparsity problem by modeling MTUs both as bags-of-words and as a sequence of individual source and target words. Our best results improve the out-put of a phrase-based statistical machine translation system trained on WMT 2012 French-English data by up to 1.5 BLEU, and we outperform the traditional n-gram based MTU approach by up to 0.8 BLEU. 1
Adaptation Data Selection using Neural Language Models: Experiments in Machine Translation
"... Data selection is an effective approach to domain adaptation in statistical machine translation. The idea is to use language models trained on small in-domain text to select similar sentences from large general-domain corpora, which are then incorporated into the training data. Substantial gains hav ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Data selection is an effective approach to domain adaptation in statistical machine translation. The idea is to use language models trained on small in-domain text to select similar sentences from large general-domain corpora, which are then incorporated into the training data. Substantial gains have been demonstrated in previous works, which employ standard n-gram language models. Here, we explore the use of neural language models for data selection. We hypothesize that the continuous vector representation of words in neural language models makes them more effective than n-grams for modeling unknown word contexts, which are prevalent in general-domain text. In a comprehensive evaluation of 4 language pairs (English to German, French, Russian, Spanish), we found that neural language models are indeed viable tools for data selection: while the improvements are varied (i.e. 0.1 to 1.7 gains in BLEU), they are fast to train on small in-domain data and can sometimes substantially outperform conventional n-grams. 1
Learning a recurrent visual representation for image caption generation. arXiv preprint arXiv:1411.5654
, 2014
"... In this paper we explore the bi-directional mapping be-tween images and their sentence-based descriptions. We propose learning this mapping using a recurrent neural net-work. Unlike previous approaches that map both sentences and images to a common embedding, we enable the gener-ation of novel sente ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
(Show Context)
In this paper we explore the bi-directional mapping be-tween images and their sentence-based descriptions. We propose learning this mapping using a recurrent neural net-work. Unlike previous approaches that map both sentences and images to a common embedding, we enable the gener-ation of novel sentences given an image. Using the same model, we can also reconstruct the visual features associ-ated with an image given its visual description. We use a novel recurrent visual memory that automatically learns to remember long-term visual concepts to aid in both sentence generation and visual feature reconstruction. We evaluate our approach on several tasks. These include sentence gen-eration, sentence retrieval and image retrieval. State-of-the-art results are shown for the task of generating novel im-age descriptions. When compared to human generated cap-tions, our automatically generated captions are preferred by humans over 19.8 % of the time. Results are better than or comparable to state-of-the-art results on the image and sentence retrieval tasks for methods using similar visual features. 1.