Results 1 - 10
of
59
Convolutional Neural Networks for Sentence Classification
"... We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre-trained word vec-tors for sentence-level classification tasks. We show that a simple CNN with lit-tle hyperparameter tuning and static vec-tors achieves excellent results on multi-ple benchmarks. Lear ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
(Show Context)
We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre-trained word vec-tors for sentence-level classification tasks. We show that a simple CNN with lit-tle hyperparameter tuning and static vec-tors achieves excellent results on multi-ple benchmarks. Learning task-specific vectors through fine-tuning offers further gains in performance. We additionally propose a simple modification to the ar-chitecture to allow for the use of both task-specific and static vectors. The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification. 1
From Captions to Visual Concepts and Back
, 2014
"... This paper presents a novel approach for automatically generating image descriptions: visual detectors and language models learn directly from a dataset of image captions. We use Multiple Instance Learning to train visual detectors for words that commonly occur in captions, including many different ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
This paper presents a novel approach for automatically generating image descriptions: visual detectors and language models learn directly from a dataset of image captions. We use Multiple Instance Learning to train visual detectors for words that commonly occur in captions, including many different parts of speech such as nouns, verbs, and adjectives. The word detector outputs serve as conditional inputs to a maximum-entropy language model. The language model learns from a set of over 400,000 image descriptions to capture the statistics of word usage. We capture global semantics by re-ranking caption candidates using sentence-level features and a deep multimodal similarity model. When human judges compare the system captions to ones written by other people, the system captions have equal or better quality over 23 % of the time.
Evaluating Neural Word Representations in Tensor-Based Compositional Settings
"... We provide a comparative study be-tween neural word representations and traditional vector spaces based on co-occurrence counts, in a number of com-positional tasks. We use three differ-ent semantic spaces and implement seven tensor-based compositional models, which we then test (together with simpl ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
(Show Context)
We provide a comparative study be-tween neural word representations and traditional vector spaces based on co-occurrence counts, in a number of com-positional tasks. We use three differ-ent semantic spaces and implement seven tensor-based compositional models, which we then test (together with simpler ad-ditive and multiplicative approaches) in tasks involving verb disambiguation and sentence similarity. To check their scala-bility, we additionally evaluate the spaces using simple compositional methods on larger-scale tasks with less constrained language: paraphrase detection and di-alogue act tagging. In the more con-strained tasks, co-occurrence vectors are competitive, although choice of composi-tional method is important; on the larger-scale tasks, they are outperformed by neu-ral word embeddings, which show robust, stable performance across the tasks.
Existence of V (m, t) vectors
- J. Statist. Plann. Inference
"... Local resampling for patch-based texture synthesis in ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
(Show Context)
Local resampling for patch-based texture synthesis in
Deep recursive neural networks for compositionality in language
- In Proceedings of NIPS
, 2014
"... Recursive neural networks comprise a class of architecture that can operate on structured input. They have been previously successfully applied to model com-positionality in natural language using parse-tree-based structural representations. Even though these architectures are deep in structure, the ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
Recursive neural networks comprise a class of architecture that can operate on structured input. They have been previously successfully applied to model com-positionality in natural language using parse-tree-based structural representations. Even though these architectures are deep in structure, they lack the capacity for hierarchical representation that exists in conventional deep feed-forward networks as well as in recently investigated deep recurrent neural networks. In this work we introduce a new architecture — a deep recursive neural network (deep RNN) — constructed by stacking multiple recursive layers. We evaluate the proposed model on the task of fine-grained sentiment classification. Our results show that deep RNNs outperform associated shallow counterparts that employ the same number of parameters. Furthermore, our approach outperforms previous baselines on the sentiment analysis task, including a multiplicative RNN variant as well as the re-cently introduced paragraph vectors, achieving new state-of-the-art results. We provide exploratory analyses of the effect of multiple layers and show that they capture different aspects of compositionality in language. 1
Modelling, visualising and summarising documents with a single convolutional neural network
, 2014
"... Capturing the compositional process which maps the meaning of words to that of documents is a central challenge for researchers in Natural Language Process-ing and Information Retrieval. We introduce a model that is able to represent the meaning of documents by embedding them in a low dimensional ve ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
Capturing the compositional process which maps the meaning of words to that of documents is a central challenge for researchers in Natural Language Process-ing and Information Retrieval. We introduce a model that is able to represent the meaning of documents by embedding them in a low dimensional vector space, while preserving distinctions of word and sentence order crucial for capturing nu-anced semantics. Our model is based on an extended Dynamic Convolution Neu-ral Network, which learns convolution filters at both the sentence and document level, hierarchically learning to capture and compose low level lexical features into high level semantic concepts. We demonstrate the effectiveness of this model on a range of document modelling tasks, achieving strong results with no fea-ture engineering and with a more compact model. Inspired by recent advances in visualising deep convolution networks for computer vision, we present a novel vi-sualisation technique for our document networks which not only provides insight into their learning process, but also can be interpreted to produce a compelling automatic summarisation system for texts. 1
Character-Aware Neural Language Models Yoon Kim† †School of Engineering and Applied Sciences Harvard University
"... We describe a simple neural language model that re-lies only on character-level inputs. Predictions are still made at the word-level. Our model employs a con-volutional neural network (CNN) and a highway net-work over characters, whose output is given to a long short-term memory (LSTM) recurrent neu ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We describe a simple neural language model that re-lies only on character-level inputs. Predictions are still made at the word-level. Our model employs a con-volutional neural network (CNN) and a highway net-work over characters, whose output is given to a long short-term memory (LSTM) recurrent neural net-work language model (RNN-LM). On the English Penn Treebank the model is on par with the existing state-of-the-art despite having 60 % fewer parameters. On languages with rich morphology (Arabic, Czech, French, German, Spanish, Russian), the model out-performs word-level/morpheme-level LSTM baselines, again with fewer parameters. The results suggest that on many languages, character inputs are sufficient for lan-guage modeling. Analysis of word representations ob-tained from the character composition part of the model reveals that the model is able to encode, from characters only, both semantic and orthographic information.
RECURSIVE DEEP LEARNING FOR NATURAL LANGUAGE PROCESSING AND COMPUTER VISION
, 2014
"... ii ..."
(Show Context)
Learning from real users: Rating dialogue success with neural networks for reinforcement learning in spoken dialogue systems
- in Proceedings of Interspeech
, 2015
"... To train a statistical spoken dialogue system (SDS) it is essen-tial that an accurate method for measuring task success is avail-able. To date training has relied on presenting a task to either simulated or paid users and inferring the dialogue’s success by observing whether this presented task was ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
(Show Context)
To train a statistical spoken dialogue system (SDS) it is essen-tial that an accurate method for measuring task success is avail-able. To date training has relied on presenting a task to either simulated or paid users and inferring the dialogue’s success by observing whether this presented task was achieved or not. Our aim however is to be able to learn from real users acting under their own volition, in which case it is non-trivial to rate the suc-cess as any prior knowledge of the task is simply unavailable. User feedback may be utilised but has been found to be incon-sistent. Hence, here we present two neural network models that evaluate a sequence of turn-level features to rate the success of a dialogue. Importantly these models make no use of any prior knowledge of the user’s task. The models are trained on dia-logues generated by a simulated user and the best model is then used to train a policy on-line which is shown to perform at least as well as a baseline system using prior knowledge of the user’s task. We note that the models should also be of interest for eval-uating SDS and for monitoring a dialogue in rule-based SDS. Index Terms: spoken dialogue systems, real users, reward pre-diction, dialogue success classification, neural network
Twitter Sentiment Analysis with Deep Convolutional Neural Networks
"... This paper describes our deep learning system for sentiment anal-ysis of tweets. The main contribution of this work is a new model for initializing the parameter weights of the convolutional neural network, which is crucial to train an accurate model while avoid-ing the need to inject any additional ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
This paper describes our deep learning system for sentiment anal-ysis of tweets. The main contribution of this work is a new model for initializing the parameter weights of the convolutional neural network, which is crucial to train an accurate model while avoid-ing the need to inject any additional features. Briefly, we use an unsupervised neural language model to train initial word embed-dings that are further tuned by our deep learning model on a distant supervised corpus. At a final stage, the pre-trained parameters of the network are used to initialize the model. We train the latter on the supervised training data recently made available by the official system evaluation campaign on Twitter Sentiment Analysis orga-nized by Semeval-2015. A comparison between the results of our approach and the systems participating in the challenge on the of-ficial test sets, suggests that our model could be ranked in the first two positions in both the phrase-level subtask A (among 11 teams) and on the message-level subtask B (among 40 teams). This is an important evidence on the practical value of our solution.