Results 1 - 10
of
42
GloVe: Global Vectors for Word Representation
"... Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arith-metic, but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regular ..."
Abstract
-
Cited by 123 (9 self)
- Add to MetaCart
Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arith-metic, but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global log-bilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word co-occurrence matrix, rather than on the en-tire sparse matrix or on individual context windows in a large corpus. The model pro-duces a vector space with meaningful sub-structure, as evidenced by its performance of 75 % on a recent word analogy task. It also outperforms related models on simi-larity tasks and named entity recognition. 1
SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation
"... We present SimLex-999, a gold standard re-source for evaluating distributional semantic models that improves on existing resources in several important ways. First, in contrast to gold standards such as WordSim-353 and MEN, it explicitly quantifies similarity rather than association or relatedness s ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
We present SimLex-999, a gold standard re-source for evaluating distributional semantic models that improves on existing resources in several important ways. First, in contrast to gold standards such as WordSim-353 and MEN, it explicitly quantifies similarity rather than association or relatedness so that pairs of entities that are associated but not actually similar (Freud, psychology) have a low rat-ing. We show that, via this focus on similar-ity, SimLex-999 incentivizes the development of models with a different, and arguably wider range of applications than those which reflect conceptual association. Second, SimLex-999 contains a range of concrete and abstract ad-jective, noun and verb pairs, together with an independent rating of concreteness and (free) association strength for each pair. This diver-sity enables fine-grained analyses of the per-formance of models on concepts of different types, and consequently greater insight into how architectures can be improved. Further, unlike existing gold standard evaluations, for which automatic approaches have reached or surpassed the inter-annotator agreement ceil-ing, state-of-the-art models perform well be-low this ceiling on SimLex-999. There is therefore plenty of scope for SimLex-999 to quantify future improvements to distributional semantic models, guiding the development of the next generation of representation-learning architectures. 1
Topical word embeddings
- In TwentyNinth AAAI Conference on Artificial Intelligence
, 2015
"... Most word embedding models typically represent each word using a single vector, which makes these mod-els indiscriminative for ubiquitous homonymy and pol-ysemy. In order to enhance discriminativeness, we em-ploy latent topic models to assign topics for each word in the text corpus, and learn topica ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Most word embedding models typically represent each word using a single vector, which makes these mod-els indiscriminative for ubiquitous homonymy and pol-ysemy. In order to enhance discriminativeness, we em-ploy latent topic models to assign topics for each word in the text corpus, and learn topical word embeddings (TWE) based on both words and their topics. In this way, contextual word embeddings can be flexibly ob-tained to measure contextual word similarity. We can also build document representations, which are more expressive than some widely-used document models such as latent topic models. In the experiments, we eval-uate the TWE models on two tasks, contextual word similarity and text classification. The experimental re-sults show that our models outperform typical word em-bedding models including the multi-prototype version on contextual word similarity, and also exceed latent topic models and other representative document mod-els on text classification. The source code of this pa-per can be obtained from
Document Modeling with Gated Recurrent Neural Network for Sentiment Classification
"... Document level sentiment classification remains a challenge: encoding the intrin-sic relations between sentences in the se-mantic meaning of a document. To ad-dress this, we introduce a neural network model to learn vector-based document rep-resentation in a unified, bottom-up fash-ion. The model fi ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Document level sentiment classification remains a challenge: encoding the intrin-sic relations between sentences in the se-mantic meaning of a document. To ad-dress this, we introduce a neural network model to learn vector-based document rep-resentation in a unified, bottom-up fash-ion. The model first learns sentence rep-resentation with convolutional neural net-work or long short-term memory. After-wards, semantics of sentences and their relations are adaptively encoded in docu-ment representation with gated recurren-t neural network. We conduct documen-t level sentiment classification on four large-scale review datasets from IMDB and Yelp Dataset Challenge. Experimen-tal results show that: (1) our neural mod-el shows superior performances over sev-eral state-of-the-art algorithms; (2) gat-ed recurrent neural network dramatically outperforms standard recurrent neural net-work in document modeling for sentiment classification.1 1
How to make words with vectors: Phrase generation in distributional semantics
- In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL
, 2014
"... Abstract We introduce the problem of generation in distributional semantics: Given a distributional vector representing some meaning, how can we generate the phrase that best expresses that meaning? We motivate this novel challenge on theoretical and practical grounds and propose a simple data-driv ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Abstract We introduce the problem of generation in distributional semantics: Given a distributional vector representing some meaning, how can we generate the phrase that best expresses that meaning? We motivate this novel challenge on theoretical and practical grounds and propose a simple data-driven approach to the estimation of generation functions. We test this in a monolingual scenario (paraphrase generation) as well as in a cross-lingual setting (translation by synthesizing adjectivenoun phrase vectors in English and generating the equivalent expressions in Italian).
Rehabilitation of Count-based Models for Word Vector Representations
"... Recent works on word representations mostly rely on predictive models. Distributed word representations (aka word embeddings) are trained to optimally predict the contexts in which the corresponding words tend to ap-pear. Such models have succeeded in captur-ing word similarities as well as semantic ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Recent works on word representations mostly rely on predictive models. Distributed word representations (aka word embeddings) are trained to optimally predict the contexts in which the corresponding words tend to ap-pear. Such models have succeeded in captur-ing word similarities as well as semantic and syntactic regularities. Instead, we aim at re-viving interest in a model based on counts. We present a systematic study of the use of the Hellinger distance to extract semantic representations from the word co-occurrence statistics of large text corpora. We show that this distance gives good performance on word similarity and analogy tasks, with a proper type and size of context, and a dimensional-ity reduction based on a stochastic low-rank approximation. Besides being both simple and intuitive, this method also provides an encoding function which can be used to in-fer unseen words or phrases. This becomes a clear advantage compared to predictive mod-els which must train these new words. 1
Recurrent convolutional neural networks for text classification
- In Proc. Conference of the Association for the Advancement of Artificial Intelligence (AAAI
, 2015
"... Text classification is a foundational task in many NLP applications. Traditional text classifiers often rely on many human-designed features, such as dictionaries, knowledge bases and special tree kernels. In contrast to traditional methods, we introduce a recurrent con-volutional neural network for ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Text classification is a foundational task in many NLP applications. Traditional text classifiers often rely on many human-designed features, such as dictionaries, knowledge bases and special tree kernels. In contrast to traditional methods, we introduce a recurrent con-volutional neural network for text classification with-out human-designed features. In our model, we apply a recurrent structure to capture contextual information as far as possible when learning word representations, which may introduce considerably less noise compared to traditional window-based neural networks. We also employ a max-pooling layer that automatically judges which words play key roles in text classification to cap-ture the key components in texts. We conduct experi-ments on four commonly used datasets. The experimen-tal results show that the proposed method outperforms the state-of-the-art methods on several datasets, partic-ularly on document-level datasets.
The Inside-Outside Recursive Neural Network model for Dependency Parsing
"... We propose the first implementation of an infinite-order generative dependency model. The model is based on a new recursive neural network architecture, the Inside-Outside Recursive Neural Network. This architecture allows information to flow not only bottom-up, as in traditional recursive neural ne ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
We propose the first implementation of an infinite-order generative dependency model. The model is based on a new recursive neural network architecture, the Inside-Outside Recursive Neural Network. This architecture allows information to flow not only bottom-up, as in traditional recursive neural networks, but also top-down. This is achieved by computing content as well as context representations for any constituent, and letting these rep-resentations interact. Experimental re-sults on the English section of the Uni-versal Dependency Treebank show that the infinite-order model achieves a per-plexity seven times lower than the tradi-tional third-order model using counting, and tends to choose more accurate parses in k-best lists. In addition, reranking with this model achieves state-of-the-art unla-belled attachment scores and unlabelled exact match scores. 1
Big data small data, in domain out-of domain, known word unknown word: The impact of word representation on sequence labelling tasks
- In Proc. CoNLL
, 2015
"... Abstract Word embeddings -distributed word representations that can be learned from unlabelled data -have been shown to have high utility in many natural language processing applications. In this paper, we perform an extrinsic evaluation of four popular word embedding methods in the context of four ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract Word embeddings -distributed word representations that can be learned from unlabelled data -have been shown to have high utility in many natural language processing applications. In this paper, we perform an extrinsic evaluation of four popular word embedding methods in the context of four sequence labelling tasks: part-of-speech tagging, syntactic chunking, named entity recognition, and multiword expression identification. A particular focus of the paper is analysing the effects of task-based updating of word representations. We show that when using word embeddings as features, as few as several hundred training instances are sufficient to achieve competitive results, and that word embeddings lead to improvements over out-of-vocabulary words and also out of domain. Perhaps more surprisingly, our results indicate there is little difference between the different word embedding methods, and that simple Brown clusters are often competitive with word embeddings across all tasks we consider.
Medical semantic similarity with a neural language model
- In CIKM’14
, 2014
"... Advances in neural network language models have demon-strated that these models can effectively learn representa-tions of words meaning. In this paper, we explore a varia-tion of neural language models that can learn on concepts taken from structured ontologies and extracted from free-text, rather t ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Advances in neural network language models have demon-strated that these models can effectively learn representa-tions of words meaning. In this paper, we explore a varia-tion of neural language models that can learn on concepts taken from structured ontologies and extracted from free-text, rather than directly from terms in free-text. This model is employed for the task of measuring semantic similarity between medical concepts, a task that is central to a number of techniques in medical informatics and informa-tion retrieval. The model is built with two medical corpora (journal abstracts and patient records) and empirically val-idated on two ground-truth datasets of human-judged con-cept pairs assessed by medical professionals. Empirically, our approach correlates closely with expert human assessors ( ≈ 0.9) and outperforms a number of state-of-the-art bench-marks for medical semantic similarity. The demonstrated superiority of this model for providing an effective semantic similarity measure is promising in that this may translate into effectiveness gains for techniques in medical information retrieval and medical informatics (e.g., query expansion and literature-based discovery).