Results 1 - 10
of
11
Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction
"... Web search result clustering aims to facilitate information search on the Web. Rather than presenting the results of a query as a flat list, these are grouped on the basis of their similarity and subsequently shown to the user as a list of possibly labeled clusters. Each cluster is supposed to repre ..."
Abstract
-
Cited by 21 (8 self)
- Add to MetaCart
(Show Context)
Web search result clustering aims to facilitate information search on the Web. Rather than presenting the results of a query as a flat list, these are grouped on the basis of their similarity and subsequently shown to the user as a list of possibly labeled clusters. Each cluster is supposed to represent a different meaning of the input query, thus taking into account the language ambiguity, i.e. polysemy, issue. However, Web clustering methods typically rely on some shallow notion of textual similarity of search result snippets. As a result, text snippets with no word in common tend to be clustered separately, even if they share the same meaning, whereas snippets with words in common may be grouped together even if they refer to different meanings of the input query. In this paper, we present a novel approach to Web search result clustering based on the automatic discovery of word senses from raw text, a task referred to as Word Sense Induction (WSI). Key to our approach is to first acquire the senses (i.e., meanings) of an ambiguous query and then cluster the search results based on their semantic similarity to the word senses induced. Our experiments, conducted on datasets of ambiguous queries, show that our approach outperforms both Web clustering and search engines. 1.
Grounding Action Descriptions in Videos
"... Recent work has shown that the integration of visual information into text-based models can substantially improve model predictions, but so far only visual information extracted from static images has been used. In this paper, we consider the problem of grounding sentences describing actions in visu ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
(Show Context)
Recent work has shown that the integration of visual information into text-based models can substantially improve model predictions, but so far only visual information extracted from static images has been used. In this paper, we consider the problem of grounding sentences describing actions in visual information extracted from videos. We present a general purpose corpus that aligns high quality videos with multiple natural language descriptions of the actions portrayed in the videos, together with an annotation of how similar the action descriptions are to each other. Experimental results demonstrate that a text-based model of similarity between actions improves substantially when combined with visual information from videos depicting the described actions. 1
Models of Semantic Representation with Visual Attributes
"... We consider the problem of grounding the meaning of words in the physical world and focus on the visual modality which we represent by visual attributes. We create a new large-scale taxonomy of visual attributes covering more than 500 concepts and their corresponding 688K images. We use this dataset ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We consider the problem of grounding the meaning of words in the physical world and focus on the visual modality which we represent by visual attributes. We create a new large-scale taxonomy of visual attributes covering more than 500 concepts and their corresponding 688K images. We use this dataset to train attribute classifiers and integrate their predictions with text-based distributional models of word meaning. We show that these bimodal models give a better fit to human word association data compared to amodal models and word representations based on handcrafted norming data. 1
Learning Grounded Meaning Representations with Autoencoders
- In Proceedings of ACL 2014
, 2014
"... In this paper we address the problem of grounding distributional representations of lexical meaning. We introduce a new model which uses stacked autoencoders to learn higher-level embeddings from tex-tual and visual input. The two modali-ties are encoded as vectors of attributes and are obtained aut ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
In this paper we address the problem of grounding distributional representations of lexical meaning. We introduce a new model which uses stacked autoencoders to learn higher-level embeddings from tex-tual and visual input. The two modali-ties are encoded as vectors of attributes and are obtained automatically from text and images, respectively. We evaluate our model on its ability to simulate similar-ity judgments and concept categorization. On both tasks, our approach outperforms baselines and related models. 1
See No Evil, Say No Evil: Description Generation from Densely Labeled Images
"... This paper studies generation of descrip-tive sentences from densely annotated im-ages. Previous work studied generation from automatically detected visual infor-mation but produced a limited class of sen-tences, hindered by currently unreliable recognition of activities and attributes. In-stead, we ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
This paper studies generation of descrip-tive sentences from densely annotated im-ages. Previous work studied generation from automatically detected visual infor-mation but produced a limited class of sen-tences, hindered by currently unreliable recognition of activities and attributes. In-stead, we collect human annotations of ob-jects, parts, attributes and activities in im-ages. These annotations allow us to build a significantly more comprehensive model of language generation and allow us to study what visual information is required to generate human-like descriptions. Ex-periments demonstrate high quality output and that activity annotations and relative spatial location of objects contribute most to producing high quality sentences. 1
Interpretable semantic vectors from a joint model of brain-and text-based meaning
- In Proceedings of ACL
, 2014
"... Abstract Vector space models (VSMs) represent word meanings as points in a high dimensional space. VSMs are typically created using a large text corpora, and so represent word semantics as observed in text. We present a new algorithm (JNNSE) that can incorporate a measure of semantics not previousl ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Abstract Vector space models (VSMs) represent word meanings as points in a high dimensional space. VSMs are typically created using a large text corpora, and so represent word semantics as observed in text. We present a new algorithm (JNNSE) that can incorporate a measure of semantics not previously used to create VSMs: brain activation data recorded while people read words. The resulting model takes advantage of the complementary strengths and weaknesses of corpus and brain activation data to give a more complete representation of semantics. Evaluations show that the model 1) matches a behavioral measure of semantics more closely, 2) can be used to predict corpus data for unseen words and 3) has predictive power that generalizes across brain imaging technologies and across subjects. We believe that the model is thus a more faithful representation of mental vocabularies.
Is this a wampimuk? Cross-modal mapping between distributional semantics and the visual world
"... Abstract Following up on recent work on establishing a mapping between vector-based semantic embeddings of words and the visual representations of the corresponding objects from natural images, we first present a simple approach to cross-modal vector-based semantics for the task of zero-shot learni ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract Following up on recent work on establishing a mapping between vector-based semantic embeddings of words and the visual representations of the corresponding objects from natural images, we first present a simple approach to cross-modal vector-based semantics for the task of zero-shot learning, in which an image of a previously unseen object is mapped to a linguistic representation denoting its word. We then introduce fast mapping, a challenging and more cognitively plausible variant of the zero-shot task, in which the learner is exposed to new objects and the corresponding words in very limited linguistic contexts. By combining prior linguistic and visual knowledge acquired about words and their objects, as well as exploiting the limited new evidence available, the learner must learn to associate new objects with words. Our results on this task pave the way to realistic simulations of how children or robots could use existing knowledge to bootstrap grounded semantic knowledge about new concepts.
Visual Bilingual Lexicon Induction with Transferred ConvNet Features
"... This paper is concerned with the task of bilingual lexicon induction using image-based features. By applying features from a convolutional neural network (CNN), we obtain state-of-the-art performance on a standard dataset, obtaining a 79 % relative improvement over previous work which uses bags of v ..."
Abstract
- Add to MetaCart
(Show Context)
This paper is concerned with the task of bilingual lexicon induction using image-based features. By applying features from a convolutional neural network (CNN), we obtain state-of-the-art performance on a standard dataset, obtaining a 79 % relative improvement over previous work which uses bags of visual words based on SIFT features. The CNN image-based approach is also compared with state-of-the-art lin-guistic approaches to bilingual lexicon in-duction, even outperforming these for one of three language pairs on another stan-dard dataset. Furthermore, we shed new light on the type of visual similarity met-ric to use for genuine similarity versus re-latedness tasks, and experiment with using multiple layers from the same network in an attempt to improve performance. 1
Multi- and Cross-Modal Semantics Beyond Vision: Grounding in Auditory Perception
"... Multi-modal semantics has relied on fea-ture norms or raw image data for per-ceptual input. In this paper we examine grounding semantic representations in raw auditory data, using standard evaluations for multi-modal semantics, including mea-suring conceptual similarity and related-ness. We also eva ..."
Abstract
- Add to MetaCart
(Show Context)
Multi-modal semantics has relied on fea-ture norms or raw image data for per-ceptual input. In this paper we examine grounding semantic representations in raw auditory data, using standard evaluations for multi-modal semantics, including mea-suring conceptual similarity and related-ness. We also evaluate cross-modal map-pings, through a zero-shot learning task mapping between linguistic and auditory modalities. In addition, we evaluate multi-modal representations on an unsupervised musical instrument clustering task. To our knowledge, this is the first work to com-bine linguistic and auditory information into multi-modal representations. 1
Migrating Psycholinguistic Semantic Feature Norms into Linked Data in Linguistics
"... Semantic feature norms, originally uti-lized in the field of psycholinguistics as a tool for studying human semantic repre-sentation and computation, have recently attracted the attention of some NLP/IR re-searchers who wish to improve their task performances. However, currently avail-able semantic ..."
Abstract
- Add to MetaCart
Semantic feature norms, originally uti-lized in the field of psycholinguistics as a tool for studying human semantic repre-sentation and computation, have recently attracted the attention of some NLP/IR re-searchers who wish to improve their task performances. However, currently avail-able semantic feature norms are, by nature, not well-structured, making them difficult to integrate into existing resources of var-ious kinds. In this paper, by examining an actual set of semantic feature norms, we investigate which types of semantic fea-tures should be migrated into Linked Data in Linguistics (LDL) and how the migra-tion could be done. 1