Results 1 - 10
of
297
Flickr tag recommendation based on collective knowledge
- IN WWW ’08: PROC. OF THE 17TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB
, 2008
"... Online photo services such as Flickr and Zooomr allow users to share their photos with family, friends, and the online community at large. An important facet of these services is that users manually annotate their photos using so called tags, which describe the contents of the photo or provide addit ..."
Abstract
-
Cited by 224 (1 self)
- Add to MetaCart
(Show Context)
Online photo services such as Flickr and Zooomr allow users to share their photos with family, friends, and the online community at large. An important facet of these services is that users manually annotate their photos using so called tags, which describe the contents of the photo or provide additional contextual and semantical information. In this paper we investigate how we can assist users in the tagging phase. The contribution of our research is twofold. We analyse a representative snapshot of Flickr and present the results by means of a tag characterisation focussing on how users tags photos and what information is contained in the tagging. Based on this analysis, we present and evaluate tag recommendation strategies to support the user in the photo annotation task by recommending a set of tags that can be added to the photo. The results of the empirical evaluation show that we can effectively recommend relevant tags for a variety of photos with different levels of exhaustiveness of original tagging.
The nested chinese restaurant process and bayesian inference of topic hierarchies
, 2007
"... We present the nested Chinese restaurant process (nCRP), a stochastic process which assigns probability distributions to infinitely-deep, infinitely-branching trees. We show how this stochastic process can be used as a prior distribution in a Bayesian nonparametric model of document collections. Spe ..."
Abstract
-
Cited by 128 (15 self)
- Add to MetaCart
(Show Context)
We present the nested Chinese restaurant process (nCRP), a stochastic process which assigns probability distributions to infinitely-deep, infinitely-branching trees. We show how this stochastic process can be used as a prior distribution in a Bayesian nonparametric model of document collections. Specifically, we present an application to information retrieval in which documents are modeled as paths down a random tree, and the preferential attachment dynamics of the nCRP leads to clustering of documents according to sharing of topics at multiple levels of abstraction. Given a corpus of documents, a posterior inference algorithm finds an approximation to a posterior distribution over trees, topics and allocations of words to levels of the tree. We demonstrate this algorithm on collections of scientific abstracts from several journals. This model exemplifies a recent trend in statistical machine learning—the use of Bayesian nonparametric methods to infer distributions on flexible data structures.
Learning Taxonomic Relations from Heterogeneous Evidence
"... We present a novel approach to the automatic acquisition of taxonomic relations. The main difference to earlier approaches is that we do not only consider one single source of evidence, i.e. a specific algorithm or approach, but examine the possibility of learning taxonomic relations by considerin ..."
Abstract
-
Cited by 113 (9 self)
- Add to MetaCart
(Show Context)
We present a novel approach to the automatic acquisition of taxonomic relations. The main difference to earlier approaches is that we do not only consider one single source of evidence, i.e. a specific algorithm or approach, but examine the possibility of learning taxonomic relations by considering various and heterogeneous forms of evidence. In particular, we derive these different evidences by using well-known NLP techniques and resources and combine them via two simple strategies. Our approach shows very promising results compared to other results from the literature. The main aim of the work presented in this paper is (i) to gain insight into the behaviour of different approaches to learn taxonomic relations, (ii) to provide a first step towards combining these different approaches, and (iii) to establish a baseline for further research.
Improvements in automatic thesaurus extraction
- IN PROCEEDINGS OF THE WORKSHOP ON UNSUPERVISED LEXICAL ACQUISITION
, 2002
"... The use of semantic resources is common in modern NLP systems, but methods to extract lexical semantics have only recently begun to perform well enough for practical use. We evaluate existing and new similarity metrics for thesaurus extraction, and experiment with the tradeoff between extraction per ..."
Abstract
-
Cited by 74 (3 self)
- Add to MetaCart
(Show Context)
The use of semantic resources is common in modern NLP systems, but methods to extract lexical semantics have only recently begun to perform well enough for practical use. We evaluate existing and new similarity metrics for thesaurus extraction, and experiment with the tradeoff between extraction performance and efficiency. We propose an approximation algorithm, based on canonical attributes and coarse- and fine-grained matching, that reduces the time complexity and execution time of thesaurus extraction with only a marginal performance penalty.
Facetmap: A scalable search and browse visualization
- IEEE Transactions on Visualization and Computer Graphics
"... Abstract — The dominant paradigm for searching and browsing large data stores is text-based: presenting a scrollable list of search results in response to textual search term input. While this works well for the Web, there is opportunity for improvement in the domain of personal information stores, ..."
Abstract
-
Cited by 48 (3 self)
- Add to MetaCart
(Show Context)
Abstract — The dominant paradigm for searching and browsing large data stores is text-based: presenting a scrollable list of search results in response to textual search term input. While this works well for the Web, there is opportunity for improvement in the domain of personal information stores, which tend to have more heterogeneous data and richer metadata. In this paper, we introduce FacetMap, an interactive, query-driven visualization, generalizable to a wide range of metadata-rich data stores. FacetMap uses a visual metaphor for both input (selection of metadata facets as filters) and output. Results of a user study provide insight into tradeoffs between FacetMap’s graphical approach and the traditional text-oriented approach. Index Terms — Graphical visualization, interactive information retrieval, faceted metadata 1
Automating Creation of Hierarchical Faceted Metadata Structures
- In Procs. of the Human Language Technology Conference (NAACL HLT
, 2007
"... We describe Castanet, an algorithm for automatically generating hierarchical faceted metadata from textual descriptions of items, to be incorporated into browsing and navigation interfaces for large information collections. From an existing lexical database (such as WordNet), Castanet carves out a s ..."
Abstract
-
Cited by 44 (1 self)
- Add to MetaCart
We describe Castanet, an algorithm for automatically generating hierarchical faceted metadata from textual descriptions of items, to be incorporated into browsing and navigation interfaces for large information collections. From an existing lexical database (such as WordNet), Castanet carves out a structure that reflects the contents of the target information collection; moderate manual modifications improve the outcome. The algorithm is simple yet effective: a study conducted with 34 information architects finds that Castanet achieves higher quality results than other automated category creation algorithms, and 85 % of the study participants said they would like to use the system for their work. 1
Automatic construction of multifaceted browsing interfaces
- In CIKM
, 2005
"... Databases of text and text-annotated data constitute a significant fraction of the information available in electronic form. Searching and browsing are the typical ways that users locate items of interest in such databases. Interfaces that use multifaceted hierarchies represent a new powerful browsi ..."
Abstract
-
Cited by 42 (2 self)
- Add to MetaCart
(Show Context)
Databases of text and text-annotated data constitute a significant fraction of the information available in electronic form. Searching and browsing are the typical ways that users locate items of interest in such databases. Interfaces that use multifaceted hierarchies represent a new powerful browsing paradigm which has been proven to be a successful complement to keyword searching. Thus far, multifaceted hierarchies have been created manually or semi-automatically, making it difficult to deploy multifaceted interfaces over a large number of databases. We present automatic and scalable methods for creation of multifaceted interfaces. Our methods are integrated with traditional relational databases and can scale well for large databases. Furthermore, we present methods for selecting the best portions of the generated hierarchies when the screen space is not sufficient for displaying all the hierarchy at once. We apply our technique to a range of large data sets, including annotated images, television programming schedules, and web pages. The results are promising and suggest directions for future research.
Ontology Learning from Text: An Overview
- In Paul Buitelaar, P., Cimiano, P., Magnini B. (Eds.), Ontology Learning from Text: Methods, Applications and Evaluation
, 2005
"... ..."
(Show Context)
Generating Hierarchical Summaries for Web Searches
, 2003
"... Hierarchies provide a means of organizing, summarizing and accessing information. We describe a method for automatically generating hierarchies from small collections of text, and then apply this technique to summarizing the documents retrieved by a search engine. We show that these hierarchies prov ..."
Abstract
-
Cited by 41 (1 self)
- Add to MetaCart
(Show Context)
Hierarchies provide a means of organizing, summarizing and accessing information. We describe a method for automatically generating hierarchies from small collections of text, and then apply this technique to summarizing the documents retrieved by a search engine. We show that these hierarchies provide better access to the documents than a simple ranked list and that the terms in the hierarchy are better summaries of the documents than the top TF.IDF weighted terms. In addition, we discuss the formal framework of the technique and how the technique has been used with news databases and TREC collections.