Results 1 - 10
of
12
Clustering to find exemplar terms for keyphrase extraction
- In Proceedings of EMNLP
, 2009
"... Keyphrases are widely used as a brief summary of documents. Since manual assignment is time-consuming, various unsupervised ranking methods based on importance scores are proposed for keyphrase extraction. In practice, the keyphrases of a document should not only be statistically important in the do ..."
Abstract
-
Cited by 40 (5 self)
- Add to MetaCart
(Show Context)
Keyphrases are widely used as a brief summary of documents. Since manual assignment is time-consuming, various unsupervised ranking methods based on importance scores are proposed for keyphrase extraction. In practice, the keyphrases of a document should not only be statistically important in the document, but also have a good coverage of the document. Based on this observation, we propose an unsupervised method for keyphrase extraction. Firstly, the method finds exemplar terms by leveraging clustering techniques, which guarantees the document to be semantically covered by these exemplar terms. Then the keyphrases are extracted from the document using the exemplar terms. Our method outperforms sate-of-the-art graphbased ranking methods (TextRank) by 9.5 % in F1-measure. 1
Toward Selectivity Based Keyword Extraction for Croatian News
- Submitted on Workshop on Surfacing the Deep and the Social Web, Co-organised by ICT-COST Action KEYSTONE (IC1302), Riva Del Garda
, 2014
"... Abstract. Preliminary report on network based keyword extraction for Croatian is an unsupervised method for keyword extraction from the complex network. We build our approach with a new network measure-the node selectivity, motivated by the research of the graph based cen-trality approaches. The nod ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
Abstract. Preliminary report on network based keyword extraction for Croatian is an unsupervised method for keyword extraction from the complex network. We build our approach with a new network measure-the node selectivity, motivated by the research of the graph based cen-trality approaches. The node selectivity is defined as the average weight distribution on the links of the single node. We extract nodes (keyword candidates) based on the selectivity value. Furthermore, we expand ex-tracted nodes to word-tuples ranked with the highest in/out selectivity values. Selectivity based extraction does not require linguistic knowledge while it is purely derived from statistical and structural information en-compassed in the source text which is reflected into the structure of the network. Obtained sets are evaluated on a manually annotated keywords: for the set of extracted keyword candidates average F1 score is 24,63%, and average F2 score is 21,19%; for the exacted words-tuples candidates average F1 score is 25,9 % and average F2 score is 24,47%.
Automatic keyphrase extraction: A survey of the state of the art
- In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics
, 2014
"... Abstract While automatic keyphrase extraction has been examined extensively, state-of-theart performance on this task is still much lower than that on many core natural language processing tasks. We present a survey of the state of the art in automatic keyphrase extraction, examining the major sour ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
Abstract While automatic keyphrase extraction has been examined extensively, state-of-theart performance on this task is still much lower than that on many core natural language processing tasks. We present a survey of the state of the art in automatic keyphrase extraction, examining the major sources of errors made by existing systems and discussing the challenges ahead.
Towards multi-granularity multi-facet e-book retrieval
- In WWW ’07: Proceedings of the 16th international conference on World Wide Web
, 2007
"... There are more than one million digitalized books (i.e. e-books) so far in China-US Million Book Digital Library Project (MBP for short). It is thus important to design effective and powerful tools that enable users to easily search the required information and appropriately access knowledge in the ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
There are more than one million digitalized books (i.e. e-books) so far in China-US Million Book Digital Library Project (MBP for short). It is thus important to design effective and powerful tools that enable users to easily search the required information and appropriately access knowledge in the digital library. To-wards this end, currently most digital libraries simply use the traditional metadata-based or fulltext-based retrieval technologies on the e-book collection. However, there are at least two limita-tions of such e-book retrieval systems. (1) The granularity of re-trieval results is either too big or too small, and consequently the middle granularities such as chapters and paragraphs are ignored in the traditional e-book retrieval systems. (2) The mass of re-trieval results are usually ill-organized so that users often need to pay more efforts to obtain the required items. Therefore, with the
Pku at imageclef 2008: Experiments with query extension techniques for text-base and content-based image retrieval
- in Online Working Notes for the CLEF 2008 Workshop
, 2008
"... In this paper, we present our solutions for the WikipediaMM task at ImageCLEF 2008. The aim of this task is to investigate effective retrieval approaches in the context of a large-scale and heterogeneous collection of Wikipedia images that are searched by textual queries (and/or sample images and/or ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
In this paper, we present our solutions for the WikipediaMM task at ImageCLEF 2008. The aim of this task is to investigate effective retrieval approaches in the context of a large-scale and heterogeneous collection of Wikipedia images that are searched by textual queries (and/or sample images and/or concepts) describing a user’s information need. We first experimented with a text-based image retrieval approach with query extension, where the expansion terms are automatically selected from a knowledge base that is (semi-)automatically constructed from Wikipedia. We show how this open, constantly evolving encyclopedia can yield inexpensive knowledge structures that are specifically tailored to effectively enhance the semantics of queries. Encouragingly, the experimental results rank in the first place among all submitted runs. The second approach we experimented with is content-based image retrieval (CBIR), in which we first train 1-vs-all classifiers for all query concepts by using the training images obtained by Yahoo! search, and then treat the retrieval task as visual concept detection in the given Wikipedia image set. By comparison, this approach performs better than other submitted CBIR runs. Finally, we experimented with a cross-media image retrieval approach by combining and re-ranking text-based and content-based retrieval results. Despite the final experimental results were not formally submitted before the deadline, this approach performs remarkably better than the text-based retrieval or CBIR approaches. 1.
Large-scale cross-media retrieval of wikipediamm images with textual and visual query expansion
- in [Evaluating Systems for Multilingual and Multimodal Information Access ], LNCS 5706, 763–770
, 2009
"... Abstract. In this paper, we present our approaches for the WikipediaMM task at ImageCLEF 2008. We first experimented with a text-based image retrieval approach with query expansion, where the extension terms were automatically selected from a knowledge base that was semi-automatically constructed f ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract. In this paper, we present our approaches for the WikipediaMM task at ImageCLEF 2008. We first experimented with a text-based image retrieval approach with query expansion, where the extension terms were automatically selected from a knowledge base that was semi-automatically constructed from Wikipedia. Encouragingly, the experimental results rank in the first place among all submitted runs. We also implemented a content-based image retrieval approach with query-dependent visual concept detection. Then cross-media retrieval was successfully carried out by independently applying the two metasearch tools and then combining the results through a weighted summation of scores. Though not submitted, this approach outperforms our text-based and content-based approaches remarkably.
Toward Selectivity-Based Keyword Extraction for Croatian News
"... Abstract. Our approach proposes a novel network measure- the node selectivity for the task of keyword extraction. The node selectivity is de-fined as the average strength of the node. Firstly, we show that selectivity-based keyword extraction slightly outperforms the extraction based on the standard ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract. Our approach proposes a novel network measure- the node selectivity for the task of keyword extraction. The node selectivity is de-fined as the average strength of the node. Firstly, we show that selectivity-based keyword extraction slightly outperforms the extraction based on the standard centrality measures: in-degree, out-degree, betweenness, and closeness. Furthermore, from the data set of Croatian news we ex-tract keyword candidates and expand extracted nodes to word-tuples ranked with the highest in/out selectivity values. The obtained sets are
a review of methods and approaches
"... Abstract – Paper presents a survey of methods and approaches for keyword extraction task. In addition to the systematization of methods, the paper gathers a comprehensive review of existing research. Related work on keyword extraction is elaborated for supervised and unsupervised methods, with speci ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract – Paper presents a survey of methods and approaches for keyword extraction task. In addition to the systematization of methods, the paper gathers a comprehensive review of existing research. Related work on keyword extraction is elaborated for supervised and unsupervised methods, with special emphasis on graph-