Mining the Web for Lexical Knowledge to Improve Keyphrase Extraction: Learning from Labeled and Unlabeled Data (2002)
Cached
Download Links
- [cogprints.org]
- [extractor.iit.nrc.ca]
- [cogprints.org]
- [cogprints.org]
- [ai.iit.nrc.ca]
- DBLP
Other Repositories/Bibliography
| Venue: | ERB-1096 NRC #44947, National Research Council, Institute for Information Technology |
| Citations: | 7 - 0 self |
BibTeX
@TECHREPORT{Turney02miningthe,
author = {Peter D. Turney},
title = {Mining the Web for Lexical Knowledge to Improve Keyphrase Extraction: Learning from Labeled and Unlabeled Data},
institution = {ERB-1096 NRC #44947, National Research Council, Institute for Information Technology},
year = {2002}
}
OpenURL
Abstract
A journal article is often accompanied by a list of keyphrases, composed of about five to fifteen important words and phrases that capture the article’s main topics. Keyphrases are useful for a variety of purposes, including summarizing, indexing, labeling, categorizing, clustering, highlighting, browsing, and searching. The task of automatic keyphrase extraction is to select keyphrases from within the text of a given document. Automatic keyphrase extraction makes it feasible to generate keyphrases for the huge number of documents that do not have manually assigned keyphrases. Good performance on this task has been obtained by approaching it as a supervised learning problem. An input document is treated as a set of candidate phrases that must be classified as either keyphrases or non-keyphrases. To classify a candidate phrase as a keyphrase, the most important features (attributes) appear to be the frequency and location of the candidate phrase in the document. Recent work has demonstrated that it is also useful to know the frequency of the candidate phrase as a manually assigned keyphrase for other documents in the same domain as the given document (e.g., the domain of computer science). Unfortunately, this keyphrase-frequency







