Domain-Speci Keyphrase Extraction (1999) [2 citations — 1 self]
Abstract:
Keyphrases are an important means of document summarization, clustering, and topic search. Only a small minority of documents have author-assigned keyphrases, and manually assigning keyphrases to existing documents is very laborious. Therefore it is highly desirable to automate the keyphrase extraction process. This paper shows that a simple procedure for keyphrase extraction based on the naive Bayes learning scheme performs comparably to the state of the art. It goes on to explain how this procedure's performance can be boosted by automatically tailoring the extraction process to the particular document collection at hand. Results on a large collection of technical reports in computer science show that the quality of the extracted keyphrases improves signi cantly when domain-speci c information is exploited. 1
Citations
| 3307 | C4.5: Programs for machine learning – Quinlan - 1993 |
| 1504 | Bagging Predictors – Breiman - 1996 |
| 430 | Multi-interval discretization of continuous-valued attributes for classification learning – Fayyad, Irani - 1993 |
| 336 | Inductive Learning Algorithms and Representations for Text Categorization – Dumais, Platt, et al. - 1998 |
| 29 | Extraction of Keyphrases from Text – Turney - 1999 |
| 13 | On the optimality of the simple Bayesian classi er under zero-one loss – Domingos, Pazzani - 1997 |
| 10 | Development of a stemming algorithm. Mechanical translation and computational linguistics – Lovins - 1968 |

