Results 1 -
1 of
1
An Evaluation of Text Classification Methods for Literary Study
, 2006
"... Literary text classification differs from current text classification in other domains in the following aspects: 1) data- literary texts exibit more varieties of language uses because of their long history and creative characteristics; 2) category labels- literary scholars assign more kinds of text ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Literary text classification differs from current text classification in other domains in the following aspects: 1) data- literary texts exibit more varieties of language uses because of their long history and creative characteristics; 2) category labels- literary scholars assign more kinds of text category labels by topics, styles, genres, authors, eras, and many other literary concepts mixing these factors; 3) purposes- literary scholars use classification as example-based search strategy as well as for feature-category correlation analysis. Current text classification algorithms are evaluated on topic spotting tasks using “young ” benchmark corpora in the domains of news articles, scientific literature, and webpages. It is worth questioning if these evaluation results will be consistent in the literary domain. Two major factors affect an algorithm’s performance in a classification task: 1) the algorithm’s inference model (including parameter tuning); 2) the data preprocessing choices. In other words, a successful application relies on the right model and the right features. This thesis will evaluate the performance of a few popular text classification algorithms on literary text classification tasks under different data preprocessing choices. This study focuses on two data preprocessing choices, stop word removal and stemming. Two classification algorithms, multinomial

