@MISC{Li_classificationin, author = {Yunlei Li}, title = {Classification in the Presence of Class Noise}, year = {} }
Share
OpenURL
Abstract
In machine learning, class noise occurs frequently and deteriorates the classifier derived from the noisy dataset. This paper presents several possible solutions to this problem based on LSA, a probabilistic noise model proposed by Lawrence and Schölkopf (2001). These solutions include the Clustering-based Probabilistic Algorithm (CPA), the Probabilistic Fisher (PF), and the Probabilistic Kernel Fisher (PKF). The proposed algorithms enable standard classifiers to tolerate class noise, and extend the earlier work of Lawrence and Schölkopf in several ways. First, CPA applies LSA to non-Gaussian datasets. Second, PKF represents a novel incorporation of LSA in the Kernel Fisher Discriminant (KFD). It also relaxes the distribution assumption previously made. The methods were investigated on simulated noisy datasets and a real comparative genomic hybridization (CGH) dataset. The results show that the proposed approaches improve standard classifiers in noisy datasets. PKF achieves the largest performance gain in small size non-Gaussian datasets. For large sample size datasets, CPA tolerates the highest noise level in Gaussian datasets, and offers a nice alternative in non-Gaussian datasets in that it is computationally much simpler.