• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

DMCA

SMOTE: Synthetic Minority Over-sampling Technique (2002)

Cached

  • Download as a PDF

Download Links

  • [www.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www.scs.cmu.edu]
  • [www.csee.usf.edu]
  • [www.jair.org]
  • [sci2s.ugr.es]
  • [www3.nd.edu]
  • [users.rowan.edu]
  • [www.cse.nd.edu]
  • [users.rowan.edu]
  • [www.csee.usf.edu]
  • [www.cse.nd.edu]
  • [www3.nd.edu]
  • [users.rowan.edu]
  • [arxiv.org]
  • [www.nd.edu]
  • [www3.nd.edu]
  • [www.cse.nd.edu]
  • [www3.nd.edu]
  • [www.cse.nd.edu]

  • Other Repositories/Bibliography

  • CiteULike
  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Nitesh V. Chawla , Kevin W. Bowyer , Lawrence O. Hall , W. Philip Kegelmeyer
Venue:Journal of Artificial Intelligence Research
Citations:634 - 27 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@ARTICLE{Chawla02smote:synthetic,
    author = {Nitesh V. Chawla and Kevin W. Bowyer and Lawrence O. Hall and W. Philip Kegelmeyer},
    title = {SMOTE: Synthetic Minority Over-sampling Technique},
    journal = {Journal of Artificial Intelligence Research},
    year = {2002},
    volume = {16},
    pages = {321--357}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of ``normal'' examples with only a small percentage of ``abnormal'' or ``interesting'' examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.

Keyphrases

minority class    synthetic minority over-sampling technique    roc space    majority class    reverse error    small percentage    real-world data set    class prior    synthetic minority class example    naive bayes    good mean    naive bayes classifier    normal example    imbalanced datasets    roc convex hull strategy    classification category    receiver operating characteristic curve    loss ratio   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University