• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 1,868
Next 10 →

An empirical comparison of voting classification algorithms: Bagging, boosting, and variants.

by Eric Bauer , Philip Chan , Salvatore Stolfo , David Wolpert - Machine Learning, , 1999
"... Abstract. Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and real-world datasets. We review these algorithms and describe a large empirical study comparing several vari ..."
Abstract - Cited by 707 (2 self) - Add to MetaCart
Abstract. Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and real-world datasets. We review these algorithms and describe a large empirical study comparing several

Uncertainty estimates in regional and global observed temperature changes: A new dataset from 1850

by P. Brohan, J. J. Kennedy, I. Harris, S. F. B. Tett, P. D. Jones - J. Geophys , 2006
"... The historical surface temperature dataset HadCRUT provides a record of surface temperature trends and variability since 1850. A new version of this dataset, HadCRUT3, has been produced; benefiting from recent improvements to the sea-surface temperature dataset which forms its marine component, and ..."
Abstract - Cited by 382 (12 self) - Add to MetaCart
The historical surface temperature dataset HadCRUT provides a record of surface temperature trends and variability since 1850. A new version of this dataset, HadCRUT3, has been produced; benefiting from recent improvements to the sea-surface temperature dataset which forms its marine component

Unbiased look at dataset bias

by Antonio Torralba, Alexei A. Efros - in CVPR , 2011
"... Datasets are an integral part of contemporary object recognition research. They have been the chief reason for the considerable progress in the field, not just as source of large amounts of training data, but also as means of measuring and comparing performance of competing algorithms. At the same t ..."
Abstract - Cited by 154 (10 self) - Add to MetaCart
using a set of popular datasets, evaluated based on a number of criteria including: relative data bias, cross-dataset generalization, effects of closed-world assumption, and sample value. The experimental results, some rather surprising, suggest directions that can improve dataset collection as well

Undoing the damage of dataset bias

by Aditya Khosla, Tinghui Zhou, Tomasz Malisiewicz, Alexei Efros, Antonio Torralba , 2012
"... The presence of bias in existing object recognition datasets is now well-known in the computer vision community. While it remains in question whether creating an unbiased dataset is possible given limited resources, in this work we propose a discriminative framework that directly exploits dataset b ..."
Abstract - Cited by 35 (3 self) - Add to MetaCart
The presence of bias in existing object recognition datasets is now well-known in the computer vision community. While it remains in question whether creating an unbiased dataset is possible given limited resources, in this work we propose a discriminative framework that directly exploits dataset

Fair and Balanced? Bias in bug-fix Datasets

by Christian Bird, Adrian Bachmann, Eirik Aune, John Duffy, Abraham Bernstein, Vladimir Filkov, Premkumar Devanbu - in European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE , 2009
"... Software engineering researchers have long been interested in where and why bugs occur in code, and in predicting where they might turn up next. Historical bug-occurence data has been key to this research. Bug tracking systems, and code version histories, record when, how and by whom bugs were fixed ..."
Abstract - Cited by 67 (10 self) - Add to MetaCart
? In this paper, we investigate historical data from several software projects, and find strong evidence of systematic bias. We then investigate the potential effects of “unfair, imbalanced” datasets on the performance of prediction techniques. We draw the lesson that bias is a critical problem that threatens

Selection bias in the LETOR datasets

by Tom Minka, Stephen Robertson
"... The LETOR datasets consist of data extracted from traditional IR test corpora. For each of a number of test topics, a set of documents has been extracted, in the form of features of each document-query pair, for use by a ranker. An examination of the ways in which documents were selected for each to ..."
Abstract - Cited by 12 (0 self) - Add to MetaCart
topic shows that the selection has (for each of the three corpora) a particular bias or skewness. This has some unexpected effects which may considerably influence any learning-to-rank exercise conducted on these datasets. The problems may be resolvable by modifying the datasets.

Bias plus variance decomposition for zero-one loss functions

by Ron Kohavi - In Machine Learning: Proceedings of the Thirteenth International Conference , 1996
"... We present a bias-variance decomposition of expected misclassi cation rate, the most commonly used loss function in supervised classi cation learning. The bias-variance decomposition for quadratic loss functions is well known and serves as an important tool for analyzing learning algorithms, yet no ..."
Abstract - Cited by 212 (5 self) - Add to MetaCart
that, in practice, the naive frequency-based estimation of the decomposition terms is by itself biased and show how to correct for this bias. We illustrate the decomposition on various algorithms and datasets from the UCI repository. 1

The TRMM Multi-satellite Precipitation Analysis: Quasi-Global, Multi-Year, Combined-Sensor Precipitation Estimates at Fine Scale

by George J. Huffman, Robert F. Adler, David T. Bolvin, Guojun Gu, Eric J. Nelkin, Kenneth P. Bowman, Yang Hong, Erich F. Stocker, David, B. Wolff - J. Hydrometeor , 2007
"... The Tropical Rainfall Measuring Mission (TRMM) Multisatellite Precipitation Analysis (TMPA) pro-vides a calibration-based sequential scheme for combining precipitation estimates from multiple satellites, as well as gauge analyses where feasible, at fine scales (0.25 ° 0.25 ° and 3 hourly). TMPA is ..."
Abstract - Cited by 226 (13 self) - Add to MetaCart
is available both after and in real time, based on calibration by the TRMM Combined Instrument and TRMM Microwave Imager precipitation products, respectively. Only the after-real-time product incorporates gauge data at the present. The dataset covers the latitude band 50°N–S for the period from 1998

Boltzmann machines

by Geoffrey E. Hinton , 2007
"... A Boltzmann Machine is a network of symmetrically connected, neuronlike units that make stochastic decisions about whether to be on or off. Boltzmann machines have a simple learning algorithm that allows them to discover interesting features in datasets composed of binary vectors. The learning algor ..."
Abstract - Cited by 228 (21 self) - Add to MetaCart
A Boltzmann Machine is a network of symmetrically connected, neuronlike units that make stochastic decisions about whether to be on or off. Boltzmann machines have a simple learning algorithm that allows them to discover interesting features in datasets composed of binary vectors. The learning

A case study of bias in bug-fix datasets

by Thanh H. D. Nguyen, Bram Adams, Ahmed E. Hassan - In Reverse Engineering (WCRE), 2010 17th Working Conference on , 2010
"... Abstract—Software quality researchers build software qual-ity models by recovering traceability links between bug reports in issue tracking repositories and source code files. However, all too often the data stored in issue tracking repositories is not explicitly tagged or linked to source code. Res ..."
Abstract - Cited by 16 (5 self) - Add to MetaCart
to the code and incorrect tagging of issues, exhibit biases that compromise the validity and generality of the quality models built on top of the datasets. In this study, we verify the effects of such biases for a commercial project that enforces strict development guidelines and rules on the quality
Next 10 →
Results 1 - 10 of 1,868
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University