Results 1 - 10
of
1,868
An empirical comparison of voting classification algorithms: Bagging, boosting, and variants.
- Machine Learning,
, 1999
"... Abstract. Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and real-world datasets. We review these algorithms and describe a large empirical study comparing several vari ..."
Abstract
-
Cited by 707 (2 self)
- Add to MetaCart
Abstract. Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and real-world datasets. We review these algorithms and describe a large empirical study comparing several
Uncertainty estimates in regional and global observed temperature changes: A new dataset from 1850
- J. Geophys
, 2006
"... The historical surface temperature dataset HadCRUT provides a record of surface temperature trends and variability since 1850. A new version of this dataset, HadCRUT3, has been produced; benefiting from recent improvements to the sea-surface temperature dataset which forms its marine component, and ..."
Abstract
-
Cited by 382 (12 self)
- Add to MetaCart
The historical surface temperature dataset HadCRUT provides a record of surface temperature trends and variability since 1850. A new version of this dataset, HadCRUT3, has been produced; benefiting from recent improvements to the sea-surface temperature dataset which forms its marine component
Unbiased look at dataset bias
- in CVPR
, 2011
"... Datasets are an integral part of contemporary object recognition research. They have been the chief reason for the considerable progress in the field, not just as source of large amounts of training data, but also as means of measuring and comparing performance of competing algorithms. At the same t ..."
Abstract
-
Cited by 154 (10 self)
- Add to MetaCart
using a set of popular datasets, evaluated based on a number of criteria including: relative data bias, cross-dataset generalization, effects of closed-world assumption, and sample value. The experimental results, some rather surprising, suggest directions that can improve dataset collection as well
Undoing the damage of dataset bias
, 2012
"... The presence of bias in existing object recognition datasets is now well-known in the computer vision community. While it remains in question whether creating an unbiased dataset is possible given limited resources, in this work we propose a discriminative framework that directly exploits dataset b ..."
Abstract
-
Cited by 35 (3 self)
- Add to MetaCart
The presence of bias in existing object recognition datasets is now well-known in the computer vision community. While it remains in question whether creating an unbiased dataset is possible given limited resources, in this work we propose a discriminative framework that directly exploits dataset
Fair and Balanced? Bias in bug-fix Datasets
- in European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE
, 2009
"... Software engineering researchers have long been interested in where and why bugs occur in code, and in predicting where they might turn up next. Historical bug-occurence data has been key to this research. Bug tracking systems, and code version histories, record when, how and by whom bugs were fixed ..."
Abstract
-
Cited by 67 (10 self)
- Add to MetaCart
? In this paper, we investigate historical data from several software projects, and find strong evidence of systematic bias. We then investigate the potential effects of “unfair, imbalanced” datasets on the performance of prediction techniques. We draw the lesson that bias is a critical problem that threatens
Selection bias in the LETOR datasets
"... The LETOR datasets consist of data extracted from traditional IR test corpora. For each of a number of test topics, a set of documents has been extracted, in the form of features of each document-query pair, for use by a ranker. An examination of the ways in which documents were selected for each to ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
topic shows that the selection has (for each of the three corpora) a particular bias or skewness. This has some unexpected effects which may considerably influence any learning-to-rank exercise conducted on these datasets. The problems may be resolvable by modifying the datasets.
Bias plus variance decomposition for zero-one loss functions
- In Machine Learning: Proceedings of the Thirteenth International Conference
, 1996
"... We present a bias-variance decomposition of expected misclassi cation rate, the most commonly used loss function in supervised classi cation learning. The bias-variance decomposition for quadratic loss functions is well known and serves as an important tool for analyzing learning algorithms, yet no ..."
Abstract
-
Cited by 212 (5 self)
- Add to MetaCart
that, in practice, the naive frequency-based estimation of the decomposition terms is by itself biased and show how to correct for this bias. We illustrate the decomposition on various algorithms and datasets from the UCI repository. 1
The TRMM Multi-satellite Precipitation Analysis: Quasi-Global, Multi-Year, Combined-Sensor Precipitation Estimates at Fine Scale
- J. Hydrometeor
, 2007
"... The Tropical Rainfall Measuring Mission (TRMM) Multisatellite Precipitation Analysis (TMPA) pro-vides a calibration-based sequential scheme for combining precipitation estimates from multiple satellites, as well as gauge analyses where feasible, at fine scales (0.25 ° 0.25 ° and 3 hourly). TMPA is ..."
Abstract
-
Cited by 226 (13 self)
- Add to MetaCart
is available both after and in real time, based on calibration by the TRMM Combined Instrument and TRMM Microwave Imager precipitation products, respectively. Only the after-real-time product incorporates gauge data at the present. The dataset covers the latitude band 50°N–S for the period from 1998
Boltzmann machines
, 2007
"... A Boltzmann Machine is a network of symmetrically connected, neuronlike units that make stochastic decisions about whether to be on or off. Boltzmann machines have a simple learning algorithm that allows them to discover interesting features in datasets composed of binary vectors. The learning algor ..."
Abstract
-
Cited by 228 (21 self)
- Add to MetaCart
A Boltzmann Machine is a network of symmetrically connected, neuronlike units that make stochastic decisions about whether to be on or off. Boltzmann machines have a simple learning algorithm that allows them to discover interesting features in datasets composed of binary vectors. The learning
A case study of bias in bug-fix datasets
- In Reverse Engineering (WCRE), 2010 17th Working Conference on
, 2010
"... Abstract—Software quality researchers build software qual-ity models by recovering traceability links between bug reports in issue tracking repositories and source code files. However, all too often the data stored in issue tracking repositories is not explicitly tagged or linked to source code. Res ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
to the code and incorrect tagging of issues, exhibit biases that compromise the validity and generality of the quality models built on top of the datasets. In this study, we verify the effects of such biases for a commercial project that enforces strict development guidelines and rules on the quality
Results 1 - 10
of
1,868