Results 1 - 10
of
4,053
Large Landscape Conservation — Synthetic and Real-World Datasets
"... Biodiversity underpins ecosystem goods and services and hence protecting it is key to achieving sustainability. How-ever, the persistence of many species is threatened by habitat loss and fragmentation due to human land use and climate change. Conservation efforts are implemented under very limited ..."
Abstract
- Add to MetaCart
Biodiversity underpins ecosystem goods and services and hence protecting it is key to achieving sustainability. How-ever, the persistence of many species is threatened by habitat loss and fragmentation due to human land use and climate change. Conservation efforts are implemented under very limited economic resources, and therefore designing scal-able, cost-efficient and systematic approaches for conserva-tion planning is an important and challenging computational task. In particular, preserving landscape connectivity be-tween good habitat has become a key conservation priority in recent years. We give an overview of landscape connectiv-ity conservation and some of the underlying graph-theoretic optimization problems. We present a synthetic generator ca-pable of creating families of randomized structured problems,
Working with Real-World Datasets Preprocessing and prediction with large incomplete and heterogeneous datasets
, 2004
"... and prediction with large incomplete and heterogeneous datasets ..."
Learning to Detect Traffic Signs: Comparative Evaluation of Synthetic and Real-world Datasets
"... This study compares the performance of sign detection based on synthetic training data to the performance of detection based on real-world training images. Viola-Jones detectors are created for 4 different traffic signs with both synthetic and real data, and varying numbers of training samples. The ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
This study compares the performance of sign detection based on synthetic training data to the performance of detection based on real-world training images. Viola-Jones detectors are created for 4 different traffic signs with both synthetic and real data, and varying numbers of training samples
Reducing noise in labels and features for a real world dataset: application of NLP corpus annotation methods
- In Proceedings of the 10th international
, 2009
"... Abstract. This paper illustrates how a combination of information extraction, machine learning, and NLP corpus annotation practice was applied to a problem of ranking vulnerability of structures (service boxes, manholes) in the Manhattan electrical grid. By adapting NLP corpus annotation methods to ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
Abstract. This paper illustrates how a combination of information extraction, machine learning, and NLP corpus annotation practice was applied to a problem of ranking vulnerability of structures (service boxes, manholes) in the Manhattan electrical grid. By adapting NLP corpus annotation methods to the task of knowledge transfer from domain experts, we compensated for the lack of operational definitions of components of the model, such as serious event. The machine learning depended on the ticket classes, but it was not the end goal. Rather, our rule-based document classification determines both the labels of examples and their feature representations. Changes in our classification of events led to improvements in our model, as reflected in the AUC scores for the full ranked list of over 51K structures. The improvements for the very top of the ranked list, which is of most importance for prioritizing work on the electrical grid, affected one in every four or five structures. 1
Very simple classification rules perform well on most commonly used datasets
- Machine Learning
, 1993
"... The classification rules induced by machine learning systems are judged by two criteria: their classification accuracy on an independent test set (henceforth "accuracy"), and their complexity. The relationship between these two criteria is, of course, of keen interest to the machin ..."
Abstract
-
Cited by 547 (5 self)
- Add to MetaCart
to the machine learning community. There are in the literature some indications that very simple rules may achieve surprisingly high accuracy on many datasets. For example, Rendell occasionally remarks that many real world datasets have "few peaks (often just one) " and so are &
A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection
- INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE
, 1995
"... We review accuracy estimation methods and compare the two most common methods: cross-validation and bootstrap. Recent experimental results on artificial data and theoretical results in restricted settings have shown that for selecting a good classifier from a set of classifiers (model selection), te ..."
Abstract
-
Cited by 1283 (11 self)
- Add to MetaCart
), ten-fold cross-validation may be better than the more expensive leaveone-out cross-validation. We report on a largescale experiment -- over half a million runs of C4.5 and a Naive-Bayes algorithm -- to estimate the effects of different parameters on these algorithms on real-world datasets. For cross
An empirical comparison of voting classification algorithms: Bagging, boosting, and variants.
- Machine Learning,
, 1999
"... Abstract. Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and real-world datasets. We review these algorithms and describe a large empirical study comparing several vari ..."
Abstract
-
Cited by 707 (2 self)
- Add to MetaCart
Abstract. Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and real-world datasets. We review these algorithms and describe a large empirical study comparing several
On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration
- SIGKDD'02
, 2002
"... ... mining time series data. Literally hundreds of papers have introduced new algorithms to index, classify, cluster and segment time series. In this work we make the following claim. Much of this work has very little utility because the contribution made (speed in the case of indexing, accuracy in ..."
Abstract
-
Cited by 325 (59 self)
- Add to MetaCart
in the case of classification and clustering, model accuracy in the case of segmentation) offer an amount of "improvement" that would have been completely dwarfed by the variance that would have been observed by testing on many real world datasets, or the variance that would have been observed
Learning probabilistic relational models
- In IJCAI
, 1999
"... A large portion of real-world data is stored in commercial relational database systems. In contrast, most statistical learning methods work only with "flat " data representations. Thus, to apply these methods, we are forced to convert our data into a flat form, thereby losing much ..."
Abstract
-
Cited by 613 (30 self)
- Add to MetaCart
A large portion of real-world data is stored in commercial relational database systems. In contrast, most statistical learning methods work only with "flat " data representations. Thus, to apply these methods, we are forced to convert our data into a flat form, thereby losing much
Results 1 - 10
of
4,053