Results 1 - 10
of
165
Instance-based learning algorithms
- Machine Learning
, 1991
"... Abstract. Storing and using specific instances improves the performance of several supervised learning algorithms. These include algorithms that learn decision trees, classification rules, and distributed networks. However, no investigation has analyzed algorithms that use only specific instances to ..."
Abstract
-
Cited by 897 (18 self)
- Add to MetaCart
Abstract. Storing and using specific instances improves the performance of several supervised learning algorithms. These include algorithms that learn decision trees, classification rules, and distributed networks. However, no investigation has analyzed algorithms that use only specific instances to solve incremental learning tasks. In this paper, we describe a framework and methodology, called instance-based learning, that generates classification predictions using only specific instances. Instance-based learning algorithms do not maintain a set of abstractions derived from specific instances. This approach extends the nearest neighbor algorithm, which has large storage requirements. We describe how storage requirements can be significantly reduced with, at most, minor sacrifices in learning rate and classification accuracy. While the storage-reducing algorithm performs well on several realworld databases, its performance degrades rapidly with the level of attribute noise in training instances. Therefore, we extended it with a significance test to distinguish noisy instances. This extended algorithm's performance degrades gracefully with increasing noise levels and compares favorably with a noise-tolerant decision tree algorithm.
Comparison of discrimination methods for the classification of tumors using gene expression data
- JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2002
"... A reliable and precise classification of tumors is essential for successful diagnosis and treatment of cancer. cDNA microarrays and high-density oligonucleotide chips are novel biotechnologies increasingly used in cancer research. By allowing the monitoring of expression levels in cells for thousand ..."
Abstract
-
Cited by 348 (2 self)
- Add to MetaCart
A reliable and precise classification of tumors is essential for successful diagnosis and treatment of cancer. cDNA microarrays and high-density oligonucleotide chips are novel biotechnologies increasingly used in cancer research. By allowing the monitoring of expression levels in cells for thousands of genes simultaneously, microarray experiments may lead to a more complete understanding of the molecular variations among tumors and hence to a finer and more informative classification. The ability to successfully distinguish between tumor classes (already known or yet to be discovered) using gene expression data is an important aspect of this novel approach to cancer classification. This article compares the performance of different discrimination methods for the classification of tumors based on gene expression data. The methods include nearest-neighbor classifiers, linear discriminant analysis, and classification trees. Recent machine learning approaches, such as bagging and boosting, are also considered. The discrimination methods are applied to datasets from three recently published cancer gene expression studies.
Flexible Metric Nearest Neighbor Classification
, 1994
"... The K-nearest-neighbor decision rule assigns an object of unknown class to the plurality class among the K labeled "training" objects that are closest to it. Closeness is usually defined in terms of a metric distance on the Euclidean space with the input measurement variables as axes. The metric cho ..."
Abstract
-
Cited by 115 (2 self)
- Add to MetaCart
The K-nearest-neighbor decision rule assigns an object of unknown class to the plurality class among the K labeled "training" objects that are closest to it. Closeness is usually defined in terms of a metric distance on the Euclidean space with the input measurement variables as axes. The metric chosen to define this distance can strongly effect performance. An optimal choice depends on the problem at hand as characterized by the respective class distributions on the input measurement space, and within a given problem, on the location of the unknown object in that space. In this paper new types of K-nearest-neighbor procedures are described that estimate the local relevance of each input variable, or their linear combinations, for each individual point to be classified. This information is then used to separately customize the metric used to define distance from that object in finding its nearest neighbors. These procedures are a hybrid between regular K-nearest-neighbor methods and tree-structured recursive partitioning techniques popular in statistics and machine learning.
Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule
, 2003
"... Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic ..."
Abstract
-
Cited by 84 (4 self)
- Add to MetaCart
Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real high-dimensional data sets with millions of examples and show that the near linear scaling holds over several orders of magnitude. Our average case analysis suggests that much of the e#ciency is because the time to process non-outliers, which are the majority of examples, does not depend on the size of the data set.
Relational Instance-Based Learning
- Proceedings of the Thirteenth International Conference on Machine Learning
, 1996
"... A relational instance-based learning algorithm, called Ribl, is motivated and developed in this paper. We argue that instancebased methods o#er solutions to the often unsatisfactory behavior of current inductive logic programming #ILP# approaches in domains with continuous attribute values a ..."
Abstract
-
Cited by 65 (1 self)
- Add to MetaCart
A relational instance-based learning algorithm, called Ribl, is motivated and developed in this paper. We argue that instancebased methods o#er solutions to the often unsatisfactory behavior of current inductive logic programming #ILP# approaches in domains with continuous attribute values and in domains with noisy attributes and#or examples. Three research issues that emerge when a propositional instance-based learner is adapted to a #rst-order representation are identi#ed: #1# construction of cases from the knowledge base, #2# computation of similaritybetween arbitrarily complex cases, and #3# estimation of the relevance of predicates and attributes. Solutions to these issues are developed. Empirical results indicate that Ribl is able to achieve high classi#cation accuracy in a variety of domains. to appear in: Proc. 13th International Conference on Machine Learning, L. Saitta #ed.#, Morgan Kaufmann, 1996 1 Introduction The #eld of Inductive Logic Programming ...
An Empirical Investigation of Brute Force to choose Features, Smoothers and Function Approximators
- Computational Learning Theory and Natural Learning Systems
, 1992
"... The generalization error of a function approximator, feature set or smoother can be estimated directly by the leave-one-out cross-validation error. For memory-based methods, this is computationally feasible. We describe an initial version of a general memory-based learning system (GMBL): a large col ..."
Abstract
-
Cited by 42 (9 self)
- Add to MetaCart
The generalization error of a function approximator, feature set or smoother can be estimated directly by the leave-one-out cross-validation error. For memory-based methods, this is computationally feasible. We describe an initial version of a general memory-based learning system (GMBL): a large collection of learners brought into a widely applicable machine-learning family. We present ongoing investigations into search algorithms which, given a dataset, find the family members and features that generalize best. We also describe GMBL's application to two noisy, difficult problems---predicting car engine emissions from pressure waves, and controlling a robot billiards player with redundant state variables. 1 Introduction The main engineering benefit of machine learning is its application to autonomous systems in which human decision making is minimized. Function approximation plays a large and successful role in this process. However, many other human decisions are needed even for si...
Combining Nearest Neighbor Classifiers Through Multiple Feature Subsets
"... Combining multiple classifiers is an effective technique for improving accuracy. There are many general combining algorithms, such as Bagging or Error Correcting Output Coding, that significantly improve classifiers like decision trees, rule learners, or neural networks. Unfortunately, many combinin ..."
Abstract
-
Cited by 39 (0 self)
- Add to MetaCart
Combining multiple classifiers is an effective technique for improving accuracy. There are many general combining algorithms, such as Bagging or Error Correcting Output Coding, that significantly improve classifiers like decision trees, rule learners, or neural networks. Unfortunately, many combining methods do not improve the nearest neighbor classifier. In this paper, we present MFS, a combining algorithm designed to improve the accuracy of the nearest neighbor (NN) classifier. MFS combines multiple NN classifiers each using only a random subset of features. The experimental results are encouraging: On 25 datasets from the UCI Repository, MFS significantly improved upon the NN, k nearest neighbor (kNN), and NN classifiers with forward and backward selection of features. MFS was also robust to corruption by irrelevant features compared to the kNN classifier. Finally, we show that MFS is able to reduce both bias and variance components of error.
Distribution-free consistency results in nonparametric discrimination and regression function estimation
- Ann. Statist
, 1980
"... (X,,, Y„) be a random sample drawn from its distribution. We study the consistency properties of the kernel estimate m(x) of the regression function m(x) = E { Y X = x} that is defined by m(x) = ~ i-1 Y,k((X,- x)/h)/7. n~1k((Xi- x)/h,?) where k is a bounded nonnegative function on Rd with compact ..."
Abstract
-
Cited by 37 (7 self)
- Add to MetaCart
(X,,, Y„) be a random sample drawn from its distribution. We study the consistency properties of the kernel estimate m(x) of the regression function m(x) = E { Y X = x} that is defined by m(x) = ~ i-1 Y,k((X,- x)/h)/7. n~1k((Xi- x)/h,?) where k is a bounded nonnegative function on Rd with compact support and (h,? ) is a sequence of positive numbers satisfying h „--~,,0, nh,'-n oo. It is shown that E { f I m„ (x)- m(x)rµ(dx))--~,,0 whenever E(I YAP) < x (p> 1). No other restrictions are placed on the distribution of (X, Y). The result is applied to verify the Bayes risk consistency of the corresponding discrimination rules. 1. Introduction and summary. In this paper we present consistency results for the nonparametric regression function estimation problem. Assume that (X, Y), (X1, Y1),. • • , (Xn, Yn) are independent identically distributed Rd x R-valued random vectors with E { I Y I} C oo. The purpose is to estimate the regression function m(x) = E{YIX = x}
The Racing Algorithm: Model Selection for Lazy Learners
- Artificial Intelligence Review
, 1997
"... Given a set of models and some training data, we would like to find the model that best describes the data. Finding the model with the lowest generalization error is a computationally expensive process, especially if the number of testing points is high or if the number of models is large. Optimizat ..."
Abstract
-
Cited by 35 (2 self)
- Add to MetaCart
Given a set of models and some training data, we would like to find the model that best describes the data. Finding the model with the lowest generalization error is a computationally expensive process, especially if the number of testing points is high or if the number of models is large. Optimization techniques such as hill climbing or genetic algorithms are helpful but can end up with a model that is arbitrarily worse than the best one or cannot be used because there is no distance metric on the space of discrete models. In this paper we develop a technique called "racing" that tests the set of models in parallel, quickly discards those models that are clearly inferior and concentrates the computational effort on differentiating among the better models. Racing is especially suitable for selecting among lazy learners since training requires negligible expense, and incremental testing using leave-one-out cross validation is efficient. We use racing to select among various lazy learnin...
Feature Weighting for Lazy Learning Algorithms
, 1998
"... : Learning algorithms differ in the degree to which they process their inputs prior to their use in performance tasks. Many algorithms eagerly compile input samples and use only the compilations to make decisions. Others are lazy: they perform less precompilation and use the input samples to guide ..."
Abstract
-
Cited by 32 (1 self)
- Add to MetaCart
: Learning algorithms differ in the degree to which they process their inputs prior to their use in performance tasks. Many algorithms eagerly compile input samples and use only the compilations to make decisions. Others are lazy: they perform less precompilation and use the input samples to guide decision making. The performance of many lazy learners significantly degrades when samples are defined by features containing little or misleading information. Distinguishing feature relevance is a critical issue for these algorithms, and many solutions have been developed that assign weights to features. This chapter introduces a categorization framework for feature weighting approaches used in lazy similarity learners and briefly surveys some examples in each category. 1.1 INTRODUCTION Lazy learning algorithms are machine learning algorithms (Mitchell, 1997) that are welcome members of procrastinators anonymous. Purely lazy learners typically display the following characteristics (Aha, 19...

