• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Flexible metric nearest-neighbor classi&cation. (1984)

by J Friedman
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 133
Next 10 →

Locally weighted learning

by Christopher G. Atkeson, Andrew W. Moore , Stefan Schaal - ARTIFICIAL INTELLIGENCE REVIEW , 1997
"... This paper surveys locally weighted learning, a form of lazy learning and memorybased learning, and focuses on locally weighted linear regression. The survey discusses distance functions, smoothing parameters, weighting functions, local model structures, regularization of the estimates and bias, ass ..."
Abstract - Cited by 599 (51 self) - Add to MetaCart
This paper surveys locally weighted learning, a form of lazy learning and memorybased learning, and focuses on locally weighted linear regression. The survey discusses distance functions, smoothing parameters, weighting functions, local model structures, regularization of the estimates and bias, assessing predictions, handling noisy data and outliers, improving the quality of predictions by tuning t parameters, interference between old and new data, implementing locally weighted learning e ciently, and applications of locally weighted learning. A companion paper surveys how locally weighted learning can be used in robot learning and control.

SEEMORE: Combining Color, Shape, and Texture Histogramming in a Neurally Inspired Approach to Visual Object Recognition

by Bartlett W. Mel , 1997
"... this article. ..."
Abstract - Cited by 192 (1 self) - Add to MetaCart
this article.
(Show Context)

Citation Context

...ixel pairs were considered, and used to generate counts in the 6 histogram bins. 2.3 The Learning Rule While nearest-neighbor classification techniques are remarkably powerful given their simplicity [=-=Friedman, 1994-=-], the problem of scaling feature dimensions in order to minimize classification error in high-dimension remains an experimental art. The need for such optimization is particularly acute in cases, inc...

A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms

by Dietrich Wettschereck , David W. Aha, Takao Mohri - ARTIFICIAL INTELLIGENCE REVIEW , 1997
"... Many lazy learning algorithms are derivatives of the k-nearest neighbor (k-NN) classifier, which uses a distance function to generate predictions from stored instances. Several studies have shown that k-NN's performance is highly sensitive to the definition of its distance function. Many k ..."
Abstract - Cited by 147 (0 self) - Add to MetaCart
Many lazy learning algorithms are derivatives of the k-nearest neighbor (k-NN) classifier, which uses a distance function to generate predictions from stored instances. Several studies have shown that k-NN's performance is highly sensitive to the definition of its distance function. Many k-NN variants have been proposed to reduce this sensitivity by parameterizing the distance function with feature weights. However, these variants have not been categorized nor empirically compared. This paper reviews a class of weight-setting methods for lazy learning algorithms. We introduce a framework for distinguishing these methods and empirically compare them. We observed four trends from our experiments and conducted further studies to highlight them. Our results suggest that methods which use performance feedback to assign weight settings demonstrated three advantages over other methods: they require less pre-processing, perform better in the presence of interacting features, and generally require less training data to learn good settings. We also found that continuous weighting methods tend to outperform feature selection algorithms for tasks where some features are useful but less important than others.
(Show Context)

Citation Context

..., and fix these other design decisions in our experiments (Section 4). Further information related to lazy classification algorithms can be found elsewhere (e.g., Vapnik 1992; Bottou and Vapnik 1992; =-=Friedman 1994-=-; Atkeson et al. 1996a). aireda11.tex; 28/05/1997; 12:55; v.5; p.6 A REVIEW AND EMPIRICAL EVALUATION OF FEATURE WEIGHTING METHODS 279 Table 1. Dimensions For Distinguishing Feature Weighting Methods D...

Distance Metric Learning: A Comprehensive Survey

by Liu Yang , 2006
"... ..."
Abstract - Cited by 127 (13 self) - Add to MetaCart
Abstract not found

Semi-supervised Clustering with User Feedback

by David Cohn, Rich Caruana, Andrew Mccallum , 2003
"... We present a new approach to clustering based on the observation that \it is easier to criticize than to construct." Our approach of semi-supervised clustering allows a user to iteratively provide feedback to a clustering algorithm. The feedback is incorporated in the form of constraints w ..."
Abstract - Cited by 125 (2 self) - Add to MetaCart
We present a new approach to clustering based on the observation that \it is easier to criticize than to construct." Our approach of semi-supervised clustering allows a user to iteratively provide feedback to a clustering algorithm. The feedback is incorporated in the form of constraints which the clustering algorithm attempts to satisfy on future iterations. These constraints allow the user to guide the clusterer towards clusterings of the data that the user nds more useful. We demonstrate semi-supervised clustering with a system that learns to cluster news stories from a Reuters data set. Introduction Consider the following problem: you are given 100,000 text documents (e.g., papers, newsgroup articles, or web pages) and asked to group them into classes or into a hierarchy such that related documents are grouped together. You are not told what classes or hierarchy to use or what documents are related; you have some criteria in mind, but may not be able to say exactly w...

Lazy Decision Trees

by Jerome H. Friedman, Ron Kohavi, Yeogirl Yun , 1996
"... Lazy learning algorithms, exemplified by nearestneighbor algorithms, do not induce a concise hypothesis from a given training set; the inductive process is delayed until a test instance is given. Algorithms for constructing decision trees, such as C4.5, ID3, and CART create a single "best" ..."
Abstract - Cited by 117 (6 self) - Add to MetaCart
Lazy learning algorithms, exemplified by nearestneighbor algorithms, do not induce a concise hypothesis from a given training set; the inductive process is delayed until a test instance is given. Algorithms for constructing decision trees, such as C4.5, ID3, and CART create a single "best" decision tree during the training phase, and this tree is then used to classify test instances. The tests at the nodes of the constructed tree are good on average, but there may be better tests for classifying a specific instance. We propose a lazy decision tree algorithm---LazyDT---that conceptually constructs the "best" decision tree for each test instance. In practice, only a path needs to be constructed, and a caching scheme makes the algorithm fast. The algorithm is robust with respect to missing values without resorting to the complicated methods usually seen in induction of decision trees. Experiments on real and artificial problems are presented. Introduction Delay is preferable to error. -...

Discovering Structure in Multiple Learning Tasks: The TC Algorithm

by Sebastian Thrun, Joseph O'Sullivan - In International Conference on Machine Learning , 1996
"... Recently, there has been an increased interest in "lifelong " machine learning methods, that transfer knowledge across multiple learning tasks. Such methods have repeatedly been found to outperform conventional, single-task learning algorithms when the learning tasks are appropriately rela ..."
Abstract - Cited by 111 (3 self) - Add to MetaCart
Recently, there has been an increased interest in "lifelong " machine learning methods, that transfer knowledge across multiple learning tasks. Such methods have repeatedly been found to outperform conventional, single-task learning algorithms when the learning tasks are appropriately related. To increase robustness of such approaches, methods are desirable that can reason about the relatedness of individuallearning tasks, in order to avoid the danger arising from tasks that are unrelated and thus potentially misleading. This paper describes the task-clustering (TC) algorithm. TC clusters learning tasks into classes of mutually related tasks. When facing a new learning task, TC first determines the most related task cluster, then exploits information selectively from this task cluster only. An empirical study carried out in a mobile robot domain shows that TC outperforms its non-selective counterpart in situations where only a small number of tasks is relevant. 1 INTRODUCTION One of t...
(Show Context)

Citation Context

...erties of nearest neighbor. 2.2 ADJUSTING THE DISTANCE METRIC TC transfers knowledge across learning tasks by adjusting for some tasks, then re-using it in others. Following ideas presented elsewhere =-=[3, 4, 10, 11, 15, 16, 28]-=-, this is done by minimizing the distance between training examples that 2 belong to the same class, while maximizing the distance between training examples with opposite class labels: where min 1 if ...

Adaptive Metric Nearest Neighbor Classification

by Carlotta Domeniconi, Jing Peng, Dimitrios Gunopulos - IEEE Transactions on Pattern Analysis and Machine Intelligence , 2000
"... Nearest neighbor classification assumes locally constant class conditional probabilities. This assumption becomes invalid in high dimensions with finite samples due to the curse of dimensionality. Severe bias can be introduced under these conditions when using the nearest neighbor rule. We propose a ..."
Abstract - Cited by 104 (4 self) - Add to MetaCart
Nearest neighbor classification assumes locally constant class conditional probabilities. This assumption becomes invalid in high dimensions with finite samples due to the curse of dimensionality. Severe bias can be introduced under these conditions when using the nearest neighbor rule. We propose a locally adaptive nearest neighbor classification method to try to minimize bias. We use a Chisquared distance analysis to compute a flexible metric for producing neighborhoods that are highly adaptive to query locations. Neighborhoods are elongated along less relevant feature dimensions and constricted along most influential ones. As a result, the class conditional probabilities tend to be smoother in the modified neighborhoods, whereby better classification performance can be achieved. The efficacy of our method is validated and compared against other techniques using a variety of simulated and real world data. 1 Introduction In a classification problem, we are given J classes and N tra...
(Show Context)

Citation Context

...st neighbor rule in a high dimensional input feature space with finite samples. As such, the choice of a distance measure becomes crucial in determining the outcome of nearest neighbor classification =-=[6, 7, 9]-=-. The commonly used Euclidean distance measure, while simple computationally, implies that the input space is isotropic. However, the assumption for isotropy is often invalid and generally undesirable...

Probabilistic Feature Relevance Learning for Content-Based Image Retrieval

by Jing Peng, Bir Bhanu, Shan Qing - Computer Vision and Image Understanding , 1999
"... Most of the current image retrieval systems use "one-shot" queries to a database to retrieve similar images. Typically a K-nearest neighbor kind of algorithm is used, where weights measuring feature importance along each input dimension remain fixed (or manually tweaked by the user), in th ..."
Abstract - Cited by 68 (18 self) - Add to MetaCart
Most of the current image retrieval systems use "one-shot" queries to a database to retrieve similar images. Typically a K-nearest neighbor kind of algorithm is used, where weights measuring feature importance along each input dimension remain fixed (or manually tweaked by the user), in the computation of a given similarity metric. However, the similarity does not vary with equal strength or in the same proportion in all directions in the feature space emanating from the query image. The manual adjustment of these weights is time consuming and exhausting. Moreover, it requires a very sophisticated user. In this paper, we present a novel probabilistic method that enables image retrieval procedures to automatically capture feature relevance based on user's feedback and that is highly adaptive to query locations. Experimental results are presented that demonstrate the efficacy of our technique using both simulated and real-world data.
(Show Context)

Citation Context

... new round of retrieval begins. The above process repeats until the user is satisfied with the results or the system cannot improve the results from one iteration to the next. 2 Related Work Friedman =-=[6]-=- describes an approach for learning local feature relevance that combines some of the best features of K-NN learning and recursive partitioning. This approach recursively homes in on a query along the...

Improving minority class prediction using case-specific feature weights

by Claire Cardie - Proceedings of the Fourteenth International Conference on Machine Learning , 1997
"... This paper addresses the problem of handling skewed class distributions within the case-based learning (CBL) framework. We rst present as a baseline an informationgain-weighted CBL algorithm and apply it to three data sets from natural language processing (NLP) with skewed class distributions. Altho ..."
Abstract - Cited by 62 (4 self) - Add to MetaCart
This paper addresses the problem of handling skewed class distributions within the case-based learning (CBL) framework. We rst present as a baseline an informationgain-weighted CBL algorithm and apply it to three data sets from natural language processing (NLP) with skewed class distributions. Although overall performance of the baseline CBL algorithm is good, we show that the algorithm exhibits poor performance on minority class instances. We then present two CBL algorithms designed to improve the performance of minority class predictions. Each variation creates test-case-speci c feature weights by rst observing the path taken by the test case in a decision tree created for the learning task, and then using pathspeci c information gain values to create an appropriate weight vector for use during case retrieval. When applied to the NLP data sets, the algorithms are shown to signi cantly increase the accuracy of minority class predictions while maintaining or improving overall classi cation accuracy. 1
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University