11 citations found. Retrieving documents...
A.Y. Ng. On feature selection: learning with exponentially many irrelevant features as training examples. In Proceedings of the 15th International Conference on Machine Learning, pages 404--412, San Francisco, CA, 1998. Morgan Kaufmann.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Feature Selection - Portinale, Saitta (2002)   (Correct)

....machine learning is to reduce the number of features used to characterize a dataset so as to improve a learning algorithm s preformance on a given task. Feature selection in machine learning has shown its impressive performance gains in attacking large dimensionality with many irrelevant features [19, 41, 17], as well as in enhancing comprensibility of the learned result [52] As already noticed, the problem can be exposed as a search problem, so that heuristic search techniques can be devised in order to face it. Each state in the search space is a subset of the original feature set and a partial ....

A.Y. Ng. On feature selection: learning with exponentially many irrelevant features as training examples. In Proc. 15th Intl. Conf. on Machine Learning, pages 404--412, 1998.


Feature Selection Applied to Image Beauty Estimation - Klautau   (Correct)

....As in [TV00] Das01] used the AdaBoost algorithm [SS99] for selecting features. Both works use boosting as a filter method. In [SN99] boosting was integrated to wrapper methods. Wrappers are of interest because they can lead to improved ac curacy when compared to filter methods. In [Ng98] an analysis of feature selection wrappers was presented and the ordered rs wrapper algorithm was presented, which had improved asymptotic behavior when the number of irrelevant features is large compared to the relevant ones. Our motivation to investigate filter and wrapper in this work can be ....

.... However, there were no comparisons with other filter methods (e.g. Hal99] Therefore, we investigate how the boosting based method of [TV00] compares with filter methods based on support vector machines (SVM) and information gain [Hal99] In terms of wrapper methods, the simulations in [Ng98] used only synthetic data. The ordered fs algorithm was used with real life (microarray) data in [XJK01] but no comparisons of ordered fs with other methods were presented. Our experimental framework seems appropriate to evaluate some conclusions presented in [Ng98] These comparisons will also ....

[Article contains additional citation context not shown here]

A. Ng. On feature selection: Learning with exponentially many irrelevant features as training examples. In International Conference on Machine Learning, 1998.


Feature Selection for High-Dimensional Genomic Microarray Data - Xing, Jordan, Karp (2001)   (10 citations)  (Correct)

.... are a rapidly maturing technology that provide the opportunity to assay the expression levels of thousands or tens of thousands of genes in a single experiment (Shalon et al. 1996) These assays provide the input to a wide variety of statistical modeling e orts, including classi cation, clustering, and density estimation. For example, by measuring expression levels associated with two kinds of tissue, tumor or non tumor, one obtains labeled data sets that can be used to build diagnostic classi ers. The number of replicates in these experiments are often severely limited, however; indeed, in ....

.... setting generally lead to the pessimistic conclusion that exponentially many data points are needed to provide guarantees of choosing good feature subsets, Ng has recently described a generic feature selection methodology, referred to as FS ORDERED, that leads to more optimistic conclusions (Ng, 1998). In Ng s approach, cross validation is used only to compare between feature subsets of di erent cardinality. Ng proves that this approach yields a generalization error that is upper bounded by the logarithm of the number of irrelevant features. In a problem with over 7000 features, ltering ....

[Article contains additional citation context not shown here]

Ng, A. (1998). On feature selection: Learning with exponentially many irrelevant features as training examples. Proceedings of the Fifteenth International Conference on Machine Learning.


Feature Selection for High-Dimensional Genomic Microarray Data - Xing, Jordan, Karp (2001)   (10 citations)  (Correct)

.... are a rapidly maturing technology that provide the opportunity to assay the expression levels of thousands or tens of thousands of genes in a single experiment (Shalon et al. 1996) These assays provide the input to a wide variety of statistical modeling e#orts, including classification, clustering, and density estimation. For example, by measuring expression levels associated with two kinds of tissue, tumor or non tumor, one obtains labeled data sets that can be used to build diagnostic classifiers. The number of replicates in these experiments are often severely limited, however; indeed, in ....

.... setting generally lead to the pessimistic conclusion that exponentially many data points are needed to provide guarantees of choosing good feature subsets, Ng has recently described a generic feature selection methodology, referred to as FS ORDERED, that leads to more optimistic conclusions (Ng, 1998). In Ng s approach, cross validation is used only to compare between feature subsets of di#erent cardinality. Ng proves that this approach yields a generalization error that is upper bounded by the logarithm of the number of irrelevant features. In a problem with over 7000 features, filtering ....

[Article contains additional citation context not shown here]

Ng, A. (1998). On feature selection: Learning with exponentially many irrelevant features as training examples. Proceedings of the Fifteenth International Conference on Machine Learning.


IGLUE: A Lattice-based Constructive Induction System. - Nguifo, Njiwoua (2001)   (1 citation)  (Correct)

....the optimal set of attributes by iteratively including (resp. deleting) a new feature. A random selection of attributes combined to a genetic algorithm was proposed by Skalak [Ska94] These methods do not consider the dependence between atttributes as it is the case with the Galois lattice. Ng [Ng98] gives a theoretical bound on error rate for ML systems using a feature 27 selection process (wrapper method) This is done by assuming that the selection process nds the best subset of features. Sebag [SS94] proposed a redescription process similar to the approach described in this paper. The ....

A. Ng. On Feature Selection: Learning with Exponentially many Irrelevant Features as Training Examples. In Proceedings of ICML'98. University of Wisconsin, Madison, 1998.


Automatic Selection of Visual Features and Classifiers - Jaimes, Chang (2000)   (1 citation)  (Correct)

....their performance. IB 1 IB 3 IB 5 ID3 Nave Bayes MC4 Pitcher [0, 1, 2, 4, 11, 12] CV: 0.43 T: 22.75 R: 116 [1, 2, 4, 7, 12, 15, 17, 37] CV: 0.43 T: 23.63 R: 118 [1, 2, 7, 12, 20, 37] CV: 0.58 T: 23.14 R: 115 1, 2, 7, 9, 11, 12, 19, 20, 37] CV: 1. 30 T: 22.36 R: 115 [1, 2, 3, 6, 12, 18, 21, 30, 37] CV: 1.29 T: 17.60 R: 112 [1, 2, 10, 11, 15, 16, 17, 19, 20, 22, 23, 24, 25, 26, 27, 28, 30, 35, 38] CV: 0.45 T: 27.03 R: 95 Top Grass [0, 2, 7, 8, 9, 21] CV: 0.14 T: 14.86 R: 142 [2, 5, 8, 9, 10, 11, 19, 35, 38] CV: 0.43 T: 16.25 R: 145 [0, 2, 7, 8, 9, 21,22] CV: ....

....7, 12, 20, 37] CV: 0.58 T: 23.14 R: 115 1, 2, 7, 9, 11, 12, 19, 20, 37] CV: 1.30 T: 22.36 R: 115 [1, 2, 3, 6, 12, 18, 21, 30, 37] CV: 1.29 T: 17.60 R: 112 [1, 2, 10, 11, 15, 16, 17, 19, 20, 22, 23, 24, 25, 26, 27, 28, 30, 35, 38] CV: 0. 45 T: 27.03 R: 95 Top Grass [0, 2, 7, 8, 9, 21] CV: 0.14 T: 14.86 R: 142 [2, 5, 8, 9, 10, 11, 19, 35, 38] CV: 0.43 T: 16.25 R: 145 [0, 2, 7, 8, 9, 21,22] CV: 0.14 T: 15.06 R: 145 [1, 2, 6, 8, 11, 12, 38] CV: 1.01 T: 16.23 R: 128 [0, 2, 3, 4, 7, 9, 10, 11, 12, 13, 25, 32, 34] CV: 1.00 T: 13.36 R: 124 [1, 2, 6, ....

[Article contains additional citation context not shown here]

A.Y. Ng, "On Feature Selection: Learning with Exponentially many Irrelevant Features as Training Examples", International Conference on Machine Learning, 1998.


The ANNIGMA-Wrapper Approach to Neural Nets Feature.. - Hsu, Schuschel, Yang (1999)   (Correct)

....results with the performance data reported in their papers, we found surprisingly that our simple approach outperforms their sophisticated approaches in almost all test datasets. This suggests that feature selection for neural nets might not be as difficult as previously considered. Meanwhile, Ng [14] presents a theoretical analysis for the wrapper model. The analysis suggests an wrapper based algorithm called ordered fs, which is proved to be able to tolerate having exponentially many irrelevant features in the number of training examples. This result may be applied to a heuristic guided ....

....the previous feature being eliminated is restored and the next worst ranked feature is eliminated. The process is iterated until a performance improving elimination is found for each size of feature subsets. This search strategy can be considered as a heuristic version of ordered fs in Ng [14]. ordered fs works by exhaustively searching for the feature subset with the lowest cross validation error rate for each size of subsets, then among the best feature subsets for each size, output the one with the lowest hold out error. ordered fs has been shown to be able to tolerate having ....

[Article contains additional citation context not shown here]

A. Y. Ng, "On feature selection: learning with exponentially many irrelevant features as training examples, " in Machine Learning: proceedings of the fifteenth international conference (ICML '98), (Madison, Wisconsin USA), Morgan Kaufmann, July 1998.


Generalisation Error Bounds for Sparse Linear Classifiers - Graepel, Herbrich.. (2000)   (Correct)

....due to Novikoff we demonstrate that our new results put classifiers found by this algorithm on a firm theoretical basis. 1 Introduction Sparseness in the representation of knowledge has long been considered advantageous. While sparsity in the original features is addressed in feature selection [11] we deal with a different kind of sparsity. Many learning algorithms are based on a dual representation of linear classifiers: The weight vector is represented as a linear combination of input vectors in a kernel space whose existence can be ensured by the application of Mercer kernels. Examples ....

A. Y. Ng. On feature selection: Learning with exponentially many irrelevant features as training examples. In Proceedings of the 15th International Conference on Machine Learning, pages 404--412. Morgan Kaufmann, 1998.


Convergence rates of the Voting Gibbs classifier, with.. - Ng, Jordan   Self-citation (Ng)   (Correct)

No context found.

Machine Learning, 2, 285--318. Ng, A. Y. (1998). On Feature Selection: Learning with exponentially many irrelevant features as training examples. Proceedings of the Fifteenth International Conference on Machine Learning (pp. 404-- 412). Morgan Kaufmann.


Filter Methods - Duch (2004)   (Correct)

No context found.

A.Y. Ng. On feature selection: learning with exponentially many irrelevant features as training examples. In Proceedings of the 15th International Conference on Machine Learning, pages 404--412, San Francisco, CA, 1998. Morgan Kaufmann.


Learning Structured Visual Detectors From User Input At Multiple.. - Jaimes (2001)   (1 citation)  (Correct)

No context found.

A.Y. Ng, "On Feature Selection: Learning with Exponentially many Irrelevant Features as Training Examples," in proceedings of International Conference on Machine Learning (ICML), 1998.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC