42 citations found. Retrieving documents...
J. Kittler. Feature selection and extraction. In T. Y. Young and K.-S. Fu, editors, Handbook of Pattern Recognition and Image Processing. Academic Press, New York, 1986.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Evolutionary Model Selection in Unsupervised Learning - Kim, Street, Menczer (2002)   (Correct)

....dimensionality. # Corresponding author: YongSeog Kim, Tel. 1 435 797 2271; Fax: 1 435 797 2351; E mail: ykim b202.usu.edu. 1088 467X 02 8.00 2002 IOS Press. All rights reserved Most feature selection algorithms have focused on heuristic search approaches, such as sequential search [36], nonlinear optimization [9] and genetic algorithms [54] Recent reviews of these methods can be found in [13,40] Regardless of the search algorithm employed, these methods evaluate potential solutions in terms of predictive accuracy. Specifically, the data set could be divided into training ....

....from uniform distributions in the unit interval, so that the clusters may overlap. We present some 2 dimensional projections of the synthetic data set in Fig. 5. For further comparisons we have implemented a greedy heuristic algorithm known as the plus 2 take away 1 sequential selection algorithm [36]. This is a reasonable choice for a comparative algorithm because we want our algorithm to outperform most commercial statistical programs (e.g. SAS and SPSS) that implement simpler search algorithms, such as sequential forward and backward selection, for feature selection. Since the greedy ....

J. Kittler, Feature selection and extraction, in: Handbook of Pattern Recognition and Image Processing, Y. Fu, ed., New York, 1986, Academic Press.


Multi-Class Linear Dimension Reduction by Weighted.. - Loog, Duin, Haeb-Umbach (2001)   (3 citations)  (Correct)

....related to the classification rate. This also holds for the eigenvalue decomposition based approach by Young and Odell [14] Procedures that deal with the class overlap problem are usually iterative and thereby much more computationally demanding, e.g. the Patrick Fisher approach described in [8], the nonlinear principal component analysis by neural networks, 12] and the general nonparametric approach suggested by Buturovic[1] 2 The Fisher Criterion and its Non Optimality Multi class LDR is concerned with the search for a linear transformation that reduces the dimension of a given ....

J. Kittler. Feature selection and extraction. In Handbook of pattern recognition and image processing. Academic Press, 1986.


Hierarchical Feature Selection: A Decision Tree based Approach. - Ferri Albert Gracia   (Correct)

....function. This eventually leads to a particular subset or family of subsets with different sizes that are near to the optimal solution. In particular, there are optimal algorithms that give the best solution for a given subset size if the criterion function satisfies the monotonicity condition [4]. Apart from the fact that the concept of best solution is not clearly defined, these optimal solution are not applicable in practice due to the high computational burden. From the fact that specific classification problems may require more accurate feature selection, there is a recent interest ....

Kittler, J. "Feature Selection and Extraction", Handbook of Pattern Recognition and Image Processing, 1986


Cancer Diagnosis And Prognosis Via Linear-Programming-Based.. - Street (1994)   (5 citations)  (Correct)

....missing values to the mean is a common practice which is particularly appropriate here. Since we are building linear models, a missing value has no effect on the prediction for that case. The feature selection procedure is a variation of the heuristic sequential backward elimination method [45], a top down, greedy search through the space of feature sets. The procedure begins by setting aside a tuning or validation set, that is a surrogate testing set, in our case a randomly selected 10 of the training cases. The regular RSA procedure is then applied to the training set, finding the ....

J. Kittler. Feature selection and extraction. In Young & Fu, editor, Handbook of Pattern Recognition and Image Processing. Academic Press, New York, 1986.


Assessing the Importance of Features for Multi-Layer.. - Egmont-Petersen.. (1998)   (Correct)

....of features in the general case where the type of distribution of the features is unknown. Alternative assessment criteria that are easier to compute have been suggested. Among these, probabilistic distance measures, dependence measures and entropy measures have been proposed (for overviews see Kittler, 1986; Siedlecki et al. 1988) With some distance measures, bounds of the error rate for the assessed feature subset can be determined. Most distance measures are inferior to the marginal contribution because their relationship with the error rate is often very loose (Kittler, 1986) Another drawback ....

....(for overviews see Kittler, 1986; Siedlecki et al. 1988) With some distance measures, bounds of the error rate for the assessed feature subset can be determined. Most distance measures are inferior to the marginal contribution because their relationship with the error rate is often very loose (Kittler, 1986). Another drawback of using the probabilistic distance and dependency measures as assessment criteria is that they do not take into account the properties of a particular classifier, i.e. the contribution of each feature to classifier performance (Foroutan et al. 1987; Siedlecki et al. 1988) A ....

Kittler, J. (1986). Feature selection and extraction. In T. Y. Young & K.-S. Fu (Eds.), Handbook of Pattern Recognition and Image Processing. Orlando: Academic Press.


Representation Quality in Text Classification: An Introduction and .. - Lewis   (Correct)

....the statistical properties of syntactic phrases could be corrected, without degrading their desirable semantic properties, then the quality of this form of representation will be improved. A number of dimensionality reduction techniques from pattern recognition potentially would have this effect [13]. One approach is to use cluster analysis [1] to recognize groups of redundant attributes and replace them with a single attribute. We recently conducted a preliminary experiment testing this approach. 1 The titles and abstracts of the 3204 documents in the CACM 3204 test collection [9] were ....

J. Kittler. Feature selection and extraction. In Tzay Y. Young and King-Sun Fu, editors, Handbook of Pattern Recognition and Image Processing, pages 59--83. Academic Press, Orlando, 1986.


Wrappers for Feature Subset Selection - Kohavi, John (1996)   (329 citations)  (Correct)

....an induction algorithm that is run on data containing only these features generates a classifier with the highest possible accuracy. Note that feature subset selection chooses a set of features from existing features, and does not construct new ones; there is no feature extraction or construction (Kittler 1986, Rendell Seshu 1990) From a purely classification theoretical standpoint, the question of which features to use is not of much interest. A Bayes rule, or a Bayes classifier, is a rule that predicts the most probable class for a given instance, based on the full distribution D (assumed to be ....

....estimation strategy (Kaelbling 1993) However, in all cases 33 the worst case bound remains the same and the optimal tradeoff between exploration and exploitation was empirically determined to be domain dependent. 8 Related Work The pattern recognition literature (Devijver Kittler 1982, Kittler 1986, Ben Bassat 1982) statistics literature (Draper Smith 1981, Miller 1984, Miller 1990, Neter et al. 1990) and recent machine learning papers (Almuallim Dietterich 1991, Almuallim Dietterich 1994, Kira Rendell 1992a, Kira Rendell 1992b, Kononenko 1994) consist of many such measures for ....

Kittler, J. (1986), Feature Selection and Extraction, Academic Press, Inc, chapter 3, pp. 59--83.


Lazy Learning of Bayesian Rules - Zheng, Webb   (5 citations)  (Correct)

....Sequential Selection (FSS) method is used for selecting a subset of the available attributes with which to build a naive Bayesian classifier. Pazzani (1996) also investigates attribute deletion for naive Bayesian classifiers using the Backward Sequential Elimination (BSE) and FSS approaches (Kittler, 1986). In addition, Kubat, Flotzinger, and Pfurtscheller (1993) show that using decision tree learning as a pre process to select attributes for naive Bayesian classification performs better than either decision tree learning or naive Bayesian classification alone in a domain for discovering patterns ....

....including new attributes is returned as the final classifier. The Bsej algorithm used here is our implementation. The selective naive Bayesian classifier Bse is our implementation of the backward sequential attribute elimination algorithm for naive Bayesian classification (Pazzani, 1996; Kittler, 1986). It is exactly the same as Bsej except that at each step of the greedy search, Bse only considers deleting one existing attribute. Another algorithm with which Lbr should be compared is LazyDT (Friedman et al. 1996) LazyDT also generates rules (Friedman et al. 1996, refer to them as decision ....

Kittler, J. (1986). Feature selection and extraction. In T.Y. Young & K. Fu (Eds.), Handbook of Pattern Recognition and Image Processing (pp. 59-81). San Diego, CA: Academic Press.


Use of Domain Knowledge in Constructive Induction - Callan (1990)   (1 citation)  (Correct)

....system. One solution to this problem is to enable the machine learning program to choose its own vocabulary. This approach is usually called constructive induction [Michalski, 1983] although it is also called the new term problem [Dietterich, London, Clarkson Dromey, 1982] feature extraction [Kittler, 1986] and feature generation [Rendell, 1985] Most implementations are incremental; whenever the program is not making reasonable progress in achieving some goal, domain independent heuristics are used to change the vocabulary. Constructive induction may be faster than a manual search of the space of ....

Kittler, J. (1986). Feature selection and extraction. In Young & Fu (Eds.), Handbook of pattern recognition and image processing. New York: Academic Press.


Feature Selection in Unsupervised Learning via Evolutionary.. - Kim, Street, Menczer (2000)   (11 citations)  (Correct)

....solutions both by examining the selected features and by judging the semantics of the resulting clusters. Another way to evaluate our approach is by comparison with an alternative algorithm. For this purpose we have implemented a greedy heuristic algorithm known as two way se9 quential selection [18]. Our implementation of this algorithm for clustering requires a set value of K and uses Fwithin as the only optimization criterion. The algorithm begins by nding the single dimension along which the objective is optimized. This dimension constitutes the initial feature set. At each successive ....

J. Kittler. Feature selection and extraction. In Y. Fu, editor, Handbook of Pattern Recognition and Image Processing, New York, 1978. Academic Press.


Feature Selection in Unsupervised Learning via Evolutionary .. - Yongseog Kim Management (2000)   (11 citations)  (Correct)

....a search algorithm that explores the combinatorial space of feature subsets, and one or more criterion functions that evaluate the qualityofeach subset based directly on the predictive model. Most feature selection research has focused on heuristic search approaches, such as sequential search [13], nonlinear optimization [5] and genetic algorithms (GAs) 17] A recent review of these methods can be found in [6] These methods considered feature selection in a supervised learning context, evaluating potential solutions in terms of predictive accuracy. We instead wish to nd natural ....

....about the clusters and the relevant features. In this case, we can evaluate the solutions both by examining the selected features and by judging the semantics of the resulting clusters. For further comparisons wehave implemented a greedy heuristic algorithm known as two way sequential selection [13]. This algorithm requires a set value of # and uses ####### as the optimization criterion. It begins by nding the single dimension along which the objective is optimized. At each successive step, the algorithm adds an additional feature that, when combined with the currentset,forms the best ....

J. Kittler. Feature selection and extraction. In Y. Fu, editor, ######## ## ####### ########### ### ##### ##########, New York, 1978. Academic Press.


Simultaneous Evolution of Feature Subset and Neural.. - Hallinan, Jackway (1999)   (Correct)

....we proceed on the basis that some smaller subset of these features is jointly sufficient, or even optimal, for the classification of new, unseen data. The goal of feature selection is simply to find this optimal subset of features, or in practical terms, to find a subset which performs well [11, 5]. In machine learning systems where the classifier must be trained on a representative set of training data it is well known that reducing the number of features used may actually reduce the classification error [9] A point often made but not always appreciated is that the optimal feature set ....

J. Kittler. Feature selection and extraction. In T. Y. Young and K. S. Fu, editors, Handbook of Pattern Recognition and Image Processing. Academic Press, New York, 1986.


Learning and Revising User Profiles: The Identification of.. - Pazzani, Billsus (1997)   (82 citations)  (Correct)

....cause problems if many words that aren t very relevant are used as features. On average for the values we tested, 96 performed best. One might consider using information gain to select a large group of informative features, and then using existing approaches for feature subset selection (e.g. Kittler, 1986; John et al. 1994) to select some of these features using a criteria other than informativeness. However, such algorithms increase the complexity of the Bayesian classifier, making it impractical to learn a profile interactively. Furthermore, such approaches are likely to overfit the example ....

Kittler, J. (1986). Feature selection and extraction. In Young, & Fu, (Eds.), Handbook of Pattern Recognition and Image Processing. Academic Press, New York.


Modeling Languages and Condor: Metacomputing for Optimization - Ferris, Munson (1998)   (Correct)

....selection problem chooses a small number of the data characteristics with the best predictive capability. This problem is applicable in numerous situations and is becoming increasingly important, especially in data mining. Many approaches to solving the problem have been postulated and used [5, 6, 7, 20, 21, 22, 23, 29]. The method presented in this paper generates a large number of independent mixed integer programs (MIP) To make this technique practical, we need to perform the individual optimizations in parallel. Rather than require a large parallel computer, we utilize a metacomputer, a confederation of ....

J. Kittler. Feature selection and extraction. In T. Y. Young and K.-S. Fu, editors, Handbook of Pattern Recognition and Image Processing. Academic Press, New York, 1986.


Modeling Languages and Condor: Metacomputing for Optimization - Ferris, Munson (1998)   (Correct)

....selection problem chooses a small number of the data characteristics with the best predictive capability. This problem is applicable in numerous situations and is becoming increasingly important, especially in data mining. Many approaches to solving the problem have been postulated and used [5, 6, 7, 20, 21, 22, 23, 29]. The method presented in this paper generates a large number of independent mixed integer programs (MIP) To make this technique practical, we need to perform the individual optimizations in parallel. Rather than require a large parallel computer, we utilize a metacomputer, a confederation of ....

J. Kittler. Feature selection and extraction. In T. Y. Young and K.-S. Fu, editors, Handbook of Pattern Recognition and Image Processing. Academic Press, New York, 1986.


Adaptive Fraud Detection - Fawcett, Foster (1997)   (40 citations)  (Correct)

....14 FAWCETT AND PROVOST A feature selection process is used to reduce the number of monitors in the final detector. Some of the rules do not perform well when used in monitors, and some monitors overlap in their fraud detection coverage. We therefore employ a sequential forward selection process (Kittler 1986) which chooses a small set of useful monitors. Empirically, this simplifies the final detector and increases its accuracy. The final output of DC 1 is a detector that profiles each user s behavior based on several indicators, and produces an alarm if there is sufficient evidence of fraudulent ....

Kittler, J. (1986). Feature selection and extraction. In K. S. Fu (Ed.), Handbook of pattern recognition and image processing, pp. 59--83. New York: Academic Press.


Exploiting Context in Feature Selection - Domingos   (Correct)

....by more than this amount. RC was also found to be faster than BSS and FSS in all domains, sometimes by a large factor [Domingos, in press] 4 RELATED WORK Variations of FSS and BSS are described and evaluated in [Aha and Bankert, 1994] Beyond the pattern recognition approaches surveyed in [Kittler, 1986] and [Devijver and Kittler, 1982] many methods for feature selection have been proposed in the artificial intelligence literature in recent years [Kibler and Aha, 1987, Almuallim and Dietterich, 1991, Kira and Rendell, 1992, Cardie, 1993, Schlimmer, 1993, Vafaie and DeJong, 1993, Caruana and ....

J. Kittler. Feature selection and extraction. In T. Y. Young and K. S. Fu, editors, Handbook of Pattern Recognition and Image Processing. Academic Press, New York, NY, 1986.


Multivariate versus Univariate Decision Trees - Brodley, Utgoff (1992)   (24 citations)  (Correct)

....this end, LMDT s dispersion measure computes for each variable the average squared distance between the weights of each pair of classes and then eliminates the variable that has the smallest dispersion. This measure is analogous to the Euclidean interclass distance measure for estimating error (Kittler, 1986). A thermal linear machine has converged when the magnitude of each correction to the linear machine is larger than the amount permitted by the thermal training rule for each instance in the training set. However, one does not need to wait until convergence to begin discarding variables. The ....

....is effective in reducing total training time without reducing the quality of the learned classifier. 2.4 Relationship to Other Methods for Finding Multivariate Tests The problem of finding multivariate splits for decision trees has been studied in both pattern recognition and machine learning. Kittler (1986) describes several approaches for linear feature combination. In this framework, LMDT performs a sequential backward selection (SBS) search for a good combination of features. An SBS search is a top down search method that starts with all of the initial features and tries to remove the feature ....

[Article contains additional citation context not shown here]

Kittler, J. (1986). Feature selection and extraction. In Young & Fu (Eds.), Handbook of pattern recognition and image processing. New York: Academic Press.


Journal of Machine Learning Research 1 (2002) 1-48.. - Support Vector Machines   (Correct)

No context found.

J. Kittler. Feature selection and extraction. In T. Y. Young and K.-S. Fu, editors, Handbook of Pattern Recognition and Image Processing. Academic Press, New York, 1986.


Case Study: - Steel Surface Classification   (Correct)

No context found.

Kittler J., "Feature Selection and Extraction" Handbook of Pattern Recognition and Image Processing T. Y. Young and K. S. Fu (editors) Academic Press Inc., London,


Unknown - (1993)   (Correct)

No context found.

Kittler, J. (1986). Feature selection and extraction. In Young & Fu (Eds.), Handbook of pattern recognition and image processing. New York: Academic Press.


A Rough Sets Based Approach to Feature Selection - Zhang, Yao   (Correct)

No context found.

J. Kittler, "Feature selection and extraction," In Young and Fu (Eds.), Handbook of pattern recognition and image processing, pp203-217, New York: Academic Press, 1986.


Feature Selection and Classifier Ensembles: A Study on.. - Yu (2003)   (Correct)

No context found.

J. Kittler. Feature selection and extraction. In Tzay Y. Young and King-Sun Fu, editors, Handbook of Pattern Recognition and Image Processing, pages 59--83. Academic Press, 1986. 47


ROC Method for the Evaluation of Multi-class Segmentation .. - Rees, Wright, Greenway (2002)   (Correct)

No context found.

J. Kittler. Feature selection and extraction. In T. Young and K Fu, editors, Handbook of Pattern Recognition and Image Processing, pages 59--83. NY Academic Press, 1986.


Dynamic Reducts and Statistical Inference - Bazan (1996)   (Correct)

No context found.

J. Kittler (1986). Feature selection and extraction, In Young and Fu (eds.), Handbook of pattern recognition and image processing. New York: Academic Press.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC