47 citations found. Retrieving documents...
W. Siedlecki, J. Sklansky. On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence, 2(2):197--20fi 1988.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Conventional and Evolutionary Feature Selection of SAR Data.. - Mayer, Somol (2000)   (Correct)

....distance measures, e.g. Mahalanobis or Bhattacharyya distance, may be appropriate to evaluate the quality of a feature subset. As pointed out by Siedlicki and Sklansky (1988) the error rate with respect to the chosen measurement criterion J( is even better (computational feasibility provided) [6]. Bhattacharyya Distance In the following formulation B distance measures the separability of normal distributions for two classes indexed by i and k [4] B ik = 1 8 ( i k ) t ( i k ) 1 2 ln j j j i j j k j (1) i k (2) where i , k are the ....

W. Siedlecki and J. Sklansky. On automatic feature selection. International Journal of Pattern Recognition and Arti cial Intelligence, 2(2):197{ 220, June 1988.


Genetic Feature Subset Selection for Gender.. - Sun, Bebis, Yuan, Louis (2002)   (Correct)

....might not allow the classifier to generalize nicely, especially when the training set is small. Exhaustive evaluation of possible feature subsets is usually computationally prohibitive in practice. A number of feature selection approaches have been proposed in the literature (see Siedlecki et al. [12], Jain et al. 13] for comprehensive surveys) C. Overview of proposed method Automatic feature subset selection distinguishes the proposed gender classification method from all other reported approaches. In particular, GAs [14] are employed to select features that encode important gender ....

W. Siedlecki and J. Sklansky, "On automatic feature selection," International Journal of Pattern Recognition and Artificial Intelligence, vol. 2, no. 2, pp. 197--220, 1988.


Less is More: Genetic Optimisation of Nearest Neighbour.. - Ramos, Muge (1998)   (1 citation)  (Correct)

....AZU CAR COR EUL EVO FAV JAN SAL SPI VIM Figure 1 Portuguese grey granites, commercial designation (each image represents an area of approximately 4 x 4 cm [3] used a K NN rule Classifier to evaluate feature sets for counterpropagation networks training. Another authors used the same approach ([10], 11] for other kinds of Classifiers. In the present case, a Nearest Neighbour rule Classifier was applied to Portuguese Granite Classification. From 237 Granite images, 50 were selected randomly for future performance evaluation (testing set) Each sample on the training set (187 samples) ....

W. Siedlecki; J.Sklansky; On automatic Feature selection, Int. J. Pattern Recognition Art. Intell., vol. 2, n" 2, pp.197-200, 1988.


From Feature Extraction to Classification: A.. - Ramos, Pina, Muge (1999)   (Correct)

....methods for searching the feature space to apply in nearest neighbour rule prototypes, which is the case presented in this paper. For instance, Brill et al. 5] used a k NNR classifier to evaluate feature sets for counter propagation networks training. Some other authors used the same approach ([6], 7] for another kind of classifiers. 2. GATHERING DATA A collection of 14 Portuguese grey granites was previously defined to be studied [9] Although this commercial label includes the real grey types, it also includes other similar colourless types (bluish, whitish and yellowish, for instance) ....

Siedlecki W. Sklansky J., On automatic Feature selection, Int. J. Pattern Recognition Art. Intell., 2, 2, 197-200, 1988.


Feature Selection from Huge Feature Sets - Bins, Draper (2001)   (4 citations)  (Correct)

.... reduction [24] space partitioning [17] feature extraction and decision trees [21] Many algorithms have been proposed for feature selection, from simple algorithms like Sequential Forward Selection (SFS) 18] to more complex algorithms such as neural net prunning [6] and genetic selection [22]. Surveys of feature selection algorithms are given by Kittler [12] Siedlecki and Sklansky [22] and Bins [1] For this work, the most relevant algorithms are: Relief, proposed by Kira and Rendell [11] in 1992 and extended by Kononenko [15] to handle noisy, incomplete and multi class data sets; ....

.... algorithms have been proposed for feature selection, from simple algorithms like Sequential Forward Selection (SFS) 18] to more complex algorithms such as neural net prunning [6] and genetic selection [22] Surveys of feature selection algorithms are given by Kittler [12] Siedlecki and Sklansky [22] and Bins [1] For this work, the most relevant algorithms are: Relief, proposed by Kira and Rendell [11] in 1992 and extended by Kononenko [15] to handle noisy, incomplete and multi class data sets; and Sequential Floating Forward Selection (SFFS) and Sequential Floating Backward Selection ....

W. Siedlecki and J. Sklansky, "On Automatic Feature Selection ", International Journal of Pattern Recognition and Artificial Intelligence, 2(2):197-220, 1988.


Assessing the Importance of Features for Multi-Layer.. - Egmont-Petersen.. (1998)   (Correct)

....the MLP. In these situations, feature selection is often a desired task. Ideally, when the acquisition costs of the features are equal, one wants to rank the available features according to the change in correctness that results from removing or adding the respective feature from the feature set (Siedlecki et al. 1988). We define the marginal contribution of a feature k among a set of n features as the difference in error rate of a classifier based on all n features and a classifier based on all but the kth feature. Our goal is to estimate the marginal contribution of each feature used in a trained MLP and to ....

....Leiden, The Netherlands; E mail: michael lkeb.azl.nl. 0893 6080 98 19.00 # 1998 Elsevier Science Ltd. All rights reserved. PII S0893 6080(98)00031 8 Neural Networks 11 (1998) 623 635 PERGAMON Neural Networks 1987; Holz et al. 1994; Karthaus et al. 1995; Kittler, 1980; Kudo et al. 1993; Siedlecki et al. 1988, 1989; Stahlberger et al. 1997) The best subset of features is obtained by a feature selection procedure. Such a procedure investigates different subsets of features according to a search scheme. At each step, the feature subsets are compared according to an assessment criterion. The procedure ....

[Article contains additional citation context not shown here]

Siedlecki W., & Sklansky J. (1988). On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence, 2 (2), 197--220.


Parcel: Feature Subset Selection in Variable Cost Domains - Scott, Niranjan, Prager (1998)   (18 citations)  (Correct)

.... the algorithms make assumptions that are not valid in this domain, as noted by John et al.[52] Recent comparative studies of these and more recent algorithms, applied to machine learning, include those carried out by Dash and Liu[26] Gordon and desJardins[42] Langley [63] Siedlecki and Sklansky[100], Jain and Zongker[50] and Kohavi and John[59] the last of which is referenced in detail in subsequent chapters of this report. In general, feature subset selection algorithms have two components: an evaluation function J( Delta) which scores candidate feature sets, and a search engine for ....

....do not, over arbitrary feature subsets drawn from the same original features, induce the same order of preference as that obtained by comparing the errors of the Bayes classifier. This evidence, coupled with the reasons detailed above, has lead many researchers, such as Siedlecki and Sklansky [100] to conclude: it seems that the only promising and legitimate way of evaluating features must be through the error rate of the classifier being designed. Consequently, increasing interest is falling upon feature selection algorithms that are classifier inclusive, i.e. that use the test ....

[Article contains additional citation context not shown here]

W. Siedlecki and J. Sklansky. On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence, 2(2):197--220, 1988.


Wrappers for Feature Subset Selection - Kohavi, John (1996)   (329 citations)  (Correct)

....a monotonic measure and that selects one feature at a time can find the best feature subset of a desired size; even a 2 1 algorithm that adds the best pair and removes the worst single feature can fail. More recent papers attempt to use AI techniques, such as beam search and bidirectional search (Siedlecki Sklansky 1988), best first search (Xu, Yan Chang 1989) and genetic algorithms (Vafai De Jong 1992, Vafai De Jong 1993) All the algorithms described above use a deterministic evaluation function, although in some cases they can easily be extended to probabilistic estimates, such as cross validation that ....

Siedlecki, W. & Sklansky, J. (1988), "On automatic feature selection", International Journal of Pattern Recognition and Artificial Intelligence 2(2), pp. 197--220.


Improving Statistical Measures of Feature Subsets by.. - Mayer, Somol, Huber..   (Correct)

....distance measures, e.g. Mahalanobis or Bhattacharyya distance, may be appropriate to evaluate the quality of a feature subset. As pointed out by Siedlicki and Sklansky (1988) the error rate with respect to the chosen measurement criterion J( is even better (computational feasibility provided) (Siedlecki and Sklansky, 1988). 1.2 Bhattacharyya Distance for Feature Selection In the following formulation B distance measures the separability of normal distributions for two classes indexed by i and k (Fukunaga, 1990) B ik = 1 8 ( i k ) t 1 ( i k ) 1 2 ln j j j i j 1 2 j k j 1 2 ; ....

Siedlecki, W. and Sklansky, J. (1988). On automatic feature selection. International Journal of Pattern Recognition and Articial Intelligence, 2(2):197-220.


A Monotonic Measure for Optimal Feature Selection - Liu, Motoda, Dash (1998)   (1 citation)  (Correct)

....while still aiming at the optimal subset. Examples are Branch Bound [15, 17] Relief [7, 10] Wrapper methods [8] Approximate Markov Blanket [9] and LVF [12] We will review some of these methods briefly in the next section. The feature selection problem can be viewed as a search problem [18,17, 11]. The search process starts with either an empty set or a full set. For the former, it expands the search space by adding one feature at a time (Sequential Forward Selection) 17]# for the latter, it expands the search space by deleting one feature at a time (Sequential Backward Selection) 15] ....

....S 0 oe S 1 oe : oe Sn ) U (S 0 ) U (S 1 ) U (Sn ) In this case, the search can be complete but not exhaustive. That means it need not exhaustively search the whole space but the optimal subset is guaranteed. Many distance and information based measures have been shown to be nonmonotonic [18]. Many researchers pointed out that the only remaining alternative is to use the error rate of a classifier as the measure. Among many classifiers, however, only the Bayes Classifier satisfies this monotonicity condition 5 because other classifiers adopt some assumptions and employ certain ....

[Article contains additional citation context not shown here]

W. Siedlecki and J Sklansky. On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence, 2:197--220, 1988.


Consistency Based Feature Selection - Dash, Liu, Motoda   (Correct)

....no two examples with the same values on S have different class labels [1] We study the pros and cons of this measure in comparison with other measures. Another aspect of feature selection is related to the study of search strategies. Extensive research efforts have been devoted to this study [19, 7, 3]. Examples are Branch Bound [16] Relief [11] Wrapper methods [12] and Las Vegas algorithms [14] The search process starts with either an empty set or a full set. For the former, it expands the search space by adding one feature at a time (Forward Selection) an example is Focus [1] for the ....

....However, if each node is evaluated by a measure U and an upper limit is set for the acceptable values of U , then B B backtracks whenever an infeasible node is discovered. If U is monotonic, no feasible node is omitted and savings of search time do not sacrifice optimality. As pointed out in [19], the measures used in [16] such as accuracy have disadvantages (e.g. non monotonicity) the authors of [19] proposed the concept of approximate monotonicity. ABB [13] is an automated B B algorithm having its bound as the inconsistency rate of the data when the full set of features is used. It ....

[Article contains additional citation context not shown here]

W. Siedlecki and J Sklansky. On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence, 2:197--220, 1988.


A Monotonic Measure for Optimal Feature Selection - Huan Liu And (1998)   (1 citation)  (Correct)

....question. Therefore, various feature selection methods have been designed to avoid exhaustive search while still aiming at the optimal subset. Examples are Branch Bound [7] Focus [1] Relief [4] Wrapper methods [3] and LVF [5] The feature selection problem can be viewed as a search problem [9]. The search process starts with either an empty set or a full set. For the former, it expands the search space by adding one feature at a time (Sequential Forward Selection) 1] for the latter, it expands the search space by deleting one feature at a time (Sequential Backward Selection) 7] As ....

..... The monotonicity condition requires that: S 0 oe S 1 oe : oe Sn ) U (S 0 ) U (S 1 ) U (Sn ) In this case, the search can be complete but not exhaustive. In other words, the optimal subset is guaranteed. Many distance and information based measures have been shown to be non monotonic [9]. Many researchers pointed out that the only remaining alternative is to use the error rate of a classifier as the measure. Among many classifiers, however, only the Bayes Classifier satisfies this monotonicity condition 1 because other classifiers adopt some assumptions and employ certain ....

[Article contains additional citation context not shown here]

W. Siedlecki and J Sklansky. On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence, 2:197--220, 1988.


Incremental Feature Selection - Liu, Setiono (1998)   (5 citations)  (Correct)

....approach to finding the optimal d features would require examining P d i=0 Gamma N i Delta subsets. The number of possible subsets grows exponentially. Researchers have designed different strategies in search of optimal subsets of d features (Branch and Bound [20] and its variations [26], many heuristic and stochastic methods [5, 7] If we view 1 these feature selection algorithms from the perspective of using an induction algorithm, as pointed out in [8] the work on feature selection can be divided into filter and wrapper models. In a filter model, a feature selector is ....

.....5161 Vote 98.0 62.0 9.9 9.9 1 Mushroom 236.0 74.0 0.33 1.16 .0004 In cases indicated by , the comparison between before and after is obvious. Tables 3 shows that results are consistent with the known fact that there are no bad features from the standpoint of Bayesian decision rules [26]. In all the datasets tested using NBC, only table sizes are all reduced (except Monk2) due to feature selection; error rates are not significantly changed in seven out of nine datasets. For the two datasets (SoybeanL and Mushroom) the latter s error rate increases a little in absolute ....

W. Siedlecki and J. Sklansky. On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence, 2:197--220, 1988.


Feature Selection in Unsupervised Learning via Evolutionary.. - Kim, Street, Menczer (2000)   (11 citations)  (Correct)

....data sets with even moderate dimensionality. Most research on search algorithms has used heuristic search approaches in favor of eciency rather than optimality. For instance, algorithms such as sequential search [30, 19] branch and bound [26] nonlinear optimization [5] and simulated annealing [27] have been applied. The formulation of feature selection as a combinatorial optimization problem has also lead to the use of genetic algorithms [28, 31] A recent review of these methods can be found in [8] Regardless of the search algorithm employed, most previous methods evaluated potential ....

W. Siedlecki and J. Sklansky. On automatic feature selection. International Journal of Pattern Recognition and Articial Intelligence, 2(2):197-220, 1988.


Hybrid Search of Feature Subsets - Dash, Liu (1998)   (2 citations)  (Correct)

....S have different class labels [1] Applying this criterion does make feature selection simpler and the class separability of the original data can be retained. This aspect of feature selection is related to the study of search strategies. Extensive research effort has been devoted to this study [19, 18, 10]. Examples are Branch Bound [15, 18] Relief [6, 9] Wrapper methods [7] Approximate Markov Blanket [8] and LVF [13] The search process starts with either an empty set or a full set. For the former, it expands the search space by adding one feature at a time (Step wise Forward Selection) an ....

....values of U , then Branch Bound backtracks whenever an infeasible node is discovered. If U is monotonic, no feasible node is omitted as a result of early backtracking and, therefore, gained savings of search time do not violate the optimality of the selected subset. As was pointed out in [19], the measures used in [15] have disadvantages (nonmonotonicity is one) the authors of [19] proposed approximate monotonicity. ABB is a Branch Bound algorithm where the bound is the inconsistency rate of the dataset with the full set of features. It starts with the full set of features S 0 , ....

[Article contains additional citation context not shown here]

W. Siedlecki and J. Sklansky. On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence, 2:197--220, 1988.


Feature Selection for Classification - Dash, Liu (1997)   (36 citations)  (Correct)

....a generation procedure to generate the next candidate subset; 2. an evaluation function to evaluate the subset under examination; 3. a stopping criterion to decide when to stop; and 4. a validation procedure to check whether the subset is valid. The generation procedure is a search procedure [46,26]. Basically, it generates subsets of features for evaluation. The generation procedure can start: i) with no features, ii) with all features, or (iii) with a random subset of features. In the first two cases, features are iteratively added or removed, whereas in the last case, features are ....

....or with the results of competing feature selection methods using artificial datasets, real world datasets, or both. There have been quite a few attempts to study feature selection methods based on some framework or structure. Prominent among these are Doak s [13] and Siedlecki and Sklansky s [46] surveys. Siedlecki and Sklansky discussed the evolution of feature selection methods and grouped the methods into past, present, and future categories. Their main focus was the branch and bound methods [34] and its variants, 16] No experimental study was conducted in this paper. Their survey ....

Siedlecki, W. and Sklansky, J., On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence, 2:197--220, 1988.


Modeling Languages and Condor: Metacomputing for Optimization - Ferris, Munson (1998)   (Correct)

....selection problem chooses a small number of the data characteristics with the best predictive capability. This problem is applicable in numerous situations and is becoming increasingly important, especially in data mining. Many approaches to solving the problem have been postulated and used [5, 6, 7, 20, 21, 22, 23, 29]. The method presented in this paper generates a large number of independent mixed integer programs (MIP) To make this technique practical, we need to perform the individual optimizations in parallel. Rather than require a large parallel computer, we utilize a metacomputer, a confederation of ....

W. Siedlecki and J. Sklansky. On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence, 2(2):197--220, 1988.


Modeling Languages and Condor: Metacomputing for Optimization - Ferris, Munson (1998)   (Correct)

....selection problem chooses a small number of the data characteristics with the best predictive capability. This problem is applicable in numerous situations and is becoming increasingly important, especially in data mining. Many approaches to solving the problem have been postulated and used [5, 6, 7, 20, 21, 22, 23, 29]. The method presented in this paper generates a large number of independent mixed integer programs (MIP) To make this technique practical, we need to perform the individual optimizations in parallel. Rather than require a large parallel computer, we utilize a metacomputer, a confederation of ....

W. Siedlecki and J. Sklansky. On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence, 2(2):197--220, 1988.


Towards an Evolutionary Algorithm: A Comparison of Two Feature.. - Chen, Liu   (Correct)

....one algorithm can be overcome by the other and what new problems are. Based on the experimental results, we will evaluate, in the context of data mining, the pros and cons of an evolutionary algorithm for feature selection yet without using classification accuracy as a part of a fitness function [23, 12, 17]. We investigate the factors that should be considered for such an evolutionary algorithm and propose directions of evolutionary feature selection for data mining. 2 Basic Concepts A data instance is typically described to a classification learning algorithm as an assignment of values (a 1 ; a 2 ....

....otherwise the bit has value 0, the feature is not selected. Given selected features, we can evaluate the number of inconsistencies IC (to be explained later) for a given data set. Classically, feature selection is treated as a search problem to find the minimal number of bits that have value 1 [22, 23, 1] with which some criterion is satisfied. The most intuitive approaches are those of sequential forward and backward search (SFS and SBS) SFS starts with an empty set and adds the best unselected feature into the set one at a time. SBS begins with the full set and removes the least relevant ....

[Article contains additional citation context not shown here]

W. Siedlecki and J. Sklansky. On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence, 2:197--220, 1988.


Comparison Of Neural Networks And Statistical.. - Ganster, Röhrer..   (Correct)

....Nevertheless there exist methods to estimate their quality and techniques that reduce the complexity by removing redundant information in the feature set. Various suboptimal algorithms including sequential search procedures [2, 10] neural network techniques [1, 8] and genetic algorithms [14] have been suggested in order to increase the generalization ability and therefore the performance of the classification process. In this paper we are investigating three classification approaches: First we perform feature subset selection by sequential forward floating selection (SFFS [2] and ....

W. Siedlecki and J. Sklanski. On automatic feature selection. Int. Journal of Pattern Recognition and Artificial Intelligence, 2:197--220, 1988.


On Growing Better Decision Trees from Data - Murthy (1997)   (17 citations)  (Correct)

....Bayesian. 15 An exception is the optimal feature subset selection method using zero one integer programming, suggested by Ichino and Sklansky [217] 24 remove, at each step, the worst feature. When more than one feature is greedily added or removed, beam search is said to have been performed [445, 69]. A combination of forward selection and backward elimination, a bidirectional search, was attempted in [445] Comparisons of heuristic feature subset selection methods resound the conclusions of studies comparing feature evaluation criteria and studies comparing pruning methods no feature ....

....suggested by Ichino and Sklansky [217] 24 remove, at each step, the worst feature. When more than one feature is greedily added or removed, beam search is said to have been performed [445, 69] A combination of forward selection and backward elimination, a bidirectional search, was attempted in [445]. Comparisons of heuristic feature subset selection methods resound the conclusions of studies comparing feature evaluation criteria and studies comparing pruning methods no feature subset selection heuristic is far superior to the others. Cover et al. 94, 484] showed that heuristic ....

W. Siedlecki and J. Skalansky. On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence, 2(2):197--220, 1988.


Parcel: Feature Subset Selection in Variable Cost Domains - Scott, Niranjan, Prager (1998)   (18 citations)  (Correct)

.... the algorithms make assumptions that are not valid in this domain, as noted by John et al.[52] Recent comparative studies of these and more recent algorithms, applied to machine learning, include those carried out by Dash and Liu[26] Gordon and desJardins[42] Langley [63] Siedlecki and Sklansky[100], Jain and Zongker[50] and Kohavi and John[59] the last of which is referenced in detail in subsequent chapters of this report. In general, feature subset selection algorithms have two components: an evaluation function J( Delta) which scores candidate feature sets, and a search engine for ....

....do not, over arbitrary feature subsets drawn from the same original features, induce the same order of preference as that obtained by comparing the errors of the Bayes classifier. This evidence, coupled with the reasons detailed above, has lead many researchers, such as Siedlecki and Sklansky [100] to conclude: it seems that the only promising and legitimate way of evaluating features must be through the error rate of the classifier being designed. Consequently, increasing interest is falling upon feature selection algorithms that are classifier inclusive, i.e. that use the test ....

[Article contains additional citation context not shown here]

W. Siedlecki and J. Sklansky. On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence, 2(2):197--220, 1988.


Automated Image Analysis Techniques For Digital Mammography - Woods (1994)   (Correct)

....is to find an optimal subset of d features for a particular detection task given a full set of D features, where d D. Thus, a method of evaluating the goodness of a set of features is required. The misclassification error rate of the classifier being utilized is a good evaluation criterion [78, 87]. The only way to guarantee the selection of an optimal feature vector is an exhaustive search of all 49 possible subsets of features. The problem can be formulated as a search of a directed graph [87] The size of the power set (the set of all subsets) of D features is 2 D . Since we consider ....

....error rate of the classifier being utilized is a good evaluation criterion [78, 87] The only way to guarantee the selection of an optimal feature vector is an exhaustive search of all 49 possible subsets of features. The problem can be formulated as a search of a directed graph [87]. The size of the power set (the set of all subsets) of D features is 2 D . Since we consider 40 plus features at some point, the complexity of an exhaustive search would be unreasonable. As a result, a number of suboptimal search techniques are often utilized for feature selection. A complete ....

[Article contains additional citation context not shown here]

W. Siedlecki and J. Slansky, "On automatic feature selection," International Journal of Pattern Recognition and Artificial Intelligence, vol. 2, no. 2, pp. 197-- 220, 1988.


Feature Subset Selection as Search with Probabilistic Estimates - Kohavi (1994)   (17 citations)  (Correct)

....Figure 1: The wrapper model. The induction algorithm is used as a black box by the subset selection algorithm. r. Branch and bound algorithms were introduced by Narendra Fukunaga (1977) Finally, more recent papers attempt to use AI techniques, such as beam search and bidirectional search (Siedlecki Sklansky 1988), best first search (Xu, Yan, Chang 1989) and genetic algorithms (Vafai De Jong 1992) All the algorithms described above assume that the evaluation function is deterministic. When the evaluation function is a random variable, the search becomes more complicated. Greiner (1992) describes how ....

Siedlecki, W., and Sklansky, J. 1988. On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence 2(2):197--220.


Machine Learning via Polyhedral Concave Minimization - Mangasarian (1996)   (5 citations)  (Correct)

....Concave Minimization O. L. Mangasarian Mathematical Programming Technical Report 95 20 November 1995 Dedicated to Klaus Ritter on the Occasion of his Sixtieth Birthday Abstract Two fundamental problems of machine learning, misclassification minimization [10, 24, 18] and feature selection, [25, 29, 14] are formulated as the minimization of a concave function on a polyhedral set. Other formulations of these problems utilize linear programs with equilibrium constraints [18, 1, 4, 3] which are generally intractable. In contrast, for the proposed concave minimization formulation, a successive ....

W. Siedlecki and J. Sklansky. On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence, 2(2):197--220, 1988.


Wrappers For Performance Enhancement And Oblivious Decision Graphs - Kohavi (1995)   (43 citations)  (Correct)

....a monotonic measure and that selects one feature at a time can find the best feature subset of a desired size; even a 2 1 algorithm that adds the best pair and removes the worst single feature can fail. More recent papers attempt to use AI techniques, such as beam search and bidirectional search (Siedlecki Sklansky 1988), best first search (Xu, Yan Chang 1989) and genetic algorithms (Vafai De Jong 1992, Vafai De Jong 1993) All the algorithms described above assume that the evaluation function is deterministic. Langley (1994) reviewed feature subset selection methods in machine learning and contrasted the ....

Siedlecki, W. & Sklansky, J. (1988), "On automatic feature selection", International Journal of Pattern Recognition and Artificial Intelligence 2(2), 197--220.


Feature Selection and Non-linear Feature Extraction - Kumar   (Correct)

....out of which only a few are useful. Using a large number of features leads to the curse of dimensionality and it is hard to find a good classifier which can generalize well if all these features are used at the same time. Hence there is a problem of feature seclection (Devijver and Schnabel 1982; Siedlecki and Sklansky 1988; Fukunaga 1990; Hand 1981) where only a subset of relevent features are to be selected to do the classification. For example the height of a person is not a good feature for disease diasgnosis and so on. Hence, it is important to find a measure of usefulness of a feature for a given problem. No ....

Siedlecki, W., and Sklansky, J. (1988). On automatic feature selection.


Evolutionary Training Data Sets with n-dimensional Encoding.. - Mayer, Huber (1998)   (Correct)

....of Eigenfilters [3] The primary features SAR backscatter S and coherence C together with the derived features from local statistics make up a potentially high dimensional feature vector. The Curse of Dimensionality [4] and empirical results exploring feature subset selection techniques [5] suggest the construction of a feature vector of lower dimension for a given TDS size. To identify the most relevant features among the large number of texture features derived from Eigenimages the average Jeffreys Matusita distance (JMD) between classes is evaluated for all possible subsets of ....

Wojciech Siedlecki and Jack Sklansky. On Automatic Feature Selection. International Journal of Pattern Recognition and Artificial Intelligence, 2(2):197--220, 1988.


ERC - Evolutionary Resample and Combine for Adaptive Parallel.. - Huber, Mayer (1998)   (Correct)

....The primary features SAR backscatter and coherence together with the derived features from local statistics and principal components make up a potentially high dimensional feature vector. The Curse of Dimensionality [2] and empirical results exploring feature subset se lection techniques [13] suggest the construction of a feature vector of lower dimension. To identify the most relevant features among the large number of texture energy features derived from principal components images the average Jeffreys Matusita distance (JMD) is evaluated for all possible subsets of an initial ....

W. Siedlecki and J. Sklansky. On Automatic Feature Selection. International Journal of Pattern Recognition and Artificial Intelligence, 2(2):197--220, 1988.


Using Upper Bounds On Attainable Discrimination To .. - Lovell, Dance.. (1996)   (Correct)

....Features are removed from the working set in order of increasing importance to attainable accuracy. The problem of selecting an optimal subset of features is NP hard and the backwards selection algorithm described here is not the only practical method for searching the space of possible models [3]. While forwards selection of features would require significantly less computation, commencing the search with the saturated model allows us to readily identify and retain variables whose interaction provides good discrimination between classes. In the following subsections we report how this ....

W. Siedlecki and J. Sklansky, "On automatic feature selection," International Journal of Pattern Recognition and Artificial Intelligence, vol. 2, no. 2, pp. 197--220, 1988.


Artificial Intelligence and Intrusion Detection: Current and.. - Frank (1994)   (25 citations)  (Correct)

....features may be redundant since the information they add is contained in other features. Extra features can increase computation time, and can impact the accuracy of an IDS. Feature selection improves classification by searching for the subset of features which best classifies the training data [SiSk] . In the ID domain, features are derived from information sources used to detect intrusions, and training instances are derived from detected intrusion attempts as well as normal behavior. Thus, feature selection can be used to find features most indicative of misuse, or can be used to ....

....from information sources used to detect intrusions, and training instances are derived from detected intrusion attempts as well as normal behavior. Thus, feature selection can be used to find features most indicative of misuse, or can be used to distinguish between types of misuse. Do] and [SiSk] have performed comparisons of a variety of feature selection techniques, and [Do] tested several techniques on simulated computer attack data to explore the possibility of using feature selection to improve intrusion detection techniques. In section 5 we give an example of feature selection ....

W. Siedlecki, J. Sklansky. "On Automatic Feature Selection." International Journal of Artificial Intelligence, vol.2, no.2, 1988.


Learning Bayesian Networks Using Feature Selection - Provan, Singh (1995)   (17 citations)  (Correct)

....quite widespread within the last few years. In statistics, research on feature selection has focused primarily on selecting a subset of features within linear regression. Techniques developed include sequential backward selection [Marill63] branch bound [Narendra77, Xu89] and search algorithms [Siedlecki88]. A 1993 meeting of the Society of AI and Statistics was dedicated to papers on Selecting Models from Data [Cheeseman94] and contains a large number of papers on feature selection. This statistical approach to subset selection shares many principles with other statistical notions of information ....

Siedlecki, W. and J. Sklansky (1988). On automatic feature selection. Itnl. J. of Pattern Recognition and Artificial Intelligence, 2(2):197--220.


A Comparison of Induction Algorithms for Selective and.. - Singh, al. (1995)   (16 citations)  (Correct)

....the last few years. In statistics, research on feature selection has focused primarily on selecting a subset of features within linear regression. Techniques developed include sequential backward selection (Marill and Green, 1963) branch bound (Narendra and Fukunaga, 1977) and search algorithms (Siedlecki and Sklansky, 1988). Feature selection has received considerable attention in the last few years within the computational learning community, using both filter based and wrapper based approaches (John et al. 1994) A filter model filters out less relevant features using an algorithm different from the induction ....

Siedlecki, W. and Sklansky, J. (1988). On automatic feature selection. Int. Jornal of Pattern Recognition and Artificial Intelligence, 2(2):197--220.


Feature Selection via Mathematical Programming - Bradley, Mangasarian, Street (1997)   (12 citations)  (Correct)

....(OBD) method for reducing neural network complexity. One feature selection algorithm via concave minimization (FSV) reduced cross validation error on a cancer prognosis database by 35.4 while reducing problem features from 32 to 4. Feature selection is an important problem in machine learning [18, 15, 16, 17, 33]. In its basic form the problem consists of eliminating as many of the features in a given problem as possible, while still carrying out a preassigned task with acceptable accuracy. Having a minimal number of features often leads to better generalization and simpler models that can be more easily ....

W. SIEDLECKI and J. SKLANSKY, 1988. On Automatic Feature Selection, International Journal of Pattern Recognition and Artificial Intelligence 2, 197--220.


What Patient Information Allows Us to Make Accurate.. - Lovell Dance   (Correct)

.... outcome (2) Given specific items of information about a number of patients, how accurately does that information allow us to discriminate between patients who have adverse or benign outcomes From the standpoint of pattern recognition, the first question describes the problem of feature selection [1, 2]. Many feature selection techniques rely upon statistical measures of the between class separation provided by different feature sets [1, 3] One drawback with this is that, while certain features may provide a high degree of class separation, there is no guarantee they form a useful ....

W. Siedlecki and J. Sklansky, "On automatic feature selection," International Journal of Pattern Recognition and Artificial Intelligence, vol. 2, no. 2, pp. 197--220, 1988.


Feature Subset Selection Using A Genetic Algorithm - Yang, Honavar (1997)   (65 citations)  (Correct)

....approach on some benchmark classification problems as well as a document classification task. Section 1.6 concludes with summary and discussion of some directions for future research. 1.2 RELATED WORK A number of approaches to feature subset selection have been proposed in the literature. See [Siedlecki and Sklansky, 1988; Doak, 1992; Langley, 1994; Dash and Liu, 1997] for surveys) These approaches involve searching for an optimal subset of features based on some criteria of interest. Feature subset selection problem can be viewed as a special case of the feature weighting problem. It involves assigning a ....

Siedlecki, W. and Sklansky, J. (1988). On automatic feature selection. International Journal of Pattern Recognition, 2:197--220.


Induction of Selective Bayesian Network Classifiers - Singh, al. (1996)   (1 citation)  (Correct)

....area being focused primarily on selecting a subset of features within linear regression. Techniques developed include sequential backward selection (Marill Green, 1963) branch and bound (Narendra Fukunaga, 1977) best first (Xu et al. 1989) and beam search as well as bidirectional search (Siedlecki Sklansky, 1988). A recent meeting of the Society of AI and Statistics was dedicated to papers on Selecting Models from Data (Cheeseman Oldford, 1994) and contains numerous papers on feature selection. This statistical approach to subset selection shares many principles with other statistical notions of ....

Siedlecki, W. & Sklansky, J. (1988). On automatic feature selection. Intl. J. of Pattern Recognition and Artificial Intelligence, 2(2):197--220.


Wrappers for Feature Subset Selection - Kohavi, John (1997)   (329 citations)  (Correct)

....a monotonic measure and that selects one feature at a time can find the best feature subset of a desired size; even a 2 1 algorithm that adds the best pair and removes the worst single feature can fail. More recent papers attempt to use AI techniques, such as beam search and bidirectional search (Siedlecki Sklansky 1988), best first search (Xu, Yan Chang 1989) and genetic algorithms (Vafai De Jong 1992, Vafai De Jong 1993) All the algorithms described above use a deterministic evaluation function, although in some cases they can easily be extended to probabilistic estimates, such as cross validation that ....

Siedlecki, W. & Sklansky, J. (1988), "On automatic feature selection", International Journal of Pattern Recognition and Artificial Intelligence 2(2), pp. 197--220.


Evaluating Confidence Measures in a Neural Network .. - Sykacek.. (1997)   (2 citations)  (Correct)

....of features were further normalized. Following the suggestions of C. Bishop in [3] we subtracted the mean value and transformed the data to unit variance. B. Feature selection using search algorithms and quality measures In pattern recognition literature (e.g. W. Siedlecki and J. Sklansky in [21]) the problem of infeasible features is widely 1 Depending on the number of training samples, the complexity of the classifier must be limited. The complexity is given by the number of free parameters of the neural network. This determines an upper limit for the number of inputs. 3 discussed. ....

W. Siedlecki and J. Sklansky. On automatic feature selection. Int. Journal Pattern Recognition and Artificial Intelligence, 2:197--220, 1988.


Irrelevant Features and the Subset Selection Problem - John, Kohavi, Pfleger (1994)   (270 citations)  (Correct)

....Kittler generalized the different variants including forward methods, stepwise methods, and plus take away r. Branch and bound algorithms were introduced by Narendra Fukunaga (1977) Finally, more recent papers attempt to use AI techniques, such as beam search and bidirectional search (Siedlecki Sklansky 1988), best first search (Xu, Yan, Chang 1989) and genetic algorithms (Vafai De Jong 1992) Many measures have been suggested to evaluate the subset selection (as opposed to cross validation) such as adjusted mean squared error, adjusted multiple correlation coefficient, and the C p statistic ....

Siedlecki, W., and Sklansky, J. 1988. On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence 2(2):197--220.


Exploiting And Evolving R - Mathematical Morphology Feature   (Correct)

No context found.

W. Siedlecki, J. Sklansky. On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence, 2(2):197--20fi 1988.


Project CellNet: Evolving an Autonomous Pattern Recogniser - Kharma Kowaliw Clement   (Correct)

No context found.

Siedlicki, W., & Sklansky, J. (1988) "On Automatic Feature Selection", in International Journal of Pattern Recognition, 2:197-220.


Feature Selection and Classifier Ensembles: A Study on.. - Yu (2003)   (Correct)

No context found.

W. Siedlecki and J. Skansky. On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence, 2:197--220, 1988. 48


Object Detection Using Feature Subset Selection - Sun, Bebis, Miller   (Correct)

No context found.

W.Siedlecki and J.Sklansky, "On automatic feature selection," International Journal of Pattern Recognition and Artificial Intelligence, vol. 2, no. 2, pp. 197--220, 1988.


A Survey of Dimension Reduction Techniques - Fodor (2002)   (Correct)

No context found.

W. Siedlecki and J. Sklansky. On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence, 2(2):197--220, 1988.


Detection of Malignancy Associated Changes in Cervical Cells.. - Hallinan (1999)   (Correct)

No context found.

Siedlecki, W. & Sklansky, J. 1988. 'On automatic feature selection', International Journal of Pattern Recognition and Artificial Intelligence vol. 2, no. 2, p.197 -- 220.


On the use of Expected Attainable Discrimination.. - Lovell, Scott.. (1997)   (Correct)

No context found.

W. Siedlecki and J. Sklansky, "On automatic feature selection," International Journal of Pattern Recognition and Artificial Intelligence, vol. 2, no. 2, pp. 197--220, 1988.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC