37 citations found. Retrieving documents...
K P Bennett and O L Mangasarian. Neural network training via linear programming. Technical report, Center for Parallel Optimization, University of Wisconsin, 1990.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

The Representation of Natural Language to Enable Neural Networks.. - Lyon (1994)   (1 citation)  (Correct)

....typically change the output. In order to illustrate the difference between these two types of problems, consider the following 2 cases. Perceptual tasks An example of the perceptual type of neural network task is provided by an application successfully used for a medical classification task [79, 80, 81, 1968,1990,1992]. The problem is to distinguish sets of 9 clinical measurements associated with different medical conditions. A stylised version of the problem, reduced to 2 dimensions so that it can be illustrated in diagrammatic form, is shown in Figure 3.5 (derived from Mangasarian [81] X O O O O O O ....

K P Bennett and O L Mangasarian. Neural network training via linear programming. Technical report, Center for Parallel Optimization, University of Wisconsin, 1990.


A Comparative Study of Fuzzy Classifiers on Breast Cancer Data - Jain, Abraham (2004)   (Correct)

....was obtained from repository of machine learning database University of California, Irvine. This data set has 32 attributes (30 real valued input features) and 569 instances of which 357 benign and 212 malignant class. There are several studies based on this database. Bennet and Mangasarian [19] used linear programming techniques, obtaining a 99.6 classification rate on 487 cases (the reduced database available at the time) However, diagnostic decisions are essentially black boxes, with no explanation as to how they were attained. In the first approach, a single fuzzy if then rule was ....

Bennett, K.,P., and Mangasarian, O.L., Neural network training via linear programming. In P. M. Pardalos, editor, Advances in Optimization and Parallel Computing, pages 56-57. Elsevier Science, 1992


Coevolutionary Fuzzy Modeling - Peņa-Reyes (2002)   (Correct)

....case v 1 v 2 v 3 v 9 diagnostic . 683 488 1 malignant Note that the diagnostics do not provide any information about the degree of benignity or malignancy. There are several studies based on this database. Bennet and Mangasarian [10] used linear programming techniques, obtaining a 99.6 classification rate on 487 cases (the reduced database available at the time) However, their solution exhibits little understandability, i.e. diagnostic decisions are essentially black boxes, with no explanation as to how they were attained. ....

K. P. Bennett and O. L. Mangasarian. Neural network training via linear programming. In P. M. Pardalos, editor, Advances in Optimization and Parallel Computing, pages 56-- 57. Elsevier Science, 1992.


Constructive Neural Network Learning Algorithms for.. - Parekh, Yang, Honavar (2000)   (14 citations)  (Correct)

....implements a kind of nearest neighbor classification scheme. Each hidden neuron is an exemplar representing a group of patterns that belong to the same class and are close to each other in terms of some suitably chosen distance metric. The minimizing resources method [43] the multisurface method [4], and the Voronoi diagram approach [5] are based on the idea of partitioning the input space by constructing linear hyperplanes. Hidden layer neurons are trained to partition the input space into homogeneous regions where each region contains patterns belonging to a single output class. The output ....

K. Bennett and O. Mangasarian, "Neural-network training via linear programming, " Dept. Comput. Sci., Univ. Wisconsin, Madison, Tech. Rep. 948, 1990.


Combining Evolutionary and Fuzzy Techniques in Medical.. - Peņa-Reyes, Sipper   (Correct)

....5 1 1 1 benign 2 5 4 4 1 benign . 683 4 8 8 1 malignant Note that the diagnostics do not provide any information about the degree of benignity or malignancy. There are several studies based on this database. Bennet and Mangasarian [3] used linear programming techniques, obtaining a 99.6 classification rate on 487 cases (the reduced database available at the time) However, their solution exhibits little understandability, i.e. diagnostic decisions are essentially black boxes, with no explanation as to how they were attained. ....

K. P. Bennett and O. L. Mangasarian. Neural network training via linear programming. In P. M. Pardalos, editor, Advances in Optimization and Parallel Computing, pages 56--57. Elsevier Science, 1992.


Cancer Diagnosis And Prognosis Via Linear-Programming-Based.. - Street (1994)   (5 citations)  (Correct)

....cytological features [57, 58, 92] The MSM procedure uses a linear programming model to place successive pairs of separating planes in the feature space of the input examples, building a piecewise linear separating surface. The procedure can also be considered a neural network training algorithm [9]. While successful [93] the diagnostic results still depended on the ability to subjectively assign values to input features, and were therefore difficult to replicate. Further, the classifier employed was relatively complex, employing four pairs of planes in 9 space. Subsequent work by ....

K. P. Bennett and O. L. Mangasarian. Neural network training via linear programming. In P. M. Pardalos, editor, Advances in Optimization and Parallel Computing, pages 56--67. North--Holland, Amsterdam, 1992.


Mathematical Programming Approaches To Machine Learning And Data.. - Bradley (1998)   (1 citation)  (Correct)

.... sensitive to outliers such as those occurring when the underlying data distributions have pronounced tails, hence (6) has a similar effect to that of robust regression [78] 75, pp 82 87] The formulation (6) is equivalent to the following robust linear programming formulation (RLP) proposed in [7] and effectively used to solve problems from real world domains [109] min w;fl;y;z ae e 0 y m e 0 z k fi fi fi fi Gamma Aw efl e y; Bw Gamma efl e z; y 0; z 0 oe : 7) 19 Non symmetric misclassification costs can easily be handled by the RLP formulation (7) by ....

.... true system as a testing set [75] The linear systems used are based upon ideas related to signal processing [69, 155] and more specifically to an example in [2, Equation (8) We consider the following true signal g(t) 0; 1] Gamma R: g(t) 3 X j=1 x j Gammaa j t ; t 2 [0; 1] a = [0 4 7] 0 ; x = 0:5 2:0 Gamma 1:5] 0 : 87) We assume that the true signal g(t) cannot be sampled precisely, but that the following observed signal can be sampled: 119 g(t) g(t) error) sampled at times : t i = i 4 t; 4t = 0:04; i = 0; 1; 25: 88) We further assume that we do not ....

K. P. Bennett and O. L. Mangasarian. Neural network training via linear programming. In P. M. Pardalos, editor, Advances in Optimization and Parallel Computing, pages 56--67, Amsterdam, 1992. North Holland. 146


A Fuzzy-Genetic Approach to Breast Cancer Diagnosis - Pena-Reyes, Sipper (1999)   (Correct)

....2 # 683 # 1 5 5 # 4 # 2 1 4 # 8 # 3 1 4 # 8 # 9 1 1 # 1 Diagnostic Benign Benign # Malignant Note that the diagnostics do not provide any information about the degree of benignity or malignancy. There are several studies based on this database. Bennet and Mangasarian [3] used linear programming techniques, obtaining a 99.6 classification rate on 487 cases (the reduced database available at the time) However, their solution exhibits little understandability, i.e. diagnostic decisions are essentially black boxes, with no explanation as to how they were attained. ....

Bennett KP, Mangasarian OL. Neural network training via linear programming. In: Pardalos PM, editor. Advances in Optimization and Parallel Computing. Elsevier, 1992:56 -- 7.


A Fuzzy-Genetic Approach to Breast Cancer Diagnosis - Pena-Reyes, Sipper (1998)   (Correct)

....1 benign . 683 4 8 8 Delta Delta Delta 1 malignant Note that the diagnostics do not provide any information about the degree of benignity or malignancy. There are several studies based on this database. Bennet and Mangasarian [2] used linear programming techniques, obtaining a 99.6 classification rate on 487 cases (the reduced database available at the time) However, their solution exhibits little understandability, i.e. diagnostic decisions are essentially black boxes, with no explanation as to how they were attained. ....

K. P. Bennett and O. L. Mangasarian. Neural network training via linear programming. In P. M. Pardalos, editor, Advances in Optimization and Parallel Computing, pages 56--57. Elsevier Science, 1992.


Designing Breast Cancer Diagnostic Systems via a Hybrid.. - Peņa-Reyes, Sipper (1999)   (Correct)

....is an outpatient procedure that involves using a small gauge needle to extract fluid directly from a breast mass. 2 Note that the diagnostics do not provide any information about the degree of benignity or malignancy. There are several studies based on this database. Bennet and Mangasarian [5] used linear programming techniques, obtaining a 99.6 classification rate on 487 cases (the reduced database available at the time) However, their solution exhibits little understandability, i.e. diagnostic decisions are essentially black boxes, with no explanation as to how they were attained. ....

K. P. Bennett and O. L. Mangasarian, "Neural network training via linear programming," in Advances in Optimization and Parallel Computing, P. M. Pardalos, Ed., pp. 56--57. Elsevier Science, 1992.


Evolving a Fuzzy System for Breast Cancer Diagnosis - Pena-Reyes (1997)   (Correct)

....degree of benignity or malignancy. There are several studies based on this database, usually with data divided into two sets: training and test. The training set is used for system synthesis (i.e. finding good parameters) and the test set is used for verification purposes. Bennet and Mangasarian [19] used linear programming techniques, obtaining 100 classification on the training set and 98.3 on the test set. However, their solution exhibits little understandability, i.e. diagnostic decisions are essentially black boxes, with no explanation as to how they were attained. Kermani et al. 20] ....

K.P. Bennett and O.L. Mangasarian, "Neural network training via linear programming," in Advances in Optimization and Parallel Computing, P.M. Pardalos, Ed., pp. 56--57. Elsevier Science, 1992.


Evolving Fuzzy Rules for Breast Cancer Diagnosis - Peņa-Reyes, Sipper (1998)   (2 citations)  (Correct)

....degree of benignity or malignancy. There are several studies based on this database, usually with data divided into two sets: training and test. The training set is used for system synthesis (i.e. finding good parameters) and the test set is used for verification purposes. Bennet and Mangasarian [10] used linear programming techniques, obtaining 100 classification on the training set and 98.3 on the test set. However, their solution exhibits little understandability, i.e. diagnostic decisions are essentially black boxes, with no explanation as to how they were attained. Kermani et al. 11] ....

....1 (Be nign) or 2 (Malignant) Table 2 delineates the parameters encoding, which together comprise one individual s genome. Table 2 Parameters encoding of an individual s genome. Total genome length is 64 18Nr , where Nr denotes the number of rules. Parameter Values Bits Qty Total bits P [1 10] 4 9 36 d [0 7] 3 9 27 A [0 3] 2 9N r 18N r C (1,2) 1 1 1 To evolve the fuzzy inference system, we used a simple genetic algorithm, with a fixed population size of 200 individuals, no generational overlap, and fitnessproportionate selection. As for the fitness function of the genetic ....

K.P. Bennett and O.L. Mangasarian, "Neural network training via linear programming," in Advances in Optimization and Parallel Computing, P.M. Pardalos, Ed., pp. 56--57. Elsevier Science, 1992.


On the Solution of the Parity Problem By a Single Hidden Layer.. - Setiono   (Correct)

.... architecture, it has been previously thought that N hidden units are required to solve the N bit parity problem [9] Many experimentations with neural network construction and training algorithms were tested with the assumption that indeed N hidden units are necessary for solving the problem [1, 2, 5, 8]. A neural network construction algorithm that employs the quasi Newton method for minimizing the error function was recently proposed by Setiono and Hui [10] This algorithm successfully built networks having less than N hidden units that were capable to solve the N bit parity problem for small ....

K.P. Bennett and O.L. Mangasarian, Neural network training via linear programming. in: P.M. Pardalos, ed., Advances in Optimization and Parallel Computing (Elsevier Science Publishers B.V., Amsterdam, 1992) 56--67.


Feature Transformation and Multivariate Decision Tree Induction - Liu, Setiono   (Correct)

.... hyperplane (a set of features is needed) It is known that selecting the best set of features for test at a node is an NP complete problem [9] In order to find an optimal hyperplane for a multivariate test, researchers offered various suggestions such as Regression [2] Linear Programming [1, 3], Randomization [15] and Simulated Annealing [9] In the following, we examine the conventional procedure of MDT building with reference to UDT building. 2.1 Common aspects of the MDT building Many issues in building MDT s are the same as those for UDT s. Both types of trees are built from ....

K.P. Bennett and O.L. Mangasarian. Neural network training via linear programming. In P.M. Pardalos, editor, Advances in Optimization and Parallel Computing, pages 56--67. Elsevier Science Publishers B.V., Amsterdam, 1992.


Extracting Rules From Pruned Neural Networks for Breast Cancer.. - Setiono (1996)   (15 citations)  (Correct)

.... problem The database for the Wisconsin Breast Cancer Diagnosis is available publicly via anonymous ftp from the University of California Irvine repository [12] This data set has been used as the test data for several studies on pattern classification methods using linear programming techniques [3, 9, 20] and statistical techniques [21] Each pattern in the data set has nine attributes. The nine measurements taken from fine needle aspirates from human breast tissues correspond to cytological characteristics of benign or of malignant sample. These are A 1 . clump thickness, A 2 . uniformity of ....

K.P. Bennett and O.L. Mangasarian, Neural network training via linear programming, in: P.M. Pardalos ed., Advances in Optimization and Parallel Computing, (Elsevier Science Publishers B.V., Amsterdam, 1990) 56-67.


NeuroLinear: From neural networks to oblique decision rules - Setiono, Liu (1997)   (3 citations)  (Correct)

....We describe next in detail how the rules are extracted by NeuroLinear on two datasets. A. Detailed analysis 1: The University of Wisconsin Breast Cancer Dataset. This data set has been used as the test data for several studies on pattern classification methods using linear programming techniques [1, 13] and statistical techniques [23] Each pattern is described by nine attributes. The nine measurements taken from fine needle aspirates from human breast tissues correspond to cytological characteristics of a benign or of a malignant pattern. They are summarized in Table 8. Table 8 The 9 ....

....an addition of 1 input unit to represent hidden unit bias values, the number of input units was 26. A network with 4 hidden units was trained and pruned with a minimum accuracy on the training set of 85 . This figure for minimum accuracy was based on reported experimental results on this dataset [1]. The pruned network had only 2 hidden units and 13 connections left. Seven inputs I 1 ; I 5 ; I 7 ; I 9 ; I 13 ; I 14 and I 16 were connected to the first hidden unit. Inputs I 3 ; I 15 ; I 18 and I 22 were connected to the second hidden unit. The accuracy rates of this network are 88.38 on the ....

K.P. Bennett and O.L. Mangasarian, Neural network training via linear programming, in: P.M. Pardalos, ed., Advances in Optimization and Parallel Computing, (Elsevier Science Publishers B.V., Amsterdam, 1992) 56--67.


Generating Concise and Accurate Classification Rules for Breast.. - Setioni   (Correct)

....as needed to achieve the minimum required accuracy. In this paper, as in our previous work, we adopt the first approach. The number of hidden units needed for classifying the samples of the WBCD data set using a single hidden layer feedforward neural networks is as few as three and as many as nine [4, 14, 19]. Network connections link units in the input layer to units in the hidden layer and units in the hidden layer to those in the output layer. There is no direct connections between units in the input layer and units in the output layer. Given this network structure, it is natural to decompose the ....

K.P. Bennett and O.L. Mangasarian, Neural network training via linear programming, in: P.M. Pardalos ed., Advances in Optimization and Parallel Computing, (Elsevier Science Publishers B.V., Amsterdam, 1990) 56--67.


Neural Network Constructive Algorithms: Trading Generalization.. - Smieja (1991)   (10 citations)  (Correct)

....the input vector space, using the Minimal Resources method of Ruj an and Marchand. Both approaches basically decide on the separation hyperplanes to be used for this training set, and then transfer the results simply to a network architecture. 3. 1 The Multisurface Method Bennett and Mangasarian [3] approach the problem of neural network mapping from the following angle: a set of vectors exists (the training set) and we are required to classify them into one of two groups. It may (in general) not be possible to separate the vectors with a single hyperplane (linear separability) but by ....

....one, by MR may allow it to find solutions faster than MSM, but for small problems. However, it may not be so easy on scaling the systems, since MR relies on a search procedure through an exponentially (with N I ) increasing number of planes. The scaling dependence of MSM was not given in [3]. One notices too that the MSM representations, in the hidden layer of the neural network finally constructed, are of similar form to the interim Tiling representations. Given any layer in the Tiling construction, observe the number of representations: The master unit corresponds to a hyperplane ....

K.P. Bennett and O.L. Mangasarian. Neural network training via linear programming. Technical Report 948, Computer sciences department, University of Wisconsin-Madison, July 1990.


Multicategory Classification by Support Vector Machines - Bredensteiner, Bennett (1999)   (2 citations)  Self-citation (Bennett)   (Correct)

.... e # 0. 4) In general it is not always possible for a single linear function to completely separate two given sets of points. Thus, it is important to find the linear function that discriminates best between the two sets according to some error minimization criterion. Bennett and Mangasarian [4] minimize the average magnitude of the misclassification errors in the construction of their following robust linear programming problem (RLP) min w,#,y,z 1 #1 e T y 1 #2 e T z subject to y A 1 w #e e # 0 z A 2 w #e e # 0 y # 0, z # 0 (5) where # 1 0 ....

K. P. Bennett and O. L. Mangasarian. Neural network training via linear programming. In P. M. Pardalos, editor, Advances in Optimization and Parallel Computing, pages 56--67, Amsterdam, 1992. North Holland.


Feature Minimization within Decision Trees - Bredensteiner, Bennett (1996)   (12 citations)  Self-citation (Bennett)   (Correct)

....functions. In this paper, we apply the feature minimization method to two error functions. The first error function minimizes the average magnitude of misclassified points within each class. The underlying problem without feature minimization is a linear program. This robust linear program (RLP) [4] has been used for decision tree construction [1] RLP combined with the greedy sequential backward elimination method for feature minimization, a simplified version of SBE, forms the basis of a breast cancer diagnosis system [28, 27] The second error function is a slight modification of the ....

....be applied 4 to algorithms that minimize the number of points misclassified such as [2, 18] or to other successful linear programming approaches [12, 25] but we leave these extensions for future work. 2. 1 Feature Minimization Applied to RLP The following robust linear programming problem, RLP [4], minimizes a weighted average of the sum of the distances from the misclassified points to the separating plane. min w,#,u,v 1 m eu 1 k ev subject to u Aw e# e # 0 v Bw e# e # 0 u # 0, v # 0 (4) We are interested in minimizing the number of features at ....

K. P. Bennett and O. L. Mangasarian. Neural network training via linear programming. In P. M. Pardalos, editor, Advances in Optimization and Parallel Computing, pages 56-- 67, Amsterdam, 1992. North Holland. 24


Feature Minimization within Decision Trees - Bredensteiner, Bennett (1995)   (12 citations)  Self-citation (Bennett)   (Correct)

....minimization problem can be applied to many different error functions. In this paper, we use an error function that minimizes the average magnitude of misclassified points within each class. The underlying problem without feature minimization is a linear program. This robust linear program (RLP) [3] has been used for decision tree construction [1] RLP combined with the greedy sequential backward elimination method for feature minimization forms the basis of a breast cancer diagnosis system [17, 16] Our feature minimization method could also be applied to algorithms that minimize the number ....

....feature minimization method could also be applied to algorithms that minimize the number of points misclassified such as [2, 11] or to other successful linear programming approaches [10, 15] but we leave these extensions for future work. 3 The following robust linear programming problem, RLP [3], minimizes a weighted average of the sum of the distances from the misclassified points to the separating plane. min w;fl;u;v 1 m eu 1 k ev subject to u Aw Gamma efl Gamma e 0 v Gamma Bw efl Gamma e 0 u 0; v 0 (4) We are interested in minimizing the number of features at ....

K. P. Bennett and O. L. Mangasarian. Neural network training via linear programming. In P. M. Pardalos, editor, Advances in Optimization and Parallel Computing, pages 56--67, Amsterdam, 1992. North Holland. 14


Serial and Parallel Multicategory Discrimination - Bennett, Mangasarian (1994)   (3 citations)  Self-citation (Bennett Mangasarian)   (Correct)

....solvable piecewise quadratic minimization (6) has a zero minimum, in which case any solution (w i ; fl i ) i = 1; k, provides a piecewise linear separation as characterized in Definition 2.1. As was the case for linear and piecewise linear separation of two sets by linear programming [3, 4], it is important to determine when the useless null solution occurs for sets A i ; i = 1; k A 1 A 2 A 2 A 3 A 4 A 4 A 4 First piecewise linear separation Second piecewise linear separation Third piecewise linear separation Figure 2: Geometric depiction of decision tree ....

K. P. Bennett and O. L. Mangasarian. Neural network training via linear programming. In P. M. Pardalos, editor, Advances in Optimization and Parallel Computing, pages 56--67, Amsterdam, 1992. North Holland.


Mathematical Programming in Neural Networks - Mangasarian (1993)   (21 citations)  Self-citation (Mangasarian)   (Correct)

.... however, it should be noted that even before Minsky Papert proposed their classical XOR counterexample, a linear programming based piecewise linear separator was proposed in 1968 [28] that could easily and correctly handle this problem, and which in fact can be represented as a neural network [7]. See Figure 14 and discussion following Algorithm 2.2 below. We shall now use this example to motivate a general multisurface method (MSM) for separating the sets A and B of the XOR example or any other disjoint sets A and B in R n and will present a neural network representation of this ....

.... A i 1 : fA i j A i w i i 1 g; B i 1 : fB i j B i w i i 0 g (c) Increment tree level: i 1 i go to (b) We remark that the separation achieved by the MSM algorithm and depicted in Figure 12 and 13 can also be represented as a single hidden layer neural network [7] that is depicted in Figure 14, which we interpret now. The first pair of hidden LTU s with their incoming arcs represent the planes w 1 x = 1 1 and wx 1 = 1 0 : Because 1 1 1 0 ; only one of them can be activated by a specific x: Hence when LTU1 fires, LTU2 does not, and since ....

[Article contains additional citation context not shown here]

K. P. Bennett and O. L. Mangasarian. Neural network training via linear programming. In P. M. Pardalos, editor, Advances in Optimization and Parallel Computing, pages 56--67, Amsterdam, 1992. North Holland.


Mathematical Programming in Data Mining - Mangasarian (1996)   (18 citations)  Self-citation (Mangasarian)   (Correct)

.... subject to constraints, is a broad discipline that has been applied to a great variety of theoretical and applied problems such as operations research [29, 54] network problems [60, 53] game theory and economics [71, 35] engineering mechanics [57, 37] and more recently to machine learning [3, 61, 23, 48, 46]. In this paper we describe three recent mathematical programming based developments that are relevant to data mining: feature selection [45, 10] clustering [11] and robust representation [67] We note at the outset that we do not plan to survey either the fields of data mining or mathematical ....

K. P. Bennett and O. L. Mangasarian. Neural network training via linear programming. In P. M. Pardalos, editor, Advances in Optimization and Parallel Computing, pages 56--67, Amsterdam, 1992. North Holland.


A Parametric Optimization Method for Machine Learning - Bennett, Bredensteiner (1995)   (6 citations)  Self-citation (Bennett)   (Correct)

....Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180. Email bennek rpi.edu, bredee rpi.edu. This material is based on research supported by National Science Foundation Grant 949427. can be such that it minimizes the distances of the misclassified points from the separating plane [BM92] In misclassification minimization the problem is to minimize the number of misclassified points. For a given problem, different error functions may result in better (or worse) separators in terms of generalization. In Figure 1, the plane obtained by minimizing the distances of the misclassified ....

....Thus we propose a hybrid approach (parametric misclassification minimization) that identifies the outliers and minimizes the distances of the remaining misclassified points. This parametric approach includes the linear program that minimizes the average misclassification error as a subproblem [BM92] This research investigates mathematical programming methods for constructing decisions in decision trees. Classical decision tree algorithms such as CART [BFOS84] and ID3 [Qui84] use exhaustive search to find decisions based on a single input attribute. When the decisions are multivariate ....

[Article contains additional citation context not shown here]

K. P. Bennett and O. L. Mangasarian. Neural network training via linear programming. In P. M. Pardalos, editor, Advances in Optimization and Parallel Computing, pages 56-- 67, Amsterdam, 1992. North Holland.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC