Results 1  10
of
45
Automatic Construction of Decision Trees from Data: A MultiDisciplinary Survey
 Data Mining and Knowledge Discovery
, 1997
"... Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial ne ..."
Abstract

Cited by 220 (1 self)
 Add to MetaCart
(Show Context)
Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial neural networks. Researchers in these disciplines, sometimes working on quite different problems, identified similar issues and heuristics for decision tree construction. This paper surveys existing work on decision tree construction, attempting to identify the important issues involved, directions the work has taken and the current state of the art. Keywords: classification, treestructured classifiers, data compaction 1. Introduction Advances in data collection methods, storage and processing technology are providing a unique challenge and opportunity for automated data exploration techniques. Enormous amounts of data are being collected daily from major scientific projects e.g., Human Genome...
RSVM: Reduced support vector machines
 Data Mining Institute, Computer Sciences Department, University of Wisconsin
, 2001
"... Abstract An algorithm is proposed which generates a nonlinear kernelbased separating surface that requires as little as 1 % of a large dataset for its explicit evaluation. To generate this nonlinear surface, the entire dataset is used as a constraint in an optimization problem with very few variabl ..."
Abstract

Cited by 160 (19 self)
 Add to MetaCart
Abstract An algorithm is proposed which generates a nonlinear kernelbased separating surface that requires as little as 1 % of a large dataset for its explicit evaluation. To generate this nonlinear surface, the entire dataset is used as a constraint in an optimization problem with very few variables corresponding to the 1%
Smoothing Methods for Convex Inequalities and Linear Complementarity Problems
 Mathematical Programming
, 1993
"... A smooth approximation p(x; ff) to the plus function: maxfx; 0g, is obtained by integrating the sigmoid function 1=(1 + e \Gammaffx ), commonly used in neural networks. By means of this approximation, linear and convex inequalities are converted into smooth, convex unconstrained minimization probl ..."
Abstract

Cited by 73 (6 self)
 Add to MetaCart
(Show Context)
A smooth approximation p(x; ff) to the plus function: maxfx; 0g, is obtained by integrating the sigmoid function 1=(1 + e \Gammaffx ), commonly used in neural networks. By means of this approximation, linear and convex inequalities are converted into smooth, convex unconstrained minimization problems, the solution of which approximates the solution of the original problem to a high degree of accuracy for ff sufficiently large. In the special case when a Slater constraint qualification is satisfied, an exact solution can be obtained for finite ff. Speedup over MINOS 5.4 was as high as 515 times for linear inequalities of size 1000 \Theta 1000, and 580 times for convex inequalities with 400 variables. Linear complementarity problems are converted into a system of smooth nonlinear equations and are solved by a quadratically convergent Newton method. For monotone LCP's with as many as 400 variables, the proposed approach was as much as 85 times faster than Lemke's method. Key Words: Smo...
A New Class Of Incremental Gradient Methods For Least Squares Problems
 SIAM J. Optim
, 1996
"... The LMS method for linear least squares problems di#ers from the steepest descent method in that it processes data blocks onebyone, with intermediate adjustment of the parameter vector under optimization. This mode of operation often leads to faster convergence when far from the eventual limit, an ..."
Abstract

Cited by 66 (2 self)
 Add to MetaCart
(Show Context)
The LMS method for linear least squares problems di#ers from the steepest descent method in that it processes data blocks onebyone, with intermediate adjustment of the parameter vector under optimization. This mode of operation often leads to faster convergence when far from the eventual limit, and to slower (sublinear) convergence when close to the optimal solution. We embed both LMS and steepest descent, as well as other intermediate methods, within a oneparameter class of algorithms, and we propose a hybrid class of methods that combine the faster early convergence rate of LMS with the faster ultimate linear convergence rate of steepest descent. These methods are wellsuited for neural network training problems with large data sets. Furthermore, these methods allow the e#ective use of scaling based for example on diagonal or other approximations of the Hessian matrix. 1 Research supported by NSF under Grant 9300494DMI. 2 Dept. of Electrical Engineering and Computer Science, M...
Mathematical Programming for Data Mining: Formulations and Challenges
 INFORMS Journal on Computing
, 1998
"... This paper is intended to serve as an overview of a rapidly emerging research and applications area. In addition to providing a general overview, motivating the importance of data mining problems within the area of knowledge discovery in databases, our aim is to list some of the pressing research ch ..."
Abstract

Cited by 61 (0 self)
 Add to MetaCart
(Show Context)
This paper is intended to serve as an overview of a rapidly emerging research and applications area. In addition to providing a general overview, motivating the importance of data mining problems within the area of knowledge discovery in databases, our aim is to list some of the pressing research challenges, and outline opportunities for contributions by the optimization research communities. Towards these goals, we include formulations of the basic categories of data mining methods as optimization problems. We also provide examples of successful mathematical programming approaches to some data mining problems. keywords: data analysis, data mining, mathematical programming methods, challenges for massive data sets, classification, clustering, prediction, optimization. To appear: INFORMS: Journal of Compting, special issue on Data Mining, A. Basu and B. Golden (guest editors). Also appears as Mathematical Programming Technical Report 9801, Computer Sciences Department, University of Wi...
A convergent incremental gradient method with constant step size
 SIAM J. OPTIM
, 2004
"... An incremental gradient method for minimizing a sum of continuously differentiable functions is presented. The method requires a single gradient evaluation per iteration and uses a constant step size. For the case that the gradient is bounded and Lipschitz continuous, we show that the method visits ..."
Abstract

Cited by 60 (2 self)
 Add to MetaCart
(Show Context)
An incremental gradient method for minimizing a sum of continuously differentiable functions is presented. The method requires a single gradient evaluation per iteration and uses a constant step size. For the case that the gradient is bounded and Lipschitz continuous, we show that the method visits regions in which the gradient is small infinitely often. Under certain unimodality assumptions, global convergence is established. In the quadratic case, a global linear rate of convergence is shown. The method is applied to distributed optimization problems arising in wireless sensor networks, and numerical experiments compare the new method with the standard incremental gradient method.
Nuclear Feature Extraction For Breast Tumor Diagnosis
, 1993
"... Interactive image processing techniques, along with a linearprogrammingbased inductive classifier, have been used to create a highly accurate system for diagnosis of breast tumors. A small fraction of a fine needle aspirate slide is selected and digitized. With an interactive interface, the user i ..."
Abstract

Cited by 55 (7 self)
 Add to MetaCart
Interactive image processing techniques, along with a linearprogrammingbased inductive classifier, have been used to create a highly accurate system for diagnosis of breast tumors. A small fraction of a fine needle aspirate slide is selected and digitized. With an interactive interface, the user initializes active contour models, known as snakes, near the boundaries of a set of cell nuclei. The customized snakes are deformed to the exact shape of the nuclei. This allows for precise, automated analysis of nuclear size, shape and texture. Ten such features are computed for each nucleus, and the mean value, largest (or "worst") value and standard error of each feature are found over the range of isolated cells. After 569 images were analyzed in this fashion, different combinations of features were tested to find those which best separate benign from malignant samples. Tenfold crossvalidation accuracy of 97% was achieved using a single separating plane on three of the thirty features: ...
SSVM: A Smooth Support Vector Machine for Classification
 Computational Optimization and Applications
, 1999
"... Smoothing methods, extensively used for solving important mathematical programming problems and applications, are applied here to generate and solve an unconstrained smooth reformulation of the support vector machine for pattern classification using a completely arbitrary kernel. We term such re ..."
Abstract

Cited by 53 (4 self)
 Add to MetaCart
(Show Context)
Smoothing methods, extensively used for solving important mathematical programming problems and applications, are applied here to generate and solve an unconstrained smooth reformulation of the support vector machine for pattern classification using a completely arbitrary kernel. We term such reformulation a smooth support vec tor machine (SSVM). A fast NewtonArmijo algorithm for solving the SSVM converges globally and quadratically. Numerical results and comparisons are given to demonstrate the effectiveness and speed of the algorithm. On six publicly available datasets, tenfold cross validation correctness of SSVM was the highest compared with four other methods as well as the fastest. On larger problems, SSVM was compa rable or faster than SVM light [17], SOR [23] and SMO [27]. SSVM can also generate a highly nonlinear separating surface such as a checker board.
Approximation Accuracy, Gradient Methods, and Error Bound for Structured Convex Optimization
, 2009
"... Convex optimization problems arising in applications, possibly as approximations of intractable problems, are often structured and large scale. When the data are noisy, it is of interest to bound the solution error relative to the (unknown) solution of the original noiseless problem. Related to this ..."
Abstract

Cited by 38 (1 self)
 Add to MetaCart
(Show Context)
Convex optimization problems arising in applications, possibly as approximations of intractable problems, are often structured and large scale. When the data are noisy, it is of interest to bound the solution error relative to the (unknown) solution of the original noiseless problem. Related to this is an error bound for the linear convergence analysis of firstorder gradient methods for solving these problems. Example applications include compressed sensing, variable selection in regression, TVregularized image denoising, and sensor network localization.