by Prasanth B. Nair, Andy J. Keane, Carla Brodley, Andrea Danyluk
Journal of Machine Learning Research
http://www.jmlr.org/papers/volume3/nair02a/nair02a.ps.gz
Add To MetaCart
Abstract:
We present some greedy learning algorithms for building sparse nonlinear regression and classification models from observational data using Mercer kernels. Our objective is to develop efficient numerical schemes for reducing the training and runtime complexities of kernel-based algorithms applied to large datasets. In the spirit of Natarajan's greedy algorithm (Natarajan, 1995), we iteratively minimize the L 2 loss function subject to a specified constraint on the degree of sparsity required of the final model until a specified stopping criterion is reached. We discuss various greedy criteria for basis selection and numerical schemes for improving the robustness and computational efficiency. Subsequently, algorithms based on residual minimization and thin QR factorization are presented for constructing sparse regression and classification models. During the course of the incremental model construction, the algorithms are terminated using model selection principles such as the minimum descriptive length (MDL) and Akaike's information criterion (AIC). Finally, experimental results on benchmark data are presented to demonstrate the competitiveness of the algorithms developed in this paper.
Citations
|
1004
|
Experiments with a new boosting algorithm
– Schapire
- 1996
|
|
718
|
Pattern recognition and neural networks
– Ripley
- 1996
|
|
543
|
Additive logistic regression: a statistical view of boosting
– Friedman, Hastie, et al.
|
|
441
|
Atomic decomposition by basis pursuit
– Chen, Donoho, et al.
- 1999
|
|
204
|
Sparse bayesian learning and the relevance vector machine
– Tipping
|
|
176
|
Greedy function approximation: A gradient boosting machine
– Friedman
|
|
172
|
New Support Vector Algorithms
– Schölkopf, Smola, et al.
|
|
155
|
Regularization networks and support vector machines
– Evgeniou, Pontil, et al.
- 2000
|
|
149
|
An equivalence between sparse approximation and SupportVector Machines
– Girosi
- 1998
|
|
132
|
T.: Parallel preconditioning with sparse approximate inverses
– Grote, Huckle
- 1997
|
|
123
|
Seeger M.: Using the Nyström Method to Speed Up Kernel Machines
– Williams
- 2001
|
|
94
|
Sparse greedy matrix approximation for machine learning
– Smola, Schökopf
- 2000
|
|
83
|
Reorthogonalization and stable algorithms for updating the Gram-Schmidt QR factorization
– Daniel, Gragg, et al.
- 1976
|
|
75
|
Matching Pursuit in a Time-Frequency Dictionary
– Mallat, Zhang
- 1993
|
|
71
|
Approximate inverse preconditioners via sparse-sparse iterations
– Chow
- 1998
|
|
63
|
The Nature of Statistical Learning
– Vapnik
- 1996
|
|
59
|
Sparse approximate solutions to linear systems
– Natarajan
- 1995
|
|
54
|
Sparse greedy gaussian process regression
– Smola, Bartlett
- 2001
|
|
37
|
Approximate inverse techniques for block-partitioned matrices
– CHOW, SAAD
- 1997
|
|
31
|
On selecting models for nonlinear time series
– Judd, Mees
- 1995
|
|
22
|
Further analysis of the data by Akaike’s information criterion and the finite corrections. Commun. Stat. Theory Methods A7:13–26
– Sugiura
- 1978
|
|
17
|
Adaptive greedy techniques for approximate solution of large RBF systems
– Schaback, Wendland
|
|
14
|
On the optimality of the Backward Greedy Algorithm for the subset selection problem
– Couvreur, Bresler
- 2000
|
|
13
|
Orthogonal least squares learning for radial basis function networks
– Chen, Cowan, et al.
- 1991
|
|
12
|
Comparison of basis selection methods
– Adler, Rao, et al.
- 1996
|
|
5
|
Boosting with the L 2 loss: regression and classification
– Buhlmann, Yu
- 2001
|
|
5
|
Local regularization assisted orthogonal least squares regression
– Chen
- 2001
|
|
5
|
Algorithm 686: FORTRAN Subroutines for Updating the QR Decomposition
– Reichel, Gragg
- 1990
|
|
2
|
On learning functions from noise-free and noisy examples via Occam's razor
– Natarajan
- 1999
|
|
1
|
Model selection and the principle of minimum description length
– H
- 2001
|
|
1
|
Boosting with the L2 loss: regression and classification. Research Report No. 98, Seminar für Statistik
– Bühlmann, Yu
- 2001
|