Results 1 - 10
of
35
An introduction to kernel-based learning algorithms
- IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2001
"... This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and ..."
Abstract
-
Cited by 280 (46 self)
- Add to MetaCart
This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and
Classifying Single Trial EEG: Towards Brain Computer Interfacing
, 2002
"... Driven by the progress in the field of single-trial analysis of EEG, there is a growing interest in brain computer interfaces (BCIs), i.e., systems that enable human subjects to control a computer only by means of their brain signals. In a pseudo-online simulation our BCI detects upcoming finger mov ..."
Abstract
-
Cited by 63 (28 self)
- Add to MetaCart
Driven by the progress in the field of single-trial analysis of EEG, there is a growing interest in brain computer interfaces (BCIs), i.e., systems that enable human subjects to control a computer only by means of their brain signals. In a pseudo-online simulation our BCI detects upcoming finger movements in a natural keyboard typing condition and predicts their laterality.
Boosting Bit Rates and Error Detection for the Classification of Fast-paced Motor Commands Based on Single-trial EEG Analysis
, 2003
"... Brain-Computer-Interfaces (BCI) involve two coupled adapting systems: the human subject and the computer. In developing our BCI, our goal was to minimize the need for subject training and to impose the major learning load on the computer. To this end, we use behavioral paradigms that exploit single- ..."
Abstract
-
Cited by 43 (22 self)
- Add to MetaCart
Brain-Computer-Interfaces (BCI) involve two coupled adapting systems: the human subject and the computer. In developing our BCI, our goal was to minimize the need for subject training and to impose the major learning load on the computer. To this end, we use behavioral paradigms that exploit single-trial EEG potentials preceding voluntary nger movements. Here, we report recent results on the basic physiology of such pre-movement event-related potentials (ERP): 1) We predict the laterality of imminent left vs. right hand nger movements in a natural keyboard typing condition and demonstrate that a single-trial classi- cation based on the lateralized Bereitschaftspotential (BP) achieves good accuracies even at a pace as fast as 2 taps per second. Results for 4 out of 8 subjects reached a peak information transfer rate of more than 15 bits per minute (bpm); the 4 other subjects reached 6-10 bpm. 2) We detect cerebral error potentials from single false-response trials in a forced-choice task, reecting the subject's recognition of an erroneous response. Based on a specically tailored classi cation procedure that limits the rate of false positives at, e.g. 2 %, the algorithm manages to detect 85 % of error trials in 7/8 subjects. Thus, concatenating a primary single-trial BP-paradigm involving nger classication feedback with such secondary error detection could serve as an ecient on-line conrmation/correction tool for improvement of bit rates in a future BCI setting. As the present variant of the Berlin BCI (BBCI) is designed to achieve fast classications in normally behaving subjects, it opens a new perspective for assistance of action control in time-critical behavioral contexts; the potential transfer to paralysed patients will require further study.
Constructing Descriptive and Discriminative Nonlinear Features: Rayleigh Coefficients in Kernel Feature Spaces
, 2003
"... We incorporate prior knowledge to construct nonlinear algorithms for invariant feature extraction and discrimination. Employing a unified framework in terms of a nonlinearized variant of the Rayleigh coefficient, we propose nonlinear generalizations of Fisher's discriminant and oriented PCA using su ..."
Abstract
-
Cited by 30 (4 self)
- Add to MetaCart
We incorporate prior knowledge to construct nonlinear algorithms for invariant feature extraction and discrimination. Employing a unified framework in terms of a nonlinearized variant of the Rayleigh coefficient, we propose nonlinear generalizations of Fisher's discriminant and oriented PCA using support vector kernel functions. Extensive simulations show the utility of our approach.
Generalized Spectral Bounds for Sparse LDA
- IN INTERNATIONAL CONFERENCE ON MACHINE LEARNING. ICML’06
, 2006
"... We present a discrete spectral framework for the sparse or cardinality-constrained solution of a generalized Rayleigh quotient. This NP-hard combinatorial optimization problem is central to supervised learning tasks such as sparse LDA, feature selection and relevance ranking for classification. We ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
We present a discrete spectral framework for the sparse or cardinality-constrained solution of a generalized Rayleigh quotient. This NP-hard combinatorial optimization problem is central to supervised learning tasks such as sparse LDA, feature selection and relevance ranking for classification. We derive a new generalized form of the Inclusion Principle for variational eigenvalue bounds, leading to exact and optimal sparse linear discriminants using branch-and-bound search. An efficient greedy (approximate) technique is also presented. The generalization performance of our sparse LDA algorithms is demonstrated with real-world UCI ML benchmarks and compared to a leading SVM-based gene selection algorithm for cancer classification.
Optimal kernel selection in kernel Fisher discriminant analysis
- In Proceedings of the Twenty-Third International Conference on Machine Learning
, 2006
"... In Kernel Fisher discriminant analysis (KFDA), we carry out Fisher linear discriminant analysis in a high dimensional feature space defined implicitly by a kernel. The performance of KFDA depends on the choice of the kernel; in this paper, we consider the problem of finding the optimal kernel, over ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
In Kernel Fisher discriminant analysis (KFDA), we carry out Fisher linear discriminant analysis in a high dimensional feature space defined implicitly by a kernel. The performance of KFDA depends on the choice of the kernel; in this paper, we consider the problem of finding the optimal kernel, over a given convex set of kernels. We show that this optimal kernel selection problem can be reformulated as a tractable convex optimization problem which interior-point methods can solve globally and efficiently. The kernel selection method is demonstrated with some UCI machine learning benchmark examples. 1.
A Fast Iterative Algorithm for Fisher Discriminant using Heterogeneous Kernels
- IN PROCEEDINGS OF THE TWENTY-FIRST INTERNATIONAL CONFERENCE ON MACHINE LEARNING
, 2004
"... We propose a fast iterative classification algorithm for Kernel Fisher Discriminant (KFD) using heterogeneous kernel models. In contrast with the standard KFD that requires the user to predefine a kernel function, we incorporate the task of choosing an appropriate kernel into the optimization ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
We propose a fast iterative classification algorithm for Kernel Fisher Discriminant (KFD) using heterogeneous kernel models. In contrast with the standard KFD that requires the user to predefine a kernel function, we incorporate the task of choosing an appropriate kernel into the optimization problem to be solved. The choice of kernel is defined as a linear combination of kernels belonging to a potentially large family of di#erent positive semidefinite kernels. The complexity of our algorithm does not increase significantly with respect to the number of kernels on the kernel family. Experiments on several benchmark datasets demonstrate that generalization performance of the proposed algorithm is not significantly different from that achieved by the standard KFD in which the kernel parameters have been tuned using cross validation. We also
Bayesian framework for least squares support vector machine classifiers, Gaussian processes and kernel fisher discriminant analysis
- NEURAL COMPUTATION
, 2002
"... The Bayesian evidence framework has been successfully applied to the design of multilayer perceptrons (MLPs) in the work of MacKay. Nevertheless,the training of MLPs suffers from drawbacks like the non-convex optimization problem and the choice of the number of hidden units. In Support Vector Machin ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
The Bayesian evidence framework has been successfully applied to the design of multilayer perceptrons (MLPs) in the work of MacKay. Nevertheless,the training of MLPs suffers from drawbacks like the non-convex optimization problem and the choice of the number of hidden units. In Support Vector Machines (SVMs) for classification,as introduced by Vapnik,a nonlinear decision boundary is obtained by mapping the input vector first in a nonlinear way to a high dimensional kernel-induced feature space in which a linear large margin classifier is constructed. Practical expressions are formulated in the dual space in terms of the related kernel function and the solution follows from a (convex) quadratic programming (QP) problem. In Least Squares SVMs (LS-SVMs), the SVM problem formulation is modified by introducing a least squares cost function and equality instead of inequality constraints and the solution follows from a linear system in the dual space. Implicitly,the least squares formulation corresponds to a regression formulation and is also related to kernel
Efficient Kernel Discriminant Analysis via Spectral Regression
"... Linear Discriminant Analysis (LDA) has been a popular method for extracting features which preserve class separability. The projection vectors are commonly obtained by maximizing the between class covariance and simultaneously minimizing the within class covariance. LDA can be performed either in th ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Linear Discriminant Analysis (LDA) has been a popular method for extracting features which preserve class separability. The projection vectors are commonly obtained by maximizing the between class covariance and simultaneously minimizing the within class covariance. LDA can be performed either in the original input space or in the reproducing kernel Hilbert space (RKHS) into which data points are mapped, which leads to Kernel Discriminant Analysis (KDA). When the data are highly nonlinear distributed, KDA can achieve better performance than LDA. However, computing the projective functions in KDA involves eigen-decomposition of kernel matrix, which is very expensive when a large number of training samples exist. In this paper, we present a new algorithm for kernel discriminant analysis, called Spectral Regression Kernel Discriminant Analysis (SRKDA). By using spectral graph analysis, SRKDA casts discriminant analysis into a regression framework which facilitates both efficient computation and the use of regularization techniques. Specifically, SRKDA only needs to solve a set of regularized regression problems and there is no eigenvector computation involved, which is a huge save of computational cost. Moreover, the new formulation makes it very easy to develop incremental version of the algorithm which can fully utilize the computational results of the existing training samples. Extensive experiments on spoken letter, handwritten digit image and face image data demonstrate the effectiveness and efficiency of the proposed algorithm.
Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces
, 2000
"... We examine methods for constructing regression ensembles based on a linear program (LP). The ensemble regression function consists of linear combina- tions of base hypotheses generated by some boosting-type base learning algorithm. Unlike the classification case, for regression the set of possible h ..."
Abstract
-
Cited by 11 (7 self)
- Add to MetaCart
We examine methods for constructing regression ensembles based on a linear program (LP). The ensemble regression function consists of linear combina- tions of base hypotheses generated by some boosting-type base learning algorithm. Unlike the classification case, for regression the set of possible hypotheses producible by the base learning algorithm may be infinite. We explicitly tackle the issue of how to define and solve ensemble regression when the hypothesis space is infinite. Our approach is based on a semi-infinite linear program that has an infinite number of constraints and a finite number of variables. We show that the regression problem is well posed for infinite hypothesis spaces in both the primal and dual spaces. Most importantly, we prove there exists an optimal solution to the infinite hypothesisspace problem consisting of a finite number of hypothesis. We propose two algorithms for solving the infinite and finite hypothesis problems. One uses a column generation simplex-type algorithm and the other adopts an exponential barrier approach. Furthermore, we give sufficient conditions for the base learning algorithm and the hypothesis set to be used for infinite regression ensembles. Computational resultsshow that these methods are extremely promising.

