Results 1 - 10
of
14
Optimal kernel selection in kernel Fisher discriminant analysis
- In Proceedings of the Twenty-Third International Conference on Machine Learning
, 2006
"... In Kernel Fisher discriminant analysis (KFDA), we carry out Fisher linear discriminant analysis in a high dimensional feature space defined implicitly by a kernel. The performance of KFDA depends on the choice of the kernel; in this paper, we consider the problem of finding the optimal kernel, over ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
In Kernel Fisher discriminant analysis (KFDA), we carry out Fisher linear discriminant analysis in a high dimensional feature space defined implicitly by a kernel. The performance of KFDA depends on the choice of the kernel; in this paper, we consider the problem of finding the optimal kernel, over a given convex set of kernels. We show that this optimal kernel selection problem can be reformulated as a tractable convex optimization problem which interior-point methods can solve globally and efficiently. The kernel selection method is demonstrated with some UCI machine learning benchmark examples. 1.
Domain Adaptation via Transfer Component Analysis
"... Domain adaptation solves a learning problem in a target domain by utilizing the training data in a different but related source domain. Intuitively, discovering a good feature representation across domains is crucial. In this paper, we propose to find such a representation through a new learning met ..."
Abstract
-
Cited by 13 (8 self)
- Add to MetaCart
Domain adaptation solves a learning problem in a target domain by utilizing the training data in a different but related source domain. Intuitively, discovering a good feature representation across domains is crucial. In this paper, we propose to find such a representation through a new learning method, transfer component analysis (TCA), for domain adaptation. TCA tries to learn some transfer components across domains in a Reproducing Kernel Hilbert Space (RKHS) using Maximum Mean Discrepancy (MMD). In the subspace spanned by these transfer components, data distributions in different domains are close to each other. As a result, with the new representations in this subspace, we can apply standard machine learning methods to train classifiers or regression models in the source domain for use in the target domain. The main contribution of our work is that we propose a novel feature representation in which to perform domain adaptation via a new parametric kernel using feature extraction methods, which can dramatically minimize the distance between domain distributions by projecting data onto the learned transfer components. Furthermore, our approach can handle large datsets and naturally lead to out-of-sample generalization. The effectiveness and efficiency of our approach in are verified by experiments on two real-world applications: cross-domain indoor WiFi localization and cross-domain text classification. 1
Ensemble-based discriminant learning with boosting for face recognition
- IEEE Transactions on Neural Networks
, 2006
"... In this paper, we propose a novel ensemble-based approach to boost performance of traditional Linear Discriminant Analysis (LDA)-based methods used in face recognition. The ensemble-based approach is based on the recently emerged technique known as “boosting”. However, it is generally believed that ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
In this paper, we propose a novel ensemble-based approach to boost performance of traditional Linear Discriminant Analysis (LDA)-based methods used in face recognition. The ensemble-based approach is based on the recently emerged technique known as “boosting”. However, it is generally believed that boosting-like learning rules are not suited to a strong and stable learner such as LDA. To break the limitation, a novel weakness analysis theory is developed here. The theory attempts to boost a strong learner by increasing the diversity between the classifiers created by the learner, at the expense of decreasing their margins, so as to achieve a trade-off suggested by recent boosting studies for a low generalization error. In addition, a novel distribution accounting for the pairwise class dis-criminant information is introduced for effective interaction between the booster and the LDA-based learner. The integration of all these methodologies proposed here leads to the novel ensemble-based discriminant learning approach, capable of taking advantage of both the boosting and LDA techniques. Promising experimental results obtained on various difficult face recognition scenarios demonstrate the effectiveness of the proposed approach. We believe that this work is especially beneficial in extending the boosting framework to accommodate general (strong/weak) learners.
Gradient-based Optimization of Kernel-Target Alignment for Sequence Kernels Applied to Bacterial Gene Start Detection
- IEEE/ACM TRANS. COMPUT. BIOL. BIOINFORMATICS
"... Biological data mining using kernel methods can be improved by a task-specific choice of the kernel function. Oligo kernels for genomic sequence analysis have proven to have a high discriminative power and to provide interpretable results. Oligo kernels that consider subsequences of different length ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
Biological data mining using kernel methods can be improved by a task-specific choice of the kernel function. Oligo kernels for genomic sequence analysis have proven to have a high discriminative power and to provide interpretable results. Oligo kernels that consider subsequences of different lengths can be combined and param-eterized to increase their flexibility. For adapting these parameters efficiently, gradient-based optimization of the kernel-target alignment is proposed. The power of this new, general model selection procedure and the benefits of fitting kernels to problem classes are demonstrated by adapting oligo kernels for bacterial gene start detection.
Noise-based feature perturbation as a selection method for microarray data
, 2007
"... Abstract. DNA microarrays can monitor the expression levels of thousands of genes simultaneously, providing the opportunity for the identification of genes that are differentially expressed across different conditions. Microarray datasets are generally limited to a small number of samples with a lar ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract. DNA microarrays can monitor the expression levels of thousands of genes simultaneously, providing the opportunity for the identification of genes that are differentially expressed across different conditions. Microarray datasets are generally limited to a small number of samples with a large number of gene expressions, therefore feature selection becomes a very important aspect of the microarray classification problem. In this paper, a new feature selection method, feature perturbation by adding noise, is proposed to improve the performance of classification. The experimental results on a benchmark colon cancer dataset indicate that the proposed method can result in more accurate class predictions using a smaller set of features when compared to the SVM-RFE feature selection method. Key words: feature perturbation, microarray gene expression data, gene selection, classification 1
Design of multimodal dissimilarity spaces for retrieval of video documents
- IEEE TRANSACTION ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2007
"... ..."
Support vector echo-state machine for chaotic time-series prediction
- IEEE Transactions on Neural Networks
"... Abstract: A novel chaotic time series prediction method based on support vector machines and echo state mechanisms is proposed. The basic idea is replacing “kernel trick ” with “reservoir trick ” in dealing with nonlinearity, that is, performing linear support vector regression in the high dimension ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract: A novel chaotic time series prediction method based on support vector machines and echo state mechanisms is proposed. The basic idea is replacing “kernel trick ” with “reservoir trick ” in dealing with nonlinearity, that is, performing linear support vector regression in the high dimension “reservoir ” state space, and the solution benefits from the advantages from structural risk minimization principle, and we call it SVESMs (Support Vector Echo State Machines). SVESMs belong to a special kind of recurrent neural networks with convex objective function, and its solution is global optimal and unique. SVESMs are especially efficient in dealing with real life nonlinear time series, and its generalization ability and robustness are obtained by regularization operator and robust loss function. The method is tested on the benchmark prediction problem of Mackey-Glass time series and applied to some real life time series such as monthly sunspots time series and runoff time series of the Yellow River, and the prediction results are promising.
Bayes Optimal Kernel Discriminant Analysis
"... Kernel methods provide an efficient mechanism to derive nonlinear algorithms. In classification problems as well as in feature extraction, kernel-based approaches map the originally nonlinearly separable data into a space of intrinsically much higher dimensionality where the data is linearly separab ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Kernel methods provide an efficient mechanism to derive nonlinear algorithms. In classification problems as well as in feature extraction, kernel-based approaches map the originally nonlinearly separable data into a space of intrinsically much higher dimensionality where the data is linearly separable and can be readily classified with existing and efficient linear methods. For a given kernel function, the main challenge is to determine the parameters of the kernel which map the original nonlinear problem to a linear one. This paper derives a Bayes optimal criterion for the selection of the kernel parameters in discriminant analysis. Our criterion selects the kernel parameters that maximize the (Bayes) classification accuracy in the kernel space. We also show how we can use the same criterion to do subclass selection in the kernel space for problems with multimodal class distributions. Extensive experimental evaluation demonstrates the superiority of the proposed criterion over the state of the art. 1.
Computer Laboratory Learning compound noun semantics
, 2008
"... Technical reports published by the University of Cambridge ..."

