Results 1  10
of
10
Estimating the Support of a HighDimensional Distribution
, 1999
"... Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We propo ..."
Abstract

Cited by 766 (29 self)
 Add to MetaCart
Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We propose a method to approach this problem by trying to estimate a function f which is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a preliminary theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabelled d...
Dimensionality Reduction for Density Ratio Estimation in Highdimensional Spaces
 NEURAL NETWORKS, VOL.23, NO.1, PP.44–59
, 2010
"... The ratio of two probability density functions is becoming a quantity of interest these days in the machine learning and data mining communities since it can be used for various data processing tasks such as nonstationarity adaptation, outlier detection, and feature selection. Recently, several met ..."
Abstract

Cited by 23 (16 self)
 Add to MetaCart
The ratio of two probability density functions is becoming a quantity of interest these days in the machine learning and data mining communities since it can be used for various data processing tasks such as nonstationarity adaptation, outlier detection, and feature selection. Recently, several methods have been developed for directly estimating the density ratio without going through density estimation and were shown to work well in various practical problems. However, these methods still perform rather poorly when the dimensionality of the data domain is high. In this paper, we propose to incorporate a dimensionality reduction scheme into a densityratio estimation procedure and experimentally show that the estimation accuracy in highdimensional cases can be improved.
Geodesic Gaussian kernels for value function approximation
, 2007
"... The leastsquares policy iteration approach works efficiently in value function approximation, given appropriate basis functions. Because of its smoothness, the Gaussian kernel is a popular and useful choice as a basis function. However, it does not allow for discontinuity which typically arises in ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
The leastsquares policy iteration approach works efficiently in value function approximation, given appropriate basis functions. Because of its smoothness, the Gaussian kernel is a popular and useful choice as a basis function. However, it does not allow for discontinuity which typically arises in realworld reinforcement learning tasks. In this paper, we propose a new basis function based on geodesic Gaussian kernels, which exploits the nonlinear manifold structure induced by the Markov decision processes. The usefulness of the proposed method is successfully demonstrated in simulated robot arm control and Khepera robot navigation.
Kernel LeastSquares Temporal Difference Learning Kernel LeastSquares Temporal Difference Learning
"... Kernel methods have attracted many research interests recently since by utilizing Mercer kernels, nonlinear and nonparametric versions of conventional supervised or unsupervised learning algorithms can be implemented and usually better generalization abilities can be obtained. However, kernel meth ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
Kernel methods have attracted many research interests recently since by utilizing Mercer kernels, nonlinear and nonparametric versions of conventional supervised or unsupervised learning algorithms can be implemented and usually better generalization abilities can be obtained. However, kernel methods in reinforcement learning have not been popularly studied in the literature. In this paper, we present a novel kernelbased leastsquares temporaldifference (TD) learning algorithm called KLSTD(λ), which can be viewed as the kernel version or nonlinear form of the previous linear LSTD(λ) algorithms. By introducing kernelbased nonlinear mapping, the KLSTD(λ) algorithm is superior to conventional linear TD(λ) algorithms in value function prediction or policy evaluation problems with nonlinear value functions. Furthermore, in KLSTD(λ), the eligibility traces in kernelbased TD learning are derived to make use of data more efficiently, which is different from the recent work on Gaussian Processes in reinforcement learning. Experimental results on a typical valuefunction learning prediction problem of a Markov chain demonstrate the
A Regularized Framework for Feature Selection in Face Detection and Authentication
"... Abstract This paper proposes a general framework for selecting features in the computer vision domain—i.e., learning descriptions from data—where the prior knowledge related to the application is confined in the early stages. The main building block is a regularization algorithm based on a penalty t ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Abstract This paper proposes a general framework for selecting features in the computer vision domain—i.e., learning descriptions from data—where the prior knowledge related to the application is confined in the early stages. The main building block is a regularization algorithm based on a penalty term enforcing sparsity. The overall strategy we propose is also effective for training sets of limited size and reaches competitive performances with respect to the stateoftheart. To show the versatility of the proposed strategy we apply it to both face detection and authentication, implementing two modules of a monitoring system working in real time in our lab. Aside from the choices of the feature dictionary and the training data, which require prior knowledge on the problem, the proposed method is fully automatic. The very good results obtained in different applications speak for the generality and the robustness of the framework.
DOI 10.1007/s1126300700917 Efficient Learning of Relational Object Class Models
"... Abstract We present an efficient method for learning partbased object class models from unsegmented images represented as sets of salient features. A model includes parts’ appearance, as well as location and scale relations between parts. The object class is generatively modeled using a simple Bayes ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We present an efficient method for learning partbased object class models from unsegmented images represented as sets of salient features. A model includes parts’ appearance, as well as location and scale relations between parts. The object class is generatively modeled using a simple Bayesian network with a central hidden node containing location and scale information, and nodes describing object parts. The model’s parameters, however, are optimized to reduce a loss function of the training error, as in discriminative methods. We show how boosting techniques can be extended to optimize the relational model proposed, with complexity linear in the number of parts and the number of features per image. This efficiency allows our method to learn relational models with many parts and features. The method has an advantage over purely generative and purely discriminative approaches for learning from sets of salient features, since generative method often use a small number of parts and features, while discriminative methods tend to ignore geometrical relations between parts. Experimental results are described, using some benchmark data sets and three sets of newly collected data, showing the relative merits of our method in recognition and localization tasks.
A Genetic Algorithmbased Multiclass Support Vector Machine for Mongolian Character Recognition
, 2007
"... Abstract. This paper proposes a hybrid genetic algorithm and support vector machine (GASVM) approach to address the Mongolian character recognition problem. As the character recognition problem can be considered as a multiclass classification problem, we devise a DAGSVM classifier. DAGSVM uses t ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. This paper proposes a hybrid genetic algorithm and support vector machine (GASVM) approach to address the Mongolian character recognition problem. As the character recognition problem can be considered as a multiclass classification problem, we devise a DAGSVM classifier. DAGSVM uses the OneAgainstOne technique to combine multiple binary SVM classifiers. The GA is used to select the multiclass SVM model parameters. Empirical results demonstrate that the GASVM approach is able to achieve good accuracy rate.
7 61
, 2005
"... Abstract We present an efficient method for learning partbased object class models from unsegmented images represented as sets of salient features. A model includes parts’ appearance, as well as location and scale relations between parts. The object class is generatively modeled using a simple Bayes ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We present an efficient method for learning partbased object class models from unsegmented images represented as sets of salient features. A model includes parts’ appearance, as well as location and scale relations between parts. The object class is generatively modeled using a simple Bayesian network with a central hidden node containing location and scale information, and nodes describing object parts. The model’s parameters, however, are optimized to reduce a loss function of the training error, as in discriminative methods. We show how boosting techniques can be extended to optimize the relational model proposed, with complexity linear in the number of parts and the number of features per image. This efficiency allows our method to learn relational models with many parts and features. The method has an advantage over purely generative and purely discriminative approaches for learning from sets of salient features, since generative method often use a small number of parts and features, while discriminative methods tend to ignore geometrical relations between parts. Experimental results are described, using some benchmark data sets and three sets of newly collected data, showing the relative merits of our method in recognition and localization tasks. 1
Anomaly Detection for Communication Network Monitoring Applications
"... Functioning mobile telecommunication networks are taken for granted in presentday society. The network operator’s objective is to optimise the network’s capabilities in order to provide fluent connections for subscribers. Network management is based on the huge amounts of data that are recorded fro ..."
Abstract
 Add to MetaCart
(Show Context)
Functioning mobile telecommunication networks are taken for granted in presentday society. The network operator’s objective is to optimise the network’s capabilities in order to provide fluent connections for subscribers. Network management is based on the huge amounts of data that are recorded from all parts of the network. The data is used to monitor performance, to detect problems and also to provide novel knowledge to be used in future planning. Anomalous events in the network provide a valuable source of information for network management. This thesis presents an interpretation of anomalies and the basic theory of how to detect them when the probability distribution is known. However, since in real life applications the probability distribution is not known, the main focus is on methods that are based on distances. This thesis proposes procedures for anomaly detection and for summarising the information obtained about the anomalies. The procedures utilise clustering in both the anomaly detection and the further analysis of the anomalies. Scaling of variables affects the distances and the results of clustering. Therefore, methods to incorporate ex