Results 1  10
of
51
Variable Kernel Density Estimation
 Annals of Statistics
, 1992
"... In this paper, we propose a method for robust kernel density estimation. We interpret a KDE with Gaussian kernel as the inner product between a mapped test point and the centroid of mapped training points in kernel feature space. Our robust KDE replaces the centroid with a robust estimate based on M ..."
Abstract

Cited by 108 (4 self)
 Add to MetaCart
In this paper, we propose a method for robust kernel density estimation. We interpret a KDE with Gaussian kernel as the inner product between a mapped test point and the centroid of mapped training points in kernel feature space. Our robust KDE replaces the centroid with a robust estimate based on Mestimation [1]. The iteratively reweighted least squares (IRWLS) algorithm for Mestimation depends only on inner products, and can therefore be implemented using the kernel trick. We prove the IRWLS method monotonically decreases its objective value at every iteration for a broad class of robust loss functions. Our proposed method is applied to synthetic data and network traffic volumes, and the results compare favorably to the standard KDE. Index Terms — kernel density estimation, Mestimator, outlier, kernel feature space, kernel trick 1.
Probability Density Estimation from Optimally Condensed Data Samples
 IEEE Trans. Pattern Analysis and Machine Intelligence
, 2003
"... Abstract—The requirement to reduce the computational cost of evaluating a point probability density estimate when employing a Parzen window estimator is a wellknown problem. This paper presents the Reduced Set Density Estimator that provides a kernelbased density estimator which employs a small per ..."
Abstract

Cited by 55 (0 self)
 Add to MetaCart
(Show Context)
Abstract—The requirement to reduce the computational cost of evaluating a point probability density estimate when employing a Parzen window estimator is a wellknown problem. This paper presents the Reduced Set Density Estimator that provides a kernelbased density estimator which employs a small percentage of the available data sample and is optimal in the L2 sense. While only requiring OðN 2 Þ optimization routines to estimate the required kernel weighting coefficients, the proposed method provides similar levels of performance accuracy and sparseness of representation as Support Vector Machine density estimation, which requires OðN 3 Þ optimization routines, and which has previously been shown to consistently outperform Gaussian Mixture Models. It is also demonstrated that the proposed density estimator consistently provides superior density estimates for similar levels of data reduction to that provided by the recently proposed DensityBased Multiscale Data Condensation algorithm and, in addition, has comparable computational scaling. The additional advantage of the proposed method is that no extra free parameters are introduced such as regularization, bin width, or condensation ratios, making this method a very simple and straightforward approach to providing a reduced set density estimator with comparable accuracy to that of the full sample Parzen density estimator. Index Terms—Kernel density estimation, Parzen window, data condensation, sparse representation. 1
Modefinding for mixtures of Gaussian distributions
 Dept. of Computer Science, University of Sheffield
, 1999
"... I consider the problem of finding all the modes of a mixture of multivariate Gaussian distributions, which has applications in clustering and regression. I derive exact formulas for the gradient and Hessian and give a partial proof that the number of modes cannot be more than the number of component ..."
Abstract

Cited by 50 (8 self)
 Add to MetaCart
(Show Context)
I consider the problem of finding all the modes of a mixture of multivariate Gaussian distributions, which has applications in clustering and regression. I derive exact formulas for the gradient and Hessian and give a partial proof that the number of modes cannot be more than the number of components, and are contained in the convex hull of the component centroids. Then, I develop two exhaustive mode search algorithms: one based on combined quadratic maximisation and gradient ascent and the other one based on a fixedpoint iterative scheme. Appropriate values for the search control parameters are derived by taking into account theoretical results regarding the bounds for the gradient and Hessian of the mixture. The significance of the modes is quantified locally (for each mode) by error bars, or confidence intervals (estimated using the values of the Hessian at each mode); and globally by the sparseness of the mixture, measured by its differential entropy (estimated through bounds). I conclude with some reflections about bumpfinding.
Continuous latent variable models for dimensionality reduction and sequential data reconstruction
, 2001
"... ..."
Sparse kernel density construction using orthogonal forward regression with leaveoneout test score and local regularization
 IEEE Trans. Systems, Man and Cybernetics, Part B
, 2004
"... An automatic algorithm is derived for constructing kernel density estimates based on a regression approach that directly optimizes generalization capability. Computational efficiency of the density construction is ensured using an orthogonal forward regression, and the algorithm incrementally minimi ..."
Abstract

Cited by 16 (7 self)
 Add to MetaCart
(Show Context)
An automatic algorithm is derived for constructing kernel density estimates based on a regression approach that directly optimizes generalization capability. Computational efficiency of the density construction is ensured using an orthogonal forward regression, and the algorithm incrementally minimizes the leaveoneout test score. Local regularization is incorporated into the density construction process to further enforce sparsity. Examples are included to demonstrate the ability of the proposed algorithm to effectively construct a very sparse kernel density estimate with comparable accuracy to that of the full sample Parzen window density estimate. I.
DISTRIBUTED PARTICLE FILTER FOR TARGET TRACKING IN SENSOR NETWORKS
"... Abstract—In this paper, we present a distributed particle filter (DPF) for target tracking in a sensor network. The proposed DPF consists of two major steps. First, particle compression based on support vector machine is performed to reduce the cost of transmission among sensors. Second, each sensor ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
(Show Context)
Abstract—In this paper, we present a distributed particle filter (DPF) for target tracking in a sensor network. The proposed DPF consists of two major steps. First, particle compression based on support vector machine is performed to reduce the cost of transmission among sensors. Second, each sensor fuses the compressed information from its neighboring nodes with use of consensus or gossip algorithm to estimate the target track. Computer simulations are included to verify the effectiveness of the proposed approach. 1.
Distributed Data Fusion Using Support Vector Machines
"... Abstract The basic quantity to be estimated in the Bayesian approach to data fusion is the conditional probability density function (CPDF). In recent times, computationally efficient particle filtering approaches are gaining growing importance in estimating these CPDF. In this approach, i.i.d sampl ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
Abstract The basic quantity to be estimated in the Bayesian approach to data fusion is the conditional probability density function (CPDF). In recent times, computationally efficient particle filtering approaches are gaining growing importance in estimating these CPDF. In this approach, i.i.d samples are used to represent the conditional probability densities. However, their application in data fusion is severely limited due to the fact that the information is stored in the form of a large set of samples. In all practical data fusion systems that have limited communication bandwidth, broadcasting this probabilistic information, available as a set of samples, to the fusion center is impractical. Support vector machines, through statistical learning theory, provide a way of compressing information by generating optimal kernal based representations. In this paper we use SVM to compress the probabilistic information available in the form of i.i.d samples and apply it to solve the Bayesian data fusion problem. We demonstrate this technique on a multisensor tracking example.
Particle swarm optimization aided orthogonal forward regression for unified data modelling
 IEEE TRANS. EVOLUTION. COMPUT
, 2010
"... We propose a unified data modeling approach that is equally applicable to supervised regression and classification applications, as well as to unsupervised probability density function estimation. A particle swarm optimization (PSO) aided orthogonal forward regression (OFR) algorithm based on leave ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
We propose a unified data modeling approach that is equally applicable to supervised regression and classification applications, as well as to unsupervised probability density function estimation. A particle swarm optimization (PSO) aided orthogonal forward regression (OFR) algorithm based on leaveoneout (LOO) criteria is developed to construct parsimonious radial basis function (RBF) networks with tunable nodes. Each stage of the construction process determines the center vector and diagonal covariance matrix of one RBF node by minimizing the LOO statistics. For regression applications, the LOO criterion is chosen to be the LOO mean square error, while the LOO misclassification rate is adopted in twoclass classification applications. By adopting the Parzen window estimate as the desired response, the unsupervised density estimation problem is transformed into a constrained regression problem. This PSO aided OFR algorithm for tunablenode RBF networks is capable of constructing very parsimonious RBF models that generalize well, and our analysis and experimental results demonstrate that the algorithm is computationally even simpler than the efficient regularization assisted orthogonal least square algorithm based on LOO criteria for selecting fixednode RBF models. Another significant advantage of the proposed learning procedure is that it does not have learning hyperparameters that have to be tuned using costly cross validation. The effectiveness of the proposed PSO aided OFR construction procedure is illustrated using several examples taken from regression and classification, as well as density estimation applications.
Inverse density as an inverse problem: The fredholm equation approach (Technical Report 1304.5575). arXiv
, 2013
"... In this paper we address the problem of estimating the ratio q p where p is a density function and q is another density, or, more generally an arbitrary function. Knowing or approximating this ratio is needed in various problems of inference and integration, in particular, when one needs to average ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
In this paper we address the problem of estimating the ratio q p where p is a density function and q is another density, or, more generally an arbitrary function. Knowing or approximating this ratio is needed in various problems of inference and integration, in particular, when one needs to average a function with respect to one probability distribution, given a sample from another. It is often referred as importance sampling in statistical inference and is also closely related to the problem of covariate shift in transfer learning as well as to various MCMC methods. It may also be useful for separating the underlying geometry of a space, say a manifold, from the density function defined on it. Our approach is based on reformulating the problem of estimating q p as an inverse problem in terms of an integral operator corresponding to a kernel, and thus reducing it to an integral equation, known as the Fredholm problem of the first kind. This formulation, combined with the techniques of regularization and kernel methods, leads to a principled kernelbased framework for constructing algorithms and for analyzing them theoretically. The resulting family of algorithms (FIRE, for Fredholm Inverse Regularized Estimator) is flexible, simple and easy to implement. We provide detailed theoretical analysis including concentration bounds and convergence rates for the Gaussian kernel in the case of densities defined on Rd, compact domains in Rd and smooth ddimensional submanifolds of the Euclidean space. We also show experimental results including applications to classification and semisupervised learning within the covariate shift framework and demonstrate some encouraging experimental comparisons. We also show how the parameters of our algorithms can be chosen in a completely unsupervised manner. 1