Results 1 - 10
of
12
Outlier Detection with the Kernelized Spatial Depth Function
, 2008
"... Statistical depth functions provide from the “deepest ” point a “center-outward ordering” of multidimensional data. In this sense, depth functions can measure the “extremeness” or “outlyingness” of a data point with respect to a given data set. Hence they can detect outliers – observations that appe ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
(Show Context)
Statistical depth functions provide from the “deepest ” point a “center-outward ordering” of multidimensional data. In this sense, depth functions can measure the “extremeness” or “outlyingness” of a data point with respect to a given data set. Hence they can detect outliers – observations that appear extreme relative to the rest of the observations. Of the various statistical depths, the spatial depth is especially appealing because of its computational efficiency and mathematical tractability. In this article, we propose a novel statistical depth, the kernelized spatial depth (KSD), which generalizes the spatial depth via positive definite kernels. By choosing a proper kernel, the KSD can capture the local structure of a data set while the spatial depth fails. We demonstrate this by the half-moon data and the ring-shaped data. Based on the KSD, we propose a novel outlier detection algorithm, by which an observation with a depth value less than a threshold is declared as an outlier. The proposed algorithm is simple in structure: the threshold is the only one parameter for a given kernel. It applies to a one-class learning setting, in which “normal ” observations are given as the training data, as well as to a missing label scenario where the training set consists of a mixture of normal observations and outliers with unknown labels. We give upper bounds on the false alarm probability of a depth-based detector. These upper bounds can be used to determine the threshold. We perform extensive experiments on synthetic data and data sets from real applications. The proposed outlier detector is compared with existing methods. The KSD outlier detector demonstrates competitive performance.
Nonparametric assessment of contamination in multivariate data using minimum-volume sets and FDR
, 2007
"... Large, multivariate datasets from high-throughput instrumentation have become ubiquitous throughout the sciences. Frequently, it is of great interest to characterize the measurements in these datasets by the extent to which they represent ‘nominal ’ versus ‘contaminated ’ instances. However, often t ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
(Show Context)
Large, multivariate datasets from high-throughput instrumentation have become ubiquitous throughout the sciences. Frequently, it is of great interest to characterize the measurements in these datasets by the extent to which they represent ‘nominal ’ versus ‘contaminated ’ instances. However, often the nature of even the nominal patterns in the data are unknown and potentially quite complex, making their explicit parametric modeling a daunting task. In this paper, we introduce a nonparametric method for the simultaneous annotation of multivariate data (called MN-SCAnn), by which one may produce an annotated ranking of the observations, indicating the relative extent to which each may or may not be considered nominal, while making minimal assumptions on the nature of the nominal distribution. In our framework each observation is linked to a corresponding minimum volume set and, implicitly adopting a hypothesis testing perspective, each set is associated with a test, which in turn is accompanied by a certain false discovery rate. The combination of minimum volume set methods with false discovery rate principles, in the context of contaminated data, is new. Moreover, estimation of the key underlying quantities requires that a number of issues be addressed. We illustrate MN-SCAnn through examples in two contexts – the pre-processing of cell-based assays in bioinformatics, and the detection of anomalous traffic patterns in Internet measurement studies.
General Foundations for Studying Masking and Swamping Robustness of Outlier Identifiers
, 2012
"... This paper is dedicated to the memory of Kesar Singh, an outstanding contributor to statistical science. ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
This paper is dedicated to the memory of Kesar Singh, an outstanding contributor to statistical science.
Spatial trimming, with applications to robustify sample spatial quantile and outlyingness functions, and to construct a new robust scatter estimator. submitted
, 2010
"... The spatial multivariate median has a long history as an alternative to the sample mean. Its transformation-retransformation (TR) sample version is affine equivariant, highly robust, and computationally easy. More recently, an entire TR spatial multivariate quantile function has been developed and a ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
The spatial multivariate median has a long history as an alternative to the sample mean. Its transformation-retransformation (TR) sample version is affine equivariant, highly robust, and computationally easy. More recently, an entire TR spatial multivariate quantile function has been developed and applied in practice along with related rank functions. However, as quantile levels move farther out, robustness of the TR sample version as measured by breakdown point decreases to zero, a serious limitation in applications such as outlier detection and setting inner 50%, 75%, and 90 % quantile regions. Here we introduce a new device, “spatial trimming”, and with it solve two problems of general scope and application: (i) the need for robustification of the TR sample spatial quantile function and its closely related depth, outlyingness, and rank functions, and (ii) the need for a computationally easy, robust, and affine equivariant scatter estimator. Improvements in robustness accomplished by spatial trimming are confirmed by improved breakdown points and illustrated using simulated and actual data. Other applications of spatial trimming are
1Robust Model-based Learning via Spatial-EM Algorithm
"... Abstract—This paper presents a new robust EM algorithm for the finite mixture learning procedures. The proposed Spatial-EM algorithm utilizes median-based location and rank-based scatter estimators to replace sample mean and sample covariance matrix in each M step, hence enhancing stability and robu ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—This paper presents a new robust EM algorithm for the finite mixture learning procedures. The proposed Spatial-EM algorithm utilizes median-based location and rank-based scatter estimators to replace sample mean and sample covariance matrix in each M step, hence enhancing stability and robustness of the algorithm. It is robust to outliers and initial values. Compared with many robust mixture learning methods, the Spatial-EM has the advantages of simplicity in implementation and statistical efficiency. We apply Spatial-EM to supervised and unsupervised learning scenarios. More specifically, robust clustering and outlier detection methods based on Spatial-EM have been proposed. We apply the outlier detection to taxonomic research on fish species novelty discovery. Two real datasets are used for clustering analysis. Compared with the regular EM and many other existing methods such as K-median, X-EM and SVM, our method demonstrates superior performance and high robustness. Index Terms—Clustering, EM algorithm, finite mixture, spatial rank, outlier detection, robustness F 1
Computationally Easy Outlier Detection via Projection Pursuit with Finitely Many Directions
, 2011
"... Outlier detection methods are fundamental to all of data analysis. They are desirably robust, affine invariant, and computationally easy in any dimension. The powerful projection pursuit approach yields the “projection outlyingness”, which is affine invariant and highly robust and does not impose el ..."
Abstract
- Add to MetaCart
Outlier detection methods are fundamental to all of data analysis. They are desirably robust, affine invariant, and computationally easy in any dimension. The powerful projection pursuit approach yields the “projection outlyingness”, which is affine invariant and highly robust and does not impose ellipsoidal contours like the Mahalanobis distance approach. However, it is highly computationally intensive, being obtained by taking suprema of univariate scaled deviation outlyingness over all projections of the data onto lines. Here we introduce several outlyingness functions based on a vector of scaled deviations taken over only finitely many directions approximately uniform over the unit hypersphere. A preliminary transformation of the data to a strong invariant coordinate system makes such vectors affine invariant. We establish useful foundational theory for finite vectors of scaled deviations on projections. Also, using artificial and real data sets, we compare our affine invariant outlyingness functions with the usual projection outlyingness and with robust Mahalanobis distance outlyingness.
unknown title
"... Abstract: Depth functions represent a recently emerging powerful methodology in nonparametric multivariate inference. They provide multivariate notions of order statistics and generate quantile contours, outlyingness functions, and sign and rank functions. There are wide possibilities for constructi ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract: Depth functions represent a recently emerging powerful methodology in nonparametric multivariate inference. They provide multivariate notions of order statistics and generate quantile contours, outlyingness functions, and sign and rank functions. There are wide possibilities for constructing such functions, but these narrow considerably when criteria such as affine equivariance or invariance, computational ease, asymptotic behavior, and robustness are applied. We review basic definitions, leading examples, key properties, and selected applications.
A Robust Sample Spatial Outlyingness Function
, 2011
"... Sample quantile, rank, and outlyingness functions play long-established roles in univariate exploratory data analysis. In recent years, various multivariate generalizations have been formulated, among which the “spatial ” approach has become especially well developed, including fully affine equivari ..."
Abstract
- Add to MetaCart
Sample quantile, rank, and outlyingness functions play long-established roles in univariate exploratory data analysis. In recent years, various multivariate generalizations have been formulated, among which the “spatial ” approach has become especially well developed, including fully affine equivariant/invariant versions with but modest computational burden (Möttönen and Oja, 1995, Chaudhuri, 1996, Vardi and Zhang, 2002, Serfling, 2010, and Oja, 2010). The only shortcoming of the spatial approach is that its robustness decreases to zero as the quantile or outlyingness level is chosen farther out from the center (Dang and Serfling, 2010). This is especially detrimental to exploratory data analysis procedures such as detection of outliers and delineation of the “middle ” 50%, 75%, or 90 % of the data set, for example. Here we develop suitably robust versions using a trimming approach. The improvements in robustness are illustrated and characterized using simulated and actual data. Also, as a byproduct of the investigation, a new robust, affine equivariant, and computationally easy scatter estimator is introduced.
DISCUSSION
, 2009
"... With delight we most heartily congratulate Hallin, Paindaveine and Šiman (HPS) on a superb and stimulating paper. It uniquely impacts our thinking about regression quantiles, multivariate quantiles, and the halfspace depth. Here we examine this highly significant contribution from the standpoints of ..."
Abstract
- Add to MetaCart
(Show Context)
With delight we most heartily congratulate Hallin, Paindaveine and Šiman (HPS) on a superb and stimulating paper. It uniquely impacts our thinking about regression quantiles, multivariate quantiles, and the halfspace depth. Here we examine this highly significant contribution from the standpoints of some perspectives on multivariate quantile and depth functions, some criteria to consider in choosing such functions, and some further points about the much-studied halfspace depth. We also raise a few technical issues and questions for consideration. General perspectives on quantile and depth functions. In thinking about any new contribution to multivariate quantile functions, we may draw upon the following perspectives, which also clarify the univariate case in some respects. (P1) In multivariate analysis, orientation to a “center ” compensates for lack of a natural order. (P2) In the context of quantiles, the role of “center ” is naturally given to the “median.” (P3) The inverse of a quantile function is not the distribution F but rather the rank function. (P4) Depth, outlyingness, quantile, and rank functions are equivalent (DOQR paradigm). (P5) Quantile functions are best viewed as parameters or characteristics of the distribution F. (P6) Equivalence between distribution and quantile functions is not an essential requirement. Let us briefly elaborate on some of these points. (P3). In the univariate case, a natural linear order makes it convenient and straightforward to define distribution and quantile functions as mutual inverses, F and F −1. However, for extension to higher dimension, the equivalent medianoriented formulation is the most appropriate point of departure. That is, via u = 2p − 1, the usual quantile function F −1 (p), 0<p<1, may be represented
Robust Model-based Learning via Spatial-EM Algorithm
"... 1041-4347 (c) 2013 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution ..."
Abstract
- Add to MetaCart
(Show Context)
1041-4347 (c) 2013 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution