Results 1  10
of
71
Anomaly Detection: A Survey
, 2007
"... Anomaly detection is an important problem that has been researched within diverse research areas and application domains. Many anomaly detection techniques have been specifically developed for certain application domains, while others are more generic. This survey tries to provide a structured and c ..."
Abstract

Cited by 540 (5 self)
 Add to MetaCart
(Show Context)
Anomaly detection is an important problem that has been researched within diverse research areas and application domains. Many anomaly detection techniques have been specifically developed for certain application domains, while others are more generic. This survey tries to provide a structured and comprehensive overview of the research on anomaly detection. We have grouped existing techniques into different categories based on the underlying approach adopted by each technique. For each category we have identified key assumptions, which are used by the techniques to differentiate between normal and anomalous behavior. When applying a given technique to a particular domain, these assumptions can be used as guidelines to assess the effectiveness of the technique in that domain. For each category, we provide a basic anomaly detection technique, and then show how the different existing techniques in that category are variants of the basic technique. This template provides an easier and succinct understanding of the techniques belonging to each category. Further, for each category, we identify the advantages and disadvantages of the techniques in that category. We also provide a discussion on the computational complexity of the techniques since it is an important issue in real application domains. We hope that this survey will provide a better understanding of the di®erent directions in which research has been done on this topic, and how techniques developed in one area can be applied in domains for which they were not intended to begin with.
Surface Street Traffic Estimation
 In MobiSys
, 2007
"... In this paper, we propose a simple yet effective method of identifying traffic conditions on surface streets given location traces collected from onroad vehicles—this requires only GPS location data, plus infrequent lowbandwidth cellular updates. Unlike other systems, which simply display vehicle ..."
Abstract

Cited by 73 (0 self)
 Add to MetaCart
(Show Context)
In this paper, we propose a simple yet effective method of identifying traffic conditions on surface streets given location traces collected from onroad vehicles—this requires only GPS location data, plus infrequent lowbandwidth cellular updates. Unlike other systems, which simply display vehicle speeds on the road, our system characterizes unique traffic patterns on each road segment and identifies unusual traffic states on a segmentbysegment basis. We developed and evaluated the system by applying it to two sets of location traces. Evaluation results show that higher than 90 % accuracy in characterization can be achieved after ten or more traversals are collected on a given road segment. We also show that traffic patterns on a road are very consistent over time, provided that the underlying road conditions do not change. This allows us to use a longer history in identifying traffic conditions with higher accuracy.
Learning minimum volume sets
 J. Machine Learning Res
, 2006
"... Given a probability measure P and a reference measure µ, one is often interested in the minimum µmeasure set with Pmeasure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P, and are useful for detecting anomalies and constructing confidence region ..."
Abstract

Cited by 39 (7 self)
 Add to MetaCart
(Show Context)
Given a probability measure P and a reference measure µ, one is often interested in the minimum µmeasure set with Pmeasure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P, and are useful for detecting anomalies and constructing confidence regions. This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P. Other than these samples, no other information is available regarding P, but the reference measure µ is assumed to be known. We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classification. As in classification, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. Thus we obtain finite sample size performance bounds in terms of VC dimension and related quantities. We also demonstrate strong universal consistency and an oracle inequality. Estimators based on histograms and dyadic partitions illustrate the proposed rules. 1
Stochastic model for power grid dynamics
 In Proc. of 40th Annual Hawaii International Conference on System Sciences
, 2007
"... Abstract — We introduce a stochastic model that describes the quasistatic dynamics of an electric transmission network under perturbations introduced by random load fluctuations, random removing of system components from service, random repair times for the failed components, and random response ti ..."
Abstract

Cited by 33 (1 self)
 Add to MetaCart
Abstract — We introduce a stochastic model that describes the quasistatic dynamics of an electric transmission network under perturbations introduced by random load fluctuations, random removing of system components from service, random repair times for the failed components, and random response times to implement optimal system corrections for removing line overloads in a damaged or stressed transmission network. We use a linear approximation to the network flow equations and apply linear programming techniques that optimize the dispatching of generators and loads in order to eliminate the network overloads associated with a damaged system. We also provide a simple model for the operator’s response to various contingency events that is not always optimal due to either failure of the state estimation system or due to the incorrect subjective assessment of the severity associated with these events. This further allows us to use a game theoretic framework for casting the optimization of the operator’s response into the choice of the optimal strategy which minimizes the operating cost. We use a simple strategy space which is the degree of tolerance to line overloads and which is an automatic control (optimization) parameter that can be adjusted to trade off automatic load shed without propagating cascades versus reduced load shed and an increased risk of propagating cascades. The tolerance parameter is chosen to describes a smooth transition from a risk averse to a risk taken strategy. We present numerical results comparing the responses of two power grid systems to optimization approaches with different factors of risk and select the best blackout controlling parameter. PACS: 89.75.k, 05.10.a, 02.50.r I.
SemiSupervised Novelty Detection
, 2010
"... A common setting for novelty detection assumes that labeled examples from the nominal class are available, but that labeled examples of novelties are unavailable. The standard (inductive) approach is to declare novelties where the nominal density is low, which reduces the problem to density level se ..."
Abstract

Cited by 31 (3 self)
 Add to MetaCart
A common setting for novelty detection assumes that labeled examples from the nominal class are available, but that labeled examples of novelties are unavailable. The standard (inductive) approach is to declare novelties where the nominal density is low, which reduces the problem to density level set estimation. In this paper, we consider the setting where an unlabeled and possibly contaminated sample is also available at learning time. We argue that novelty detection in this semisupervised setting is naturally solved by a general reduction to a binary classification problem. In particular, a detector with a desired false positive rate can be achieved through a reduction to NeymanPearson classification. Unlike the inductive approach, semisupervised novelty detection (SSND) yields detectors that are optimal (e.g., statistically consistent) regardless of the distribution on novelties. Therefore, in novelty detection, unlabeled data have a substantial impact on the theoretical properties of the decision rule. We validate the practical utility of SSND with an extensive experimental study. We also show that SSND provides distributionfree, learningtheoretic solutions to two well known problems in hypothesis testing. First, our results provide a general solution to the general twosample problem, that is, the problem of determining whether two random samples arise from the same distribution. Second, a specialization of SSND coincides with the standard pvalue approach to multiple testing under the socalled random effects model. Unlike standard rejection regions based on thresholded pvalues, the general SSND framework allows for adaptation to arbitrary alternative distributions in multiple dimensions.
How to compare different loss functions and their risks
, 2006
"... Many learning problems are described by a risk functional which in turn is defined by a loss function, and a straightforward and widelyknown approach to learn such problems is to minimize a (modified) empirical version of this risk functional. However, in many cases this approach suffers from subst ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
Many learning problems are described by a risk functional which in turn is defined by a loss function, and a straightforward and widelyknown approach to learn such problems is to minimize a (modified) empirical version of this risk functional. However, in many cases this approach suffers from substantial problems such as computational requirements in classification or robustness concerns in regression. In order to resolve these issues many successful learning algorithms try to minimize a (modified) empirical risk of a surrogate loss function, instead. Of course, such a surrogate loss must be “reasonably related ” to the original loss function since otherwise this approach cannot work well. For classification good surrogate loss functions have been recently identified, and the relationship between the excess classification risk and the excess risk of these surrogate loss functions has been exactly described. However, beyond the classification problem little is known on good surrogate loss functions up to now. In this work we establish a general theory that provides powerful tools for comparing excess risks of different loss functions. We then apply this theory to several learning problems including (costsensitive) classification, regression, density estimation, and density level detection.
Outlier Detection with the Kernelized Spatial Depth Function
, 2008
"... Statistical depth functions provide from the “deepest ” point a “centeroutward ordering” of multidimensional data. In this sense, depth functions can measure the “extremeness” or “outlyingness” of a data point with respect to a given data set. Hence they can detect outliers – observations that appe ..."
Abstract

Cited by 23 (4 self)
 Add to MetaCart
(Show Context)
Statistical depth functions provide from the “deepest ” point a “centeroutward ordering” of multidimensional data. In this sense, depth functions can measure the “extremeness” or “outlyingness” of a data point with respect to a given data set. Hence they can detect outliers – observations that appear extreme relative to the rest of the observations. Of the various statistical depths, the spatial depth is especially appealing because of its computational efficiency and mathematical tractability. In this article, we propose a novel statistical depth, the kernelized spatial depth (KSD), which generalizes the spatial depth via positive definite kernels. By choosing a proper kernel, the KSD can capture the local structure of a data set while the spatial depth fails. We demonstrate this by the halfmoon data and the ringshaped data. Based on the KSD, we propose a novel outlier detection algorithm, by which an observation with a depth value less than a threshold is declared as an outlier. The proposed algorithm is simple in structure: the threshold is the only one parameter for a given kernel. It applies to a oneclass learning setting, in which “normal ” observations are given as the training data, as well as to a missing label scenario where the training set consists of a mixture of normal observations and outliers with unknown labels. We give upper bounds on the false alarm probability of a depthbased detector. These upper bounds can be used to determine the threshold. We perform extensive experiments on synthetic data and data sets from real applications. The proposed outlier detector is compared with existing methods. The KSD outlier detector demonstrates competitive performance.
Asymptotic normality of plugin level set estimates
 Annals of Applied Probability
, 2009
"... We establish the asymptotic normality of the Gmeasure of the symmetric difference between the level set and a plugintype estimator of it formed by replacing the density in the definition of the level set by a kernel density estimator. Our proof will highlight the efficacy of Poissonization method ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
We establish the asymptotic normality of the Gmeasure of the symmetric difference between the level set and a plugintype estimator of it formed by replacing the density in the definition of the level set by a kernel density estimator. Our proof will highlight the efficacy of Poissonization methods in the treatment of large sample theory problems of this kind.
QP Algorithms with Guaranteed Accuracy and Run Time for Support Vector Machines
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... We describe polynomialtime algorithms that produce approximate solutions with guaranteed accuracy for a class of QP problems that are used in the design of support vector machine classifiers. These ..."
Abstract

Cited by 21 (3 self)
 Add to MetaCart
We describe polynomialtime algorithms that produce approximate solutions with guaranteed accuracy for a class of QP problems that are used in the design of support vector machine classifiers. These
Adaptive Hausdorff Estimation of Density Level Sets
, 2007
"... Consider the problem of estimating the γlevel set G ∗ γ = {x: f(x) ≥ γ} of an unknown ddimensional density function f based on n independent observations X1,..., Xn from the density. This problem has been addressed under global error criteria related to the symmetric set difference. However, in c ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
(Show Context)
Consider the problem of estimating the γlevel set G ∗ γ = {x: f(x) ≥ γ} of an unknown ddimensional density function f based on n independent observations X1,..., Xn from the density. This problem has been addressed under global error criteria related to the symmetric set difference. However, in certain applications such as anomaly detection and clustering, a more uniform mode of convergence is desirable to ensure that the estimated set is close to the target set everywhere. The Hausdorff error criterion provides this degree of uniformity and hence is more appropriate in such situations. It is known that the minimax optimal rate of convergence for the Hausdorff error is (n/log n) −1/(d+2α) for level sets with Lipschitz boundaries, where the parameter α characterizes the regularity of the density around the level of interest. However, the estimators proposed in previous work achieve this rate for very restricted classes of sets (e.g. the boundary fragment and starshaped sets) that effectively reduce the set estimation problem to a function estimation problem. This characterization precludes the existence of multiple connected components, which is fundamental to many applications such as clustering. Also, all previous work assumes knowledge of the density regularity as characterized by the parameter α. In this paper, we present a procedure that is adaptive to unknown regularity conditions and achieves near minimax optimal rates of Hausdorff error convergence for a class of level sets with very general shapes and multiple connected components at arbitrary orientations. 1