Results 1  10
of
39
SemiSupervised Novelty Detection
, 2010
"... A common setting for novelty detection assumes that labeled examples from the nominal class are available, but that labeled examples of novelties are unavailable. The standard (inductive) approach is to declare novelties where the nominal density is low, which reduces the problem to density level se ..."
Abstract

Cited by 31 (3 self)
 Add to MetaCart
A common setting for novelty detection assumes that labeled examples from the nominal class are available, but that labeled examples of novelties are unavailable. The standard (inductive) approach is to declare novelties where the nominal density is low, which reduces the problem to density level set estimation. In this paper, we consider the setting where an unlabeled and possibly contaminated sample is also available at learning time. We argue that novelty detection in this semisupervised setting is naturally solved by a general reduction to a binary classification problem. In particular, a detector with a desired false positive rate can be achieved through a reduction to NeymanPearson classification. Unlike the inductive approach, semisupervised novelty detection (SSND) yields detectors that are optimal (e.g., statistically consistent) regardless of the distribution on novelties. Therefore, in novelty detection, unlabeled data have a substantial impact on the theoretical properties of the decision rule. We validate the practical utility of SSND with an extensive experimental study. We also show that SSND provides distributionfree, learningtheoretic solutions to two well known problems in hypothesis testing. First, our results provide a general solution to the general twosample problem, that is, the problem of determining whether two random samples arise from the same distribution. Second, a specialization of SSND coincides with the standard pvalue approach to multiple testing under the socalled random effects model. Unlike standard rejection regions based on thresholded pvalues, the general SSND framework allows for adaptation to arbitrary alternative distributions in multiple dimensions.
Asymptotic normality of plugin level set estimates
 Annals of Applied Probability
, 2009
"... We establish the asymptotic normality of the Gmeasure of the symmetric difference between the level set and a plugintype estimator of it formed by replacing the density in the definition of the level set by a kernel density estimator. Our proof will highlight the efficacy of Poissonization method ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
We establish the asymptotic normality of the Gmeasure of the symmetric difference between the level set and a plugintype estimator of it formed by replacing the density in the definition of the level set by a kernel density estimator. Our proof will highlight the efficacy of Poissonization methods in the treatment of large sample theory problems of this kind.
Adaptive Hausdorff Estimation of Density Level Sets
, 2007
"... Consider the problem of estimating the γlevel set G ∗ γ = {x: f(x) ≥ γ} of an unknown ddimensional density function f based on n independent observations X1,..., Xn from the density. This problem has been addressed under global error criteria related to the symmetric set difference. However, in c ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
(Show Context)
Consider the problem of estimating the γlevel set G ∗ γ = {x: f(x) ≥ γ} of an unknown ddimensional density function f based on n independent observations X1,..., Xn from the density. This problem has been addressed under global error criteria related to the symmetric set difference. However, in certain applications such as anomaly detection and clustering, a more uniform mode of convergence is desirable to ensure that the estimated set is close to the target set everywhere. The Hausdorff error criterion provides this degree of uniformity and hence is more appropriate in such situations. It is known that the minimax optimal rate of convergence for the Hausdorff error is (n/log n) −1/(d+2α) for level sets with Lipschitz boundaries, where the parameter α characterizes the regularity of the density around the level of interest. However, the estimators proposed in previous work achieve this rate for very restricted classes of sets (e.g. the boundary fragment and starshaped sets) that effectively reduce the set estimation problem to a function estimation problem. This characterization precludes the existence of multiple connected components, which is fundamental to many applications such as clustering. Also, all previous work assumes knowledge of the density regularity as characterized by the parameter α. In this paper, we present a procedure that is adaptive to unknown regularity conditions and achieves near minimax optimal rates of Hausdorff error convergence for a class of level sets with very general shapes and multiple connected components at arbitrary orientations. 1
Anomaly detection with score functions based on nearest neighbor graphs
 in NIPS
, 2009
"... We propose a novel nonparametric adaptive anomaly detection algorithm for high dimensional data based on score functions derived from nearest neighbor graphs on npoint nominal data. Anomalies are declared whenever the score of a test sample falls below α, which is supposed to be the desired false ..."
Abstract

Cited by 18 (7 self)
 Add to MetaCart
(Show Context)
We propose a novel nonparametric adaptive anomaly detection algorithm for high dimensional data based on score functions derived from nearest neighbor graphs on npoint nominal data. Anomalies are declared whenever the score of a test sample falls below α, which is supposed to be the desired false alarm level. The resulting anomaly detector is shown to be asymptotically optimal in that it is uniformly most powerful for the specified false alarm level, α, for the case when the anomaly density is a mixture of the nominal and a known density. Our algorithm is computationally efficient, being linear in dimension and quadratic in data size. It does not require choosing complicated tuning parameters or function approximation classes and it can adapt to local structure such as local change in dimensionality. We demonstrate the algorithm on both artificial and real data sets in high dimensional feature spaces. 1
LowNoise Density Clustering
"... We study densitybased clustering under lownoise conditions. Our framework allows for sharply defined clusters such as clusters on lower dimensional manifolds. We show that accurate clustering is possible even in high dimensions. We propose two databased methods for choosing the bandwidth and we s ..."
Abstract

Cited by 18 (7 self)
 Add to MetaCart
We study densitybased clustering under lownoise conditions. Our framework allows for sharply defined clusters such as clusters on lower dimensional manifolds. We show that accurate clustering is possible even in high dimensions. We propose two databased methods for choosing the bandwidth and we study the stability properties of density clusters. We show that a simple graphbased algorithm known as the “friendsoffriends ” algorithm successfully approximates the high density clusters. 1
Geometric entropy minimization (GEM) for anomaly detection and localization
 In Proc. Advances in Neural Information Processing Systems (NIPS
, 2006
"... and localization ..."
(Show Context)
Machine learning approaches to network anomaly detection
 in Proceedings of the Second Workshop on Tackling Computer Systems Problems with Machine Learning (SysML
, 2007
"... Abstract — Networks of various kinds often experience anomalous behaviour. Examples include attacks or large data transfers in IP networks, presence of intruders in distributed video surveillance systems, and an automobile accident or an untimely congestion in a road network. Machine learning techni ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
(Show Context)
Abstract — Networks of various kinds often experience anomalous behaviour. Examples include attacks or large data transfers in IP networks, presence of intruders in distributed video surveillance systems, and an automobile accident or an untimely congestion in a road network. Machine learning techniques enable the development of anomaly detection algorithms that are nonparametric, adaptive to changes in the characteristics of normal behaviour in the relevant network, and portable across applications. In this paper we use two different datasets, pictures of a highway in Quebec taken by a network of webcams and IP traffic statistics from the Abilene network, as examples in demonstrating the applicability of two machine learning algorithms to network anomaly detection. We investigate the use of the blockbased OneClass Neighbour Machine and the recursive Kernelbased Online Anomaly Detection algorithms. I.
Overlaying classifiers: A practical approach for optimal ranking
 Adv. Neural Inf. Process. Syst
, 2009
"... The ROC curve is one of the most widely used visual tool to evaluate performance of scoring functions regarding their capacities to discriminate between two populations. It is the goal of this paper to propose a statistical learning method for constructing a scoring function with nearly optimal ROC ..."
Abstract

Cited by 15 (7 self)
 Add to MetaCart
The ROC curve is one of the most widely used visual tool to evaluate performance of scoring functions regarding their capacities to discriminate between two populations. It is the goal of this paper to propose a statistical learning method for constructing a scoring function with nearly optimal ROC curve. In this bipartite setup, the target is known to be the regression function up to an increasing transform and solving the optimization problem boils down to recovering the collection of level sets of the latter, which we interpret here as a continuum of imbricated classification problems. We propose a discretization approach, consisting in building a finite sequence of N classifiers by constrained empirical risk minimization and then constructing a piecewise constant scoring function sN(x) by overlaying the resulting classifiers. Given the functional nature of the ROC criterion, the accuracy of the ranking induced by sN(x) can be conceived in a variety of ways, depending on the distance chosen for measuring closeness to the optimal curve in the ROC space. By relating the ROC curve of the resulting scoring function to piecewise linear approximates of the optimal ROC curve, we establish the consistency of the method as well as rate bounds to control its generalization ability in supnorm. Eventually, we also highlight the fact that, as a byproduct, the algorithm proposed provides an accurate estimate of the optimal ROC curve.
Nonparametric assessment of contamination in multivariate data using minimumvolume sets and FDR
, 2007
"... Large, multivariate datasets from highthroughput instrumentation have become ubiquitous throughout the sciences. Frequently, it is of great interest to characterize the measurements in these datasets by the extent to which they represent ‘nominal ’ versus ‘contaminated ’ instances. However, often t ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
Large, multivariate datasets from highthroughput instrumentation have become ubiquitous throughout the sciences. Frequently, it is of great interest to characterize the measurements in these datasets by the extent to which they represent ‘nominal ’ versus ‘contaminated ’ instances. However, often the nature of even the nominal patterns in the data are unknown and potentially quite complex, making their explicit parametric modeling a daunting task. In this paper, we introduce a nonparametric method for the simultaneous annotation of multivariate data (called MNSCAnn), by which one may produce an annotated ranking of the observations, indicating the relative extent to which each may or may not be considered nominal, while making minimal assumptions on the nature of the nominal distribution. In our framework each observation is linked to a corresponding minimum volume set and, implicitly adopting a hypothesis testing perspective, each set is associated with a test, which in turn is accompanied by a certain false discovery rate. The combination of minimum volume set methods with false discovery rate principles, in the context of contaminated data, is new. Moreover, estimation of the key underlying quantities requires that a number of issues be addressed. We illustrate MNSCAnn through examples in two contexts – the preprocessing of cellbased assays in bioinformatics, and the detection of anomalous traffic patterns in Internet measurement studies.
Learning minimum volume sets with support vector machines
 in Proc. IEEE Int. Workshop on Machine Learning for Signal Processing (MLSP
, 2006
"... Given a probability law P on ddimensional Euclidean space, the minimum volume set (MVset) with mass β, 0 <β<1, is the set with smallest volume enclosing a probability mass of at least β. We examine the use of support vector machines (SVMs) for estimating an MVset from a collection of data p ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Given a probability law P on ddimensional Euclidean space, the minimum volume set (MVset) with mass β, 0 <β<1, is the set with smallest volume enclosing a probability mass of at least β. We examine the use of support vector machines (SVMs) for estimating an MVset from a collection of data points drawn from P, a problem with applications in clustering and anomaly detection. We investigate both oneclass and twoclass methods. The twoclass approach reduces the problem to NeymanPearson (NP) classification, where we artificially generate a second class of data points according to a uniform distribution. The simple approach to generating the uniform data suffers from the curse of dimensionality. In this paper we (1) describe the reduction of MVset estimation to NP classification, (2) devise improved methods for generating artificial uniform data for the twoclass approach, (3) advocate a new performance measure for systematic comparison of MVset algorithms, and (4) establish a set of benchmark experiments to serve as a point of reference for future MVset algorithms. We find that, in general, the twoclass method performs more reliably. 1.