Results 1  10
of
33
Estimating the Support of a HighDimensional Distribution
, 1999
"... Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We propo ..."
Abstract

Cited by 766 (29 self)
 Add to MetaCart
Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We propose a method to approach this problem by trying to estimate a function f which is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a preliminary theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabelled d...
Support Vector Method for Novelty Detection
, 2000
"... Suppose you are given some dataset drawn from an underlying probability distributionPand you want to estimate a “simple ” subsetSof input space such that the probability that a test point drawn from P lies outside of Sequals some a priori specified between0and1. We propose a m ethod to approach this ..."
Abstract

Cited by 160 (4 self)
 Add to MetaCart
Suppose you are given some dataset drawn from an underlying probability distributionPand you want to estimate a “simple ” subsetSof input space such that the probability that a test point drawn from P lies outside of Sequals some a priori specified between0and1. We propose a m ethod to approach this problem by trying to estimate a function f which is positive on S and negative on the complement. The functional form offis given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. We provide a theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabelled data.
A classification framework for anomaly detection
 J. Machine Learning Research
, 2005
"... One way to describe anomalies is by saying that anomalies are not concentrated. This leads to the problem of finding level sets for the data generating density. We interpret this learning problem as a binary classification problem and compare the corresponding classification risk with the standard p ..."
Abstract

Cited by 71 (6 self)
 Add to MetaCart
(Show Context)
One way to describe anomalies is by saying that anomalies are not concentrated. This leads to the problem of finding level sets for the data generating density. We interpret this learning problem as a binary classification problem and compare the corresponding classification risk with the standard performance measure for the density level problem. In particular it turns out that the empirical classification risk can serve as an empirical performance measure for the anomaly detection problem. This allows us to compare different anomaly detection algorithms empirically, i.e. with the help of a test set. Based on the above interpretation we then propose a support vector machine (SVM) for anomaly detection. Finally, we establish universal consistency for this SVM and report some experiments which compare our SVM to other commonly used methods including the standard oneclass SVM. 1
Oneclass collaborative filtering
 In ICDM 2008
, 2008
"... Many applications of collaborative filtering (CF), such as news item recommendation and bookmark recommendation, are most naturally thought of as oneclass collaborative filtering (OCCF) problems. In these problems, the training data usually consist simply of binary data reflecting a user’s action or ..."
Abstract

Cited by 65 (1 self)
 Add to MetaCart
Many applications of collaborative filtering (CF), such as news item recommendation and bookmark recommendation, are most naturally thought of as oneclass collaborative filtering (OCCF) problems. In these problems, the training data usually consist simply of binary data reflecting a user’s action or inaction, such as page visitation in the case of news item recommendation or webpage bookmarking in the bookmarking scenario. Usually this kind of data are extremely sparse (a small fraction are positive examples), therefore ambiguity arises in the interpretation of the nonpositive examples. Negative examples and unlabeled positive examples are mixed together and we are typically unable to distinguish them. For example, we cannot really attribute a user not bookmarking a page to a lack of interest or lack of awareness of the page. Previous research addressing this oneclass problem only considered it as a classification task. In this paper, we consider the oneclass problem under the CF setting. We propose two frameworks to tackle OCCF. One is based on weighted low rank approximation; the other is based on negative example sampling. The experimental results show that our approaches significantly outperform the baselines. 1
PAC Learning from Positive Statistical Queries
 Proc. 9th International Conference on Algorithmic Learning Theory  ALT ’98
, 1998
"... . Learning from positive examples occurs very frequently in natural learning. The PAC learning model of Valiant takes many features of natural learning into account, but in most cases it fails to describe such kind of learning. We show that in order to make the learning from positive data possible, ..."
Abstract

Cited by 52 (3 self)
 Add to MetaCart
(Show Context)
. Learning from positive examples occurs very frequently in natural learning. The PAC learning model of Valiant takes many features of natural learning into account, but in most cases it fails to describe such kind of learning. We show that in order to make the learning from positive data possible, extrainformation about the underlying distribution must be provided to the learner. We define a PAC learning model from positive and unlabeled examples. We also define a PAC learning model from positive and unlabeled statistical queries. Relations with PAC model ([Val84]), statistical query model ([Kea93]) and constantpartition classification noise model ([Dec97]) are studied. We show that kDNF and kdecision lists are learnable in both models, i.e. with far less information than it is assumed in previously used algorithms. 1 Introduction The PAC learning model of Valiant ([Val84]) has become the reference model in computational learning theory. However, in spite of the importance of lea...
Learning minimum volume sets
 J. Machine Learning Res
, 2006
"... Given a probability measure P and a reference measure µ, one is often interested in the minimum µmeasure set with Pmeasure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P, and are useful for detecting anomalies and constructing confidence region ..."
Abstract

Cited by 41 (9 self)
 Add to MetaCart
(Show Context)
Given a probability measure P and a reference measure µ, one is often interested in the minimum µmeasure set with Pmeasure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P, and are useful for detecting anomalies and constructing confidence regions. This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P. Other than these samples, no other information is available regarding P, but the reference measure µ is assumed to be known. We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classification. As in classification, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. Thus we obtain finite sample size performance bounds in terms of VC dimension and related quantities. We also demonstrate strong universal consistency and an oracle inequality. Estimators based on histograms and dyadic partitions illustrate the proposed rules. 1
SV Estimation of a Distribution's Support
, 1999
"... Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified 0 < 1. We propose an algorithm which appro ..."
Abstract

Cited by 37 (2 self)
 Add to MetaCart
Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified 0 < 1. We propose an algorithm which approaches this problem by trying to estimate a function f which is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The algorithm is a natural extension of the support vector algorithm to the case of unlabelled data.
Robust Novelty Detection with SingleClass MPM
 IN ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS
, 2003
"... In this paper we consider the problem of novelty detection, presenting an algorithm that aims to nd a minimal region in input space containing a fraction of the probability mass underlying a data set. This algorithm  the "singleclass minimax probability machine (MPM)"  is built ..."
Abstract

Cited by 29 (1 self)
 Add to MetaCart
(Show Context)
In this paper we consider the problem of novelty detection, presenting an algorithm that aims to nd a minimal region in input space containing a fraction of the probability mass underlying a data set. This algorithm  the "singleclass minimax probability machine (MPM)"  is built on a distributionfree methodology that minimizes the worstcase probability of a data point falling outside of a convex set, given only the mean and covariance matrix of the distribution and making no further distributional assumptions. We present
Resampling approach for anomaly detection in multispectral images
 IN PROCEEDINGS OF SPIE, VOL. 5093 (23 SEPTEMBER 2003): SHEN, SYLVIA S.; AND LEWIS, PAUL E. (EDS.), ALGORITHMS AND TECHNOLOGIES FOR MULTISPECTRAL, HYPERSPECTRAL, AND ULTRASPECTRAL IMAGERY IX
, 2003
"... We propose a novel approach for identifying the “most unusual” samples in a data set, based on a resampling of data attributes. The resampling produces a “background class” and then binary classification is used to distinguish the original training set from the background. Those in the training set ..."
Abstract

Cited by 29 (3 self)
 Add to MetaCart
(Show Context)
We propose a novel approach for identifying the “most unusual” samples in a data set, based on a resampling of data attributes. The resampling produces a “background class” and then binary classification is used to distinguish the original training set from the background. Those in the training set that are most like the background (i.e., most unlike the rest of the training set) are considered anomalous. Although by their nature, anomalies do not permit a positive definition (if I knew what they were, I wouldn’t call them anomalies), one can make “negative definitions” (I can say what does not qualify as an interesting anomaly). By choosing different resampling schemes, one can identify different kinds of anomalies. For multispectral images, anomalous pixels correspond to locations on the ground with unusual spectral signatures or, depending on how feature sets are constructed, unusual spatial textures.