Results 1  10
of
54
Outlier Detection in Sensor Networks
 MobiHoc'07
, 2007
"... Outlier detection has many important applications in sensor networks, e.g., abnormal event detection, animal behavior change, etc. It is a difficult problem since global information about data distributions must be known to identify outliers. In this paper, we use a histogrambased method for outlie ..."
Abstract

Cited by 35 (1 self)
 Add to MetaCart
Outlier detection has many important applications in sensor networks, e.g., abnormal event detection, animal behavior change, etc. It is a difficult problem since global information about data distributions must be known to identify outliers. In this paper, we use a histogrambased method for outlier detection to reduce communication cost. Rather than collecting all the data in one location for centralized processing, we propose collecting hints (in the form of a histogram) about the data distribution, and using the hints to filter out unnecessary data and identify potential outliers. We show that this method can be used for detecting outliers in terms of two different definitions. Our simulation results show that the histogram method can dramatically reduce the communication cost.
Incremental local outlier detection for data streams
 In Proceedings of IEEE Symposium on Computational Intelligence and Data Mining
, 2007
"... Abstract. Outlier detection has recently become an important problem in many industrial and financial applications. This problem is further complicated by the fact that in many cases, outliers have to be detected from data streams that arrive at an enormous pace. In this paper, an incremental LOF (L ..."
Abstract

Cited by 34 (1 self)
 Add to MetaCart
Abstract. Outlier detection has recently become an important problem in many industrial and financial applications. This problem is further complicated by the fact that in many cases, outliers have to be detected from data streams that arrive at an enormous pace. In this paper, an incremental LOF (Local Outlier Factor) algorithm, appropriate for detecting outliers in data streams, is proposed. The proposed incremental LOF algorithm provides equivalent detection performance as the iterated static LOF algorithm (applied after insertion of each data record), while requiring significantly less computational time. In addition, the incremental LOF algorithm also dynamically updates the profiles of data points. This is a very important property, since data profiles may change over time. The paper provides theoretical evidence that insertion of a new data point as well as deletion of an old data point influence only limited number of their closest neighbors and thus the number of updates per such insertion/deletion does not depend on the total number of points N in the data set. Our experiments performed on several simulated and real life data sets have demonstrated that the proposed incremental LOF algorithm is computationally efficient, while at the same time very successful in detecting outliers and changes of distributional behavior in various data stream applications. I.
Outlier Detection with the Kernelized Spatial Depth Function
, 2008
"... Statistical depth functions provide from the “deepest ” point a “centeroutward ordering” of multidimensional data. In this sense, depth functions can measure the “extremeness” or “outlyingness” of a data point with respect to a given data set. Hence they can detect outliers – observations that appe ..."
Abstract

Cited by 23 (4 self)
 Add to MetaCart
(Show Context)
Statistical depth functions provide from the “deepest ” point a “centeroutward ordering” of multidimensional data. In this sense, depth functions can measure the “extremeness” or “outlyingness” of a data point with respect to a given data set. Hence they can detect outliers – observations that appear extreme relative to the rest of the observations. Of the various statistical depths, the spatial depth is especially appealing because of its computational efficiency and mathematical tractability. In this article, we propose a novel statistical depth, the kernelized spatial depth (KSD), which generalizes the spatial depth via positive definite kernels. By choosing a proper kernel, the KSD can capture the local structure of a data set while the spatial depth fails. We demonstrate this by the halfmoon data and the ringshaped data. Based on the KSD, we propose a novel outlier detection algorithm, by which an observation with a depth value less than a threshold is declared as an outlier. The proposed algorithm is simple in structure: the threshold is the only one parameter for a given kernel. It applies to a oneclass learning setting, in which “normal ” observations are given as the training data, as well as to a missing label scenario where the training set consists of a mixture of normal observations and outliers with unknown labels. We give upper bounds on the false alarm probability of a depthbased detector. These upper bounds can be used to determine the threshold. We perform extensive experiments on synthetic data and data sets from real applications. The proposed outlier detector is compared with existing methods. The KSD outlier detector demonstrates competitive performance.
Mining distancebased outliers from large databases in any metric space
 IN: KDD
, 2006
"... Let R be a set of objects. An object o ∈ R is an outlier, if there exist less than k objects in R whose distances to o are at most r. The values of k, r, and the distance metric are provided by a user at the run time. The objective is to return all outliers with the smallest I/O cost. This paper con ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
(Show Context)
Let R be a set of objects. An object o ∈ R is an outlier, if there exist less than k objects in R whose distances to o are at most r. The values of k, r, and the distance metric are provided by a user at the run time. The objective is to return all outliers with the smallest I/O cost. This paper considers a generic version of the problem, where no information is available for outlier computation, except for objects’ mutual distances. We prove an upper bound for the memory consumption which permits the discovery of all outliers by scanning the dataset 3 times. The upper bound turns out to be extremely low in practice, e.g., less than 1 % of R. Since the actual memory capacity of a realistic DBMS is typically larger, we develop a novel algorithm, which integrates our theoretical findings with carefullydesigned heuristics that leverage the additional memory to improve I/O efficiency. Our technique reports all outliers by scanning the dataset at most twice (in some cases, even once), and significantly outperforms the existing solutions by a factor up to an order of magnitude.
Interpreting and Unifying Outlier Scores
"... Outlier scores provided by different outlier models differ widely in their meaning, range, and contrast between different outlier models and, hence, are not easily comparable or interpretable. We propose a unification of outlier scores provided by various outlier models and a translation of the arbi ..."
Abstract

Cited by 21 (5 self)
 Add to MetaCart
(Show Context)
Outlier scores provided by different outlier models differ widely in their meaning, range, and contrast between different outlier models and, hence, are not easily comparable or interpretable. We propose a unification of outlier scores provided by various outlier models and a translation of the arbitrary “outlier factors ” to values in the range [0, 1] interpretable as values describing the probability of a data object of being an outlier. As an application, we show that this unification facilitates enhanced ensembles for outlier detection. 1
Converting output scores from outlier detection algorithms into probability estimates
 In Proc. of the Sixth IEEE Int. Conf. on Data Mining (ICDM’06
, 2006
"... Current outlier detection schemes typically output a numeric score representing the degree to which a given observation is an outlier. We argue that converting the scores into wellcalibrated probability estimates is more favorable for several reasons. First, the probability estimates allow us to se ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
(Show Context)
Current outlier detection schemes typically output a numeric score representing the degree to which a given observation is an outlier. We argue that converting the scores into wellcalibrated probability estimates is more favorable for several reasons. First, the probability estimates allow us to select the appropriate threshold for declaring outliers using a Bayesian risk model. Second, the probability estimates obtained from individual models can be aggregated to build an ensemble outlier detection framework. In this paper, we present two methods for transforming outlier scores into probabilities. The first approach assumes that the posterior probabilities follow a logistic sigmoid function and learns the parameters of the function from the distribution of outlier scores. The second approach models the score distributions as a mixture of exponential and Gaussian probability functions and calculates the posterior probabilites via the Bayes ’ rule. We evaluated the efficacy of both methods in the context of threshold selection and ensemble outlier detection. We also show that the calibration accuracy improves with the aid of some labeled examples. 1
On evaluation of outlier rankings and outlier scores
 In Proceedings of the 12th SIAM international conference on data mining (SDM), Anaheim, CA
, 2012
"... Outlier detection research is currently focusing on the development of new methods and on improving the computation time for these methods. Evaluation however is rather heuristic, often considering just precision in the top k results or using the area under the ROC curve. These evaluation procedures ..."
Abstract

Cited by 12 (8 self)
 Add to MetaCart
(Show Context)
Outlier detection research is currently focusing on the development of new methods and on improving the computation time for these methods. Evaluation however is rather heuristic, often considering just precision in the top k results or using the area under the ROC curve. These evaluation procedures do not allow for assessment of similarity between methods. Judging the similarity of or correlation between two rankings of outlier scores is an important question in itself but it is also an essential step towards meaningfully building outlier detection ensembles, where this aspect has been completely ignored so far. In this study, our generalized view of evaluation methods allows both to evaluate the performance of existing methods as well as to compare different methods w.r.t. their detection performance. Our new evaluation framework takes into consideration the class imbalance problem and offers new insights on similarity and redundancy of existing outlier detection methods. As a result, the design of effective ensemble methods for outlier detection is considerably enhanced. 1
iScore: Measuring the Interestingness of Articles in a Limited User Environment
 In: IEEE Symposium on Computational Intelligence and Data Mining
, 2007
"... AbstractSearch engines, such as Google, assign scores to news articles based on their relevancy to a query. However, not all relevant articles for the query may be interesting to a user. For example, if the article is old or yields little new information, the article would be uninteresting. Relevan ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
(Show Context)
AbstractSearch engines, such as Google, assign scores to news articles based on their relevancy to a query. However, not all relevant articles for the query may be interesting to a user. For example, if the article is old or yields little new information, the article would be uninteresting. Relevancy scores do not take into account what makes an article interesting, which would vary from user to user. Although methods such as collaborative filtering have been shown to be effective in recommendation systems, in a limited user environment there are not enough users that would make collaborative filtering effective. We present a general framework for defining and measuring the “interestingness ” of articles, incorporating userfeedback. We show 21 % improvement over traditional IR methods. I.
Isolationbased anomaly detection
"... Anomalies are data points that are few and different. As a result of these properties, we show that, anomalies are susceptible to a mechanism called isolation. This paper proposes a method called Isolation Forest (iForest) which detects anomalies purely based on the concept of isolation without empl ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
Anomalies are data points that are few and different. As a result of these properties, we show that, anomalies are susceptible to a mechanism called isolation. This paper proposes a method called Isolation Forest (iForest) which detects anomalies purely based on the concept of isolation without employing any distance or density measure—fundamentally different from all existing methods. As a result, iForest is able to exploit subsampling (i) to achieve a low linear timecomplexity and a small memoryrequirement, and (ii) to deal with the effects of swamping and masking effectively. Our empirical evaluation shows that iForest outperforms ORCA, oneclass SVM, LOF and Random Forests in terms of AUC, processing time, and it is robust against masking and swamping effects. iForest also works well in high dimensional problems containing a large number of irrelevant attributes, and when anomalies are not available in training sample.