### Citations

4018 |
An Introduction to Modern Information Retrieval
- Salton, McGill
- 1983
(Show Context)
Citation Context ...n low variance of estimators and locality in density. (3) ∆k =2,k min = 20,kmax = ⌊n/2⌋−(⌊n/2⌋ mod 2) (make kmax even and hence j = k/2 integer). 4 Experimental Evaluation We use precision and recall =-=[22]-=- to evaluate the effectiveness of MDV in comparison with LOF. In detail, let O and P denote the true outlier set and predicted outlier set respectively, then precision ≡|O ∩ P |/|P |, recall ≡|O ∩ P |... |

2228 |
Finding groups in data: an introduction to cluster analysis, volume 344. Wiley-Interscience
- Kaufman, Rousseeuw
- 2009
(Show Context)
Citation Context ...eighborhood. They do not assume any prior distribution of the data and limit the counting of points to the neighborhood of each point. Corresponding to clustering algorithms that find convex clusters =-=[12, 13]-=-, one such technique is the well-known DB(p, d)-outlier [14], where a point in a dataset T is an outlier if at least p fraction of points in T lie greater than distance d from it. A special case of DB... |

2139 |
Statistics for Spatial Data
- Cressie
- 1991
(Show Context)
Citation Context ...ors will also be close to one and it is impossible for LOF to identify them. 3 Outlier Detection with Multi-scale Deviation of Volume 3.1 Complete Spatial Randomness Complete spatial randomness (csr) =-=[19]-=- refers to a lack of structure (pattern) in the spatial point process, where events (points regarded as a realization of events) are uniformly distributed in the study region A ⊂ℜ d . For any sub-regi... |

1785 | A Density-Based Algorithm for Discovering Clusters
- Ester, Kriege, et al.
- 1996
(Show Context)
Citation Context ...on the local density comparison only with the immediate neighbors. Corresponding to clustering algorithms capable of finding arbitrary shape clusters consisting of points with similar local densities =-=[16, 17]-=-, the notion of local outlier factor (LOF) is proposed in [18], which measures the degree of outlierness, based on the difference in the local density of a point and its k nearest neighbors. Comparati... |

855 |
UCI Repository of machine learning databases.
- Murphy, Aha
- 1994
(Show Context)
Citation Context ....2 Real Data As mentioned in [23], in practice the performance of outlier detection approaches can be evaluated based on real data to recover data from rare classes. We choose from the UCI repository =-=[24]-=- two datasets, ionosphere and Wisconsin diagnostic breast cancer, both of which have two classes. All data from the majority class are treated as non-outliers. Then we randomly draw a few data from th... |

516 | Lof: Identifying density-based local outliers.
- Breunig, Kriegel, et al.
- 2000
(Show Context)
Citation Context ... Corresponding to clustering algorithms capable of finding arbitrary shape clusters consisting of points with similar local densities [16, 17], the notion of local outlier factor (LOF) is proposed in =-=[18]-=-, which measures the degree of outlierness, based on the difference in the local density of a point and its k nearest neighbors. Comparatively, DB(p, d)-outlier cannot detect local outliers w.r.t. a n... |

431 |
Outliers in Statistical Data.
- Barnett, Lewis
- 1994
(Show Context)
Citation Context ... it as an observation that deviates so much from other observations as to arouse suspicion that it was generated by a different mechanism. Similar definition also appeared in Barnett and Lewis’s book =-=[2]-=- which stated an outlier is an observation or subset of observation which appears to be inconsistent with the remainder of that set of data. Although outliers are often treated as noise or error in ma... |

322 | Efficient algorithms for mining outliers from large data sets.
- Ramaswamy, Rastogi, et al.
- 2000
(Show Context)
Citation Context ...ll-known DB(p, d)-outlier [14], where a point in a dataset T is an outlier if at least p fraction of points in T lie greater than distance d from it. A special case of DB(p, d)-outlier is proposed in =-=[15]-=-, where the distance to the k-th nearest neighbor is used to rank the outlierness. The strength of this definition includes simplicity and capture of the basic meaning of Hawkins’ definition. However,... |

298 |
A First Course in Probability
- Ross
- 1998
(Show Context)
Citation Context ...es, we can see that Var(Vj) =j/(λ2 (k +1)). For large j, if we approximate gamma distribution with Gaussian distribution, then kλ 2 S 2 j /j is a chi-squared random variable with k degrees of freedom =-=[20]-=-. Therefore, Var(S 2 j ) ≈ 2j2 /(kλ 4 ). For accurate estimation of true mean and variance, their estimators’ variances are preferred small, which means j cannot be too large. On the other hand, large... |

277 | An efficient approach to clustering in large multimedia databases with noise.
- Hinneburg, Keim
- 1998
(Show Context)
Citation Context ...on the local density comparison only with the immediate neighbors. Corresponding to clustering algorithms capable of finding arbitrary shape clusters consisting of points with similar local densities =-=[16, 17]-=-, the notion of local outlier factor (LOF) is proposed in [18], which measures the degree of outlierness, based on the difference in the local density of a point and its k nearest neighbors. Comparati... |

276 | What makes patterns interesting in knowledge discovery systems
- Silberschatz, Tuzhilin
- 1996
(Show Context)
Citation Context ...er factor is the interestingness. The rule’s interestingness can be measured in terms of its unexpectedness, i.e., how much it changes the current belief of the whole system of all mined rules so far =-=[7, 8]-=-. 1.1 Problem Formulation Now we give the formal formulation of outlier detection problem. Given a dataset partitioned as outliers and non-outliers, the problem of detecting outliers is essentially an... |

235 | Outlier detection for high dimensional data.
- Aggarwal, Yu
- 2001
(Show Context)
Citation Context ... to LOF. The average output size by MDV3 is 5.08, which is very close to 5, the number of true outliers. Besides, its precision is significantly higher than MDV and LOF. 4.2 Real Data As mentioned in =-=[23]-=-, in practice the performance of outlier detection approaches can be evaluated based on real data to recover data from rare classes. We choose from the UCI repository [24] two datasets, ionosphere and... |

221 | Adaptive fraud detection
- Fawcett, Provost
- 1997
(Show Context)
Citation Context ...t as finding clustering structure. Outlier detection has already found application including deviation detection in large databases [3], discovering network intrusion [4, 5], detecting cellular fraud =-=[6]-=-, etc. There are many similar problems in other fields. For instance, in association rule mining, an outlier is an interesting rule and the outlier factor is the interestingness. The rule’s interestin... |

185 | Distance-Based Outlier : Algorithms and Applications.
- Knorr, Hg, et al.
- 2000
(Show Context)
Citation Context ...ta and limit the counting of points to the neighborhood of each point. Corresponding to clustering algorithms that find convex clusters [12, 13], one such technique is the well-known DB(p, d)-outlier =-=[14]-=-, where a point in a dataset T is an outlier if at least p fraction of points in T lie greater than distance d from it. A special case of DB(p, d)-outlier is proposed in [15], where the distance to th... |

142 | CLARANS: A Method for Clustering Objects for Spatial Data Mining.
- Ng, Han
- 2002
(Show Context)
Citation Context ...eighborhood. They do not assume any prior distribution of the data and limit the counting of points to the neighborhood of each point. Corresponding to clustering algorithms that find convex clusters =-=[12, 13]-=-, one such technique is the well-known DB(p, d)-outlier [14], where a point in a dataset T is an outlier if at least p fraction of points in T lie greater than distance d from it. A special case of DB... |

101 | A Linear Method for Deviation Detection in Large Databases”,
- Arning, Agrawal, et al.
- 1996
(Show Context)
Citation Context ...cal inference. So in a way, finding outliers is at least as important as finding clustering structure. Outlier detection has already found application including deviation detection in large databases =-=[3]-=-, discovering network intrusion [4, 5], detecting cellular fraud [6], etc. There are many similar problems in other fields. For instance, in association rule mining, an outlier is an interesting rule ... |

79 |
Computing Depth Contours of Bivariate Point Clouds,
- Ruts, Rousseeuw
- 1996
(Show Context)
Citation Context ..., prior knowledge about the distribution of the dataset is not always available. Furthermore, it is hard to justify model selection in advance, e.g., Gaussian over exponential. Depth-based approaches =-=[9, 10]-=- employ computational geometry to compute different layers of convex hulls and declare those objects in the outer layer as outliers. However, they suffer from the dimensionality curse and cannot cope ... |

78 | Mining in a data-flow environment: Experience in network intrusion detection
- Stolfo, Lee, et al.
- 1999
(Show Context)
Citation Context ... outliers is at least as important as finding clustering structure. Outlier detection has already found application including deviation detection in large databases [3], discovering network intrusion =-=[4, 5]-=-, detecting cellular fraud [6], etc. There are many similar problems in other fields. For instance, in association rule mining, an outlier is an interesting rule and the outlier factor is the interest... |

52 | Finding interesting patterns using user expectations.”
- Liu, Hsu, et al.
- 1999
(Show Context)
Citation Context ...er factor is the interestingness. The rule’s interestingness can be measured in terms of its unexpectedness, i.e., how much it changes the current belief of the whole system of all mined rules so far =-=[7, 8]-=-. 1.1 Problem Formulation Now we give the formal formulation of outlier detection problem. Given a dataset partitioned as outliers and non-outliers, the problem of detecting outliers is essentially an... |

51 | Fast Computation of 2-Dimensional Depth Contours”,
- Johnson, Kwok, et al.
- 1998
(Show Context)
Citation Context ..., prior knowledge about the distribution of the dataset is not always available. Furthermore, it is hard to justify model selection in advance, e.g., Gaussian over exponential. Depth-based approaches =-=[9, 10]-=- employ computational geometry to compute different layers of convex hulls and declare those objects in the outer layer as outliers. However, they suffer from the dimensionality curse and cannot cope ... |

25 | A Fast Computer Intrusion Detection Algorithm based on Hypothesis Testing of Command Transition Probabilities”,
- DuMouchel, Schonlau
- 1998
(Show Context)
Citation Context ... outliers is at least as important as finding clustering structure. Outlier detection has already found application including deviation detection in large databases [3], discovering network intrusion =-=[4, 5]-=-, detecting cellular fraud [6], etc. There are many similar problems in other fields. For instance, in association rule mining, an outlier is an interesting rule and the outlier factor is the interest... |

21 |
Algorithms in Computational Geometry
- Edelsbrunner
- 1987
(Show Context)
Citation Context ... geometry to compute different layers of convex hulls and declare those objects in the outer layer as outliers. However, they suffer from the dimensionality curse and cannot cope with large dimension =-=[11]-=-. The remaining two categories are capable of dealing with multi-dimensional data and are mainly developed in the database community recently. These techniques are closely related to the corresponding... |

3 |
How to detect and handle outliers, volume 16
- Iglewicz, Hoaglin
- 1993
(Show Context)
Citation Context ... and variance and that is why we call the resulting score trimmed z-score. Trimmed mean has been proven to be more efficient in estimating sample location than sample mean in the presence of outliers =-=[21]-=-. In detail, for each point xi’s k-th neighborhood, the set of k + 1 volume values (the subscript j is omitted) {v(x) :x ∈ Nk(xi)} is first sorted in ascending order {v 1 ≤ v 2 ≤, ..., ≤ v k+1 }. Let ... |