17 citations found. Retrieving documents...
E. M. Knorr and R. T. Ng. A unified notion of outliers: Properties and computation. In Knowledge Discovery and Data Mining, pages 219--222, 1997.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Detecting Graph-Based Spatial Outliers - Shekhar, Lu, Zhang (2002)   (1 citation)  (Correct)

....The cost models for outlier query processing are analyzed in Section 4. Section 5 presents our experimental design. The experimental observation and results are shown in Section 6. We summarize our work in Section 7. 2. Related work and our contribution Many outlier detection algorithms [ 1 3,8,9,12,14,16] have been recently proposed. These methods can be broadly classified into two categories, namely set based outlier detection methods ar. td spatial set based outlier detection methods, as shown in Fig. 4. The set based outlier detection algorithms [2, 7] consider the statistical distribution of ....

....and graph based methods. The multi dimensional metric space based methods model data sets as a collection of points in a multi dimensional space, and provide tests based on concepts such as distance, density, and convex hull depth. Knorr and Ng presented the notion of distance based outliers [8,9]. For a k dimensional data set T with N objects, an object in T is a DB(p, D) outlier if at least a fraction p of the objects in T lies greater than distance D from . Ramaswamy et al. 13] proposed a formulation for distance based outliers based on the distance of a point to its k th nearest ....

E. Knorr and R. Ng, A unified notion of outliers: Properties and computation, in: Proc. of the International Conference on Knowledge Discovery and Data Mining, 1997, pp. 219-222.


Detecting Graph-based Spatial Outliers - Shekhar, Lu, Zhang (2002)   (1 citation)  (Correct)

....cost models for different outlier query processing 4 are analyzed in Section 4. Section 5 presents our experimental design. The experimental observation and results are shown in Section 6. We summarize our work in Section 7. 2 Related Work and Our Contribution Many outlier detection algorithms [1, 2, 3, 8, 9, 12, 14, 16] have been recently proposed. As shown in Figure 4, these methods can be broadly classified into two categories, namely set based outlier detection methods and spatial set based outlier detection methods. The set based outlier detection algorithms [2, 7] consider the statistical distribution of ....

....multi dimensional metric space based methods model data sets as a collection of points in a multidimensional space, and provide tests based on concepts such as distance, density, convex hull depth. We discuss different example tests now. Knorr and Ng presented the notion of distance based outliers [8, 9]. For a k dimensional data set T with N objects; an object O in T is a DB(p;D) outlier if at least a fraction p of the objects in T lies greater than distance D from O. Ramaswamy et al. 13] proposed a formulation for distance based outliers based on the distance of a point from its k th nearest ....

E. Knorr and R. Ng. A unified notion of outliers: Properties and computation. In Proc. of the International Conference on Knowledge Discovery and Data Mining, pages 219--222, 1997.


A Unified Approach to Spatial Outliers Detection - Shekhar, Lu, Zhang (2003)   (Correct)

....and describe the future direction of our research in Section 7. This paper focuses on spatial outlier detection using a single attribute. Outlier detection in multi dimensional space using multiple attributes is beyond the scope of this paper. 2 Related Work Many outlier detection algorithms [1, 2, 3, 12, 13, 19, 21, 28] have been recently proposed. As shown in Figure 1(a) these methods can be broadly classified into two categories, namely one dimensional(linear) outlier detection methods and multi dimensional outlier 5 detection methods. The one dimensional outlier detection algorithms [2, 9] consider the ....

....depth. These methods do not distinguish between attribute dimensions and geo spatial dimensions, and use all dimensions for defining neighborhood as well as for comparison, as shown in Figure 1(b) We discuss representative methods now. Knorr and Ng presented the notion of distance based outliers [12, 13]. As discussed in example 2 of Section 1, for a k dimensional data set T with N objects, an object O in T is a DB(p; D) outlier if at least a fraction p of the objects in T lies greater than distance D from O. Ramaswamy et al. 20] proposed a formulation for distance based outliers based on the ....

E. Knorr and R. Ng. A Unified Notion of Outliers: Properties and Computation. In Proc. of the International Conference on Knowledge Discovery and Data Mining, pages 219--222, 1997.


What's Spatial About Spatial Data Mining: Three Case.. - Shekhar, Huang, Wu, Lu.. (2001)   (1 citation)  (Correct)

....definitions. In multidimensional space based outlier detection, the definition of spatial neighborhood is based on Euclidean distance, while in graph based spatial outlier detection, the definition is based on graph connectivity. Many outlier detection algorithms [ABKS99, BL94, BKNS99, KN97, KN98, PS88, RR96, YSZ99] have been recently proposed, as shown in Figure 1.8. The set based outlier detection algorithms [BL94, Joh92] consider the statistical distribution of attribute values, ignoring the spatial relationships among data objects. Numerous distribution based outlier detection ....

....methods The multi dimensional space based methods model data sets as a collection of points in a multidimensional space, and provide tests based on concepts such as distance, density, and convex hull depth. Knorr Spatial Data Mining 21 and Ng presented the notion of distance based outliers [KN97, KN98] For a k dimensional data set T with N objects, an object O in T is a DB(p;D) outlier if at least a fraction p of the objects in T lies greater than distance D from O. Ramaswamy et al. RRS] proposed a formulation for distance based outliers based on the distance of a point from its k th ....

E. Knorr and R. Ng. A unified notion of outliers: Properties and computation. In Proc. of the International Conference on Knowledge Discovery and Data Mining, pages 219--222, 1997.


What's Spatial About Spatial Data Mining: Three Case.. - Shekhar, Huang, Wu, Lu.. (2001)   (1 citation)  (Correct)

....definitions. In multidimensional space based outlier detection, the definition of spatial neighborhood is based on Euclidean distance, while in graph based spatial outlier detection, the definition is based on graph connectivity. Many outlier detection algorithms [ABKS99, BL94, BKNS99, KN97, KN98, PS88, RR96, YSZ99] have been recently proposed, as shown in Figure 8. The set based outlier detection algorithms [BL94, Joh92] consider the statistical distribution of attribute values, ignoring the spatial relationships among data objects. Numerous distribution based outlier detection ....

....of outlier detection methods The multi dimensional space based methods model data sets as a collection of points in a multidimensional space, and provide tests based on concepts such as distance, density, and convex hull depth. Knorr and Ng presented the notion of distance based outliers [KN97, KN98] For a k dimensional data set T with N objects, an object O in T is a DB(p;D) outlier if at least a fraction p of the objects in T lies greater than distance D from O. Ramaswamy et al. RRS] proposed a formulation for distance based outliers based on the distance of a point from its k th ....

E. Knorr and R. Ng. A unified notion of outliers: Properties and computation. In Proc. of the International Conference on Knowledge Discovery and Data Mining, pages 219--222, 1997.


Detecting Graph-based Spatial Outliers: Algorithms and.. - Shekhar, Lu, Zhang (2001)   (2 citations)  (Correct)

....cost models for different outlier query processing are analyzed in Section 4. Section 5 presents our experiment design. The experimental observation 4 and results are shown in Section 6. We summarize our work in Section 7. 2 Related Work and Our Contribution Many outlier detection algorithms [1, 2, 3, 8, 9, 13, 15, 17] have been recently proposed. As shown in Figure 4, these methods can be broadly classified into two categories, namely set based outlier detection methods and spatial set based outlier detection methods. The set based outlier detection algorithms [2, 7] consider the statistical distribution of ....

....multi dimensional metric space based methods model data sets as collection of points in a multidimensional space, and provide tests based on concepts such as distance, density, convex hull depth. We discuss different example tests now. Knorr and Ng presented the notion of distance based outliers [8, 9]. For a k dimensional data set T with N objects, an object O in T is a DB(p;D) outlier if at least a fraction p of the objects in T lies greater than distance D from O. Ramaswamy et al. 14] proposed a formulation for distance based outliers based on the distance of a point from its k th nearest ....

E. Knorr and R. Ng. A unified notion of outliers: Properties and computation. In Proc. of the International Conference on Knowledge Discovery and Data Mining, pages 219--222, 1997.


Automated Identification of Errors in Data Sets - Maletic, Marcus (2000)   (Correct)

....establishment of some hypothesis on data, as shall be seen further in the paper, that makes the definition and identification of possible errors easier, without using domain knowledge. 3. Error Types and Methods Utilized The implemented data cleansing tool focuses primarily on outlier detection [Knorr97], Barnett94] As mentioned before, most of the existing tools and research is concentrated around the merge purge problem, where the outlier detection is not a concern. More than that, all of these methods and most of the existing data mining methods consider outliers of no interest and to be ....

....problem, where the outlier detection is not a concern. More than that, all of these methods and most of the existing data mining methods consider outliers of no interest and to be removed. Almost all studies that consider outlier identification as their primary objective are in statistics [Knorr97]. The only other way in which outlier detection is currently used in data cleansing is purely by using data visualizing tools to actually see the outliers [Simoudis95] That process is entirely manual, that is no automation in the outlier identification is done. The stated goal is to automate as ....

Knorr, E. M. and Ng, R. T.: "A Unified Notion of Outliers: Properties and Computation", Proceedings of KDD 97, pp. 219-222


Utilizing Association Rules for the Identification of Errors.. - Marcus, Maletic (2000)   (Correct)

....to existing pattern in the data. Combined techniques (partitioning and classification) are used to identify patterns that apply to most records. 3. Identify outlier records using clustering based on Euclidian distance. Existing clustering algorithms provide little support for identifying outliers [Knorr97, Murtagh83, Zhang97]. A combined clustering method is utilized along with the group average clustering algorithm [Yang99] considering the Euclidean distance between records. The results showed that statistical outlier detection methods could be successfully used to identify errors in a data set. The methods also ....

Knorr, E. M.; Ng, R. T.: "A Unified Notion of Outliers: Properties and Computation", Proceedings of KDD 97, pp. 219-222


Utilizing Association Rules for Identification of Possible.. - Marcus, Maletic (2000)   (Correct)

....to existing pattern in the data. Combined techniques (partitioning and classification) are used to identify patterns that apply to most records. 3. Identify outlier records using clustering based on Euclidian distance. Existing clustering algorithms provide little support for identifying outliers [Knorr97, Murtagh83, Zhang97]. A combined clustering method is utilized along with the group average clustering algorithm [Yang99] considering the Euclidean distance between records. The results showed that statistical outlier detection methods could be successfully used to identify errors in a data set. The implemented ....

Knorr, E. M.; Ng, R. T.: "A Unified Notion of Outliers: Properties and Computation", Proceedings of KDD 97, pp. 219-222


Data Cleansing: Beyond Integrity Analysis - Maletic, Marcus (2000)   (8 citations)  (Correct)

....deviation, range, etc. based on Chebyshev s theorem [3, 4] considering the confidence intervals for each field [19] Clustering: Identify outlier records using clustering based on Euclidian (or other) distance. Existing clustering algorithms provide little support for identifying outliers [22, 27, 42]. However, in some cases clustering the entire record space can reveal outliers that are not identified at the field level inspection [19] The main drawback of this method is computational time. The clustering algorithms have high computational complexity. For large record spaces and large number ....

Knorr, E. M. and Ng, R. T., "A Unified Notion of Outliers: Properties and Computation," in Proceedings of KDD 97, 1997, pp. 219-222.


Distance-Based Outliers: Algorithms and Applications - Knorr, Ng, Tucakov (2000)   (11 citations)  (Correct)

....D 0 such that: object O is an outlier according to Def iff O is a DB(p 0 ; D 0 ) outlier. For a normal distribution, outliers can be considered to be points that lie 3 or more standard deviations (i.e. 3oe) from the mean [13] 1 DB outliers are called unified outliers in our preliminary work [22]. Distance based outliers: algorithms and applications 3 Definition 2. Let T be a normally distributed random variable with mean and standard deviation oe. Define Def Normal as follows: t 2 T is an outlier iff t Gamma oe 3 or t Gamma oe Gamma3. Lemma 1. DB(p;D) unifies Def Normal with p ....

....t 2 T is an outlier iff t Gamma oe 3 or t Gamma oe Gamma3. Lemma 1. DB(p;D) unifies Def Normal with p 0 = 0:9988 and D 0 = 0:13oe, that is, t is an outlier according to Def Normal iff t is a DB(0:9988; 0:13oe) outlier. Proofs of all the lemmas in this section have already been documented [22]. Note that if the value 3oe in Def Normal is changed to some other value, such as 4oe, the above lemma can easily be modified with the corresponding p 0 and D 0 to show that DB(p;D) still unifies the new definition of Def Normal . The same general approach applies to a Student t distribution, ....

[Article contains additional citation context not shown here]

Knorr EM, Ng RT (1997) A unified notion of outliers: properties and computation. In: Proc KDD, pp 219--222 An extended version of this paper appears as: Knorr EM, Ng RT (1997) A unified approach for mining outliers. In: Proc CASCON, pp 236--248


FindOut: Finding Outliers in Very Large Datasets - Yu, Sheikholeslami, Zhang (1999)   (2 citations)  (Correct)

....can yield the degree to which a data element causes the dissimilarity of the data set to increase. It looks for the subset of data that lead to the greatest reduction in Kolmogorov complexity for the amount of data discarded [2] Knorr and Ng presented algorithms to detect Distance Based outliers [12, 13]. They consider an object O in a dataset T a DB(p; D) outlier if at least fraction p of the objects in T lies greater than distance D from O. Their index based algorithm executes a range search with radius D for each object. If number of objects in its D neighborhood exceeds a threshold, the ....

E. Knorr and R. Ng. A unified notion of outliers: Properties and computation. In Proceedings of the International Conference on Knowledge Discovery and Data Mining, pages 219--222, 1997.


Parallel Algorithm for Mining Outliers in Large Database - Hung, Cheung (1999)   (Correct)

.... discovery tasks can be classified into four general categories: a) dependency detection (e.g. association rules [1] b) class identification (e.g. classification, data clustering [4, 12, 15] c) class description (e.g. concept generalization [5, 9] and (d) exception outlier detection [10, 11]. Most research has concentrated on the first three categories while most of the existing work on outliers detection has lied in the field of statistics [2, 6] Although outliers have also been considered in some existing algorithms, they are not the main target and the algorithms only try to ....

.... only try to remove or tolerate them [4, 12, 15] In fact, the identification of outliers can be applied in the areas of electronic commerce, credit card fraud detection, the analysis of performance statistic of professional athletes [8] and even exploration of satellite or medical images [10]. For example, in a database of transactions 2 Hung and Cheung containing sales information, most transactions would involve a small amount of money and items. Thus a typical fault detection can discover exceptions in the amount of money spent, type of items purchased, time and location. As a ....

[Article contains additional citation context not shown here]

Knorr, E. M., Ng, R. T. "A unified notion of outliers: Properties and computation," Proc. KDD, pages 219-222, 1997. An extended version of this paper appears as : Knorr, E. M., Ng, R. T. "A Unified Approach for Mining Outliers," Proc. 7th CASCON, pages 236-248, 1997.


Distinguishing Mislabeled Data from Correctly Labeled Data .. - Ilya Muchnik Sundara   (Correct)

No context found.

E. M. Knorr and R. T. Ng. A unified notion of outliers: Properties and computation. In Knowledge Discovery and Data Mining, pages 219--222, 1997.


LOCI: Fast Outlier Detection Using the Local.. - Papadimitriou.. (2002)   (1 citation)  (Correct)

No context found.

E.M. Knorr and R.T. Ng. A unified notion of outliers: Properties and computation. In Proc. KDD 1997.


Parallel Mining of Outliers in Large Database - Edward Hung David   (Correct)

No context found.

Knorr, E. M., Ng, R. T. "A unified notion of outliers: Properties and computation," Proc. KDD, pages 219-222, 1997.


Detecting Graph-based Spatial Outliers - Shashi Shekhar Chang-Tien (2002)   (1 citation)  (Correct)

No context found.

E. Knorr and R. Ng. A unified notion of outliers: Properties and computation. In Proc. of the International Conference on Knowledge Discovery and Data Mining, pages 219--222, 1997.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC