MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Parallel Mining of Outliers in Large Database Edward Hung

Download:
Download as a PDF | Download as a PS
by David W. Cheung
http://www.csis.hku.hk/~ehung/paper/pnl2.ps
Add To MetaCart

Abstract:

Data mining is a new, important and fast growing database application. Outlier (exception) detection is one kind of data mining, which can be applied in a variety of areas like monitoring of credit card fraud and criminal activities in electronic commerce. With the ever-increasing size and attributes (dimensions) of database, previously proposed detection methods for two dimensions are no longer applicable. The time complexity of the Nested-Loop (NL) algorithm [13] is linear to the dimensionality but quadratic to the dataset size, inducing an unacceptable cost for large dataset. A more efficient version (ENL) and its parallel version (PENL) are introduced. In theory, the improvement of performance in PENL is linear to the number of processors, as shown in a performance comparison between ENL and PENL using Bulk Synchronization Parallel (BSP) model. The great improvement is further verified by experiments on a parallel computer system IBM 9076 SP2. The results show that it is a very good choice to mine outliers in a cluster of workstations with a low-cost interconnected by a commodity communication network.

Citations

1476 A.: Mining Association Rules Between Sets of Items in Large Databases – Agrawal, Imielinski - 1993
532 A Densitiy-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise – Ester, Kriegel, et al. - 1996
445 Efficient and Effective Clustering Methods for Spatial Data – Ng, Han - 1994
320 A quantitative analysis and performance study for Similarity Search Methods in High Dimensional Spaces – Weber, Schek, et al. - 1998
148 Algorithms for mining distancebased outliers in large datasets – Knorr, Ng
122 Lof: Identifying density-based local outliers – Breunig, Kriegel, et al. - 2000
95 Identification of Outliers – Hawkins - 1980
34 Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Mining – Knorr, Ng - 1996
25 A unified notion of outliers: Properties and computation – Knorr, Ng - 1997
14 A User's Guide to MPI – Pacheco - 1998
4 On digital money and card technologies – Knorr - 1997
1 Parallel Mining of Outliers in Large Database 21 [3 – Bisseling, McColl - 1993
1 Parallel Algorithm for Mining Outliers in Large Database – Hung, Cheung