by Simon Hawkins, Hongxing He, Graham Williams, Rohan Baxter
In Proc. of the Fifth Int. Conf. and Data Warehousing and Knowledge Discovery (DaWaK02
http://www.act.cmis.csiro.au/rohanb/PAPERS/dawak02.pdf
Add To MetaCart
Abstract:
Abstract. We consider the problem of finding outliers in large multivariate databases. Outlier detection can be applied during the data cleansing process of data mining to identify problems with the data itself, and to fraud detection where groups of outliers are often of particular interest. We use replicator neural networks (RNNs) to provide a measure of the outlyingness of data records. The performance of the RNNs is assessed using a ranked score measure. The effectiveness of the RNNs for outlier detection is demonstrated on two publicly available databases. 1
Citations
|
518
|
A density-based algorithm for discovering clusters in large spatial databases with noise
– Ester, Kriegel, et al.
- 1996
|
|
433
|
ªEfficient and Effective Clustering Methods for Spatial Data Mining,º
– Ng, Han
- 1994
|
|
296
|
A learning algorithm for Boltzmann machines
– Ackley, Hinton, et al.
- 1985
|
|
144
|
Algorithms for mining distance-based outliers in large datasets
– Knorr, Ng
- 1998
|
|
120
|
Adaptive fraud detection
– FAWCETT, PROVOST
- 1997
|
|
116
|
LOF: Identifying Density-Based Local Outliers
– Breunig, Kriegel, et al.
- 2000
|
|
116
|
Efficient algorithms for mining outliers from large data sets
– Ramaswamy, Rastogi, et al.
- 2000
|
|
115
|
ªBirch: An Efficient Data Clustering Method for Very Large Databases,º
– Zhang, Ramakrishnan, et al.
- 1996
|
|
90
|
Identification of outliers
– Hawkins
- 1980
|
|
74
|
XGobi: Interactive dynamic graphics in the X window system with a link to S
– Swayne, Cook, et al.
- 1991
|
|
64
|
Distance-based outliers: Algorithms and applications
– Knorr, Ng, et al.
- 2000
|
|
25
|
On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms
– Yamanishi, Takeuchi, et al.
- 2004
|
|
19
|
Replicator neural networks for universal optimal source coding
– Hecht-Nielsen
- 1995
|
|
17
|
A fast computer intrusion detection algorithm based on hypothesis testing of command transition probabilities
– DuMouchel, Schonlau
- 1998
|
|
16
|
Fast very robust methods for the detection of multiple outliers
– Atkinson
- 1994
|
|
9
|
The integrated delivery of large-scale data mining: The ACSys data mining project
– Williams, Altas, et al.
- 1999
|
|
8
|
An efficient approximation scheme for data mining tasks
– Kollios, Gunopoulos, et al.
- 2001
|
|
7
|
Mining the knowledge mine: The hot spots methodology for mining large real world databases
– Williams, Huang
- 1997
|
|
5
|
A unified approach for mining outliers
– Knorr, Ng
- 1997
|
|
3
|
Equivalent error bars for neural network classifiers trained by bayesian inference
– Sykacek
- 1997
|
|
2
|
Detecting multivariate outliers by a grand tour
– Bartkowiak, Szustalewicz
- 1997
|
|
2
|
A procedure for the detection of multivariate outliers
– Kosinksi
- 1999
|