MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  16 Feature Article: Xiao-Feng Zhang, Chak-Man Lam and William K. Cheung Mining Local Data Sources For Learning Global Cluster Models Via Local Model Exchange

Download:
Download as a PDF
by Xiao-feng Zhang, Chak-man Lam, William K. Cheung
http://www.comp.hkbu.edu.hk/~iib/2004/Dec/iib_vol4no2_article2.pdf
Add To MetaCart

Abstract:

Abstract — Distributed data mining has recently caught a lot of attention as there are many cases where pooling distributed data for mining is probibited, due to either huge data volume or data privacy. In this paper, we addressed the issue of learning a global cluster model, known as the latent class model, by mining distributed data sources. Most of the existing model learning algorithms (e.g., EM) require access to all the available training data. Instead, we studied a methodology based on periodic model exchange and merge, and applied it to Web structure modeling. In addition, we have tested a number of variations of the basic idea, including confining the exchange to some privacy friendly parameters and varying the number of distributed sources. Experimental results show that the proposed distributed learning scheme is effective with accuracy close to the case with all the data physically shared for the learning. Also, our results show empirically that sharing less model parameters as a further mechanism for privacy control does not result in significant performance degradation for our application. Index Terms — Distributed data mining, model-based learning, latent class model, privacy preservation

Citations

193 Probabilistic latent semantic analysis – Hofmann - 1999
119 Introduction to Modern Information Retrieval. McGraw-Hill – McGill - 1983
112 Privacy preserving association rule mining in vertically partitioned data – Vaidya, Clifton - 2002
102 The missing link: A probabilistic model of document content and hypertext connectivity – Cohn, Hofmann - 2001
85 Learning to probabilistically identify authoritative documents – Cohn, Chang - 2000
69 Collective data mining: a new perspective toward distributed data mining – Kargupta, Park, et al. - 1999
67 Clustering with Bregman divergences – Banerjee, Merugu, et al. - 2004
54 Meta-learning in distributed data mining systems: Issues and approaches – Prodromidis, Chan, et al. - 2000
53 Tools for privacy preserving distributed data mining – Clifton, Kantarcioglu, et al.
40 The knowledge grid – Cannataro, Talia
27 Privacy-preserving distributed clustering using generative models – Merugu, Ghosh - 2003
26 Privacy-preserving multivariate statistical analysis: Linear regression and classification – Du, Han, et al.
25 Using randomized response techniques for privacy-preserving data mining – Du, Zhan - 2003
18 Privacy-preserving collaborative filtering using randomized perturbation techniques – Polat, Du - 2003
15 Distributed web mining using Bayesian networks from multiple data streams – Chen, Sivakumar, et al. - 2001
12 Privacy Preserving KMeans Clustering over Vertically Partitioned Data – Vaidya, Clifton - 2003
10 Extended latent class models for collaborative recommendation – Cheung, Tsui, et al. - 2004
9 A new algorithm for learning parameters of a Bayesian network from distributed data – Chen, Sivakumar - 2002
8 A Consensus Framework for Integrating Distributed Clusterings Under Limited Knowledge Sharing – Ghosh, Strehl, et al.
2 Available electronically at http://www.cs.cmu.edu/˜WebKB – Web-KB
2 A probabilistic approach to privacy-sensitive distributed data mining – Merugu, Ghosh - 2003
2 Distributed data mining with limited knowledge sharing – Merugu, Ghosh - 2003