Results 1  10
of
65
PrivacyPreserving KMeans Clustering over Vertically Partitioned Data
 IN SIGKDD
, 2003
"... Privacy and security concerns can prevent sharing of data, derailing data mining projects. Distributed knowledge discovery, if done correctly, can alleviate this problem. The key is to obtain valid results, while providing guarantees on the (non)disclosure of data. We present a method for kmeans cl ..."
Abstract

Cited by 167 (10 self)
 Add to MetaCart
(Show Context)
Privacy and security concerns can prevent sharing of data, derailing data mining projects. Distributed knowledge discovery, if done correctly, can alleviate this problem. The key is to obtain valid results, while providing guarantees on the (non)disclosure of data. We present a method for kmeans clustering when different sites contain different attributes for a common set of entities. Each site learns the cluster of each entity, but learns nothing about the attributes at other sites.
Random projectionbased multiplicative data perturbation for privacy preserving distributed data mining
 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2006
"... This paper explores the possibility of using multiplicative random projection matrices for privacy preserving distributed data mining. It specifically considers the problem of computing statistical aggregates like the inner product matrix, correlation coefficient matrix, and Euclidean distance matri ..."
Abstract

Cited by 94 (6 self)
 Add to MetaCart
(Show Context)
This paper explores the possibility of using multiplicative random projection matrices for privacy preserving distributed data mining. It specifically considers the problem of computing statistical aggregates like the inner product matrix, correlation coefficient matrix, and Euclidean distance matrix from distributed privacy sensitive data possibly owned by multiple parties. This class of problems is directly related to many other datamining problems such as clustering, principal component analysis, and classification. This paper makes primary contributions on two different grounds. First, it explores Independent Component Analysis as a possible tool for breaching privacy in deterministic multiplicative perturbationbased models such as random orthogonal transformation and random rotation. Then, it proposes an approximate random projectionbased technique to improve the level of privacy protection while still preserving certain statistical characteristics of the data. The paper presents extensive theoretical analysis and experimental results. Experiments demonstrate that the proposed technique is effective and can be successfully used for different types of privacypreserving data mining applications.
Unsupervised multiway data analysis: A literature survey
 IEEE Transactions on Knowledge and Data Engineering
, 2008
"... Multiway data analysis captures multilinear structures in higherorder datasets, where data have more than two modes. Standard twoway methods commonly applied on matrices often fail to find the underlying structures in multiway arrays. With increasing number of application areas, multiway data anal ..."
Abstract

Cited by 82 (10 self)
 Add to MetaCart
(Show Context)
Multiway data analysis captures multilinear structures in higherorder datasets, where data have more than two modes. Standard twoway methods commonly applied on matrices often fail to find the underlying structures in multiway arrays. With increasing number of application areas, multiway data analysis has become popular as an exploratory analysis tool. We provide a review of significant contributions in literature on multiway models, algorithms as well as their applications in diverse disciplines including chemometrics, neuroscience, computer vision, and social network analysis. 1.
Distributed Data Mining: Algorithms, Systems, and Applications
, 2002
"... This paper presents a brief overview of the DDM algorithms, systems, applications, and the emerging research directions. The structure of the paper is organized as follows. We first present the related research of DDM and illustrate data distribution scenarios. Then DDM algorithms are reviewed. Subs ..."
Abstract

Cited by 70 (5 self)
 Add to MetaCart
(Show Context)
This paper presents a brief overview of the DDM algorithms, systems, applications, and the emerging research directions. The structure of the paper is organized as follows. We first present the related research of DDM and illustrate data distribution scenarios. Then DDM algorithms are reviewed. Subsequently, the architectural issues in DDM systems and future directions are discussed
Distributed Clustering Based on Sampling Local Density Estimates
, 2003
"... Huge amounts of data are stored in autonomous, geographically distributed sources. The discovery of previously unknown, implicit and valuable knowledge is a key aspect of the exploitation of such sources. In recent years several approaches to knowledge discovery and data mining, and in particu ..."
Abstract

Cited by 34 (2 self)
 Add to MetaCart
Huge amounts of data are stored in autonomous, geographically distributed sources. The discovery of previously unknown, implicit and valuable knowledge is a key aspect of the exploitation of such sources. In recent years several approaches to knowledge discovery and data mining, and in particular to clustering, have been developed, but only a few of them are designed for distributed data sources. We propose a novel distributed clustering algorithm based on nonparametric kernel density estimation, which takes into account the issues of privacy and communication costs that arise in a distributed environment.
Privacy Preserving Clustering
 In Proceedings of the 10th European Symposium On Research In Computer Security
, 2005
"... Abstract. The freedom and transparency of information flow on the Internet has heightened concerns of privacy. Given a set of data items, clustering algorithms group similar items together. Clustering has many applications, such as customerbehavior analysis, targeted marketing, forensics, and bioinf ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
Abstract. The freedom and transparency of information flow on the Internet has heightened concerns of privacy. Given a set of data items, clustering algorithms group similar items together. Clustering has many applications, such as customerbehavior analysis, targeted marketing, forensics, and bioinformatics. In this paper, we present the design and analysis of a privacypreserving kmeans clustering algorithm, where only the cluster means at the various steps of the algorithm are revealed to the participating parties. The crucial step in our privacypreserving kmeans is privacypreserving computation of cluster means. We present two protocols (one based on oblivious polynomial evaluation and the second based on homomorphic encryption) for privacypreserving computation of cluster means. We have a JAVA implementation of our algorithm. Using our implementation, we have performed a thorough evaluation of our privacypreserving clustering algorithm on three data sets. Our evaluation demonstrates that privacypreserving clustering is feasible, i.e., our homomorphicencryption based algorithm finished clustering a large data set in approximately 66 seconds. 1
Unsupervised Distributed Clustering
, 2004
"... Clustering can be defined as the process of partitioning a set of patterns into disjoint and homogeneous meaningful groups, called clusters. The growing need for distributed clustering algorithms is attributed to the huge size of databases that is common nowadays. In this paper we propose a modifica ..."
Abstract

Cited by 27 (13 self)
 Add to MetaCart
Clustering can be defined as the process of partitioning a set of patterns into disjoint and homogeneous meaningful groups, called clusters. The growing need for distributed clustering algorithms is attributed to the huge size of databases that is common nowadays. In this paper we propose a modification of a recently proposed algorithm, namely kwindows, that is able to achieve high quality results in distributed computing environments.
Collective Mining of Bayesian Networks from Distributed Heterogeneous Data
, 2002
"... We present a collective approach to learning a Bayesian network from distributed heterogenous data. In this approach, we first learn a local Bayesian network at each site using the local data. Then each site identifies the observations that are most likely to be evidence of coupling between local an ..."
Abstract

Cited by 25 (7 self)
 Add to MetaCart
(Show Context)
We present a collective approach to learning a Bayesian network from distributed heterogenous data. In this approach, we first learn a local Bayesian network at each site using the local data. Then each site identifies the observations that are most likely to be evidence of coupling between local and nonlocal variables and transmits a subset of these observations to a central site. Another Bayesian network is learnt at the central site using the data transmitted from the local site. The local and central Bayesian networks are combined to obtain a collective Bayesian network, that models the entire data. Experimental results and theoretical justification that demonstrate the feasibility of our approach are presented.
Energy Consumption in Data Analysis for Onboard and Distributed Applications
 Applications, Proceedings of the ICML'03 workshop on Machine Learning Technologies for Autonomous Space Applications
, 2003
"... Energy consumption is an important issue in the growing number of data mining and machine learning applications for batterypowered embedded and mobile devices. It plays a critical role in determining the capabilities of a broad range of applications such as space probes with onboard scientifi ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
(Show Context)
Energy consumption is an important issue in the growing number of data mining and machine learning applications for batterypowered embedded and mobile devices. It plays a critical role in determining the capabilities of a broad range of applications such as space probes with onboard scientific missions, PDAbased monitoring of remote data streams, event detection in sensor networks comprised of batterypowered data sensors and lightweight data processing nodes.
MultiDatabase Mining
, 2003
"... Multidatabase mining is an important research area because (1) there is an urgent need for analyzing data in different sources, (2) there are essential differences between mono and multidatabase mining, and (3) there are limitations in existing multidatabase mining efforts. This paper designs a ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
Multidatabase mining is an important research area because (1) there is an urgent need for analyzing data in different sources, (2) there are essential differences between mono and multidatabase mining, and (3) there are limitations in existing multidatabase mining efforts. This paper designs a new multidatabase mining process. Some research issues involving mining multidatabases, including database clustering and local pattern analysis, are discussed.