Results 1 - 10
of
136
State-of-the-art in Privacy Preserving Data Mining,
- SIGMOD Record,
, 2004
"... Abstract We provide here an overview of the new and rapidly emerging research area of privacy preserving data mining. We also propose a classification hierarchy that sets the basis for analyzing the work which has been performed in this context. A detailed review of the work accomplished in this ar ..."
Abstract
-
Cited by 159 (6 self)
- Add to MetaCart
(Show Context)
Abstract We provide here an overview of the new and rapidly emerging research area of privacy preserving data mining. We also propose a classification hierarchy that sets the basis for analyzing the work which has been performed in this context. A detailed review of the work accomplished in this area is also given, along with the coordinates of each work to the classification hierarchy. A brief evaluation is performed, and some initial conclusions are made.
Random projection-based multiplicative data perturbation for privacy preserving distributed data mining
- IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2006
"... This paper explores the possibility of using multiplicative random projection matrices for privacy preserving distributed data mining. It specifically considers the problem of computing statistical aggregates like the inner product matrix, correlation coefficient matrix, and Euclidean distance matri ..."
Abstract
-
Cited by 94 (6 self)
- Add to MetaCart
(Show Context)
This paper explores the possibility of using multiplicative random projection matrices for privacy preserving distributed data mining. It specifically considers the problem of computing statistical aggregates like the inner product matrix, correlation coefficient matrix, and Euclidean distance matrix from distributed privacy sensitive data possibly owned by multiple parties. This class of problems is directly related to many other data-mining problems such as clustering, principal component analysis, and classification. This paper makes primary contributions on two different grounds. First, it explores Independent Component Analysis as a possible tool for breaching privacy in deterministic multiplicative perturbation-based models such as random orthogonal transformation and random rotation. Then, it proposes an approximate random projection-based technique to improve the level of privacy protection while still preserving certain statistical characteristics of the data. The paper presents extensive theoretical analysis and experimental results. Experiments demonstrate that the proposed technique is effective and can be successfully used for different types of privacypreserving data mining applications.
Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification
- In Proceedings of the 4th SIAM International Conference on Data Mining
, 2004
"... analysis technique that has found applications in various areas. In this paper, we study some multivariate statistical analysis methods in Secure 2-party Computation (S2C) framework illustrated by the following scenario: two parties, each having a secret data set, want to conduct the statistical ana ..."
Abstract
-
Cited by 89 (1 self)
- Add to MetaCart
(Show Context)
analysis technique that has found applications in various areas. In this paper, we study some multivariate statistical analysis methods in Secure 2-party Computation (S2C) framework illustrated by the following scenario: two parties, each having a secret data set, want to conduct the statistical analysis on their joint data, but neither party is willing to disclose its private data to the other party or any third party. The current statistical analysis techniques cannot be used directly to support this kind of computation because they require all parties to send the necessary data to a central place. In this paper, We define two Secure 2-party multivariate statistical analysis problems: Secure 2-party Multivariate Linear Regression problem and Secure 2-party Multivariate Classification problem. We have developed a practical security model, based on which we have developed a number of building blocks for solving these two problems.
Secure set intersection cardinality with application to association rule mining
- Accepted for Publication in the Journal of Computer Security, IOS
"... There has been concern over the apparent conflict between privacy and data mining. There is no inherent conflict, as most types of data mining produce summary results that do not reveal information about individuals. The process of data mining may use private data, leading to the potential for priva ..."
Abstract
-
Cited by 64 (9 self)
- Add to MetaCart
(Show Context)
There has been concern over the apparent conflict between privacy and data mining. There is no inherent conflict, as most types of data mining produce summary results that do not reveal information about individuals. The process of data mining may use private data, leading to the potential for privacy breaches. Secure Multiparty Computation shows that results can be produced without revealing the data used to generate them. The problem is that general techniques for secure multiparty computation do not scale to data-mining size computations. This paper presents an efficient protocol for securely determining the size of set intersection, and shows how this can be used to generate association rules where multiple parties have different (and private) information about the same set of individuals. 1
When Do Data Mining Results Violate Privacy
- In Proc. of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2004
"... Privacy-preserving data mining has concentrated on obtain-ing valid results when the input data is private. An extreme example is Secure Multiparty Computation-based methods, where only the results are revealed. However, this still leaves a potential privacy breach: Do the results themselves violate ..."
Abstract
-
Cited by 60 (1 self)
- Add to MetaCart
(Show Context)
Privacy-preserving data mining has concentrated on obtain-ing valid results when the input data is private. An extreme example is Secure Multiparty Computation-based methods, where only the results are revealed. However, this still leaves a potential privacy breach: Do the results themselves violate privacy? This paper explores this issue, developing a frame-work under which this question can be addressed. Metrics are proposed, along with analysis that those metrics are con-sistent in the face of apparent problems. Categories and Subject Descriptors
Privacy-preserving decision trees over vertically partitioned data
- IN THE PROCEEDINGS OF THE 19TH ANNUAL IFIP WG 11.3 WORKING CONFERENCE ON DATA AND APPLICATIONS SECURITY
"... Privacy and security concerns can prevent sharing of data, derailing data mining projects. Distributed knowledge discovery, if done correctly, can alleviate this problem. In this paper, we tackle the problem of classification. We introduce a generalized privacy preserving variant of the ID3 algorit ..."
Abstract
-
Cited by 45 (2 self)
- Add to MetaCart
(Show Context)
Privacy and security concerns can prevent sharing of data, derailing data mining projects. Distributed knowledge discovery, if done correctly, can alleviate this problem. In this paper, we tackle the problem of classification. We introduce a generalized privacy preserving variant of the ID3 algorithm for vertically partitioned data distributed over two or more parties. Along with the algorithm, we give a complete proof of security that gives a tight bound on the information revealed.
Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data
- In Proceedings of the ACM Symposium on Applied Computing
, 2006
"... Traditional Data Mining and Knowledge Discovery algorithms assume free access to data, either at a centralized location or in federated form. Increasingly, privacy and security concerns restrict this access, thus derailing data mining projects. What is required is distributed knowledge discovery tha ..."
Abstract
-
Cited by 32 (4 self)
- Add to MetaCart
(Show Context)
Traditional Data Mining and Knowledge Discovery algorithms assume free access to data, either at a centralized location or in federated form. Increasingly, privacy and security concerns restrict this access, thus derailing data mining projects. What is required is distributed knowledge discovery that is sensitive to this problem. The key is to obtain valid results, while providing guarantees on the non-disclosure of data. Support vector machine classification is one of the most widely used classification methodologies in data mining and machine learning. It is based on solid theoretical foundations and has wide practical application. This paper proposes a privacy-preserving solution for support vector machine (SVM) classification, PP-SVM for short. Our solution constructs the global SVM classification model from the data distributed at multiple parties, without disclosing the data of each party to others. We assume that data is horizontally partitioned – each party
Privacy Preserving Naive Bayes Classifier for Horizontally Partitioned Data
, 2003
"... one. In many situations, data is split between multiple organizations. These organizations may want to utilize all of the data to create more accurate predictive models while revealing neither their training data / databases nor the instances to be classified. The Naive Bayes Classifier is a simple ..."
Abstract
-
Cited by 30 (1 self)
- Add to MetaCart
one. In many situations, data is split between multiple organizations. These organizations may want to utilize all of the data to create more accurate predictive models while revealing neither their training data / databases nor the instances to be classified. The Naive Bayes Classifier is a simple but efficient baseline classifier. In this paper, we present a privacy preserving Naive Bayes Classifier for horizontally partitioned data.
Privacy-Preserving Distributed k-Anonymity
, 2005
"... k-anonymity provides a measure of privacy protection by preventing re-identification of data to fewer than a group of k data items. While algorithms exist for producing k-anonymous data, the model has been that of a single source wanting to publish data. This paper presents a k-anonymity protocol w ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
(Show Context)
k-anonymity provides a measure of privacy protection by preventing re-identification of data to fewer than a group of k data items. While algorithms exist for producing k-anonymous data, the model has been that of a single source wanting to publish data. This paper presents a k-anonymity protocol when the data is vertically partitioned between sites. A key contribution is a proof that the protocol preserves k-anonymity between the sites: While one site may have individually identifiable data, it learns nothing that violates k-anonymity with respect to the data at the other site. This is a fundamentally different distributed privacy definition than that of Secure Multiparty Computation, and it provides a better match with both ethical and legal views of privacy.