• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Random projection-based multiplicative data perturbation for privacy preserving distributed data mining (2006)

by K Liu, H Kargupta, J Ryan
Venue:IEEE Trans. Knowl. Data Eng
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 94
Next 10 →

An Attacker’s View of Distance Preserving Maps for Privacy Preserving Data Mining

by Kun Liu, Chris Giannella, Hillol Kargupta - Proc. PKDD , 2006
"... Abstract. We examine the effectiveness of distance preserving transformations in privacy preserving data mining. These techniques are potentially very useful in that some important data mining algorithms can be efficiently applied to the transformed data and produce exactly the same results as if ap ..."
Abstract - Cited by 34 (2 self) - Add to MetaCart
Abstract. We examine the effectiveness of distance preserving transformations in privacy preserving data mining. These techniques are potentially very useful in that some important data mining algorithms can be efficiently applied to the transformed data and produce exactly the same results as if applied to the original data e.g. distance-based clustering, k-nearest neighbor classification. However, the issue of how well the original data is hidden has, to our knowledge, not been carefully studied. We take a step in this direction by assuming the role of an attacker armed with two types of prior information regarding the original data. We examine how well the attacker can recover the original data from the transformed data and prior information. Our results offer insight into the vulnerabilities of distance preserving transformations. 1
(Show Context)

Citation Context

...ta. Neither of these perturbations preserve distance and are fundamentally different than the type we study, orthogonal transformations. To facilitate large scale data mining applications, Liu et al. =-=[14]-=- proposed an approach where the data is multiplied by a randomly generated matrix – in effect, the data is projected into a lower dimensional space. This technique preserves distance on expectation. H...

Time series compressibility and privacy

by Spiros Papadimitriou, Feifei Li, George Kollios, Philip S. Yu - In VLDB , 2007
"... In this paper we study the trade-offs between time series compressibility and partial information hiding and their fundamental implications on how we should introduce uncertainty about individual values by perturbing them. More specifically, if the perturbation does not have the same compressibility ..."
Abstract - Cited by 18 (1 self) - Add to MetaCart
In this paper we study the trade-offs between time series compressibility and partial information hiding and their fundamental implications on how we should introduce uncertainty about individual values by perturbing them. More specifically, if the perturbation does not have the same compressibility properties as the original data, then it can be detected and filtered out, reducing uncertainty. Thus, by making the perturbation “similar ” to the original data, we can both preserve the structure of the data better, while simultaneously making breaches harder. However, as data become more compressible, a fraction of the uncertainty can be removed if true values are leaked, revealing how they were perturbed. We formalize these notions, study the above trade-offs on real data and develop practical schemes which strike a good balance and can also be extended for on-the-fly data hiding in a streaming environment. 1.
(Show Context)

Citation Context

... attributes. Existing methods can be broadly classified into two groups and operate either (i) by direct perturbation of individual attributes separately [4, 3, 18] or of entire records independently =-=[24, 23, 32, 10]-=-, or (ii) by effectively swapping or concealing values among a small and appropriately chosen group of “neighboring” records [38, 2, 6, 33]. Although some of the prior work on relational data has cons...

Enabling search services on outsourced private spatial data

by Man Lung Yiu , Gabriel Ghinita, C. S. Jensen, P. Kalnis , 2010
"... Cloud computing services enable organizations and individuals to outsource the management of their data to a service provider in order to save on hardware investments and reduce maintenance costs. Only authorized users are allowed to access the data. Nobody else, including the service provider, shou ..."
Abstract - Cited by 17 (2 self) - Add to MetaCart
Cloud computing services enable organizations and individuals to outsource the management of their data to a service provider in order to save on hardware investments and reduce maintenance costs. Only authorized users are allowed to access the data. Nobody else, including the service provider, should be able to view the data. For instance, a realestate company that owns a large database of properties wants to allow its paying customers to query for houses according to location. On the other hand, the untrusted service provider should not be able to learn the property locations and, e.g., selling the information to a competitor. To tackle the problem, we propose to transform the location datasets before uploading them to the service provider. The paper develops a spatial transformation that re-distributes the locations in space, and it also proposes a cryptographic-based transformation. The data owner selects the transformation key and shares it with authorized users. Without the key, it is infeasible to reconstruct the original data points from the

A Survey of Quantification of Privacy Preserving Data Mining Algorithms

by Elisa Bertino, Dan Lin, Wei Jiang
"... Abstract The aim of privacy preserving data mining (PPDM) algorithms is to extract relevant knowledge from large amounts of data while protecting at the same time sensitive information. An important aspect in the design of such algorithms is the identification of suitable evaluation criteria and the ..."
Abstract - Cited by 15 (0 self) - Add to MetaCart
Abstract The aim of privacy preserving data mining (PPDM) algorithms is to extract relevant knowledge from large amounts of data while protecting at the same time sensitive information. An important aspect in the design of such algorithms is the identification of suitable evaluation criteria and the development of related benchmarks. Recent research in the area has devoted much effort to determine a trade-off between the right to privacy and the need of knowledge discovery. It is often the case that no privacy preserving algorithm exists that outperforms all the others on all possible criteria. Therefore, it is crucial to provide a comprehensive view on a set of metrics related to existing privacy preserving algorithms so that we can gain insights on how to design more effective measurement and PPDM algorithms. In this chapter, we review and summarize existing criteria and metrics in evaluating privacy preserving techniques. 1
(Show Context)

Citation Context

...oise can be filtered out using certain signal processing techniques with very high accuracy. To avoid this problem, random projection-based multiplicative perturbation techniques has been proposed in =-=[19]-=-. Instead of adding some random values to the actual data, random matrices are used to project the set of original data points to a randomly chosen lower-dimensional space. However, the transformed da...

Compressive mechanism: Utilizing sparse representation in differential privacy.

by Yang D Li , Zhenjie Zhang , Marianne Winslett , Yin Yang - In WPES, , 2011
"... Abstract Differential privacy provides the first theoretical foundation with provable privacy guarantee against adversaries with arbitrary prior knowledge. The main idea to achieve differential privacy is to inject random noise into statistical query results. Besides correctness, the most important ..."
Abstract - Cited by 12 (1 self) - Add to MetaCart
Abstract Differential privacy provides the first theoretical foundation with provable privacy guarantee against adversaries with arbitrary prior knowledge. The main idea to achieve differential privacy is to inject random noise into statistical query results. Besides correctness, the most important goal in the design of a differentially private mechanism is to reduce the effect of random noise, ensuring that the noisy results can still be useful. This paper proposes the compressive mechanism, a novel solution on the basis of state-ofthe-art compression technique, called compressive sensing. Compressive sensing is a decent theoretical tool for compact synopsis construction, using random projections. In this paper, we show that the amount of noise is significantly reduced from O( √ n) to O(log(n)), when the noise insertion procedure is carried on the synopsis samples instead of the original database. As an extension, we also apply the proposed compressive mechanism to solve the problem of continual release of statistical results. Extensive experiments using real datasets justify our accuracy claims.
(Show Context)

Citation Context

...], computational biology [42], geophysical data analysis 3 [31], communications [43] and so on. To the best of our knowledge, we are the first to apply compressive sensing to sensitive data analysis. =-=[48, 47, 32]-=- apply random projections to differential privacy. They show that the compressed data can be used for certain statistical tasks, and do not consider the reconstruction process of compressive sensing. ...

Cloud-enabled privacy-preserving collaborative learning for mobile sensing

by Bin Liu, Yurong Jiang, Fei Sha, Ramesh Govindan - In Proc. of ACM SenSys , 2012
"... In this paper, we consider the design of a system in which Internet-connected mobile users contribute sensor data as training samples, and collaborate on building a model for classification tasks such as activity or context recognition. Constructing the model can naturally be performed by a ser-vice ..."
Abstract - Cited by 12 (1 self) - Add to MetaCart
In this paper, we consider the design of a system in which Internet-connected mobile users contribute sensor data as training samples, and collaborate on building a model for classification tasks such as activity or context recognition. Constructing the model can naturally be performed by a ser-vice running in the cloud, but users may be more inclined to contribute training samples if the privacy of these data could be ensured. Thus, in this paper, we focus on privacy-preserving collaborative learning for the mobile setting, which addresses several competing challenges not previ-ously considered in the literature: supporting complex clas-sification methods like support vector machines, respecting mobile computing and communication constraints, and en-abling user-determined privacy levels. Our approach, Pickle, ensures classification accuracy even in the presence of sig-nificantly perturbed training samples, is robust to methods that attempt to infer the original data or poison the model, and imposes minimal costs. We validate these claims using a user study, many real-world datasets and two different im-plementations of Pickle.
(Show Context)

Citation Context

...ver, we set Q < P, so this reduces the dimensionality of each feature vector. A dimensionality-reducing transformation is more resilient to reconstruction attacks than a dimensionality preserving one =-=[32]-=-. In Pickle, R is private to a participant, so is not known to the cloud, nor to other participants (each participant generates his/her own private random matrix). This multiplicative perturbation by ...

Near-optimal differentially private principal components

by Kamalika Chaudhuri, Anand D. Sarwate, Kaushik Sinha - In Proc. 26th Annual Conference on Neural Information Processing Systems (NIPS
"... Principal components analysis (PCA) is a standard tool for identifying good lowdimensional approximations to data sets in high dimension. Many current data sets of interest contain private or sensitive information about individuals. Algorithms which operate on such data should be sensitive to the pr ..."
Abstract - Cited by 12 (1 self) - Add to MetaCart
Principal components analysis (PCA) is a standard tool for identifying good lowdimensional approximations to data sets in high dimension. Many current data sets of interest contain private or sensitive information about individuals. Algorithms which operate on such data should be sensitive to the privacy risks in publishing their outputs. Differential privacy is a framework for developing tradeoffs between privacy and the utility of these outputs. In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output. We demonstrate that on real data, there is a large performance gap between the existing method and our method. We show that the sample complexity for the two procedures differs in the scaling with the data dimension, and that our method is nearly optimal in terms of this scaling. 1
(Show Context)

Citation Context

...udies privacy-preserving singular value decomposition in this model. Finally, dimension reduction through random projection has been considered as a technique for sanitizing data prior to publication =-=[18]-=-; our work differs from this line of work in that we offer differential privacy guarantees, and we only release the PCA subspace, not actual data. Independently, Kapralov and Talwar [16] have proposed...

Hiding in the Crowd: Privacy Preservation on Evolving Streams through Correlation Tracking

by Feifei Li, Jimeng Sun, Spiros Papadimitriou, George A. Mihaila, Ioana Stanoi
"... We address the problem of preserving privacy in streams, which has received surprisingly limited attention. For static data, a well-studied and widely used approach is based on random perturbation of the data values. However, streams pose additional challenges. First, analysis of the data has to be ..."
Abstract - Cited by 11 (0 self) - Add to MetaCart
We address the problem of preserving privacy in streams, which has received surprisingly limited attention. For static data, a well-studied and widely used approach is based on random perturbation of the data values. However, streams pose additional challenges. First, analysis of the data has to be performed incrementally, using limited processing time and buffer space, making batch approaches unsuitable. Second, the characteristics of streams evolve over time. Consequently, approaches based on global analysis of the data are not adequate. We show that it is possible to efficiently and effectively track the correlation and autocorrelation structure of multivariate streams and leverage it to add noise which maximally preserves privacy, in the sense that it is very hard to remove. Our techniques achieve much better results than previous static, global approaches, while requiring limited processing time and memory. We provide both a mathematical analysis and experimental evaluation on real data to validate the correctness, efficiency, and effectiveness of our algorithms. 1.
(Show Context)

Citation Context

...levels of privacy guarantees [24]. Previous work has noticed this important tradeoff between privacy and utility and various techniques have been proposed to achieve a desired balance between the two =-=[3, 23, 27, 10, 12, 28, 35, 16]-=-. Prior related work [3, 2, 23, 20] consists of additive random perturbation for the offline, conventional relational data model, where the noise is distributed along the principal components of the o...

Compressed Regression

by Shuheng Zhou , John Lafferty , Larry Wasserman - In NIPS , 2007
"... Abstract Recent research has studied the role of sparsity in high dimensional regression and signal reconstruction, establishing theoretical limits for recovering sparse models from sparse data. In this paper we study a variant of this problem where the original n input variables are compressed by ..."
Abstract - Cited by 10 (0 self) - Add to MetaCart
Abstract Recent research has studied the role of sparsity in high dimensional regression and signal reconstruction, establishing theoretical limits for recovering sparse models from sparse data. In this paper we study a variant of this problem where the original n input variables are compressed by a random linear transformation to m n examples in p dimensions, and establish conditions under which a sparse linear model can be successfully recovered from the compressed data. A primary motivation for this compression procedure is to anonymize the data and preserve privacy by revealing little information about the original data. We characterize the number of random projections that are required for 1 -regularized compressed regression to identify the nonzero coefficients in the true model with probability approaching one, a property called &quot;sparsistence.&quot; In addition, we show that 1 -regularized compressed regression asymptotically predicts as well as an oracle linear model, a property called &quot;persistence.&quot; Finally, we characterize the privacy properties of the compression procedure in information-theoretic terms, establishing upper bounds on the rate of information communicated between the compressed and uncompressed data that decay to zero.
(Show Context)

Citation Context

...n privacy in statistical data analysis has a long history, going back at least to [3]. We refer to [6] for discussion and further pointers into this literature; recent work includes [16]. The work of =-=[12]-=- is closely related to our work at a high level, in that it considers low rank random linear transformations of either the row space or column space of the data X. The authors note the JohnsonLindenst...

A game theoretic approach toward multi-party privacypreserving distributed data mining

by Hillol Kargupta, Hillol Kargupta, Kamalika Das, Kamalika Das, Kun Liu, Kun Liu - In In Communication , 2007
"... Analysis of privacy-sensitive data in a multi-party environment often assumes that the parties are well-behaved and they abide by the protocols. Parties compute whatever is needed, communicate correctly following the rules, and do not collude with other parties for exposing third party sensitive dat ..."
Abstract - Cited by 10 (5 self) - Add to MetaCart
Analysis of privacy-sensitive data in a multi-party environment often assumes that the parties are well-behaved and they abide by the protocols. Parties compute whatever is needed, communicate correctly following the rules, and do not collude with other parties for exposing third party sensitive data. This paper argues that most of these assumptions fall apart in real-life applications of privacy-preserving distributed data mining (PPDM). The paper offers a more realistic formulation of the PPDM problem as a multi-party game where each party tries to maximize its own objectives. It develops a game-theoretic framework for developing and analyzing PPDM algorithms. It also presents equilibrium-analysis of such PPDM-games and outlines a game-theoretic solution based on the concept of “cheap-talk ” borrowed from the economics and the game theory literature. 1
(Show Context)

Citation Context

...overview of the various techniques that have been developed in this area. Existing techniques for privacy preserving data mining include data hiding using microaggregation [2], perturbation [3], [7], =-=[16]-=-, [9] or anonymization [21], [5], rule hiding [4], secure multi-party computation [19] and distributed data mining. The main objective of data hiding is to transform the data or to design new computat...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University