Results 11 - 20
of
648
Differential privacy and robust statistics
- STOC'09
, 2009
"... We show by means of several examples that robust statistical estimators present an excellent starting point for differentially private estimators. Our algorithms use a new paradigm for differentially private mechanisms, which we call Propose-Test-Release (PTR), and for which we give a formal definit ..."
Abstract
-
Cited by 91 (2 self)
- Add to MetaCart
(Show Context)
We show by means of several examples that robust statistical estimators present an excellent starting point for differentially private estimators. Our algorithms use a new paradigm for differentially private mechanisms, which we call Propose-Test-Release (PTR), and for which we give a formal definition and general composition theorems.
On the Geometry of Differential Privacy
, 2009
"... We consider the noise complexity of differentially private mechanisms in the setting where the user asks d linear queries f: ℜ n → ℜ non-adaptively. Here, the database is represented by a vector in ℜ n and proximity between databases is measured in the ℓ1-metric. We show that the noise complexity is ..."
Abstract
-
Cited by 89 (5 self)
- Add to MetaCart
We consider the noise complexity of differentially private mechanisms in the setting where the user asks d linear queries f: ℜ n → ℜ non-adaptively. Here, the database is represented by a vector in ℜ n and proximity between databases is measured in the ℓ1-metric. We show that the noise complexity is determined by two geometric parameters associated with the set of queries. We use this connection to give tight upper and lower bounds on the noise complexity for any d � n. We show that for d random linear queries of sensitivity 1, it is necessary and sufficient to add ℓ2-error Θ(min{d √ d/ε, d √ log(n/d)/ε}) to achieve ε-differential privacy. Assuming the truth of a deep conjecture from convex geometry, known as the Hyperplane conjecture, we can extend our results to arbitrary linear queries giving nearly matching upper and lower bounds. Our bound translates to error O(min{d/ε, √ d log(n/d)/ε}) per answer. The best previous upper bound (Laplacian mechanism) gives a bound of O(min{d/ε, √ n/ε}) per answer, while the best known lower bound was Ω ( √ d/ε). In contrast, our lower bound is strong enough to separate the concept of differential privacy from the notion of approximate differential privacy where an upper bound of O ( √ d/ε) can be achieved.
Privacy: Theory meets Practice on the Map
"... Abstract — In this paper, we propose the first formal privacy analysis of a data anonymization process known as the synthetic data generation, a technique becoming popular in the statistics community. The target application for this work is a mapping program that shows the commuting patterns of the ..."
Abstract
-
Cited by 88 (10 self)
- Add to MetaCart
(Show Context)
Abstract — In this paper, we propose the first formal privacy analysis of a data anonymization process known as the synthetic data generation, a technique becoming popular in the statistics community. The target application for this work is a mapping program that shows the commuting patterns of the population of the United States. The source data for this application were collected by the U.S. Census Bureau, but due to privacy constraints, they cannot be used directly by the mapping program. Instead, we generate synthetic data that statistically mimic the original data while providing privacy guarantees. We use these synthetic data as a surrogate for the original data. We find that while some existing definitions of privacy are inapplicable to our target application, others are too conservative and render the synthetic data useless since they guard against privacy breaches that are very unlikely. Moreover, the data in our target application is sparse, and none of the existing solutions are tailored to anonymize sparse data. In this paper, we propose solutions to address the above issues. I.
Airavat: Security and Privacy for MapReduce
, 2009
"... The cloud computing paradigm, which involves distributed computation on multiple large-scale datasets, will become successful only if it ensures privacy, confidentiality, and integrity for the data belonging to individuals and organizations. We present Airavat, a novel integration of decentralized i ..."
Abstract
-
Cited by 82 (2 self)
- Add to MetaCart
(Show Context)
The cloud computing paradigm, which involves distributed computation on multiple large-scale datasets, will become successful only if it ensures privacy, confidentiality, and integrity for the data belonging to individuals and organizations. We present Airavat, a novel integration of decentralized information flow control (DIFC) and differential privacy that provides strong security and privacy guarantees for MapReduce computations. Airavat allows users to use arbitrary mappers, prevents unauthorized leakage of sensitive data during the computation, and supports automatic declassification of the results when the latter do not violate individual privacy. Airavat minimizes the amount of trusted code in the system and allows users without security expertise to perform privacy-preserving computations on sensitive data. Our prototype implementation demonstrates the flexibility of Airavat on a wide variety of case studies. The prototype is efficient, with run-times on Amazon’s cloud computing infrastructure within 25 % of a MapReduce system with no security.
Composition attacks and auxiliary information in data privacy
- CoRR
, 2008
"... Privacy is an increasingly important aspect of data publishing. Reasoning about privacy, however, is fraught with pitfalls. One of the most significant is the auxiliary information (also called external knowledge, background knowledge, or side information) that an adversary gleans from other channel ..."
Abstract
-
Cited by 78 (6 self)
- Add to MetaCart
(Show Context)
Privacy is an increasingly important aspect of data publishing. Reasoning about privacy, however, is fraught with pitfalls. One of the most significant is the auxiliary information (also called external knowledge, background knowledge, or side information) that an adversary gleans from other channels such as the web, public records, or domain knowledge. This paper explores how one can reason about privacy in the face of rich, realistic sources of auxiliary information. Specifically, we investigate the effectiveness of current anonymization schemes in preserving privacy when multiple organizations independently release anonymized data about overlapping populations. 1. We investigate composition attacks, in which an adversary uses independent anonymized releases to breach privacy. We explain why recently proposed models of limited auxiliary information fail to capture composition attacks. Our experiments demonstrate that even a simple instance of a composition attack can breach privacy in practice, for a large class of currently proposed techniques. The class includes k-anonymity and several recent variants. 2. On a more positive note, certain randomization-based notions of privacy (such as differential privacy) provably resist composition attacks and, in fact, the use of arbitrary side information. This resistance enables “stand-alone ” design of anonymization schemes, without the need for explicitly keeping track of other releases. We provide a precise formulation of this property, and prove that an important class of relaxations of differential privacy also satisfy the property. This significantly enlarges the class of protocols known to enable modular design. 1.
A multiplicative weights mechanism for privacy-preserving data analysis
- In FOCS
, 2010
"... Abstract—We consider statistical data analysis in the interactive setting. In this setting a trusted curator maintains a database of sensitive information about individual participants, and releases privacy-preserving answers to queries as they arrive. Our primary contribution is a new differentiall ..."
Abstract
-
Cited by 78 (7 self)
- Add to MetaCart
(Show Context)
Abstract—We consider statistical data analysis in the interactive setting. In this setting a trusted curator maintains a database of sensitive information about individual participants, and releases privacy-preserving answers to queries as they arrive. Our primary contribution is a new differentially private multiplicative weights mechanism for answering a large number of interactive counting (or linear) queries that arrive online and may be adaptively chosen. This is the first mechanism with worst-case accuracy guarantees that can answer large numbers of interactive queries and is efficient (in terms of the runtime’s dependence on the data universe size). The error is asymptotically optimal in its dependence on the number of participants, and depends only logarithmically on the number of queries being answered. The running time is nearly linear in the size of the data universe. As a further contribution, when we relax the utility requirement and require accuracy only for databases drawn from a rich class of databases, we obtain exponential improvements in running time. Even in this relaxed setting we continue to guarantee privacy for any input database. Only the utility requirement is relaxed. Specifically, we show that when the input database is drawn from a smooth distribution — a distribution that does not place too much weight on any single data item — accuracy remains as above, and the running time becomes poly-logarithmic in the data universe size. The main technical contributions are the application of multiplicative weights techniques to the differential privacy setting, a new privacy analysis for the interactive setting, and a technique for reducing data dimensionality for databases drawn from smooth distributions. I.
No Free Lunch in Data Privacy
"... Differential privacy is a powerful tool for providing privacypreserving noisy query answers over statistical databases. It guarantees that the distribution of noisy query answers changes very little with the addition or deletion of any tuple. It is frequently accompanied by popularized claims that i ..."
Abstract
-
Cited by 78 (6 self)
- Add to MetaCart
(Show Context)
Differential privacy is a powerful tool for providing privacypreserving noisy query answers over statistical databases. It guarantees that the distribution of noisy query answers changes very little with the addition or deletion of any tuple. It is frequently accompanied by popularized claims that it provides privacy without any assumptions about the data and that it protects against attackers who know all but one record. In this paper we critically analyze the privacy protections offered by differential privacy. First, we use a no-free-lunch theorem, which defines nonprivacy as a game, to argue that it is not possible to provide privacy and utility without making assumptions about how the data are generated. Then we explain where assumptions are needed. We argue that privacy of an individual is preserved when it is possible to limit the inference of an attacker about the participation of the individual in the data generating process. This is different from limiting the inference about the presence of a tuple (for example, Bob’s participation in a social network may cause edges to form between pairs of his friends, so that it affects more than just the tuple labeled as “Bob”). The definition of evidence of participation, in turn, depends on how the data are generated – this is how assumptions enter the picture. We explain these ideas using examples from social network research as well as tabular data for which deterministic statistics have been previously released. In both cases the notion of participation varies, the use of differential privacy can lead to privacy breaches, and differential privacy does not always adequately limit inference about participation.
Differentially private empirical risk minimization
, 2010
"... Privacy-preserving machine learning algorithms are crucial for the increasingly common setting in which personal data, such as medical or financial records, are analyzed. We provide general techniques to produce privacy-preserving approximations of classifiers learned via (regularized) empirical ris ..."
Abstract
-
Cited by 77 (6 self)
- Add to MetaCart
Privacy-preserving machine learning algorithms are crucial for the increasingly common setting in which personal data, such as medical or financial records, are analyzed. We provide general techniques to produce privacy-preserving approximations of classifiers learned via (regularized) empirical risk minimization (ERM). These algorithms are private under the ǫ-differential privacy definition due to Dwork et al. (2006). First we apply the output perturbation ideas of Dwork et al. (2006), to ERM classification. Then we propose a new method, objective perturbation, for privacy-preserving machine learning algorithm design. This method entails perturbing the objective function before optimizing over classifiers. We prove theoretical results showing that our algorithms preserve privacy and give generalization bounds for linear and nonlinear kernels. We further present a privacy-preserving technique for tuning the parameters in general machine learning algorithms, thereby providing end-to-end privacy guarantees for the training process. We apply these results to produce privacy-preserving analogues of regularized logistic regression and support vector machines. We obtain encouraging results from evaluating their performance on real demographic and benchmark data sets. Our results show that both theoretically and empirically, objective perturbation is superior to the previous state-of-the-art, output perturbation, in managing the inherent tradeoff between privacy and learning performance.
Quantifying Location Privacy
- IEEE SYMPOSIUM ON SECURITY AND PRIVACY
, 2011
"... It is a well-known fact that the progress of personal communication devices leads to serious concerns about privacy in general, and location privacy in particular. As a response to these issues, a number of Location-Privacy Protection Mechanisms (LPPMs) have been proposed during the last decade. Ho ..."
Abstract
-
Cited by 76 (18 self)
- Add to MetaCart
It is a well-known fact that the progress of personal communication devices leads to serious concerns about privacy in general, and location privacy in particular. As a response to these issues, a number of Location-Privacy Protection Mechanisms (LPPMs) have been proposed during the last decade. However, their assessment and comparison remains problematic because of the absence of a systematic method to quantify them. In particular, the assumptions about the attacker’s model tend to be incomplete, with the risk of a possibly wrong estimation of the users ’ location privacy. In this paper, we address these issues by providing a formal framework for the analysis of LPPMs; it captures, in particular, the prior information that might be available to the attacker, and various attacks that he can perform. The privacy of users and the success of the adversary in his location-inference attacks are two sides of the same coin. We revise location privacy by giving a simple, yet comprehensive, model to formulate all types of location-information disclosure attacks. Thus, by formalizing the adversary’s performance, we propose and justify the right metric to quantify location privacy. We clarify the difference between three aspects of the adversary’s inference attacks, namely their accuracy, certainty, and correctness. We show that correctness determines the privacy of users. In other words, the expected estimation error of the adversary is the metric of users ’ location privacy. We rely on well-established statistical methods to formalize and implement the attacks in a tool: the Location-Privacy Meter that measures the location privacy of mobile users, given various LPPMs. In addition to evaluating some example LPPMs, by using our tool, we assess the appropriateness of some popular metrics for location privacy: entropy and k-anonymity. The results show a lack of satisfactory correlation between these two metrics and the success of the adversary in inferring the users’ actual locations.