Results 1 - 10
of
212
Differential privacy: A survey of results
- In Theory and Applications of Models of Computation
, 2008
"... Abstract. Over the past five years a new approach to privacy-preserving ..."
Abstract
-
Cited by 258 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Over the past five years a new approach to privacy-preserving
A learning theory approach to non-interactive database privacy
- In Proceedings of the 40th annual ACM symposium on Theory of computing
, 2008
"... In this paper we demonstrate that, ignoring computational constraints, it is possible to release synthetic databases that are useful for accurately answering large classes of queries while preserving differential privacy. Specifically, we give a mechanism that privately releases synthetic data usefu ..."
Abstract
-
Cited by 220 (25 self)
- Add to MetaCart
(Show Context)
In this paper we demonstrate that, ignoring computational constraints, it is possible to release synthetic databases that are useful for accurately answering large classes of queries while preserving differential privacy. Specifically, we give a mechanism that privately releases synthetic data useful for answering a class of queries over a discrete domain with error that grows as a function of the size of the smallest net approximately representing the answers to that class of queries. We show that this in particular implies a mechanism for counting queries that gives error guarantees that grow only with the VC-dimension of the class of queries, which itself grows at most logarithmically with the size of the query class. We also show that it is not possible to release even simple classes of queries (such as intervals and their generalizations) over continuous domains with worst-case utility guarantees while preserving differential privacy. In response to this, we consider a relaxation of the utility guarantee and give a privacy preserving polynomial time algorithm that for any halfspace query will provide an answer that is accurate for some small perturbation of the query. This algorithm does not release synthetic data, but instead another data structure capable of representing an answer for each query. We also give an efficient algorithm for releasing synthetic data for the class of interval queries and axis-aligned rectangles of constant dimension over discrete domains. 1.
A firm foundation for private data analysis
- Commun. ACM
"... In the information realm, loss of privacy is usually associated with failure to control access to information, to control the flow of information, or to control the purposes for which information is employed. Differential privacy arose in a context in which ensuring privacy is a challenge even if al ..."
Abstract
-
Cited by 139 (3 self)
- Add to MetaCart
(Show Context)
In the information realm, loss of privacy is usually associated with failure to control access to information, to control the flow of information, or to control the purposes for which information is employed. Differential privacy arose in a context in which ensuring privacy is a challenge even if all these control problems are solved: privacy-preserving statistical analysis of data. The problem of statistical disclosure control – revealing accurate statistics about a set of respondents while preserving the privacy of individuals – has a venerable history, with an extensive literature spanning statistics, theoretical computer science, security, databases, and cryptography (see, for example, the excellent survey [1], the discussion of related work in [2] and the Journal of Official Statistics 9 (2), dedicated to confidentiality and disclosure control). This long history
What Can We Learn Privately?
- 49TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE
, 2008
"... Learning problems form an important category of computational tasks that generalizes many of the computations researchers apply to large real-life data sets. We ask: what concept classes can be learned privately, namely, by an algorithm whose output does not depend too heavily on any one input or sp ..."
Abstract
-
Cited by 99 (9 self)
- Add to MetaCart
(Show Context)
Learning problems form an important category of computational tasks that generalizes many of the computations researchers apply to large real-life data sets. We ask: what concept classes can be learned privately, namely, by an algorithm whose output does not depend too heavily on any one input or specific training example? More precisely, we investigate learning algorithms that satisfy differential privacy, a notion that provides strong confidentiality guarantees in the contexts where aggregate information is released about a database containing sensitive information about individuals. We present several basic results that demonstrate general feasibility of private learning and relate several models previously studied separately in the contexts of privacy and standard learning.
Universally Utility-Maximizing Privacy Mechanisms
"... A mechanism for releasing information about a statistical database with sensitive data must resolve a trade-off between utility and privacy. Publishing fully accurate information maximizes utility while minimizing privacy, while publishing random noise accomplishes the opposite. Privacy can be rigor ..."
Abstract
-
Cited by 97 (2 self)
- Add to MetaCart
(Show Context)
A mechanism for releasing information about a statistical database with sensitive data must resolve a trade-off between utility and privacy. Publishing fully accurate information maximizes utility while minimizing privacy, while publishing random noise accomplishes the opposite. Privacy can be rigorously quantified using the framework of differential privacy, which requires that a mechanism’s output distribution is nearly the same whether or not a given database row is included or excluded. The goal of this paper is strong and general utility guarantees, subject to differential privacy. We pursue mechanisms that guarantee near-optimal utility to every potential user, independent of its side information (modeled as a prior distribution over query results) and preferences (modeled via a loss function). Our main result is: for each fixed count query and differential privacy level, there is a geometric mechanism M ∗ — a discrete variant of the simple and well-studied Laplace mechanism — that is simultaneously expected loss-minimizing for every possible user, subject to the differential privacy constraint. This is an extremely strong utility guarantee: every potential user u, no matter what its side information and preferences, derives as much utility from M ∗ as from interacting with a differentially private mechanism Mu that is optimally tailored to u. More precisely, for every user u there is an optimal mecha-
Differential privacy and robust statistics
- STOC'09
, 2009
"... We show by means of several examples that robust statistical estimators present an excellent starting point for differentially private estimators. Our algorithms use a new paradigm for differentially private mechanisms, which we call Propose-Test-Release (PTR), and for which we give a formal definit ..."
Abstract
-
Cited by 91 (2 self)
- Add to MetaCart
(Show Context)
We show by means of several examples that robust statistical estimators present an excellent starting point for differentially private estimators. Our algorithms use a new paradigm for differentially private mechanisms, which we call Propose-Test-Release (PTR), and for which we give a formal definition and general composition theorems.
Differentially Private Aggregation of Distributed Time-Series with Transformation and Encryption
"... We propose the first differentially private aggregation algorithm for distributed time-series data that offers good practical utility without any trusted server. This addresses two important challenges in participatory data-mining applications where (i) individual users wish to publish temporally co ..."
Abstract
-
Cited by 88 (3 self)
- Add to MetaCart
We propose the first differentially private aggregation algorithm for distributed time-series data that offers good practical utility without any trusted server. This addresses two important challenges in participatory data-mining applications where (i) individual users wish to publish temporally correlated time-series data (such as location traces, web history, personal health data), and (ii) an untrusted thirdparty aggregator wishes to run aggregate queries on the data. To ensure differential privacy for time-series data despite the presence of temporal correlation, we propose the Fourier Perturbation Algorithm (FPAk). Standard differential privacy techniques perform poorly for time-series data. To answer n queries, such techniques can result in a noise of Θ(n) to each query answer, making the answers practically useless if n is large. Our FPAk algorithm perturbs the Discrete Fourier Transform of the query answers. For answering n queries, FPAk improves the expected error from Θ(n) to roughly Θ(k) where k is the number of Fourier coefficients that can (approximately) reconstruct all the n query answers. Our experiments show that k ≪ n for many real-life data-sets resulting in a huge error-improvement for FPAk. To deal with the absence of a trusted central server, we propose the Distributed Laplace Perturbation Algorithm (DLPA) to add noise in a distributed way in order to guarantee differential privacy. To the best of our knowledge, DLPA is the first distributed differentially private algorithm that can scale with a large number of users: DLPA outperforms the only other distributed solution for differential privacy proposed so far, by reducing the computational load per user from O(U) to O(1) where U is the number of users. 1
On the complexity of differentially private data release: efficient algorithms and hardness results
- In STOC
, 2009
"... ..."
(Show Context)
Airavat: Security and Privacy for MapReduce
, 2009
"... The cloud computing paradigm, which involves distributed computation on multiple large-scale datasets, will become successful only if it ensures privacy, confidentiality, and integrity for the data belonging to individuals and organizations. We present Airavat, a novel integration of decentralized i ..."
Abstract
-
Cited by 82 (2 self)
- Add to MetaCart
(Show Context)
The cloud computing paradigm, which involves distributed computation on multiple large-scale datasets, will become successful only if it ensures privacy, confidentiality, and integrity for the data belonging to individuals and organizations. We present Airavat, a novel integration of decentralized information flow control (DIFC) and differential privacy that provides strong security and privacy guarantees for MapReduce computations. Airavat allows users to use arbitrary mappers, prevents unauthorized leakage of sensitive data during the computation, and supports automatic declassification of the results when the latter do not violate individual privacy. Airavat minimizes the amount of trusted code in the system and allows users without security expertise to perform privacy-preserving computations on sensitive data. Our prototype implementation demonstrates the flexibility of Airavat on a wide variety of case studies. The prototype is efficient, with run-times on Amazon’s cloud computing infrastructure within 25 % of a MapReduce system with no security.
No Free Lunch in Data Privacy
"... Differential privacy is a powerful tool for providing privacypreserving noisy query answers over statistical databases. It guarantees that the distribution of noisy query answers changes very little with the addition or deletion of any tuple. It is frequently accompanied by popularized claims that i ..."
Abstract
-
Cited by 78 (6 self)
- Add to MetaCart
(Show Context)
Differential privacy is a powerful tool for providing privacypreserving noisy query answers over statistical databases. It guarantees that the distribution of noisy query answers changes very little with the addition or deletion of any tuple. It is frequently accompanied by popularized claims that it provides privacy without any assumptions about the data and that it protects against attackers who know all but one record. In this paper we critically analyze the privacy protections offered by differential privacy. First, we use a no-free-lunch theorem, which defines nonprivacy as a game, to argue that it is not possible to provide privacy and utility without making assumptions about how the data are generated. Then we explain where assumptions are needed. We argue that privacy of an individual is preserved when it is possible to limit the inference of an attacker about the participation of the individual in the data generating process. This is different from limiting the inference about the presence of a tuple (for example, Bob’s participation in a social network may cause edges to form between pairs of his friends, so that it affects more than just the tuple labeled as “Bob”). The definition of evidence of participation, in turn, depends on how the data are generated – this is how assumptions enter the picture. We explain these ideas using examples from social network research as well as tabular data for which deterministic statistics have been previously released. In both cases the notion of participation varies, the use of differential privacy can lead to privacy breaches, and differential privacy does not always adequately limit inference about participation.