Results 11  20
of
220
On the Geometry of Differential Privacy
, 2009
"... We consider the noise complexity of differentially private mechanisms in the setting where the user asks d linear queries f: ℜ n → ℜ nonadaptively. Here, the database is represented by a vector in ℜ n and proximity between databases is measured in the ℓ1metric. We show that the noise complexity is ..."
Abstract

Cited by 89 (5 self)
 Add to MetaCart
We consider the noise complexity of differentially private mechanisms in the setting where the user asks d linear queries f: ℜ n → ℜ nonadaptively. Here, the database is represented by a vector in ℜ n and proximity between databases is measured in the ℓ1metric. We show that the noise complexity is determined by two geometric parameters associated with the set of queries. We use this connection to give tight upper and lower bounds on the noise complexity for any d � n. We show that for d random linear queries of sensitivity 1, it is necessary and sufficient to add ℓ2error Θ(min{d √ d/ε, d √ log(n/d)/ε}) to achieve εdifferential privacy. Assuming the truth of a deep conjecture from convex geometry, known as the Hyperplane conjecture, we can extend our results to arbitrary linear queries giving nearly matching upper and lower bounds. Our bound translates to error O(min{d/ε, √ d log(n/d)/ε}) per answer. The best previous upper bound (Laplacian mechanism) gives a bound of O(min{d/ε, √ n/ε}) per answer, while the best known lower bound was Ω ( √ d/ε). In contrast, our lower bound is strong enough to separate the concept of differential privacy from the notion of approximate differential privacy where an upper bound of O ( √ d/ε) can be achieved.
On the complexity of differentially private data release: efficient algorithms and hardness results
 In STOC
, 2009
"... ..."
(Show Context)
Composition attacks and auxiliary information in data privacy
 CoRR
, 2008
"... Privacy is an increasingly important aspect of data publishing. Reasoning about privacy, however, is fraught with pitfalls. One of the most significant is the auxiliary information (also called external knowledge, background knowledge, or side information) that an adversary gleans from other channel ..."
Abstract

Cited by 78 (6 self)
 Add to MetaCart
Privacy is an increasingly important aspect of data publishing. Reasoning about privacy, however, is fraught with pitfalls. One of the most significant is the auxiliary information (also called external knowledge, background knowledge, or side information) that an adversary gleans from other channels such as the web, public records, or domain knowledge. This paper explores how one can reason about privacy in the face of rich, realistic sources of auxiliary information. Specifically, we investigate the effectiveness of current anonymization schemes in preserving privacy when multiple organizations independently release anonymized data about overlapping populations. 1. We investigate composition attacks, in which an adversary uses independent anonymized releases to breach privacy. We explain why recently proposed models of limited auxiliary information fail to capture composition attacks. Our experiments demonstrate that even a simple instance of a composition attack can breach privacy in practice, for a large class of currently proposed techniques. The class includes kanonymity and several recent variants. 2. On a more positive note, certain randomizationbased notions of privacy (such as differential privacy) provably resist composition attacks and, in fact, the use of arbitrary side information. This resistance enables “standalone ” design of anonymization schemes, without the need for explicitly keeping track of other releases. We provide a precise formulation of this property, and prove that an important class of relaxations of differential privacy also satisfy the property. This significantly enlarges the class of protocols known to enable modular design. 1.
A multiplicative weights mechanism for privacypreserving data analysis
 In FOCS
, 2010
"... Abstract—We consider statistical data analysis in the interactive setting. In this setting a trusted curator maintains a database of sensitive information about individual participants, and releases privacypreserving answers to queries as they arrive. Our primary contribution is a new differentiall ..."
Abstract

Cited by 78 (7 self)
 Add to MetaCart
(Show Context)
Abstract—We consider statistical data analysis in the interactive setting. In this setting a trusted curator maintains a database of sensitive information about individual participants, and releases privacypreserving answers to queries as they arrive. Our primary contribution is a new differentially private multiplicative weights mechanism for answering a large number of interactive counting (or linear) queries that arrive online and may be adaptively chosen. This is the first mechanism with worstcase accuracy guarantees that can answer large numbers of interactive queries and is efficient (in terms of the runtime’s dependence on the data universe size). The error is asymptotically optimal in its dependence on the number of participants, and depends only logarithmically on the number of queries being answered. The running time is nearly linear in the size of the data universe. As a further contribution, when we relax the utility requirement and require accuracy only for databases drawn from a rich class of databases, we obtain exponential improvements in running time. Even in this relaxed setting we continue to guarantee privacy for any input database. Only the utility requirement is relaxed. Specifically, we show that when the input database is drawn from a smooth distribution — a distribution that does not place too much weight on any single data item — accuracy remains as above, and the running time becomes polylogarithmic in the data universe size. The main technical contributions are the application of multiplicative weights techniques to the differential privacy setting, a new privacy analysis for the interactive setting, and a technique for reducing data dimensionality for databases drawn from smooth distributions. I.
No Free Lunch in Data Privacy
"... Differential privacy is a powerful tool for providing privacypreserving noisy query answers over statistical databases. It guarantees that the distribution of noisy query answers changes very little with the addition or deletion of any tuple. It is frequently accompanied by popularized claims that i ..."
Abstract

Cited by 78 (6 self)
 Add to MetaCart
(Show Context)
Differential privacy is a powerful tool for providing privacypreserving noisy query answers over statistical databases. It guarantees that the distribution of noisy query answers changes very little with the addition or deletion of any tuple. It is frequently accompanied by popularized claims that it provides privacy without any assumptions about the data and that it protects against attackers who know all but one record. In this paper we critically analyze the privacy protections offered by differential privacy. First, we use a nofreelunch theorem, which defines nonprivacy as a game, to argue that it is not possible to provide privacy and utility without making assumptions about how the data are generated. Then we explain where assumptions are needed. We argue that privacy of an individual is preserved when it is possible to limit the inference of an attacker about the participation of the individual in the data generating process. This is different from limiting the inference about the presence of a tuple (for example, Bob’s participation in a social network may cause edges to form between pairs of his friends, so that it affects more than just the tuple labeled as “Bob”). The definition of evidence of participation, in turn, depends on how the data are generated – this is how assumptions enter the picture. We explain these ideas using examples from social network research as well as tabular data for which deterministic statistics have been previously released. In both cases the notion of participation varies, the use of differential privacy can lead to privacy breaches, and differential privacy does not always adequately limit inference about participation.
Interactive Privacy via the Median Mechanism
 In The 42nd ACM Symposium on the Theory of Computing
, 2010
"... We define a new interactive differentially private mechanism — the median mechanism — for answering arbitrary predicate queries that arrive online. Given fixed accuracy and privacy constraints, this mechanism can answer exponentially more queries than the previously best known interactive privacy me ..."
Abstract

Cited by 72 (15 self)
 Add to MetaCart
(Show Context)
We define a new interactive differentially private mechanism — the median mechanism — for answering arbitrary predicate queries that arrive online. Given fixed accuracy and privacy constraints, this mechanism can answer exponentially more queries than the previously best known interactive privacy mechanism (the Laplace mechanism, which independently perturbs each query result). With respect to the number of queries, our guarantee is close to the best possible, even for noninteractive privacy mechanisms. Conceptually, the median mechanism is the first privacy mechanism capable of identifying and exploiting correlations among queries in an interactive setting. We also give an efficient implementation of the median mechanism, with running time polynomial in the number of queries, the database size, and the domain size. This efficient implementation guarantees privacy for all input databases, and accurate query results for almost all input distributions. The dependence of the privacy on the number of queries in this mechanism improves over that of the best previously known efficient mechanism by a superpolynomial factor, even in the noninteractive setting.
Boosting the accuracy of differentially private histograms through consistency
 Proc. VLDB Endow
, 2010
"... We show that it is possible to significantly improve the accuracy of a general class of histogram queries while satisfying differential privacy. Our approach carefully chooses a set of queries to evaluate, and then exploits consistency constraints that should hold over the noisy output. In a post ..."
Abstract

Cited by 67 (5 self)
 Add to MetaCart
We show that it is possible to significantly improve the accuracy of a general class of histogram queries while satisfying differential privacy. Our approach carefully chooses a set of queries to evaluate, and then exploits consistency constraints that should hold over the noisy output. In a postprocessing phase, we compute the consistent input most likely to have produced the noisy output. The final output is differentiallyprivate and consistent, but in addition, it is often much more accurate. We show, both theoretically and experimentally, that these techniques can be used for estimating the degree sequence of a graph very precisely, and for computing a histogram that can support arbitrary range queries accurately. 1.
Differential privacy under continual observation
 In STOC
, 2010
"... Differential privacy is a recent notion of privacy tailored to privacypreserving data analysis [10]. Up to this point, research on differentially private data analysis has focused on the setting of a trusted curator holding a large, static, data set; thus every computation is a “oneshot ” object: ..."
Abstract

Cited by 65 (3 self)
 Add to MetaCart
(Show Context)
Differential privacy is a recent notion of privacy tailored to privacypreserving data analysis [10]. Up to this point, research on differentially private data analysis has focused on the setting of a trusted curator holding a large, static, data set; thus every computation is a “oneshot ” object: there is no point in computing something twice, since the result will be unchanged, up to any randomness introduced for privacy. However, many applications of data analysis involve repeated computations, either because the entire goal is one of monitoring, e.g., of traffic conditions, search trends, or incidence of influenza, or because the goal is some kind of adaptive optimization, e.g., placement of data to minimize access costs. In these cases, the algorithm must permit continual observation of the system’s state. We therefore initiate a study of differential privacy under continual observation. We identify the problem of maintaining a counter in a privacy preserving manner and show its wide applicability to many different problems.
A Simple and Practical Algorithm for Differentially Private Data Release
"... We present a new algorithm for differentially private data release, based on a simple combination of the Exponential Mechanism with the Multiplicative Weights update rule. Our MWEM algorithm achieves what are the best known and nearly optimal theoretical guarantees, while at the same time being simp ..."
Abstract

Cited by 62 (2 self)
 Add to MetaCart
(Show Context)
We present a new algorithm for differentially private data release, based on a simple combination of the Exponential Mechanism with the Multiplicative Weights update rule. Our MWEM algorithm achieves what are the best known and nearly optimal theoretical guarantees, while at the same time being simple to implement and experimentally more accurate on actual data sets than existing techniques. 1.
Privacypreserving logistic regression
"... This paper addresses the important tradeoff between privacy and learnability, when designing algorithms for learning from private databases. We focus on privacypreserving logistic regression. First we apply an idea of Dwork et al. [7] to design a privacypreserving logistic regression algorithm. Th ..."
Abstract

Cited by 58 (2 self)
 Add to MetaCart
(Show Context)
This paper addresses the important tradeoff between privacy and learnability, when designing algorithms for learning from private databases. We focus on privacypreserving logistic regression. First we apply an idea of Dwork et al. [7] to design a privacypreserving logistic regression algorithm. This involves bounding the sensitivity of regularized logistic regression, and perturbing the learned classifier with noise proportional to the sensitivity. We show that for certain data distributions, this algorithm has poor learning generalization, compared with standard regularized logistic regression. We then provide a privacypreserving regularized logistic regression algorithm based on a new privacypreserving technique: solving a perturbed optimization problem. We prove that our algorithm preserves privacy in the model due to [7], and we provide learning guarantees. We show that our algorithm performs almost as well as standard regularized logistic regression, in terms of generalization error. Experiments demonstrate improved learning performance of our method, versus the sensitivity method. Our privacypreserving technique does not depend on the sensitivity of the function, and extends easily to a class of convex loss functions. Our work also reveals an interesting connection between regularization and privacy. 1