Results 1  10
of
94
A learning theory approach to noninteractive database privacy
 In Proceedings of the 40th annual ACM symposium on Theory of computing
, 2008
"... In this paper we demonstrate that, ignoring computational constraints, it is possible to release synthetic databases that are useful for accurately answering large classes of queries while preserving differential privacy. Specifically, we give a mechanism that privately releases synthetic data usefu ..."
Abstract

Cited by 217 (25 self)
 Add to MetaCart
In this paper we demonstrate that, ignoring computational constraints, it is possible to release synthetic databases that are useful for accurately answering large classes of queries while preserving differential privacy. Specifically, we give a mechanism that privately releases synthetic data useful for answering a class of queries over a discrete domain with error that grows as a function of the size of the smallest net approximately representing the answers to that class of queries. We show that this in particular implies a mechanism for counting queries that gives error guarantees that grow only with the VCdimension of the class of queries, which itself grows at most logarithmically with the size of the query class. We also show that it is not possible to release even simple classes of queries (such as intervals and their generalizations) over continuous domains with worstcase utility guarantees while preserving differential privacy. In response to this, we consider a relaxation of the utility guarantee and give a privacy preserving polynomial time algorithm that for any halfspace query will provide an answer that is accurate for some small perturbation of the query. This algorithm does not release synthetic data, but instead another data structure capable of representing an answer for each query. We also give an efficient algorithm for releasing synthetic data for the class of interval queries and axisaligned rectangles of constant dimension over discrete domains. 1.
Differential privacy via wavelet transforms
 In ICDE
, 2010
"... Abstract — Privacy preserving data publishing has attracted considerable research interest in recent years. Among the existing solutions, ɛdifferential privacy provides one of the strongest privacy guarantees. Existing data publishing methods that achieve ɛdifferential privacy, however, offer litt ..."
Abstract

Cited by 97 (8 self)
 Add to MetaCart
(Show Context)
Abstract — Privacy preserving data publishing has attracted considerable research interest in recent years. Among the existing solutions, ɛdifferential privacy provides one of the strongest privacy guarantees. Existing data publishing methods that achieve ɛdifferential privacy, however, offer little data utility. In particular, if the output dataset is used to answer count queries, the noise in the query answers can be proportional to the number of tuples in the data, which renders the results useless. In this paper, we develop a data publishing technique that ensures ɛdifferential privacy while providing accurate answers for rangecount queries, i.e., count queries where the predicate on each attribute is a range. The core of our solution is a framework that applies wavelet transforms on the data before adding noise to it. We present instantiations of the proposed framework for both ordinal and nominal data, and we provide a theoretical analysis on their privacy and utility guarantees. In an extensive experimental study on both real and synthetic data, we show the effectiveness and efficiency of our solution. I.
Universally UtilityMaximizing Privacy Mechanisms
"... A mechanism for releasing information about a statistical database with sensitive data must resolve a tradeoff between utility and privacy. Publishing fully accurate information maximizes utility while minimizing privacy, while publishing random noise accomplishes the opposite. Privacy can be rigor ..."
Abstract

Cited by 95 (2 self)
 Add to MetaCart
(Show Context)
A mechanism for releasing information about a statistical database with sensitive data must resolve a tradeoff between utility and privacy. Publishing fully accurate information maximizes utility while minimizing privacy, while publishing random noise accomplishes the opposite. Privacy can be rigorously quantified using the framework of differential privacy, which requires that a mechanism’s output distribution is nearly the same whether or not a given database row is included or excluded. The goal of this paper is strong and general utility guarantees, subject to differential privacy. We pursue mechanisms that guarantee nearoptimal utility to every potential user, independent of its side information (modeled as a prior distribution over query results) and preferences (modeled via a loss function). Our main result is: for each fixed count query and differential privacy level, there is a geometric mechanism M ∗ — a discrete variant of the simple and wellstudied Laplace mechanism — that is simultaneously expected lossminimizing for every possible user, subject to the differential privacy constraint. This is an extremely strong utility guarantee: every potential user u, no matter what its side information and preferences, derives as much utility from M ∗ as from interacting with a differentially private mechanism Mu that is optimally tailored to u. More precisely, for every user u there is an optimal mecha
Differential privacy and robust statistics
 STOC'09
, 2009
"... We show by means of several examples that robust statistical estimators present an excellent starting point for differentially private estimators. Our algorithms use a new paradigm for differentially private mechanisms, which we call ProposeTestRelease (PTR), and for which we give a formal definit ..."
Abstract

Cited by 90 (2 self)
 Add to MetaCart
(Show Context)
We show by means of several examples that robust statistical estimators present an excellent starting point for differentially private estimators. Our algorithms use a new paradigm for differentially private mechanisms, which we call ProposeTestRelease (PTR), and for which we give a formal definition and general composition theorems.
On the complexity of differentially private data release: efficient algorithms and hardness results
 In STOC
, 2009
"... ..."
(Show Context)
Interactive Privacy via the Median Mechanism
 In The 42nd ACM Symposium on the Theory of Computing
, 2010
"... We define a new interactive differentially private mechanism — the median mechanism — for answering arbitrary predicate queries that arrive online. Given fixed accuracy and privacy constraints, this mechanism can answer exponentially more queries than the previously best known interactive privacy me ..."
Abstract

Cited by 71 (15 self)
 Add to MetaCart
We define a new interactive differentially private mechanism — the median mechanism — for answering arbitrary predicate queries that arrive online. Given fixed accuracy and privacy constraints, this mechanism can answer exponentially more queries than the previously best known interactive privacy mechanism (the Laplace mechanism, which independently perturbs each query result). With respect to the number of queries, our guarantee is close to the best possible, even for noninteractive privacy mechanisms. Conceptually, the median mechanism is the first privacy mechanism capable of identifying and exploiting correlations among queries in an interactive setting. We also give an efficient implementation of the median mechanism, with running time polynomial in the number of queries, the database size, and the domain size. This efficient implementation guarantees privacy for all input databases, and accurate query results for almost all input distributions. The dependence of the privacy on the number of queries in this mechanism improves over that of the best previously known efficient mechanism by a superpolynomial factor, even in the noninteractive setting.
Privacypreserving logistic regression
"... This paper addresses the important tradeoff between privacy and learnability, when designing algorithms for learning from private databases. We focus on privacypreserving logistic regression. First we apply an idea of Dwork et al. [7] to design a privacypreserving logistic regression algorithm. Th ..."
Abstract

Cited by 57 (2 self)
 Add to MetaCart
(Show Context)
This paper addresses the important tradeoff between privacy and learnability, when designing algorithms for learning from private databases. We focus on privacypreserving logistic regression. First we apply an idea of Dwork et al. [7] to design a privacypreserving logistic regression algorithm. This involves bounding the sensitivity of regularized logistic regression, and perturbing the learned classifier with noise proportional to the sensitivity. We show that for certain data distributions, this algorithm has poor learning generalization, compared with standard regularized logistic regression. We then provide a privacypreserving regularized logistic regression algorithm based on a new privacypreserving technique: solving a perturbed optimization problem. We prove that our algorithm preserves privacy in the model due to [7], and we provide learning guarantees. We show that our algorithm performs almost as well as standard regularized logistic regression, in terms of generalization error. Experiments demonstrate improved learning performance of our method, versus the sensitivity method. Our privacypreserving technique does not depend on the sensitivity of the function, and extends easily to a class of convex loss functions. Our work also reveals an interesting connection between regularization and privacy. 1
Differential privacy for statistics: What we know and what we want to learn
, 2009
"... We motivate and review the definition of differential privacy, survey some results on differentially private statistical estimators, and outline a research agenda. This survey is based on two presentations given by the authors at an NCHS/CDC sponsored workshop on data privacy in May 2008. ..."
Abstract

Cited by 43 (1 self)
 Add to MetaCart
(Show Context)
We motivate and review the definition of differential privacy, survey some results on differentially private statistical estimators, and outline a research agenda. This survey is based on two presentations given by the authors at an NCHS/CDC sponsored workshop on data privacy in May 2008.
Data mining with differential privacy
 In KDD 2010
, 2010
"... We consider the problem of data mining with formal privacy guarantees, given a data access interface based on the differential privacy framework. Differential privacy requires that computations be insensitive to changes in any particular individual’s record, thereby restricting data leaks through ..."
Abstract

Cited by 42 (0 self)
 Add to MetaCart
We consider the problem of data mining with formal privacy guarantees, given a data access interface based on the differential privacy framework. Differential privacy requires that computations be insensitive to changes in any particular individual’s record, thereby restricting data leaks through the results. The privacy preserving interface ensures unconditionally safe access to the data and does not require from the data miner any expertise in privacy. However, as we show in the paper, a naive utilization of the interface to construct privacy preserving data mining algorithms could lead to inferior data mining results. We address this problem by considering the privacy and the algorithmic requirements simultaneously, focusing on decision tree induction as a sample application. The privacy mechanism has a profound effect on the performance of the methods chosen by the data miner. We demonstrate that this choice could make the difference between an accurate classifier and a completely useless one. Moreover, an improved algorithm can achieve the same level of accuracy and privacy as the naive implementation but with an order of magnitude fewer learning samples.