Results 1  10
of
82
A learning theory approach to noninteractive database privacy
 In Proceedings of the 40th annual ACM symposium on Theory of computing
, 2008
"... In this paper we demonstrate that, ignoring computational constraints, it is possible to release synthetic databases that are useful for accurately answering large classes of queries while preserving differential privacy. Specifically, we give a mechanism that privately releases synthetic data usefu ..."
Abstract

Cited by 220 (25 self)
 Add to MetaCart
In this paper we demonstrate that, ignoring computational constraints, it is possible to release synthetic databases that are useful for accurately answering large classes of queries while preserving differential privacy. Specifically, we give a mechanism that privately releases synthetic data useful for answering a class of queries over a discrete domain with error that grows as a function of the size of the smallest net approximately representing the answers to that class of queries. We show that this in particular implies a mechanism for counting queries that gives error guarantees that grow only with the VCdimension of the class of queries, which itself grows at most logarithmically with the size of the query class. We also show that it is not possible to release even simple classes of queries (such as intervals and their generalizations) over continuous domains with worstcase utility guarantees while preserving differential privacy. In response to this, we consider a relaxation of the utility guarantee and give a privacy preserving polynomial time algorithm that for any halfspace query will provide an answer that is accurate for some small perturbation of the query. This algorithm does not release synthetic data, but instead another data structure capable of representing an answer for each query. We also give an efficient algorithm for releasing synthetic data for the class of interval queries and axisaligned rectangles of constant dimension over discrete domains. 1.
A multiplicative weights mechanism for privacypreserving data analysis
 In FOCS
, 2010
"... Abstract—We consider statistical data analysis in the interactive setting. In this setting a trusted curator maintains a database of sensitive information about individual participants, and releases privacypreserving answers to queries as they arrive. Our primary contribution is a new differentiall ..."
Abstract

Cited by 78 (7 self)
 Add to MetaCart
(Show Context)
Abstract—We consider statistical data analysis in the interactive setting. In this setting a trusted curator maintains a database of sensitive information about individual participants, and releases privacypreserving answers to queries as they arrive. Our primary contribution is a new differentially private multiplicative weights mechanism for answering a large number of interactive counting (or linear) queries that arrive online and may be adaptively chosen. This is the first mechanism with worstcase accuracy guarantees that can answer large numbers of interactive queries and is efficient (in terms of the runtime’s dependence on the data universe size). The error is asymptotically optimal in its dependence on the number of participants, and depends only logarithmically on the number of queries being answered. The running time is nearly linear in the size of the data universe. As a further contribution, when we relax the utility requirement and require accuracy only for databases drawn from a rich class of databases, we obtain exponential improvements in running time. Even in this relaxed setting we continue to guarantee privacy for any input database. Only the utility requirement is relaxed. Specifically, we show that when the input database is drawn from a smooth distribution — a distribution that does not place too much weight on any single data item — accuracy remains as above, and the running time becomes polylogarithmic in the data universe size. The main technical contributions are the application of multiplicative weights techniques to the differential privacy setting, a new privacy analysis for the interactive setting, and a technique for reducing data dimensionality for databases drawn from smooth distributions. I.
Interactive Privacy via the Median Mechanism
 In The 42nd ACM Symposium on the Theory of Computing
, 2010
"... We define a new interactive differentially private mechanism — the median mechanism — for answering arbitrary predicate queries that arrive online. Given fixed accuracy and privacy constraints, this mechanism can answer exponentially more queries than the previously best known interactive privacy me ..."
Abstract

Cited by 72 (15 self)
 Add to MetaCart
We define a new interactive differentially private mechanism — the median mechanism — for answering arbitrary predicate queries that arrive online. Given fixed accuracy and privacy constraints, this mechanism can answer exponentially more queries than the previously best known interactive privacy mechanism (the Laplace mechanism, which independently perturbs each query result). With respect to the number of queries, our guarantee is close to the best possible, even for noninteractive privacy mechanisms. Conceptually, the median mechanism is the first privacy mechanism capable of identifying and exploiting correlations among queries in an interactive setting. We also give an efficient implementation of the median mechanism, with running time polynomial in the number of queries, the database size, and the domain size. This efficient implementation guarantees privacy for all input databases, and accurate query results for almost all input distributions. The dependence of the privacy on the number of queries in this mechanism improves over that of the best previously known efficient mechanism by a superpolynomial factor, even in the noninteractive setting.
A Simple and Practical Algorithm for Differentially Private Data Release
"... We present a new algorithm for differentially private data release, based on a simple combination of the Exponential Mechanism with the Multiplicative Weights update rule. Our MWEM algorithm achieves what are the best known and nearly optimal theoretical guarantees, while at the same time being simp ..."
Abstract

Cited by 62 (2 self)
 Add to MetaCart
(Show Context)
We present a new algorithm for differentially private data release, based on a simple combination of the Exponential Mechanism with the Multiplicative Weights update rule. Our MWEM algorithm achieves what are the best known and nearly optimal theoretical guarantees, while at the same time being simple to implement and experimentally more accurate on actual data sets than existing techniques. 1.
Iterative constructions and private data release
 In Proc. of the 9th Theory of Cryptography Conference (TCC
, 2012
"... In this paper we study the problem of approximately releasing the cut function of a graph while preserving differential privacy, and give new algorithms (and new analyses of existing algorithms) in both the interactive and noninteractive settings. Our algorithms in the interactive setting are achie ..."
Abstract

Cited by 41 (16 self)
 Add to MetaCart
In this paper we study the problem of approximately releasing the cut function of a graph while preserving differential privacy, and give new algorithms (and new analyses of existing algorithms) in both the interactive and noninteractive settings. Our algorithms in the interactive setting are achieved by revisiting the problem of releasing differentially private, approximate answers to a large number of queries on a database. We show that several algorithms for this problem fall into the same basic framework, and are based on the existence of objects which we call iterative database construction algorithms. We give a new generic framework in which new (efficient) IDC algorithms give rise to new (efficient) interactive private query release mechanisms. Our modular analysis simplifies and tightens the analysis of previous algorithms, leading to improved bounds. We then give a new IDC algorithm (and therefore a new private, interactive query release mechanism) based on the Frieze/Kannan lowrank matrix decomposition. This new release mechanism gives an improvement on prior work in a range of parameters where the size of the database is comparable to the size of the data universe (such as releasing all cut queries on dense graphs). We also give a noninteractive algorithm for efficiently releasing private synthetic data for graph cuts with error O(V 1.5). Our algorithm is based on randomized response and a nonprivate implementation of the SDPbased, constantfactor approximation algorithm for cutnorm due to Alon and Naor. Finally, we give a reduction based on the IDC framework showing that an efficient, private algorithm for computing sufficiently accurate rank1 matrix approximations would lead to an improved efficient algorithm for releasing private synthetic data for graph cuts. We leave finding such an algorithm as our main open problem. 1
The Price of Privately Releasing Contingency Tables and the Spectra of Random Matrices with Correlated Rows
"... Marginal (contingency) tables are the method of choice for government agencies releasing statistical summaries of categorical data. In this paper, we consider lower bounds on how much distortion (noise) is necessary in these tables to provide privacy guarantees when the data being summarized is sens ..."
Abstract

Cited by 39 (5 self)
 Add to MetaCart
Marginal (contingency) tables are the method of choice for government agencies releasing statistical summaries of categorical data. In this paper, we consider lower bounds on how much distortion (noise) is necessary in these tables to provide privacy guarantees when the data being summarized is sensitive. We extend a line of recent work on lower bounds on noise for private data analysis [9, 14, 15, 16] to a natural and important class of functionalities. Our investigation also leads to new results on the spectra of random matrices with correlated rows. Consider a database D consisting of n rows (one per individual), each row comprising d binary attributes. For any subset of T attributes of size T  = k, the marginal table for T has 2 k entries; each entry counts how many times in the database a particular setting of these attributes occurs. We provide lower bounds for releasing kattribute marginal tables under (i) minimal privacy, a general privacy notion which captures a large class of privacy definitions, and (ii) differential privacy, a rigorous notion of privacy that has received extensive recent study. Our main contributions are: • We give efficient polynomial time attacks which allow an adversary to reconstruct sensitive information given insufficiently perturbed marginal table releases. Using these reconstruction attacks,
A Statistical Framework for Differential Privacy
"... One goal of statistical privacy research is to construct a data release mechanism that protects individual privacy while preserving information content. An example is a random mechanism that takes an input database X and outputs a random database Z according to a distribution Qn(·X). Differential p ..."
Abstract

Cited by 39 (4 self)
 Add to MetaCart
One goal of statistical privacy research is to construct a data release mechanism that protects individual privacy while preserving information content. An example is a random mechanism that takes an input database X and outputs a random database Z according to a distribution Qn(·X). Differential privacy is a particular privacy requirement developed by computer scientists in which Qn(·X) is required to be insensitive to changes in one data point in X. This makes it difficult to infer from Z whether a given individual is in the original database X. We consider differential privacy from a statistical perspective. We consider several datarelease mechanisms that satisfy the differential privacy requirement. We show that it is useful to compare these schemes by computing the rate of convergence of distributions and densities constructed from the released data. We study a general privacy method, called the exponential mechanism, introduced by McSherry and Talwar (2007). We show that the accuracy of this method is intimately linked to the rate at which the probability that the empirical distribution concentrates in a small ball around the true distribution.
Learning in a large function space: Privacypreserving mechanisms for SVM learning
 CoRR, abs/0911.5708. Submitted
, 2009
"... Abstract. The ubiquitous need for analyzing privacysensitive information— including health records, personal communications, product ratings, and social network data—is driving significant interest in privacypreserving data analysis across several research communities. This paper explores the rele ..."
Abstract

Cited by 38 (4 self)
 Add to MetaCart
Abstract. The ubiquitous need for analyzing privacysensitive information— including health records, personal communications, product ratings, and social network data—is driving significant interest in privacypreserving data analysis across several research communities. This paper explores the release of Support Vector Machine (SVM) classifiers while preserving the privacy of training data. The SVM is a popular machine learning method that maps data to a highdimensional feature space before learning a linear decision boundary. We present efficient mechanisms for finitedimensional feature mappings and for (potentially infinitedimensional) mappings with translationinvariant kernels. In the latter case, our mechanism borrows a technique from largescale learning to learn in a finitedimensional feature space whose innerproduct uniformly approximates the desired feature space innerproduct (the desired kernel) with high probability. Differential privacy is established using algorithmic stability, a property used in learning theory to bound generalization error. Utility—when the private classifier is pointwise close to the nonprivate classifier with high probability—is proven using smoothness of regularized empirical risk minimization with respect to small perturbations to the feature mapping. Finally we conclude with lower bounds on the differential privacy of any mechanism approximating the SVM. 1
Publishing SetValued Data via Differential Privacy
"... Setvalued data provides enormous opportunities for various data mining tasks. In this paper, we study the problem of publishing setvalued data for data mining tasks under the rigorous differential privacy model. All existing data publishing methods for setvalued data are based on partitionbased p ..."
Abstract

Cited by 31 (13 self)
 Add to MetaCart
Setvalued data provides enormous opportunities for various data mining tasks. In this paper, we study the problem of publishing setvalued data for data mining tasks under the rigorous differential privacy model. All existing data publishing methods for setvalued data are based on partitionbased privacy models, for example kanonymity, which are vulnerable to privacy attacks based on background knowledge. In contrast, differential privacy provides strong privacy guarantees independent of an adversary’s background knowledge and computational power. Existing data publishing approaches for differential privacy, however, are not adequate in terms of both utility and scalability in the context of setvalued data due to its high dimensionality. We demonstrate that setvalued data could be efficiently released under differential privacy with guaranteed utility with the help of contextfree taxonomy trees. We propose a probabilistic topdown partitioning algorithm to generate a differentially private release, which scales linearly with the input data size. We also discuss the applicability of our idea to the context of relational data. We prove that our result is (ǫ,δ)useful for the class of counting queries, the foundation of many data mining tasks. We show that our approach maintains high utility for counting queries and frequent itemset mining and scales to large datasets through extensive experiments on reallife setvalued datasets. 1.