Results 1  10
of
62
A learning theory approach to noninteractive database privacy
 In Proceedings of the 40th annual ACM symposium on Theory of computing
, 2008
"... In this paper we demonstrate that, ignoring computational constraints, it is possible to release synthetic databases that are useful for accurately answering large classes of queries while preserving differential privacy. Specifically, we give a mechanism that privately releases synthetic data usefu ..."
Abstract

Cited by 220 (25 self)
 Add to MetaCart
(Show Context)
In this paper we demonstrate that, ignoring computational constraints, it is possible to release synthetic databases that are useful for accurately answering large classes of queries while preserving differential privacy. Specifically, we give a mechanism that privately releases synthetic data useful for answering a class of queries over a discrete domain with error that grows as a function of the size of the smallest net approximately representing the answers to that class of queries. We show that this in particular implies a mechanism for counting queries that gives error guarantees that grow only with the VCdimension of the class of queries, which itself grows at most logarithmically with the size of the query class. We also show that it is not possible to release even simple classes of queries (such as intervals and their generalizations) over continuous domains with worstcase utility guarantees while preserving differential privacy. In response to this, we consider a relaxation of the utility guarantee and give a privacy preserving polynomial time algorithm that for any halfspace query will provide an answer that is accurate for some small perturbation of the query. This algorithm does not release synthetic data, but instead another data structure capable of representing an answer for each query. We also give an efficient algorithm for releasing synthetic data for the class of interval queries and axisaligned rectangles of constant dimension over discrete domains. 1.
Iterative constructions and private data release
 In Proc. of the 9th Theory of Cryptography Conference (TCC
, 2012
"... In this paper we study the problem of approximately releasing the cut function of a graph while preserving differential privacy, and give new algorithms (and new analyses of existing algorithms) in both the interactive and noninteractive settings. Our algorithms in the interactive setting are achie ..."
Abstract

Cited by 41 (16 self)
 Add to MetaCart
In this paper we study the problem of approximately releasing the cut function of a graph while preserving differential privacy, and give new algorithms (and new analyses of existing algorithms) in both the interactive and noninteractive settings. Our algorithms in the interactive setting are achieved by revisiting the problem of releasing differentially private, approximate answers to a large number of queries on a database. We show that several algorithms for this problem fall into the same basic framework, and are based on the existence of objects which we call iterative database construction algorithms. We give a new generic framework in which new (efficient) IDC algorithms give rise to new (efficient) interactive private query release mechanisms. Our modular analysis simplifies and tightens the analysis of previous algorithms, leading to improved bounds. We then give a new IDC algorithm (and therefore a new private, interactive query release mechanism) based on the Frieze/Kannan lowrank matrix decomposition. This new release mechanism gives an improvement on prior work in a range of parameters where the size of the database is comparable to the size of the data universe (such as releasing all cut queries on dense graphs). We also give a noninteractive algorithm for efficiently releasing private synthetic data for graph cuts with error O(V 1.5). Our algorithm is based on randomized response and a nonprivate implementation of the SDPbased, constantfactor approximation algorithm for cutnorm due to Alon and Naor. Finally, we give a reduction based on the IDC framework showing that an efficient, private algorithm for computing sufficiently accurate rank1 matrix approximations would lead to an improved efficient algorithm for releasing private synthetic data for graph cuts. We leave finding such an algorithm as our main open problem. 1
The JohnsonLindenstrauss Transform itself preserves differential privacy
 In IEEE 53rd Annual Symposium on Foundations of Computer Science (FOCS
, 2012
"... ar ..."
Answering n2+o(1) counting queries with differential privacy is hard.
 CoRR
, 2012
"... ABSTRACT A central problem in differentially private data analysis is how to design efficient algorithms capable of answering large numbers of counting queries on a sensitive database. Counting queries are of the form "What fraction of individual records in the database satisfy the property q? ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
(Show Context)
ABSTRACT A central problem in differentially private data analysis is how to design efficient algorithms capable of answering large numbers of counting queries on a sensitive database. Counting queries are of the form "What fraction of individual records in the database satisfy the property q?" We prove that if oneway functions exist, then there is no algorithm that takes as input a database D ∈ ({0, 1} d ) n , and k = Θ(n 2 ) arbitrary efficiently computable counting queries, runs in time poly (d, n), and returns an approximate answer to each query, while satisfying differential privacy. We also consider the complexity of answering "simple" counting queries, and make some progress in this direction by showing that the above result holds even when we require that the queries are computable by constantdepth (AC 0 ) circuits. Our result is almost tight because it is known that Ω(n 2 ) counting queries can be answered efficiently while satisfying differential privacy. Moreover, many more than n 2 queries (even exponential in n) can be answered in exponential time. We prove our results by extending the connection between differentially private query release and cryptographic traitortracing schemes to the setting where the queries are given to the sanitizer as input, and by constructing a traitortracing scheme that is secure in this setting.
The Geometry of Differential Privacy: The Sparse and Approximate Cases
, 2012
"... In this work, we study tradeoffs between accuracy and privacy in the context of linear queries over histograms. This is a rich class of queries that includes contingency tables and range queries, and has been a focus of a long line of work [BLR08,RR10,DRV10,HT10,HR10,LHR+10,BDKT12]. For a given set ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
In this work, we study tradeoffs between accuracy and privacy in the context of linear queries over histograms. This is a rich class of queries that includes contingency tables and range queries, and has been a focus of a long line of work [BLR08,RR10,DRV10,HT10,HR10,LHR+10,BDKT12]. For a given set of d linear queries over a database x ∈ RN, we seek to find the differentially private mechanism that has the minimum mean squared error. For pure differential privacy, [HT10, BDKT12] give an O(log2 d) approximation to the optimal mechanism. Our first contribution is to give an O(log2 d) approximation guarantee for the case of (ε, δ)differential privacy. Our mechanism is simple, efficient and adds carefully chosen correlated Gaussian noise to the answers. We prove its approximation guarantee relative to the hereditary discrepancy lower bound of [MN12], using tools from convex geometry. We next consider this question in the case when the number of queries exceeds the number of individuals in the database, i.e. when d> n, ‖x‖1. The lower bounds used in the previous approximation algorithm no longer apply, and in fact better mechanisms are known in this setting [BLR08,RR10,HR10,GHRU11,GRU12]. Our second main contribution is to give an (ε, δ)differentially private mechanism that for a given query set A and an upper bound n on ‖x‖1, has mean squared error within polylog(d,N) of the optimal for A and n. This approximation is achieved by coupling the Gaussian noise addition approach with linear regression over the `1 ball. Additionally, we show a similar polylogarithmic approximation guarantee for the best εdifferentially private mechanism in this sparse setting. Our work also shows that for arbitrary counting queries, i.e. A with entries in {0, 1}, there is an εdifferentially private mechanism with expected error Õ(√n) per query, improving on the Õ(n 2 3) bound of [BLR08], and matching the lower bound implied by [DN03] up to logarithmic factors. The connection between hereditary discrepancy and the privacy mechanism enables us to derive the first polylogarithmic approximation to the hereditary discrepancy of a matrix A.
Privacy preserving gwas data sharing
 In ICDMW
, 2011
"... Abstract—Traditional statistical methods for the confidentiality protection for statistical databases do not scale well to deal with GWAS (genomewide association studies) databases and external information on them. The more recent concept of differential privacy, introduced by the cryptographic co ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Traditional statistical methods for the confidentiality protection for statistical databases do not scale well to deal with GWAS (genomewide association studies) databases and external information on them. The more recent concept of differential privacy, introduced by the cryptographic community, is an approach which provides a rigorous definition of privacy with meaningful privacy guarantees in the presence of arbitrary external information. Building on such notions, we propose new methods to release aggregate GWAS data without compromising an individual’s privacy. We present methods for releasing differentially private minor allele frequencies, chisquare statistics and pvalues. We compare these approaches on simulated data and on a GWAS study of canine hair length involving 685 dogs. We also propose a privacypreserving method for finding genomewide associations based on a differentially private approach to penalized logistic regression. Keywordscontingency tables; differential privacy; genomewide association studies (GWAS); chisquare statistics; logistic regression; pvalues; single nucleotide polymorphism (SNP). I.
Faster algorithms for privately releasing marginals. In: ICALP
, 2012
"... Abstract We study the problem of releasing kway marginals of a database D ∈ ({0, 1} d ) n , while preserving differential privacy. The answer to a kway marginal query is the fraction of D's records x ∈ {0, 1} d with a given value in each of a given set of up to k columns. Marginal queries en ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
(Show Context)
Abstract We study the problem of releasing kway marginals of a database D ∈ ({0, 1} d ) n , while preserving differential privacy. The answer to a kway marginal query is the fraction of D's records x ∈ {0, 1} d with a given value in each of a given set of up to k columns. Marginal queries enable a rich class of statistical analyses of a dataset, and designing efficient algorithms for privately releasing marginal queries has been identified as an important open problem in private data analysis (cf. Barak et. al., PODS '07). We give an algorithm that runs in time d O( √ k) and releases a private summary capable of answering any kway marginal query with at most ±.01 error on every query as long as . To our knowledge, ours is the first algorithm capable of privately releasing marginal queries with nontrivial worstcase accuracy guarantees in time substantially smaller than the number of kway marginal queries, which is d Θ(k) (for k d).
On significance of the least significant bits for differential privacy
 In Proceedings of the 2012 ACM conference on Computer and communications security, CCS ’12
, 2012
"... We describe a new type of vulnerability present in many implementations of differentially private mechanisms. In particular, all four publicly available general purpose systems for differentially private computations are susceptible to our attack. The vulnerability is based on irregularities of floa ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
We describe a new type of vulnerability present in many implementations of differentially private mechanisms. In particular, all four publicly available general purpose systems for differentially private computations are susceptible to our attack. The vulnerability is based on irregularities of floatingpoint implementations of the privacypreserving Laplacian mechanism. Unlike its mathematical abstraction, the textbook sampling procedure results in a porous distribution over doubleprecision numbers that allows one to breach differential privacy with just a few queries into the mechanism. We propose a mitigating strategy and prove that it satisfies differential privacy under some mild assumptions on available implementation of floatingpoint arithmetic. 1
Differentially private histogram publishing through lossy compression
 In ICDM
, 2012
"... Abstract—Differential privacy has emerged as one of the most promising privacy models for private data release. It can be used to release different types of data, and, in particular, histograms, which provide useful summaries of a dataset. Several differentially private histogram releasing schemes h ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Differential privacy has emerged as one of the most promising privacy models for private data release. It can be used to release different types of data, and, in particular, histograms, which provide useful summaries of a dataset. Several differentially private histogram releasing schemes have been proposed recently. However, most of them directly add noise to the histogram counts, resulting in undesirable accuracy. In this paper, we propose two sanitization techniques that exploit the inherent redundancy of reallife datasets in order to boost the accuracy of histograms. They lossily compress the data and sanitize the compressed data. Our first scheme is an optimization of the Fourier Perturbation Algorithm (FPA) presented in [13]. It improves the accuracy of the initial FPA by a factor of 10. The other scheme relies on clustering and exploits the redundancy between bins. Our extensive experimental evaluation over various reallife and synthetic datasets demonstrates that our techniques preserve very accurate distributions and considerably improve the accuracy of range queries over attributed histograms. KeywordsDifferential privacy, histogram, lossy compression, Fourier transform, clustering I.
Accurate and Efficient Private Release of Datacubes and Contingency Tables
"... Abstract — A central problem in releasing aggregate information about sensitive data is to do so accurately while providing a privacy guarantee on the output. Recent work focuses on the class of linear queries, which include basic counting queries, data cubes, and contingency tables. The goal is to ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
Abstract — A central problem in releasing aggregate information about sensitive data is to do so accurately while providing a privacy guarantee on the output. Recent work focuses on the class of linear queries, which include basic counting queries, data cubes, and contingency tables. The goal is to maximize the utility of their output, while giving a rigorous privacy guarantee. Most results follow a common template: pick a “strategy ” set of linear queries to apply to the data, then use the noisy answers to these queries to reconstruct the queries of interest. This entails either picking a strategy set that is hoped to be good for the queries, or performing a costly search over the space of all possible strategies. However, once the strategy is fixed, its evaluation can be done efficiently, using standard linear algebraic methods. In this paper, we propose a new approach that balances accuracy and efficiency: we show how to optimize the accuracy of a given strategy by answering some strategy queries more accurately than others, based on the target queries. This leads to an efficient optimal noise allocation for many popular strategies, including wavelets, hierarchies, Fourier coefficients and more. For the important case of marginal queries (equivalently, subsets of the data cube), we show that this strictly improves on previous methods, both analytically and empirically. Our results also extend to ensuring that the returned query answers are consistent with an (unknown) data set at minimal extra cost in terms of time and noise. I.