Results 1  10
of
47
Boosting the accuracy of differentially private histograms through consistency
 Proc. VLDB Endow
, 2010
"... We show that it is possible to significantly improve the accuracy of a general class of histogram queries while satisfying differential privacy. Our approach carefully chooses a set of queries to evaluate, and then exploits consistency constraints that should hold over the noisy output. In a post ..."
Abstract

Cited by 67 (5 self)
 Add to MetaCart
We show that it is possible to significantly improve the accuracy of a general class of histogram queries while satisfying differential privacy. Our approach carefully chooses a set of queries to evaluate, and then exploits consistency constraints that should hold over the noisy output. In a postprocessing phase, we compute the consistent input most likely to have produced the noisy output. The final output is differentiallyprivate and consistent, but in addition, it is often much more accurate. We show, both theoretically and experimentally, that these techniques can be used for estimating the degree sequence of a graph very precisely, and for computing a histogram that can support arbitrary range queries accurately. 1.
Distance makes the types grow stronger: A calculus for differential privacy
 In ICFP
, 2010
"... We want assurances that sensitive information will not be disclosed when aggregate data derived from a database is published. Differential privacy offers a strong statistical guarantee that the effect of the presence of any individual in a database will be negligible, even when an adversary has auxi ..."
Abstract

Cited by 54 (4 self)
 Add to MetaCart
We want assurances that sensitive information will not be disclosed when aggregate data derived from a database is published. Differential privacy offers a strong statistical guarantee that the effect of the presence of any individual in a database will be negligible, even when an adversary has auxiliary knowledge. Much of the prior work in this area consists of proving algorithms to be differentially private one at a time; we propose to streamline this process with a functional language whose type system automatically guarantees differential privacy, allowing the programmer to write complex privacysafe query programs in a flexible and compositional way. The key novelty is the way our type system captures function sensitivity, a measure of how much a function can magnify the distance between similar inputs: welltyped programs not only can’t go wrong, they can’t go too far on nearby inputs. Moreover, by introducing a monad for random computations, we can show that the established definition of differential privacy falls out naturally as a special case of this soundness principle. We develop examples including known differentially private algorithms, privacyaware variants of standard functional programming idioms, and compositionality principles for differential privacy.
Differentially private histogram publication
 In ICDE
, 2012
"... Abstract — Differential privacy (DP) is a promising scheme for releasing the results of statistical queries on sensitive data, with strong privacy guarantees against adversaries with arbitrary background knowledge. Existing studies on DP mostly focus on simple aggregations such as counts. This paper ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
(Show Context)
Abstract — Differential privacy (DP) is a promising scheme for releasing the results of statistical queries on sensitive data, with strong privacy guarantees against adversaries with arbitrary background knowledge. Existing studies on DP mostly focus on simple aggregations such as counts. This paper investigates the publication of DPcompliant histograms, which is an important analytical tool for showing the distribution of a random variable, e.g., hospital bill size for certain patients. Compared to simple aggregations whose results are purely numerical, a histogram query is inherently more complex, since it must also determine its structure, i.e., the ranges of the bins. As we demonstrate in the paper, a DPcompliant histogram with finer bins may actually lead to significantly lower accuracy than a coarser one, since the former requires stronger perturbations in order to satisfy DP. Moreover, the histogram structure itself may reveal sensitive information, which further complicates the problem. Motivated by this, we propose two novel algorithms, namely NoiseFirst and StructureFirst, for computing DPcompliant histograms. Their main difference lies in the relative order of the noise injection and the histogram structure computation steps. NoiseFirst has the additional benefit that it can improve the accuracy of an already published DPcomplaint histogram computed using a naı̈ve method. Going one step further, we extend both solutions to answer arbitrary range queries. Extensive experiments, using several real data sets, confirm that the proposed methods output highly accurate query answers, and consistently outperform existing competitors. I.
The Geometry of Differential Privacy: The Sparse and Approximate Cases
, 2012
"... In this work, we study tradeoffs between accuracy and privacy in the context of linear queries over histograms. This is a rich class of queries that includes contingency tables and range queries, and has been a focus of a long line of work [BLR08,RR10,DRV10,HT10,HR10,LHR+10,BDKT12]. For a given set ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
In this work, we study tradeoffs between accuracy and privacy in the context of linear queries over histograms. This is a rich class of queries that includes contingency tables and range queries, and has been a focus of a long line of work [BLR08,RR10,DRV10,HT10,HR10,LHR+10,BDKT12]. For a given set of d linear queries over a database x ∈ RN, we seek to find the differentially private mechanism that has the minimum mean squared error. For pure differential privacy, [HT10, BDKT12] give an O(log2 d) approximation to the optimal mechanism. Our first contribution is to give an O(log2 d) approximation guarantee for the case of (ε, δ)differential privacy. Our mechanism is simple, efficient and adds carefully chosen correlated Gaussian noise to the answers. We prove its approximation guarantee relative to the hereditary discrepancy lower bound of [MN12], using tools from convex geometry. We next consider this question in the case when the number of queries exceeds the number of individuals in the database, i.e. when d> n, ‖x‖1. The lower bounds used in the previous approximation algorithm no longer apply, and in fact better mechanisms are known in this setting [BLR08,RR10,HR10,GHRU11,GRU12]. Our second main contribution is to give an (ε, δ)differentially private mechanism that for a given query set A and an upper bound n on ‖x‖1, has mean squared error within polylog(d,N) of the optimal for A and n. This approximation is achieved by coupling the Gaussian noise addition approach with linear regression over the `1 ball. Additionally, we show a similar polylogarithmic approximation guarantee for the best εdifferentially private mechanism in this sparse setting. Our work also shows that for arbitrary counting queries, i.e. A with entries in {0, 1}, there is an εdifferentially private mechanism with expected error Õ(√n) per query, improving on the Õ(n 2 3) bound of [BLR08], and matching the lower bound implied by [DN03] up to logarithmic factors. The connection between hereditary discrepancy and the privacy mechanism enables us to derive the first polylogarithmic approximation to the hereditary discrepancy of a matrix A.
PrivacyPreserving Stream Aggregation with Fault Tolerance
"... Abstract. We consider applications where an untrusted aggregator would like to collect privacy sensitive data from users, and compute aggregate statistics periodically. For example, imagine a smart grid operator who wishes to aggregate the total power consumption of a neighborhood every ten minutes; ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
Abstract. We consider applications where an untrusted aggregator would like to collect privacy sensitive data from users, and compute aggregate statistics periodically. For example, imagine a smart grid operator who wishes to aggregate the total power consumption of a neighborhood every ten minutes; or a market researcher who wishes to track the fraction of population watching ESPN on an hourly basis. We design novel mechanisms that allow an aggregator to accurately estimate such statistics, while offering provable guarantees of user privacy against the untrusted aggregator. Our constructions are resilient to user failure and compromise, and can efficiently support dynamic joins and leaves. Our constructions also exemplify the clear advantage of combining applied cryptography and differential privacy techniques. 1
Compressive mechanism: Utilizing sparse representation in differential privacy.
 In WPES,
, 2011
"... Abstract Differential privacy provides the first theoretical foundation with provable privacy guarantee against adversaries with arbitrary prior knowledge. The main idea to achieve differential privacy is to inject random noise into statistical query results. Besides correctness, the most important ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
(Show Context)
Abstract Differential privacy provides the first theoretical foundation with provable privacy guarantee against adversaries with arbitrary prior knowledge. The main idea to achieve differential privacy is to inject random noise into statistical query results. Besides correctness, the most important goal in the design of a differentially private mechanism is to reduce the effect of random noise, ensuring that the noisy results can still be useful. This paper proposes the compressive mechanism, a novel solution on the basis of stateoftheart compression technique, called compressive sensing. Compressive sensing is a decent theoretical tool for compact synopsis construction, using random projections. In this paper, we show that the amount of noise is significantly reduced from O( √ n) to O(log(n)), when the noise insertion procedure is carried on the synopsis samples instead of the original database. As an extension, we also apply the proposed compressive mechanism to solve the problem of continual release of statistical results. Extensive experiments using real datasets justify our accuracy claims.
Accurate and Efficient Private Release of Datacubes and Contingency Tables
"... Abstract — A central problem in releasing aggregate information about sensitive data is to do so accurately while providing a privacy guarantee on the output. Recent work focuses on the class of linear queries, which include basic counting queries, data cubes, and contingency tables. The goal is to ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
Abstract — A central problem in releasing aggregate information about sensitive data is to do so accurately while providing a privacy guarantee on the output. Recent work focuses on the class of linear queries, which include basic counting queries, data cubes, and contingency tables. The goal is to maximize the utility of their output, while giving a rigorous privacy guarantee. Most results follow a common template: pick a “strategy ” set of linear queries to apply to the data, then use the noisy answers to these queries to reconstruct the queries of interest. This entails either picking a strategy set that is hoped to be good for the queries, or performing a costly search over the space of all possible strategies. However, once the strategy is fixed, its evaluation can be done efficiently, using standard linear algebraic methods. In this paper, we propose a new approach that balances accuracy and efficiency: we show how to optimize the accuracy of a given strategy by answering some strategy queries more accurately than others, based on the target queries. This leads to an efficient optimal noise allocation for many popular strategies, including wavelets, hierarchies, Fourier coefficients and more. For the important case of marginal queries (equivalently, subsets of the data cube), we show that this strictly improves on previous methods, both analytically and empirically. Our results also extend to ensuring that the returned query answers are consistent with an (unknown) data set at minimal extra cost in terms of time and noise. I.
RealTime Aggregate Monitoring with Differential Privacy
"... Sharing realtime aggregate statistics of private data has given much benefit to the public to perform data mining for understanding important phenomena, such as Influenza outbreaks and traffic congestion. However, releasing timeseries data with standard differential privacy mechanism has limited u ..."
Abstract

Cited by 9 (5 self)
 Add to MetaCart
(Show Context)
Sharing realtime aggregate statistics of private data has given much benefit to the public to perform data mining for understanding important phenomena, such as Influenza outbreaks and traffic congestion. However, releasing timeseries data with standard differential privacy mechanism has limited utility due to high correlation between data values. We propose FAST, an adaptive system to release realtime aggregate statistics under differential privacy with improved utility. To minimize overall privacy cost, FAST adaptively samples long timeseries according to detected data dynamics. To improve the accuracy of data release per time stamp, filtering is used to predict data values at nonsampling points and to estimate true values from noisy observations at sampling points. Our experiments with three real data sets confirm that FAST improves the accuracy of timeseries release and has excellent performance even under very small privacy cost.
Differentially Private Filtering
"... Abstract — Emerging systems such as smart grids or intelligent transportation systems often require enduser applications to continuously send information to external data aggregators performing monitoring or control tasks. This can result in an undesirable loss of privacy for the users in exchange ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
Abstract — Emerging systems such as smart grids or intelligent transportation systems often require enduser applications to continuously send information to external data aggregators performing monitoring or control tasks. This can result in an undesirable loss of privacy for the users in exchange of the benefits provided by the application. Motivated by this trend, we introduce privacy concerns in a system theoretic context, and address here the problem of releasing filtered signals that respect the privacy of the input data stream. We rely on a formal notion of privacy introduced in the database literature, called differential privacy, which provides strong privacy guarantees against adversaries with arbitrary side information, and extend this notion to dynamic systems. We then describe methods to approximate a given filter by a differentially private version, so that the distortion introduced by the privacy mechanism is minimized. Two specific scenarios are considered, where users either provide independent input signals or contribute events to a single integervalued stream. I.
Private Matchings and Allocations
 STOC'14
, 2014
"... We consider a private variant of the classical allocation problem: given k goods and n agents with individual, private valuation functions over bundles of goods, how can we partition the goods amongst the agents to maximize social welfare? An important special case is when each agent desires at mos ..."
Abstract

Cited by 8 (6 self)
 Add to MetaCart
(Show Context)
We consider a private variant of the classical allocation problem: given k goods and n agents with individual, private valuation functions over bundles of goods, how can we partition the goods amongst the agents to maximize social welfare? An important special case is when each agent desires at most one good, and specifies her (private) value for each good: in this case, the problem is exactly the maximumweight matching problem in a bipartite graph. Private matching and allocation problems have not been considered in the differential privacy literature, and for good reason: they are plainly impossible to solve under differential privacy. Informally, the allocation must match agents to their preferred goods in order to maximize social welfare, but this preference is exactly what agents wish to hide! Therefore, we consider the problem under the relaxed constraint of joint differential privacy: for any agent i, no coalition of agents excluding i should be able to learn about the valuation function of agent i. In this setting, the full allocation is no longer published—instead, each agent is told what good to get. We first show that with a small number of identical copies of each good, it is possible to efficiently and accurately solve the maximum weight matching problem while guaranteeing joint differential privacy. We then consider the more general allocation problem, when bidder valuations satisfy the gross substitutes condition. Finally, we prove that the allocation problem cannot be solved to nontrivial accuracy under joint differential privacy without requiring multiple copies of each type of good. ∗A full version of this paper can be found at