Results 1  10
of
91
Smooth sensitivity and sampling in private data analysis
 In STOC
, 2007
"... We introduce a new, generic framework for private data analysis. The goal of private data analysis is to release aggregate information about a data set while protecting the privacy of the individuals whose information the data set contains. Our framework allows one to release functions f of the data ..."
Abstract

Cited by 173 (16 self)
 Add to MetaCart
(Show Context)
We introduce a new, generic framework for private data analysis. The goal of private data analysis is to release aggregate information about a data set while protecting the privacy of the individuals whose information the data set contains. Our framework allows one to release functions f of the data with instancebased additive noise. That is, the noise magnitude is determined not only by the function we want to release, but also by the database itself. One of the challenges is to ensure that the noise magnitude does not leak information about the database. To address that, we calibrate the noise magnitude to the smooth sensitivity of f on the database x — a measure of variability of f in the neighborhood of the instance x. The new framework greatly expands the applicability of output perturbation, a technique for protecting individuals ’ privacy by adding a small amount of random noise to the released statistics. To our knowledge, this is the first formal analysis of the effect of instancebased noise in the context of data privacy. Our framework raises many interesting algorithmic questions. Namely, to apply the framework one must compute or approximate the smooth sensitivity of f on x. We show how to do this efficiently for several different functions, including the median and the cost of the minimum spanning tree. We also give a generic procedure based on sampling that allows one to release f(x) accurately on many databases x. This procedure is applicable even when no efficient algorithm for approximating smooth sensitivity of f is known or when f is given as a black box. We illustrate the procedure by applying it to kSED (kmeans) clustering and learning mixtures of Gaussians.
What Can We Learn Privately?
 49TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE
, 2008
"... Learning problems form an important category of computational tasks that generalizes many of the computations researchers apply to large reallife data sets. We ask: what concept classes can be learned privately, namely, by an algorithm whose output does not depend too heavily on any one input or sp ..."
Abstract

Cited by 99 (9 self)
 Add to MetaCart
(Show Context)
Learning problems form an important category of computational tasks that generalizes many of the computations researchers apply to large reallife data sets. We ask: what concept classes can be learned privately, namely, by an algorithm whose output does not depend too heavily on any one input or specific training example? More precisely, we investigate learning algorithms that satisfy differential privacy, a notion that provides strong confidentiality guarantees in the contexts where aggregate information is released about a database containing sensitive information about individuals. We present several basic results that demonstrate general feasibility of private learning and relate several models previously studied separately in the contexts of privacy and standard learning.
Airavat: Security and Privacy for MapReduce
, 2009
"... The cloud computing paradigm, which involves distributed computation on multiple largescale datasets, will become successful only if it ensures privacy, confidentiality, and integrity for the data belonging to individuals and organizations. We present Airavat, a novel integration of decentralized i ..."
Abstract

Cited by 82 (2 self)
 Add to MetaCart
(Show Context)
The cloud computing paradigm, which involves distributed computation on multiple largescale datasets, will become successful only if it ensures privacy, confidentiality, and integrity for the data belonging to individuals and organizations. We present Airavat, a novel integration of decentralized information flow control (DIFC) and differential privacy that provides strong security and privacy guarantees for MapReduce computations. Airavat allows users to use arbitrary mappers, prevents unauthorized leakage of sensitive data during the computation, and supports automatic declassification of the results when the latter do not violate individual privacy. Airavat minimizes the amount of trusted code in the system and allows users without security expertise to perform privacypreserving computations on sensitive data. Our prototype implementation demonstrates the flexibility of Airavat on a wide variety of case studies. The prototype is efficient, with runtimes on Amazon’s cloud computing infrastructure within 25 % of a MapReduce system with no security.
A multiplicative weights mechanism for privacypreserving data analysis
 In FOCS
, 2010
"... Abstract—We consider statistical data analysis in the interactive setting. In this setting a trusted curator maintains a database of sensitive information about individual participants, and releases privacypreserving answers to queries as they arrive. Our primary contribution is a new differentiall ..."
Abstract

Cited by 78 (7 self)
 Add to MetaCart
Abstract—We consider statistical data analysis in the interactive setting. In this setting a trusted curator maintains a database of sensitive information about individual participants, and releases privacypreserving answers to queries as they arrive. Our primary contribution is a new differentially private multiplicative weights mechanism for answering a large number of interactive counting (or linear) queries that arrive online and may be adaptively chosen. This is the first mechanism with worstcase accuracy guarantees that can answer large numbers of interactive queries and is efficient (in terms of the runtime’s dependence on the data universe size). The error is asymptotically optimal in its dependence on the number of participants, and depends only logarithmically on the number of queries being answered. The running time is nearly linear in the size of the data universe. As a further contribution, when we relax the utility requirement and require accuracy only for databases drawn from a rich class of databases, we obtain exponential improvements in running time. Even in this relaxed setting we continue to guarantee privacy for any input database. Only the utility requirement is relaxed. Specifically, we show that when the input database is drawn from a smooth distribution — a distribution that does not place too much weight on any single data item — accuracy remains as above, and the running time becomes polylogarithmic in the data universe size. The main technical contributions are the application of multiplicative weights techniques to the differential privacy setting, a new privacy analysis for the interactive setting, and a technique for reducing data dimensionality for databases drawn from smooth distributions. I.
Differential privacy for statistics: What we know and what we want to learn
, 2009
"... We motivate and review the definition of differential privacy, survey some results on differentially private statistical estimators, and outline a research agenda. This survey is based on two presentations given by the authors at an NCHS/CDC sponsored workshop on data privacy in May 2008. ..."
Abstract

Cited by 44 (1 self)
 Add to MetaCart
(Show Context)
We motivate and review the definition of differential privacy, survey some results on differentially private statistical estimators, and outline a research agenda. This survey is based on two presentations given by the authors at an NCHS/CDC sponsored workshop on data privacy in May 2008.
A Statistical Framework for Differential Privacy
"... One goal of statistical privacy research is to construct a data release mechanism that protects individual privacy while preserving information content. An example is a random mechanism that takes an input database X and outputs a random database Z according to a distribution Qn(·X). Differential p ..."
Abstract

Cited by 39 (4 self)
 Add to MetaCart
One goal of statistical privacy research is to construct a data release mechanism that protects individual privacy while preserving information content. An example is a random mechanism that takes an input database X and outputs a random database Z according to a distribution Qn(·X). Differential privacy is a particular privacy requirement developed by computer scientists in which Qn(·X) is required to be insensitive to changes in one data point in X. This makes it difficult to infer from Z whether a given individual is in the original database X. We consider differential privacy from a statistical perspective. We consider several datarelease mechanisms that satisfy the differential privacy requirement. We show that it is useful to compare these schemes by computing the rate of convergence of distributions and densities constructed from the released data. We study a general privacy method, called the exponential mechanism, introduced by McSherry and Talwar (2007). We show that the accuracy of this method is intimately linked to the rate at which the probability that the empirical distribution concentrates in a small ball around the true distribution.
Learning in a large function space: Privacypreserving mechanisms for SVM learning
 CoRR, abs/0911.5708. Submitted
, 2009
"... Abstract. The ubiquitous need for analyzing privacysensitive information— including health records, personal communications, product ratings, and social network data—is driving significant interest in privacypreserving data analysis across several research communities. This paper explores the rele ..."
Abstract

Cited by 38 (4 self)
 Add to MetaCart
Abstract. The ubiquitous need for analyzing privacysensitive information— including health records, personal communications, product ratings, and social network data—is driving significant interest in privacypreserving data analysis across several research communities. This paper explores the release of Support Vector Machine (SVM) classifiers while preserving the privacy of training data. The SVM is a popular machine learning method that maps data to a highdimensional feature space before learning a linear decision boundary. We present efficient mechanisms for finitedimensional feature mappings and for (potentially infinitedimensional) mappings with translationinvariant kernels. In the latter case, our mechanism borrows a technique from largescale learning to learn in a finitedimensional feature space whose innerproduct uniformly approximates the desired feature space innerproduct (the desired kernel) with high probability. Differential privacy is established using algorithmic stability, a property used in learning theory to bound generalization error. Utility—when the private classifier is pointwise close to the nonprivate classifier with high probability—is proven using smoothness of regularized empirical risk minimization with respect to small perturbations to the feature mapping. Finally we conclude with lower bounds on the differential privacy of any mechanism approximating the SVM. 1
The differential privacy frontier (extended abstract
 In TCC
, 2009
"... Abstract. We review the definition of differential privacy and briefly survey a handful of very recent contributions to the differential privacy frontier. 1 Background Differential privacy is a strong privacy guarantee for an individual’s input to a (randomized) function or sequence of functions, wh ..."
Abstract

Cited by 30 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We review the definition of differential privacy and briefly survey a handful of very recent contributions to the differential privacy frontier. 1 Background Differential privacy is a strong privacy guarantee for an individual’s input to a (randomized) function or sequence of functions, which we call a privacy mechanism. Informally, the guarantee says that the behavior of the mechanism is essentially unchanged independent of whether any individual opts into or opts out of the data set. Designed for statistical analysis, for example, of health or census data, the definition protects the privacy of individuals, and small groups of individuals, while permitting very different outcomes in the case of very different data sets. We begin by recalling some differential privacy basics. While the frontier of a vibrant area is always in flux, we will endeavor to give an impression of the state of the art by surveying a handful of extremely recent advances
Privacypreserving statistical estimation with optimal convergence rate
 In Proceedings on 43th Annual ACM Symposium on Theory of Computing
, 2011
"... Consider an analyst who wants to release aggregate statistics about a data set containing sensitive information. Using differentially private algorithms guarantees that the released statistics reveal very little about any particular record in the data set. In this paper we study the asymptotic prope ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
Consider an analyst who wants to release aggregate statistics about a data set containing sensitive information. Using differentially private algorithms guarantees that the released statistics reveal very little about any particular record in the data set. In this paper we study the asymptotic properties of differentially private algorithms for statistical inference. We show that for a large class of statistical estimators T and input distributions P, there is a differentially private estimator AT with the same asymptotic distribution as T. That is, the random variables AT (X) andT (X) converge in distribution when X consists of an i.i.d. sample from P of increasing size. This implies that AT (X) is essentially as good as the original statistic T (X) for statistical inference, for sufficiently large samples. Our technique applies to (almost) any pair T,P such that T is asymptotically normal on i.i.d. samples from P —in particular, to parametric maximum likelihood estimators and estimators for logistic and linear regression under standard regularity conditions. A consequence of our techniques is the existence of lowspace streaming algorithms whose output converges to the same asymptotic distribution as a given estimator T (for the same class of estimators and input distributions as above).
Distributed Private Data Analysis: On Simultaneously Solving How and What
, 2009
"... We examine the combination of two directions in the field of privacy concerning computations over distributed private inputs – secure function evaluation (SFE) and differential privacy. While in both the goal is to privately evaluate some function of the individual inputs, the privacy requirements a ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
We examine the combination of two directions in the field of privacy concerning computations over distributed private inputs – secure function evaluation (SFE) and differential privacy. While in both the goal is to privately evaluate some function of the individual inputs, the privacy requirements are significantly different. The general feasibility results for SFE suggest a natural paradigm for implementing differentially private analyses distributively: First choose what to compute, i.e., a differentially private analysis; Then decide how to compute it, i.e., construct an SFE protocol for this analysis. We initiate an examination whether there are advantages to a paradigm where both decisions are made simultaneously. In particular, we investigate under which accuracy requirements it is beneficial to adapt this paradigm for computing a collection of functions including binary sum, gap threshold, and approximate median queries. Our results imply that when computing the binary sum of n distributed inputs then: • When we require that the error is o ( √ n) and the number of rounds is constant, there is no benefit in the new paradigm. • When we allow an error of O ( √ n), the new paradigm yields more efficient protocols when we consider protocols that compute symmetric functions. Our results also yield new separations between the local and global models of computations for private data analysis.