Results 1 
5 of
5
Bounds on the Sample Complexity for Private Learning and Private Data Release
"... Abstract. Learning is a task that generalizes many of the analyses that are applied to collections of data, and in particular, collections of sensitive individual information. Hence, it is natural to ask what can be learned while preserving individual privacy. [Kasiviswanathan, Lee, Nissim, Raskhodn ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Learning is a task that generalizes many of the analyses that are applied to collections of data, and in particular, collections of sensitive individual information. Hence, it is natural to ask what can be learned while preserving individual privacy. [Kasiviswanathan, Lee, Nissim, Raskhodnikova, and Smith; FOCS 2008] initiated such a discussion. They formalized the notion of private learning, as a combination of PAC learning and differential privacy, and investigated what concept classes can be learned privately. Somewhat surprisingly, they showed that, ignoring time complexity, every PAC learning task could be performed privately with polynomially many samples, and in many natural cases this could even be done in polynomial time. While these results seem to equate nonprivate and private learning, there is still a significant gap: the sample complexity of (nonprivate) PAC learning is crisply characterized in terms of the VCdimension of the concept class, whereas this relationship is lost in the constructions of private learners, which exhibit, generally, a higher sample complexity. Looking into this gap, we examine several private learning tasks and give tight bounds on their sample complexity. In particular, we show strong separations between sample complexities of proper and improper private learners (such separation does not exist for nonprivate learners), and between sample complexities of efficient and inefficient proper private learners. Our results show that VCdimension is not the right measure for characterizing the sample complexity of proper private learning. We also examine the task of private data release (as initiated by [Blum, Ligett, and Roth; STOC 2008]), and give new lower bounds on the sample complexity. Our results show that the logarithmic dependence on size of the instance space is essential for private data release. 1
Fingerprinting codes and the price of approximate differential privacy
 In STOC
, 2014
"... We show new lower bounds on the sample complexity of (ε, δ)differentially private algorithms that accurately answer large sets of counting queries. A counting query on a database D ∈ ({0, 1}d)n has the form “What fraction of the individual records in the database satisfy the property q? ” We show t ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
We show new lower bounds on the sample complexity of (ε, δ)differentially private algorithms that accurately answer large sets of counting queries. A counting query on a database D ∈ ({0, 1}d)n has the form “What fraction of the individual records in the database satisfy the property q? ” We show that in order to answer an arbitrary set Q of nd counting queries on D to within error ±α it is necessary that n ≥ Ω̃ d log Q α2ε This bound is optimal up to polylogarithmic factors, as demonstrated by the Private Multiplicative Weights algorithm (Hardt and Rothblum, FOCS’10). In particular, our lower bound is the first to show that the sample complexity required for accuracy and (ε, δ)differential privacy is asymptotically larger than what is required merely for accuracy, which is O(log Q/α2). In addition, we show that our lower bound holds for the specific case of kway marginal queries (where Q  = 2k(dk)) when α is a constant. Our results rely on the existence of short fingerprinting codes (Boneh and Shaw, CRYPTO’95; Tardos, STOC’03), which we show are closely connected to the sample complexity of differentially private data release. We also give a new method for combining certain types of sample complexity lower bounds into stronger lower bounds.
The large margin mechanism for differentially private maximization
 in Advances in Neural Information Processing Systems 27. Curran Associates, Inc
"... ar ..."
Dual Query: Practical Private Query Release for High Dimensional Data
"... We present a practical, differentially private algorithm for answering a large number of queries on high dimensional datasets. Like all algorithms for this task, ours necessarily has worstcase complexity exponential in the dimension of the data. However, our algorithm packages the computational ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
We present a practical, differentially private algorithm for answering a large number of queries on high dimensional datasets. Like all algorithms for this task, ours necessarily has worstcase complexity exponential in the dimension of the data. However, our algorithm packages the computationally hard step into a concisely defined integer program, which can be solved nonprivately using standard solvers. We prove accuracy and privacy theorems for our algorithm, and then demonstrate experimentally that our algorithm performs well in practice. For example, our algorithm can efficiently and accurately answer millions of queries on the Netflix dataset, which has over 17,000 attributes; this is an improvement on the state of the art by multiple orders of magnitude.1 1.
Sample Complexity Bounds on Differentially Private Learning via Communication Complexity
, 2014
"... In this work we analyze the sample complexity of classification by differentially private algorithms. Differential privacy is a strong and wellstudied notion of privacy introduced by Dwork et al. [2006] that ensures that the output of an algorithm leaks little information about the data point provi ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
In this work we analyze the sample complexity of classification by differentially private algorithms. Differential privacy is a strong and wellstudied notion of privacy introduced by Dwork et al. [2006] that ensures that the output of an algorithm leaks little information about the data point provided by any of the participating individuals. Sample complexity of private PAC and agnostic learning was studied in a number of prior works starting with [Kasiviswanathan et al., 2011] but a number of basic questions still remain open [Beimel et al., 2010, Chaudhuri and Hsu, 2011, Beimel et al., 2013a,b]. Our main contribution is an equivalence between the sample complexity of differentiallyprivate learning of a concept class C (or SCDP(C)) and the randomized oneway communication complexity of the evaluation problem for concepts from C. Using this equivalence we prove the following bounds: • SCDP(C) = Ω(LDim(C)), where LDim(C) is the Littlestone’s dimension characterizing the number of mistakes in the onlinemistakebound learning model [Littlestone, 1987]. This result implies that SCDP(C) is different from the VCdimension of C, resolving one of the main open questions from prior work. • For any t, there exists a class C such that LDim(C) = 2 but SCDP(C) ≥ t. • For any t, there exists a class C such that the sample complexity of (pure) αdifferentially private PAC learning is Ω(t/α) but the sample complexity of the relaxed (α, β)differentially private PAC learning is O(log(1/β)/α). This resolves an open problem from [Beimel et al., 2013b]. We also obtain simpler proofs for a number of known related results. Our equivalence builds on a characterization of sample complexity by Beimel et al. [2013a] and our bounds rely on a number of known results from communication complexity. 1