Results 11 -
16 of
16
Private Empirical Risk Minimization Beyond the Worst Case: The Effect of the Constraint Set Geometry
, 2014
"... Empirical Risk Minimization (ERM) is a standard technique in machine learning, where a model is selected by minimizing a loss function over constraint set. When the training dataset consists of private information, it is natural to use a differentially private ERM algorithm, and this problem has bee ..."
Abstract
- Add to MetaCart
(Show Context)
Empirical Risk Minimization (ERM) is a standard technique in machine learning, where a model is selected by minimizing a loss function over constraint set. When the training dataset consists of private information, it is natural to use a differentially private ERM algorithm, and this problem has been the subject of a long line of work [CM08, KST12, JKT12, ST13a, DJW13, JT14, BST14, Ull14]. A private ERM algorithm outputs an approximate minimizer of the loss function and its error can be measured as the difference from the optimal value of the loss function. When the constraint set is arbitrary, the required error bounds are fairly well understood [BST14]. In this work, we show that the geometric properties of the constraint set can be used to derive significantly better results. Specifically, we show that a differentially private version of Mirror Descent leads to error bounds of the form Õ(GC/n) for a lipschitz loss function, improving on the Õ( p/n) bounds in [BST14]. Here p is the dimensionality of the problem, n is the number of data points in the training set, and GC denotes the Gaussian width of the constraint set that we optimize over. We show similar improvements for strongly convex functions, and for smooth functions. In addition, we show that when the loss function is Lipschitz with respect to the `1 norm and C is `1-bounded, a differentially private version of the Frank-Wolfe algorithm gives error bounds of the form Õ(n−2/3). This captures the important and common case of sparse linear regression (LASSO), when the data xi satisfies |xi| ∞ ≤ 1 and we optimize over the `1 ball. We show new lower bounds for this setting, that together with known bounds, imply that all our upper bounds are tight.
Private Empirical Risk Minimization, Revisited
, 2014
"... In this paper, we initiate a systematic investigation of differentially private algorithms for convex empirical risk minimization. Various instantiations of this problem have been studied before. We pro-vide new algorithms and matching lower bounds for private ERM assuming only that each data point’ ..."
Abstract
- Add to MetaCart
In this paper, we initiate a systematic investigation of differentially private algorithms for convex empirical risk minimization. Various instantiations of this problem have been studied before. We pro-vide new algorithms and matching lower bounds for private ERM assuming only that each data point’s contribution to the loss function is Lipschitz bounded and that the domain of optimization is bounded. We provide a separate set of algorithms and matching lower bounds for the setting in which the loss functions are known to also be strongly convex. Our algorithms run in polynomial time, and in some cases even match the optimal nonprivate running time (as measured by oracle complexity). We give separate algorithms (and lower bounds) for (, 0)- and (, δ)-differential privacy; perhaps surprisingly, the techniques used for designing optimal algorithms in the two cases are completely different. Our lower bounds apply even to very simple, smooth function families, such as linear and quadratic functions. This implies that algorithms from previous work can be used to obtain optimal error rates, under the additional assumption that the contributions of each data point to the loss function is smooth. We show that simple approaches to smoothing arbitrary loss functions (in order to apply previous tech-niques) do not yield optimal error rates. In particular, optimal algorithms were not previously known for problems such as training support vector machines and the high-dimensional median.
Answering Query Workloads with Optimal Error under Blowfish Privacy
, 2014
"... Recent work has proposed a privacy framework, called Blowfish, that generalizes differential privacy in order to generate principled relaxations. Blowfish privacy definitions take as input an additional parameter called a policy graph, which specifies which properties about individuals should be hid ..."
Abstract
- Add to MetaCart
Recent work has proposed a privacy framework, called Blowfish, that generalizes differential privacy in order to generate principled relaxations. Blowfish privacy definitions take as input an additional parameter called a policy graph, which specifies which properties about individuals should be hidden from an adversary. An open question is whether Blowfish privacy definitions indeed permit mechanisms that incur significant lower error for query answering compared to differentially privacy mechanism. In this paper, we answer this question and explore error bounds of sets of linear counting queries under different Blowfish policy graphs. We begin by generalizing the matrix mechanism lower bound of Li and Miklau (called the SVD bound) for differential privacy to find an analogous lower bound for our privacy framework. We show that for many query workloads and instantiations of the framework, we can achieve a much lower error bound than differential privacy. Next, we develop tools that use the existing literature on optimal or near optimal strategies for answering workloads under differential pri-vacy to develop near optimal strategies for answering workloads under our privacy framework. We provide applications of these by finding strategies for a few popular classes of queries. In particular, we find strategies to answer histogram queries and multidimensional range queries under different instantiations of our privacy framework. We believe the tools we develop will be useful for finding strategies to answer many other classes of queries with low error. ar X iv
Combinatorial discrepancy for boxes via the ellipsoid-infinity norm
"... The ellipsoid-infinity norm of a real m × n matrix A, denoted by ‖A‖E∞, is the minimum ` ∞ norm of an 0-centered ellipsoid E ⊆ Rm that contains all column vectors of A. This quantity, introduced by the second author and Talwar in 2013, is polynomial-time computable and approxi-mates the hereditary d ..."
Abstract
- Add to MetaCart
(Show Context)
The ellipsoid-infinity norm of a real m × n matrix A, denoted by ‖A‖E∞, is the minimum ` ∞ norm of an 0-centered ellipsoid E ⊆ Rm that contains all column vectors of A. This quantity, introduced by the second author and Talwar in 2013, is polynomial-time computable and approxi-mates the hereditary discrepancy herdiscA as follows: ‖A‖E∞/O(logm) ≤ herdiscA ≤ ‖A‖E ∞ ·O( logm). Here we show that both of the inequali-ties are asymptotically tight in the worst case, and we provide a simplified proof of the first inequality. We establish a number of favorable properties of ‖.‖E∞, such as the triangle inequality and multiplicativity w.r.t. the Kronecker (or tensor) product. We then demonstrate on several examples the power of the ellipsoid-infinity norm as a tool for proving lower and upper bounds in discrepancy theory. Most notably, we prove a new lower bound of Ω(logd−1 n) for the d-dimensional Tusnády problem, asking for the combinatorial discrepancy of an n-point set in Rd with respect to axis-parallel boxes. For d> 2, this improves the previous best lower bound, which was of order approxi-mately log(d−1)/2 n, and it comes close to the best known upper bound of O(logd+1/2 n), for which we also obtain a new, very simple proof. 1
Nearly Optimal Private Convolution
"... Abstract. We study algorithms for computing the convolution of a pri-vate input x with a public input h, while satisfying the guarantees of (ε, δ)-differential privacy. Convolution is a fundamental operation, in-timately related to Fourier Transforms. In our setting, the private in-put may represent ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. We study algorithms for computing the convolution of a pri-vate input x with a public input h, while satisfying the guarantees of (ε, δ)-differential privacy. Convolution is a fundamental operation, in-timately related to Fourier Transforms. In our setting, the private in-put may represent a time series of sensitive events or a histogram of a database of confidential personal information. Convolution then cap-tures important primitives including linear filtering, which is an essential tool in time series analysis, and aggregation queries on projections of the data. We give an algorithm for computing convolutions which satisfies (ε, δ)-differentially privacy and is nearly optimal for every public h, i.e. is instance optimal with respect to the public input. We prove optimality via spectral lower bounds on the hereditary discrepancy of convolution matrices. Our algorithm is very efficient – it is essentially no more com-putationally expensive than a Fast Fourier Transform.1 1