Results 1  10
of
353
A Stochastic Gradient Method with an Exponential Convergence Rate for StronglyConvex Optimization with Finite Training Sets. arXiv preprint arXiv:1202.6258
, 2012
"... We propose a new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex. While standard stochastic gradient methods converge at sublinear rates for this problem, the proposed method incorporates a memory of previous gradient values in ..."
Abstract

Cited by 76 (11 self)
 Add to MetaCart
(Show Context)
We propose a new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex. While standard stochastic gradient methods converge at sublinear rates for this problem, the proposed method incorporates a memory of previous gradient values in order to achieve a linear convergence rate. In a machine learning context, numerical experiments indicate that the new algorithm can dramatically outperform standard algorithms, both in terms of optimizing the training objective and reducing the testing objective quickly. 1
A Simple and Practical Algorithm for Differentially Private Data Release
"... We present a new algorithm for differentially private data release, based on a simple combination of the Exponential Mechanism with the Multiplicative Weights update rule. Our MWEM algorithm achieves what are the best known and nearly optimal theoretical guarantees, while at the same time being simp ..."
Abstract

Cited by 56 (2 self)
 Add to MetaCart
(Show Context)
We present a new algorithm for differentially private data release, based on a simple combination of the Exponential Mechanism with the Multiplicative Weights update rule. Our MWEM algorithm achieves what are the best known and nearly optimal theoretical guarantees, while at the same time being simple to implement and experimentally more accurate on actual data sets than existing techniques. 1.
Authorized private keyword search over encrypted data in cloud computing
, 2011
"... Abstract—In cloud computing, clients usually outsource their data to the cloud storage servers to reduce the management costs. While those data may contain sensitive personal information, the cloud servers cannot be fully trusted in protecting them. Encryption is a promising way to protect the confi ..."
Abstract

Cited by 48 (6 self)
 Add to MetaCart
(Show Context)
Abstract—In cloud computing, clients usually outsource their data to the cloud storage servers to reduce the management costs. While those data may contain sensitive personal information, the cloud servers cannot be fully trusted in protecting them. Encryption is a promising way to protect the confidentiality of the outsourced data, but it also introduces much difficulty to performing effective searches over encrypted information. Most existing works do not support efficient searches with complex query conditions, and care needs to be taken when using them because of the potential privacy leakages about the data owners to the data users or the cloud server. In this paper, using online Personal Health Record (PHR) as a case study, we first show the necessity of search capability authorization that reduces the privacy exposure resulting from the search results, and establish a scalable framework for Authorized Private Keyword Search (APKS) over encrypted cloud data. We then propose two novel solutions for APKS based on a recent cryptographic primitive, Hierarchical Predicate Encryption (HPE). Our solutions enable efficient multidimensional keyword searches with range query, allow delegation and revocation of search capabilities. Moreover, we enhance the query privacy which hides users ’ query keywords against the server. We implement our scheme on a modern workstation, and experimental results demonstrate its suitability for practical usage. I.
Minimizing Finite Sums with the Stochastic Average Gradient
, 2013
"... We propose the stochastic average gradient (SAG) method for optimizing the sum of a finite number of smooth convex functions. Like stochastic gradient (SG) methods, the SAG method’s iteration cost is independent of the number of terms in the sum. However, by incorporating a memory of previous gradie ..."
Abstract

Cited by 34 (2 self)
 Add to MetaCart
(Show Context)
We propose the stochastic average gradient (SAG) method for optimizing the sum of a finite number of smooth convex functions. Like stochastic gradient (SG) methods, the SAG method’s iteration cost is independent of the number of terms in the sum. However, by incorporating a memory of previous gradient values the SAG method achieves a faster convergence rate than blackbox SG methods. The convergence rate is improved from O(1 / √ k) to O(1/k) in general, and when the sum is stronglyconvex the convergence rate is improved from the sublinear O(1/k) to a linear convergence rate of the form O(ρ k) for ρ < 1. Further, in many cases the convergence rate of the new method is also faster than blackbox deterministic gradient methods, in terms of the number of gradient evaluations. Numerical experiments indicate that the new algorithm often dramatically outperforms existing SG and deterministic gradient methods, and that the performance may be further improved through the use of nonuniform sampling strategies. 1
Differentially Private Data Release for Data Mining
 Proc. KDD 11
, 2011
"... Privacypreserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among the existing privacy models, differential privacy provides one of the strongest privacy guarantees and has no assumptions about an adversary’s background knowledge. Most ..."
Abstract

Cited by 31 (6 self)
 Add to MetaCart
(Show Context)
Privacypreserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among the existing privacy models, differential privacy provides one of the strongest privacy guarantees and has no assumptions about an adversary’s background knowledge. Most of the existing solutions that ensure differential privacy are based on an interactive model, where the data miner is only allowed to pose aggregate queries to the database. In this paper, we propose the first anonymization algorithm for the noninteractive setting based on the generalization technique. The proposed solution first probabilistically generalizes the raw data and then adds noise to guarantee differential privacy. As a sample application, we show that the anonymized data can be used effectively to build a decision tree induction classifier. Experimental results demonstrate that the proposed noninteractive anonymization algorithm is scalable and performs better than the existing solutions for classification analysis.
LargeScale Sparse Principal Component Analysis with Application to Text Data
"... Sparse PCA provides a linear combination of small number of features that maximizes variance across data. Although Sparse PCA has apparent advantages compared to PCA, such as better interpretability, it is generally thought to be computationally much more expensive. In this paper, we demonstrate the ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
(Show Context)
Sparse PCA provides a linear combination of small number of features that maximizes variance across data. Although Sparse PCA has apparent advantages compared to PCA, such as better interpretability, it is generally thought to be computationally much more expensive. In this paper, we demonstrate the surprising fact that sparse PCA can be easier than PCA in practice, and that it can be reliably applied to very large data sets. This comes from a rigorous feature elimination preprocessing result, coupled with the favorable fact that features in reallife data typically have exponentially decreasing variances, which allows for many features to be eliminated. We introduce a fast block coordinate ascent algorithm with much better computational complexity than the existing firstorder ones. We provide experimental results obtained on text corpora involving millions of documents and hundreds of thousands of features. These results illustrate how Sparse PCA can help organize a large corpus of text data in a userinterpretable way, providing an attractive alternative approach to topic models. 1
Kernel Methods for Deep Learning
"... We introduce a new family of positivedefinite kernel functions that mimic the computation in large, multilayer neural nets. These kernel functions can be used in shallow architectures, such as support vector machines (SVMs), or in deep kernelbased architectures that we call multilayer kernel machi ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
We introduce a new family of positivedefinite kernel functions that mimic the computation in large, multilayer neural nets. These kernel functions can be used in shallow architectures, such as support vector machines (SVMs), or in deep kernelbased architectures that we call multilayer kernel machines (MKMs). We evaluate SVMs and MKMs with these kernel functions on problems designed to illustrate the advantages of deep architectures. On several problems, we obtain better results than previous, leading benchmarks from both SVMs with Gaussian kernels as well as deep belief nets. 1
Fastfood — Approximating Kernel Expansions in Loglinear Time
"... Despite their successes, what makes kernel methods difficult to use in many large scale problems is the fact that computing the decision function is typically expensive, especially at prediction time. In this paper, we overcome this difficulty by proposing Fastfood, an approximation that accelerates ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
(Show Context)
Despite their successes, what makes kernel methods difficult to use in many large scale problems is the fact that computing the decision function is typically expensive, especially at prediction time. In this paper, we overcome this difficulty by proposing Fastfood, an approximation that accelerates such computation significantly. Key to Fastfood is the observation that Hadamard matrices when combined with diagonal Gaussian matrices exhibit properties similar to dense Gaussian random matrices. Yet unlike the latter, Hadamard and diagonal matrices are inexpensive to multiply and store. These two matrices can be used in lieu of Gaussian matrices in Random Kitchen Sinks (Rahimi & Recht, 2007) and thereby speeding up the computation for a large range of kernel functions. Specifically, Fastfood requires O(n log d) time and O(n) storage to compute n nonlinear basis functions in d dimensions, a significant improvement from O(nd) computation and storage, without sacrificing accuracy. We prove that the approximation is unbiased and has low variance. Extensive experiments show that we achieve similar accuracy to full kernel expansions and Random Kitchen Sinks while being 100x faster and using 1000x less memory. These improvements, especially in terms of memory usage, make kernel methods more practical for applications that have large training sets and/or require realtime prediction. 1.
NearOptimal Adaptive Compressed Sensing
, 1306
"... Abstract—This paper proposes a simple adaptive sensing and group testing algorithm for sparse signal recovery. The algorithm, termed Compressive Adaptive Sense and Search (CASS), is shown to be nearoptimal in that it succeeds at the lowest possible signaltonoiseratio (SNR) levels. Like tradition ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
(Show Context)
Abstract—This paper proposes a simple adaptive sensing and group testing algorithm for sparse signal recovery. The algorithm, termed Compressive Adaptive Sense and Search (CASS), is shown to be nearoptimal in that it succeeds at the lowest possible signaltonoiseratio (SNR) levels. Like traditional compressed sensing based on random nonadaptive design matrices, the CASS algorithm requires only k log n measurements to recover a ksparse signal of dimension n. However,CASSsucceedsatSNR levels that are a factor log n less than required by standard compressed sensing. From the point of view of constructing and implementing the sensing operation as well as computing the reconstruction, the proposed algorithm is substantially less computationally intensive than standard compressed sensing. CASS is also demonstrated to perform considerably better in practice through simulation. To the best of our knowledge, this is the first demonstration of an adaptive compressed sensing algorithm with nearoptimal theoretical guarantees and excellent practical performance. This paper also shows that methods like compressed sensing, group testing, and pooling have an advantage beyond simply reducing the number of measurements or tests – adaptive versions of such methods can also improve detection and estimation performance when compared to nonadaptive direct (uncompressed) sensing. I.
SpeedBoost: Anytime Prediction with Uniform NearOptimality
"... We present SpeedBoost, a natural extension of functional gradient descent, for learning anytime predictors, which automatically trade computation time for predictive accuracy by selecting from a set of simpler candidate predictors. These anytime predictors not only generate approximate predictions r ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
We present SpeedBoost, a natural extension of functional gradient descent, for learning anytime predictors, which automatically trade computation time for predictive accuracy by selecting from a set of simpler candidate predictors. These anytime predictors not only generate approximate predictions rapidly, but are capable of using extra resources at prediction time, when available, to improve performance. We also demonstrate how our framework can be used to select weak predictors which target certain subsets of the data, allowing for efficient use of computational resources on difficult examples. We also show that variants of the SpeedBoost algorithm produce predictors which are provably competitive with any possible sequence of weak predictors with the same total complexity. 1