• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

An interior-point method for large-scale l1regularized logistic regression (2007)

by K Koh, Kim SJ, S Boyd
Venue:J Mach Learn Res
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 72
Next 10 →

Regularization paths for generalized linear models via coordinate descent

by Jerome Friedman, Trevor Hastie, Rob Tibshirani , 2009
"... We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, twoclass logistic regression, and multinomial regression problems while the penalties include ℓ1 (the lasso), ℓ2 (ridge regression) and mixtures of the two (the elastic ..."
Abstract - Cited by 77 (3 self) - Add to MetaCart
We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, twoclass logistic regression, and multinomial regression problems while the penalties include ℓ1 (the lasso), ℓ2 (ridge regression) and mixtures of the two (the elastic net). The algorithms use cyclical coordinate descent, computed along a regularization path. The methods can handle large problems and can also deal efficiently with sparse features. In comparative timings we find that the new algorithms are considerably faster than competing methods.

Trust region Newton method for large-scale logistic regression

by Chih-jen Lin, Ruby C. Weng, S. Sathiya Keerthi - In Proceedings of the 24th International Conference on Machine Learning (ICML , 2007
"... Large-scale logistic regression arises in many applications such as document classification and natural language processing. In this paper, we apply a trust region Newton method to maximize the log-likelihood of the logistic regression model. The proposed method uses only approximate Newton steps in ..."
Abstract - Cited by 35 (5 self) - Add to MetaCart
Large-scale logistic regression arises in many applications such as document classification and natural language processing. In this paper, we apply a trust region Newton method to maximize the log-likelihood of the logistic regression model. The proposed method uses only approximate Newton steps in the beginning, but achieves fast convergence in the end. Experiments show that it is faster than the commonly used quasi Newton approach for logistic regression. We also compare it with existing linear SVM implementations. 1

Regularization and feature selection in least-squares temporal difference learning (full version). Available at http://ai.stanford.edu/˜kolter

by J. Zico Kolter, Andrew Y. Ng , 2009
"... We consider the task of reinforcement learning with linear value function approximation. Temporal difference algorithms, and in particular the Least-Squares Temporal Difference (LSTD) algorithm, provide a method for learning the parameters of the value function, but when the number of features is la ..."
Abstract - Cited by 33 (1 self) - Add to MetaCart
We consider the task of reinforcement learning with linear value function approximation. Temporal difference algorithms, and in particular the Least-Squares Temporal Difference (LSTD) algorithm, provide a method for learning the parameters of the value function, but when the number of features is large this algorithm can over-fit to the data and is computationally expensive. In this paper, we propose a regularization framework for the LSTD algorithm that overcomes these difficulties. In particular, we focus on the case of l1 regularization, which is robust to irrelevant features and also serves as a method for feature selection. Although the l1 regularized LSTD solution cannot be expressed as a convex optimization problem, we present an algorithm similar to the Least Angle Regression (LARS) algorithm that can efficiently compute the optimal solution. Finally, we demonstrate the performance of the algorithm experimentally. 1.

An Empirical Study of Context in Object Detection

by Santosh K. Divvala, Derek Hoiem, James H. Hays, Alexei A. Efros, Martial Hebert
"... This paper presents an empirical evaluation of the role of context in a contemporary, challenging object detection task – the PASCAL VOC 2008. Previous experiments with context have mostly been done on home-grown datasets, often with non-standard baselines, making it difficult to isolate the contrib ..."
Abstract - Cited by 24 (3 self) - Add to MetaCart
This paper presents an empirical evaluation of the role of context in a contemporary, challenging object detection task – the PASCAL VOC 2008. Previous experiments with context have mostly been done on home-grown datasets, often with non-standard baselines, making it difficult to isolate the contribution of contextual information. In this work, we present our analysis on a standard dataset, using topperforming local appearance detectors as baseline. We evaluate several different sources of context and ways to utilize it. While we employ many contextual cues that have been used before, we also propose a few novel ones including the use of geographic context and a new approach for using object spatial support. 1.

Highly undersampled magnetic resonance image reconstruction via homotopic ℓ0-minimization

by Joshua Trzasko, Student Member - IEEE Trans. Med. Imaging , 2009
"... any reduction in scan time offers a number of potential benefits ranging from high-temporal-rate observation of physiological processes to improvements in patient comfort. Following recent developments in Compressive Sensing (CS) theory, several authors have demonstrated that certain classes of MR i ..."
Abstract - Cited by 14 (1 self) - Add to MetaCart
any reduction in scan time offers a number of potential benefits ranging from high-temporal-rate observation of physiological processes to improvements in patient comfort. Following recent developments in Compressive Sensing (CS) theory, several authors have demonstrated that certain classes of MR images which possess sparse representations in some transform domain can be accurately reconstructed from very highly undersampled K-space data by solving a convex ℓ1-minimization problem. Although ℓ1-based techniques are extremely powerful, they inherently require a degree of over-sampling above the theoretical minimum sampling rate to guarantee that exact reconstruction can be achieved. In this paper, we propose a generalization of the Compressive Sensing paradigm based on homotopic approximation of the ℓ0 quasi-norm and show how MR image reconstruction can be pushed even further below the Nyquist limit and significantly closer to the theoretical bound. Following a brief review of standard Compressive Sensing methods and the developed theoretical extensions, several example MRI reconstructions from highly undersampled K-space data are presented.

Fingerprinting the datacenter: Automated classification of performance crises

by Peter Bodík, Dawn B. Woodard, Moises Goldszmidt, Armando Fox, Hans Andersen - In Proceedings of EuroSys’10 , 2010
"... Contemporary datacenters comprise hundreds or thousands of machines running applications requiring high availability and responsiveness. Although a performance crisis is easily detected by monitoring key end-to-end performance indicators (KPIs) such as response latency or request throughput, the var ..."
Abstract - Cited by 14 (1 self) - Add to MetaCart
Contemporary datacenters comprise hundreds or thousands of machines running applications requiring high availability and responsiveness. Although a performance crisis is easily detected by monitoring key end-to-end performance indicators (KPIs) such as response latency or request throughput, the variety of conditions that can lead to KPI degradation makes it difficult to select appropriate recovery actions. We propose and evaluate a methodology for automatic classification and identification of crises, and in particular for detecting whether a given crisis has been seen before, so that a known solution may be immediately applied. Our approach is based on a new and efficient representation of the datacenter’s state called a fingerprint, constructed by statistical selection and summarization of the hundreds of performance metrics typically collected on such systems. Our evaluation uses 4 months of trouble-ticket data from a production datacenter with hundreds of machines running a 24x7 enterprise-class user-facing application. In experiments in a realistic and rigorous operational setting, our approach provides operators the information necessary to initiate recovery actions with 80 % correctness in an average of 10 minutes, which is 50 minutes earlier than the deadline provided to us by the operators. To the best of our knowledge this is Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

SLEP: Sparse Learning with Efficient Projections, Arizona State University, 2009. [Online]. Available: http://www.public.asu.edu/ ∼jye02/Software/SLEP [19

by Jun Liu, Shuiwang Ji, Jieping Ye - Annals of Applied Statistics , 2007
"... ..."
Abstract - Cited by 13 (5 self) - Add to MetaCart
Abstract not found

Algorithms for sparse linear classifiers in the massive data setting, 2006. Manuscript. Available fromwww.stat.rutgers.edu/˜madigan/papers

by Suhrid Balakrishnan, David Madigan , 2005
"... Classifiers favoring sparse solutions, such as support vector machines, relevance vector machines, LASSO-regression based classifiers, etc., provide competitive methods for classification problems in high dimensions. However, current algorithms for training sparse classifiers typically scale quite u ..."
Abstract - Cited by 12 (0 self) - Add to MetaCart
Classifiers favoring sparse solutions, such as support vector machines, relevance vector machines, LASSO-regression based classifiers, etc., provide competitive methods for classification problems in high dimensions. However, current algorithms for training sparse classifiers typically scale quite unfavorably with respect to the number of training examples. This paper proposes online and multi-pass algorithms for training sparse linear classifiers for high dimensional data. These algorithms have computational complexity and memory requirements that make learning on massive datasets feasible. The central idea that makes this possible is a straightforward quadratic approximation to the likelihood function.

Signal Restoration with Overcomplete Wavelet Transforms: Comparison of Analysis and Synthesis Priors

by Ivan W. Selesnick , Mário A. T. Figueiredo
"... The variational approach to signal restoration calls for the minimization of a cost function that is the sum of a data fidelity term and a regularization term, the latter term constituting a ‘prior’. A synthesis prior represents the sought signal as a weighted sum of ‘atoms’. On the other hand, an a ..."
Abstract - Cited by 10 (1 self) - Add to MetaCart
The variational approach to signal restoration calls for the minimization of a cost function that is the sum of a data fidelity term and a regularization term, the latter term constituting a ‘prior’. A synthesis prior represents the sought signal as a weighted sum of ‘atoms’. On the other hand, an analysis prior models the coefficients obtained by applying the forward transform to the signal. For orthonormal transforms, the synthesis prior and analysis prior are equivalent; however, for overcomplete transforms the two formulations are different. We compare analysis and synthesis ℓ1-norm regularization with overcomplete transforms for denoising and deconvolution.

Hunting for problems with Artemis

by Gabriela F. Creţu-ciocârlie, Mihai Budiu, Moises Goldszmidt
"... Artemis is a modular application designed for analyzing and troubleshooting the performance of large clusters running datacenter services. Artemis is composed of four modules: (1) distributed log collection and data extraction, (2) a database storing the extracted data, (3) an interactive visualizat ..."
Abstract - Cited by 9 (2 self) - Add to MetaCart
Artemis is a modular application designed for analyzing and troubleshooting the performance of large clusters running datacenter services. Artemis is composed of four modules: (1) distributed log collection and data extraction, (2) a database storing the extracted data, (3) an interactive visualization tool for exploring the data, and (4) a plug-in interface (and a set of sample plug-ins) allowing users to implement data analysis tools including (a) the extraction and construction of new features from the basic measurements collected, and (b) the implementation and invocation of statistical and machine learning algorithms and tools. In this paper we describe each of these components and then we illustrate the power of the plug-in architecture by presenting a case-study using Artemis to analyze a Dryad application running on a 240-machine cluster. 1.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University