Results 11  20
of
94
Samplerank: Learning preference from atomic gradients
 In NIPS WS on Advances in Ranking
, 2009
"... Large templated factor graphs with complex structure that changes during inference have been shown to provide stateoftheart experimental results on tasks such as identity uncertainty and information integration. However, learning parameters in these models is difficult because computing the gradi ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
(Show Context)
Large templated factor graphs with complex structure that changes during inference have been shown to provide stateoftheart experimental results on tasks such as identity uncertainty and information integration. However, learning parameters in these models is difficult because computing the gradients require expensive inference routines. In this paper we propose an online algorithm that instead learns preferences over hypotheses from the gradients between the atomic steps of inference. Although there are a combinatorial number of ranking constraints over the entire hypothesis space, a connection to the frameworks of sampled convex programs reveals a polynomial bound on the number of rankings that need to be satisfied in practice. We further apply ideas of passive aggressive algorithms to our update rules, enabling us to extend recent work in confidenceweighted classification to structured prediction problems. We compare our algorithm to structured perceptron, contrastive divergence, and persistent contrastive divergence, demonstrating substantial error reductions on two realworld problems (20 % over contrastive divergence).
Online Learning for Group Lasso
"... We develop a novel online learning algorithm for the group lasso in order to efficiently find the important explanatory factors in a grouped manner. Different from traditional batchmode group lasso algorithms, which suffer from the inefficiency and poor scalability, our proposed algorithm performs ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
(Show Context)
We develop a novel online learning algorithm for the group lasso in order to efficiently find the important explanatory factors in a grouped manner. Different from traditional batchmode group lasso algorithms, which suffer from the inefficiency and poor scalability, our proposed algorithm performs in an online mode and scales well: at each iteration one can update the weight vector according to a closedform solution based on the average of previous subgradients. Therefore, the proposed online algorithm can be very efficient and scalable. This is guaranteed by its low worstcase time complexity and memory cost both in the order of O(d), where d is the number of dimensions. Moreover, in order to achieve more sparsity in both the group level and the individual feature level, we successively extend our online system to efficiently solve a number of variants of sparse group lasso models. We also show that the online system is applicable to other group lasso models, such as the group lasso with overlap and graph lasso. Finally, we demonstrate the merits of our algorithm by experimenting with both synthetic and realworld datasets. 1.
Multidomain learning by confidenceweighted parameter combination
 Machine Learning
, 2009
"... Stateoftheart statistical NLP systems for a variety of tasks learn from labeled training data that is often domain specific. However, there may be multiple domains or sources of interest on which the system must perform. For example, a spam filtering system must give high quality predictions for ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
Stateoftheart statistical NLP systems for a variety of tasks learn from labeled training data that is often domain specific. However, there may be multiple domains or sources of interest on which the system must perform. For example, a spam filtering system must give high quality predictions for many users, each of whom receives emails from different sources and may make slightly different decisions about what is or is not spam. Rather than learning separate models for each domain, we explore systems that learn across multiple domains. We develop a new multidomain online learning framework based on parameter combination from multiple classifiers. Our algorithms draw from multitask learning and domain adaptation to adapt multiple source domain classifiers to a new target domain, learn across multiple similar domains, and learn across a large number of disparate domains. We evaluate our algorithms on two popular NLP domain adaptation tasks: sentiment classification and spam filtering. 1.
Maximum Entropy Discrimination Markov Networks
, 2008
"... Standard maxmargin structured prediction methods concentrate directly on the inputoutput mapping, and the lack of an elegant probabilistic interpretation causes limitations. In this paper, we present a novel framework called Maximum Entropy Discrimination Markov Networks (MaxEntNet) to do Bayesian ..."
Abstract

Cited by 15 (8 self)
 Add to MetaCart
Standard maxmargin structured prediction methods concentrate directly on the inputoutput mapping, and the lack of an elegant probabilistic interpretation causes limitations. In this paper, we present a novel framework called Maximum Entropy Discrimination Markov Networks (MaxEntNet) to do Bayesian maxmargin structured learning by using expected margin constraints to define a feasible distribution subspace and applying the maximum entropy principle to choose the best distribution from this subspace. We show that MaxEntNet subsumes the standard maxmargin Markov networks (M 3 N) as a spacial case where the predictive model is assumed to be linear and the parameter prior is a standard normal. Based on this understanding, we propose the Laplace maxmargin Markov networks (LapM 3 N) which use the Laplace prior instead of the standard normal. We show that the adoption of a Laplace prior of the parameter makes LapM 3 N enjoy properties expected from a sparsified M 3 N. Unlike the L1regularized maximum likelihood estimation which sets small weights to zeros to achieve sparsity, LapM 3 N posteriorly weights the parameters and features with smaller weights are shrunk more. This posterior weighting effect makes LapM 3 N more stable with respect to the magnitudes of the regularization coefficients and more generalizable. To
Phishdef: Url names say it all
 CoRR
"... Abstract—Phishing is an increasingly sophisticated method to steal personal user information using sites that pretend to be legitimate. In this paper, we take the following steps to identify phishing URLs. First, we carefully select lexical features of the URLs that are resistant to obfuscation tech ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
(Show Context)
Abstract—Phishing is an increasingly sophisticated method to steal personal user information using sites that pretend to be legitimate. In this paper, we take the following steps to identify phishing URLs. First, we carefully select lexical features of the URLs that are resistant to obfuscation techniques used by attackers. Second, we evaluate the classification accuracy when using only lexical features, both automatically and handselected, vs. when using additional features. We show that lexical features are sufficient for all practical purposes. Third, we thoroughly compare several classification algorithms, and we propose to use an online method (AROW) that is able to overcome noisy training data. Based on the insights gained from our analysis, we propose PhishDef, a phishing detection system that uses only URL names and combines the above three elements. PhishDef is a highly accurate method (when compared to stateoftheart approaches over real datasets), lightweight (thus appropriate for online and clientside deployment), proactive (based on online classification rather than blacklists), and resilient to training data inaccuracies (thus enabling the use of large noisy training data). I.
A.: Learning with noisy labels
 In: Advances in Neural Information Processing Systems 26
, 2013
"... In this paper, we theoretically study the problem of binary classification in the presence of random classification noise — the learner, instead of seeing the true labels, sees labels that have independently been flipped with some small probability. Moreover, random label noise is classconditional ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
(Show Context)
In this paper, we theoretically study the problem of binary classification in the presence of random classification noise — the learner, instead of seeing the true labels, sees labels that have independently been flipped with some small probability. Moreover, random label noise is classconditional — the flip probability depends on the class. We provide two approaches to suitably modify any given surrogate loss function. First, we provide a simple unbiased estimator of any loss, and obtain performance bounds for empirical risk minimization in the presence of iid data with noisy labels. If the loss function satisfies a simple symmetry condition, we show that the method leads to an efficient algorithm for empirical minimization. Second, by leveraging a reduction of risk minimization under noisy labels to classification with weighted 01 loss, we suggest the use of a simple weighted surrogate loss, for which we are able to obtain strong empirical risk bounds. This approach has a very remarkable consequence — methods used in practice such as biased SVM and weighted logistic regression are provably noisetolerant. On a synthetic nonseparable dataset, our methods achieve over 88 % accuracy even when 40 % of the labels are corrupted, and are competitive with respect to recently proposed methods for dealing with label noise in several benchmark datasets. 1
Online Multimodal Deep Similarity Learning with Application to Image Retrieval
"... Recent years have witnessed extensive studies on distance metric learning (DML) for improving similarity search in multimedia information retrieval tasks. Despite their successes, most existing DML methods suffer from two critical limitations: (i) they typically attempt to learn a linear distance fu ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
(Show Context)
Recent years have witnessed extensive studies on distance metric learning (DML) for improving similarity search in multimedia information retrieval tasks. Despite their successes, most existing DML methods suffer from two critical limitations: (i) they typically attempt to learn a linear distance function on the input feature space, in which the assumption of linearity limits their capacity of measuring the similarity on complex patterns in realworld applications; (ii) they are often designed for learning distance metrics on unimodal data, which may not effectively handle the similarity measures for multimedia objects with multimodal representations. To address these limitations, in this paper, we propose a novel framework of online multimodal deep similarity learning (OMDSL), which aims to optimally integrate multiple deep neural networks pretrained with stacked denoising autoencoder. In particular, the proposed framework explores a unified twostage online learning scheme that consists of (i) learning a flexible nonlinear transformation function for each individual modality, and (ii) learning to find the optimal combination of multiple diverse modalities simultaneously in a coherent process. We conduct an extensive set of experiments to evaluate the performance of the proposed algorithms for multimodal image retrieval tasks, in which the encouraging results validate the effectiveness of the proposed technique.
Learning to Detect Malicious URLs
 Exploiting Feature Covariance in HighDimensional Online Learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS
, 2011
"... Malicious Web sites are a cornerstone of Internet criminal activities. The dangers of these sites have created a demand for safeguards that protect endusers from visiting them. This article explores how to detect malicious Web sites from the lexical and hostbased features of their URLs. We show th ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
Malicious Web sites are a cornerstone of Internet criminal activities. The dangers of these sites have created a demand for safeguards that protect endusers from visiting them. This article explores how to detect malicious Web sites from the lexical and hostbased features of their URLs. We show that this problem lends itself naturally to modern algorithms for online learning. Online algorithms not only process large numbers of URLs more efficiently than batch algorithms, they also adapt more quickly to new features in the continuously evolving distribution of malicious URLs. We develop a realtime system for gathering URL features and pair it with a realtime feed of labeled URLs from a large Web mail provider. From these features and labels, we are able to train an online classifier that detects malicious Web sites with 99 % accuracy over a balanced dataset.
Confidence Weighted Mean Reversion Strategy for OnLine Portfolio Selection
"... This paper proposes a novel online portfolio selection strategy named “Confidence Weighted Mean Reversion ” (CWMR). Inspired by the mean reversion principle and the confidence weighted online learning technique, CWMR models a portfolio vector as Gaussian distribution, and sequentially updates the ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
(Show Context)
This paper proposes a novel online portfolio selection strategy named “Confidence Weighted Mean Reversion ” (CWMR). Inspired by the mean reversion principle and the confidence weighted online learning technique, CWMR models a portfolio vector as Gaussian distribution, and sequentially updates the distribution by following the mean reversion trading principle. The CWMR strategy is able to effectively exploit the power of mean reversion for online portfolio selection. Extensive experiments on various real markets demonstrate the effectiveness of our strategy in comparison with the state of the art. 1
Maximum Relative Margin and DataDependent regularization
 JOURNAL OF MACHINE LEARNING RESEARCH
"... Leading classification methods such as support vector machines (SVMs) and their counterparts achieve strong generalization performance by maximizing the margin of separation between data classes. While the maximum margin approach has achieved promising performance, this article identifies its sensit ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Leading classification methods such as support vector machines (SVMs) and their counterparts achieve strong generalization performance by maximizing the margin of separation between data classes. While the maximum margin approach has achieved promising performance, this article identifies its sensitivity to affine transformations of the data and to directions with large data spread. Maximum margin solutions may be misled by the spread of data and preferentially separate classes along large spread directions. This article corrects these weaknesses by measuring margin not in the absolute sense but rather only relative to the spread of data in any projection direction. Maximum relative margin corresponds to a datadependent regularization on the classification function while maximum absolute margin corresponds to an ℓ2 norm constraint on the classification function. Interestingly, the proposed improvements only require simple extensions to existing maximum margin formulations and preserve the computational efficiency of SVMs. Through the maximization of relative margin, surprising performance gains are achieved on realworld problems such as digit, image histogram, and text classification. In addition, risk bounds are derived for the new formulation based on Rademacher averages.