Results 1  10
of
327
Online passiveaggressive algorithms
 JMLR
, 2006
"... We present a unified view for online classification, regression, and uniclass problems. This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the nonrealizable case. The end result is new alg ..."
Abstract

Cited by 435 (24 self)
 Add to MetaCart
(Show Context)
We present a unified view for online classification, regression, and uniclass problems. This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the nonrealizable case. The end result is new algorithms and accompanying loss bounds for hingeloss regression and uniclass. We also get refined loss bounds for previously studied classification algorithms.
Informationtheoretic metric learning
 in NIPS 2006 Workshop on Learning to Compare Examples
, 2007
"... We formulate the metric learning problem as that of minimizing the differential relative entropy between two multivariate Gaussians under constraints on the Mahalanobis distance function. Via a surprising equivalence, we show that this problem can be solved as a lowrank kernel learning problem. Spe ..."
Abstract

Cited by 359 (15 self)
 Add to MetaCart
(Show Context)
We formulate the metric learning problem as that of minimizing the differential relative entropy between two multivariate Gaussians under constraints on the Mahalanobis distance function. Via a surprising equivalence, we show that this problem can be solved as a lowrank kernel learning problem. Specifically, we minimize the Burg divergence of a lowrank kernel to an input kernel, subject to pairwise distance constraints. Our approach has several advantages over existing methods. First, we present a natural informationtheoretic formulation for the problem. Second, the algorithm utilizes the methods developed by Kulis et al. [6], which do not involve any eigenvector computation; in particular, the running time of our method is faster than most existing techniques. Third, the formulation offers insights into connections between metric learning and kernel learning. 1
A Unifying Review of Linear Gaussian Models
, 1999
"... Factor analysis, principal component analysis, mixtures of gaussian clusters, vector quantization, Kalman filter models, and hidden Markov models can all be unified as variations of unsupervised learning under a single basic generative model. This is achieved by collecting together disparate observa ..."
Abstract

Cited by 351 (18 self)
 Add to MetaCart
(Show Context)
Factor analysis, principal component analysis, mixtures of gaussian clusters, vector quantization, Kalman filter models, and hidden Markov models can all be unified as variations of unsupervised learning under a single basic generative model. This is achieved by collecting together disparate observations and derivations made by many previous authors and introducing a new way of linking discrete and continuous state models using a simple nonlinearity. Through the use of other nonlinearities, we show how independent component analysis is also a variation of the same basic generative model. We show that factor analysis and mixtures of gaussians can be implemented in autoencoder neural networks and learned using squared error plus the same regularization term. We introduce a new model for static data, known as sensible principal component analysis, as well as a novel concept of spatially adaptive observation noise. We also review some of the literature involving global and local mixtures of the basic models and provide pseudocode for inference and learning for all the basic models.
A Framework for Collaborative, ContentBased and Demographic Filtering
 ARTIFICIAL INTELLIGENCE REVIEW
, 1999
"... We discuss learning a profile of user interests for recommending information sources such as Web pages or news articles. We describe the types of information available to determine whether to recommend a particular page to a particular user. This information includes the content of the page, the rat ..."
Abstract

Cited by 319 (6 self)
 Add to MetaCart
We discuss learning a profile of user interests for recommending information sources such as Web pages or news articles. We describe the types of information available to determine whether to recommend a particular page to a particular user. This information includes the content of the page, the ratings of the user on other pages and the contents of these pages, the ratings given to that page by other users and the ratings of these other users on other pages and demographic information about users. We describe how each type of information may be used individually and then discuss an approach to combining recommendations from multiple sources. We illustrate each approach and the combined approach in the context of recommending restaurants.
Online Convex Programming and Generalized Infinitesimal Gradient Ascent
, 2003
"... Convex programming involves a convex set F R and a convex function c : F ! R. The goal of convex programming is to nd a point in F which minimizes c. In this paper, we introduce online convex programming. In online convex programming, the convex set is known in advance, but in each step of some ..."
Abstract

Cited by 298 (4 self)
 Add to MetaCart
(Show Context)
Convex programming involves a convex set F R and a convex function c : F ! R. The goal of convex programming is to nd a point in F which minimizes c. In this paper, we introduce online convex programming. In online convex programming, the convex set is known in advance, but in each step of some repeated optimization problem, one must select a point in F before seeing the cost function for that step. This can be used to model factory production, farm production, and many other industrial optimization problems where one is unaware of the value of the items produced until they have already been constructed. We introduce an algorithm for this domain, apply it to repeated games, and show that it is really a generalization of in nitesimal gradient ascent, and the results here imply that generalized in nitesimal gradient ascent (GIGA) is universally consistent.
ContextSensitive Learning Methods for Text Categorization
 ACM Transactions on Information Systems
, 1996
"... this article, we will investigate the performance of two recently implemented machinelearning algorithms on a number of large text categorization problems. The two algorithms considered are setvalued RIPPER, a recent rulelearning algorithm [Cohen A earlier version of this article appeared in Proc ..."
Abstract

Cited by 291 (13 self)
 Add to MetaCart
this article, we will investigate the performance of two recently implemented machinelearning algorithms on a number of large text categorization problems. The two algorithms considered are setvalued RIPPER, a recent rulelearning algorithm [Cohen A earlier version of this article appeared in Proceedings of the 19th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR) pp. 307315
Training Algorithms for Linear Text Classifiers
, 1996
"... Systems for text retrieval, routing, categorization and other IR tasks rely heavily on linear classifiers. We propose that two machine learning algorithms, the WidrowHoff and EG algorithms, be used in training linear text classifiers. In contrast to most IR methods, theoretical analysis provides pe ..."
Abstract

Cited by 276 (12 self)
 Add to MetaCart
Systems for text retrieval, routing, categorization and other IR tasks rely heavily on linear classifiers. We propose that two machine learning algorithms, the WidrowHoff and EG algorithms, be used in training linear text classifiers. In contrast to most IR methods, theoretical analysis provides performance guarantees and guidance on parameter settings for these algorithms. Experimental data is presented showing WidrowHoff and EG to be more effective than the widely used Rocchio algorithm on several categorization and routing tasks. 1 Introduction Document retrieval, categorization, routing, and filtering systems often are based on classification. That is, the IR system decides for each document which of two or more classes it belongs to, or how strongly it belongs to a class, in order to accomplish the IR task of interest. For instance, the two classes may be the documents relevant to and not relevant to a particular user, and the system may rank documents based on how likely it i...
Learning to resolve natural language ambiguities: A unified approach
 In Proceedings of the National Conference on Artificial Intelligence. 806813. Segond F., Schiller A., Grefenstette & Chanod F.P
, 1998
"... distinct semanticonceptsuch as interest rate and has interest in Math are conflated in ordinary text. We analyze a few of the commonly used statistics based The surrounding context word associations and synand machine learning algorithms for natural language tactic patterns in this case are suffl ..."
Abstract

Cited by 172 (78 self)
 Add to MetaCart
(Show Context)
distinct semanticonceptsuch as interest rate and has interest in Math are conflated in ordinary text. We analyze a few of the commonly used statistics based The surrounding context word associations and synand machine learning algorithms for natural language tactic patterns in this case are sufflcicnt to identify disambiguation tasks and observe tha they can bc recast as learning linear separators in the feature space. the correct form. Each of the methods makes a priori assumptions, which Many of these arc important standalone problems it employs, given the data, when searching for its hy but even more important is thei role in many applicapothesis. Nevertheless, as we show, it searches a space tions including speech recognition, machine translation, that is as rich as the space of all linear separators. information extraction and intelligent humanmachine We use this to build an argument for a data driven interaction. Most of the ambiguity resolution problems approach which merely searches for a good linear sepa are at the lower level of the natural language inferences rator in the feature space, without further assumptions chain; a wide range and a large number of ambigui
Contentbased recommendation systems
 THE ADAPTIVE WEB: METHODS AND STRATEGIES OF WEB PERSONALIZATION. VOLUME 4321 OF LECTURE NOTES IN COMPUTER SCIENCE
, 2007
"... This chapter discusses contentbased recommendation systems, i.e., systems that recommend an item to a user based upon a description of the item and a profile of the user’s interests. Contentbased recommendation systems may be used in a variety of domains ranging from recommending web pages, news ..."
Abstract

Cited by 163 (0 self)
 Add to MetaCart
(Show Context)
This chapter discusses contentbased recommendation systems, i.e., systems that recommend an item to a user based upon a description of the item and a profile of the user’s interests. Contentbased recommendation systems may be used in a variety of domains ranging from recommending web pages, news articles, restaurants, television programs, and items for sale. Although the details of various systems differ, contentbased recommendation systems share in common a means for describing the items that may be recommended, a means for creating a profile of the user that describes the types of items the user likes, and a means of comparing items to the user profile to determine what to recommend. The profile is often created and updated automatically in response to feedback on the desirability of items that have been presented to the user. A common scenario for modern recommendation systems is a Web application with which a user interacts. Typically, a system presents a summary list of items to a user, and the user selects among the items to receive more details on an item or to interact
A SNoWbased face detector
 Adbances in Neural Information Processing System 12, pp 855 861
, 2000
"... mhyang~vision.ai.uiuc.edu danr~cs.uiuc.edu ahuja~vision.ai.uiuc.edu A novel learning approach for human face detection using a network of linear units is presented. The SNoW learning architecture is a sparse network of linear functions over a predefined or incrementally learned feature space and i ..."
Abstract

Cited by 146 (17 self)
 Add to MetaCart
(Show Context)
mhyang~vision.ai.uiuc.edu danr~cs.uiuc.edu ahuja~vision.ai.uiuc.edu A novel learning approach for human face detection using a network of linear units is presented. The SNoW learning architecture is a sparse network of linear functions over a predefined or incrementally learned feature space and is specifically tailored for learning in the presence of a very large number of features. A wide range of face images in different poses, with different expressions and under different lighting conditions are used as a training set to capture the variations of human faces. Experimental results on commonly used benchmark data sets of a wide range of face images show that the SNoWbased approach outperforms methods that use neural networks, Bayesian methods, support vector machines and others. Furthermore, learning and evaluation using the SNoWbased method are significantly more efficient than with other methods. 1