Results 1  10
of
62
Classification using discriminative restricted boltzmann machines
 In ICML ’08: Proceedings of the 25th international conference on Machine learning. ACM
, 2008
"... Recently, many applications for Restricted Boltzmann Machines (RBMs) have been developed for a large variety of learning problems. However, RBMs are usually used as feature extractors for another learning algorithm or to provide a good initialization for deep feedforward neural network classifiers, ..."
Abstract

Cited by 93 (13 self)
 Add to MetaCart
(Show Context)
Recently, many applications for Restricted Boltzmann Machines (RBMs) have been developed for a large variety of learning problems. However, RBMs are usually used as feature extractors for another learning algorithm or to provide a good initialization for deep feedforward neural network classifiers, and are not considered as a standalone solution to classification problems. In this paper, we argue that RBMs provide a selfcontained framework for deriving competitive nonlinear classifiers. We present an evaluation of different learning algorithms for RBMs which aim at introducing a discriminative component to RBM training and improve their performance as classifiers. This approach is simple in that RBMs are used directly to build a classifier, rather than as a stepping stone. Finally, we demonstrate how discriminative RBMs can also be successfully employed in a semisupervised setting.
Principled hybrids of generative and discriminative models
 In CVPR ’06: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition
, 2006
"... When labelled training data is plentiful, discriminative techniques are widely used since they give excellent generalization performance. However, for largescale applications such as object recognition, hand labelling of data is expensive, and there is much interest in semisupervised techniques ba ..."
Abstract

Cited by 75 (1 self)
 Add to MetaCart
(Show Context)
When labelled training data is plentiful, discriminative techniques are widely used since they give excellent generalization performance. However, for largescale applications such as object recognition, hand labelling of data is expensive, and there is much interest in semisupervised techniques based on generative models in which the majority of the training data is unlabelled. Although the generalization performance of generative models can often be improved by ‘training them discriminatively’, they can then no longer make use of unlabelled data. In an attempt to gain the benefit of both generative and discriminative approaches, heuristic procedure have been proposed [2, 3] which interpolate between these two extremes by taking a convex combination of the generative and discriminative objective functions. In this paper we adopt a new perspective which says that there is only one correct way to train a given model, and that a ‘discriminatively trained ’ generative model is fundamentally a new model [7]. From this viewpoint, generative and discriminative models correspond to specific choices for the prior over parameters. As well as giving a principled interpretation of ‘discriminative training’, this approach opens door to very general ways of interpolating between generative and discriminative extremes through alternative choices of prior. We illustrate this framework using both synthetic data and a practical example in the domain of multiclass object recognition. Our results show that, when the supply of labelled training data is limited, the optimum performance corresponds to a balance between the purely generative and the purely discriminative. 1.
An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators
, 2008
"... Statistical and computational concerns have motivated parameter estimators based on various forms of likelihood, e.g., joint, conditional, and pseudolikelihood. In this paper, we present a unified framework for studying these estimators, which allows us to compare their relative (statistical) effici ..."
Abstract

Cited by 52 (3 self)
 Add to MetaCart
Statistical and computational concerns have motivated parameter estimators based on various forms of likelihood, e.g., joint, conditional, and pseudolikelihood. In this paper, we present a unified framework for studying these estimators, which allows us to compare their relative (statistical) efficiencies. Our asymptotic analysis suggests that modeling more of the data tends to reduce variance, but at the cost of being more sensitive to model misspecification. We present experiments validating our analysis.
A discriminative framework for modeling object classes
 In Proc. IEEE CVPR
, 2005
"... Here we explore a discriminative learning method on underlying generative models for the purpose of discriminating between object categories. Visual recognition algorithms learn models from a set of training examples. Generative models learn their representations by considering data from a single cl ..."
Abstract

Cited by 45 (5 self)
 Add to MetaCart
(Show Context)
Here we explore a discriminative learning method on underlying generative models for the purpose of discriminating between object categories. Visual recognition algorithms learn models from a set of training examples. Generative models learn their representations by considering data from a single class. Generative models are popular in computer vision for many reasons, including their ability to elegantly incorporate prior knowledge and to handle correspondences between object parts and detected features. However, generative models are often inferior to discriminative models during classification tasks. We study a discriminative approach to learning object categories which maintains the representational power of generative learning, but trains the generative models in a discriminative manner. The discriminatively trained models perform better during classification tasks as a result of selecting discriminative sets of features. We conclude by proposing a multiclass object recognition system which initially trains object classes in a generative manner, identifies subsets of similar classes with high confusion, and finally trains models for these subsets in a discriminative manner to realize gains in classification performance. 1
Nguyen,“Image superresolution using support vector regression
 IEEE Trans. Image Proc
, 2007
"... Abstract—A thorough investigation of the application of support vector regression (SVR) to the superresolution problem is conducted through various frameworks. Prior to the study, the SVR problem is enhanced by finding the optimal kernel. This is done by formulating the kernel learning problem in SV ..."
Abstract

Cited by 31 (1 self)
 Add to MetaCart
Abstract—A thorough investigation of the application of support vector regression (SVR) to the superresolution problem is conducted through various frameworks. Prior to the study, the SVR problem is enhanced by finding the optimal kernel. This is done by formulating the kernel learning problem in SVR form as a convex optimization problem, specifically a semidefinite programming (SDP) problem. An additional constraint is added to reduce the SDP to a quadratically constrained quadratic programming (QCQP) problem. After this optimization, investigation of the relevancy of SVR to superresolution proceeds with the possibility of using a single and general support vector regression for all image content, and the results are impressive for small training sets. This idea is improved upon by observing structural properties in the discrete cosine transform (DCT) domain to aid in learning the regression. Further improvement involves a combination of classification and SVRbased techniques, extending works in resolution synthesis. This method, termed kernel resolution synthesis, uses specific regressors for isolated image content to describe the domain through a partitioned look of the vector space, thereby yielding good results. Index Terms—Kernel matrix, nonlinear regression, resolution synthesis, superresolution, support vector regression (SVR). I.
Dropout Training as Adaptive Regularization
"... Dropout and other feature noising schemes control overfitting by artificially corrupting the training data. For generalized linear models, dropout performs a form of adaptive regularization. Using this viewpoint, we show that the dropout regularizer is firstorder equivalent to an L2 regularizer a ..."
Abstract

Cited by 31 (3 self)
 Add to MetaCart
(Show Context)
Dropout and other feature noising schemes control overfitting by artificially corrupting the training data. For generalized linear models, dropout performs a form of adaptive regularization. Using this viewpoint, we show that the dropout regularizer is firstorder equivalent to an L2 regularizer applied after scaling the features by an estimate of the inverse diagonal Fisher information matrix. We also establish a connection to AdaGrad, an online learning algorithm, and find that a close relative of AdaGrad operates by repeatedly solving linear dropoutregularized problems. By casting dropout as regularization, we develop a natural semisupervised algorithm that uses unlabeled data to create a better adaptive regularizer. We apply this idea to document classification tasks, and show that it consistently boosts the performance of dropout training, improving on stateoftheart results on the IMDB reviews dataset. 1
Semisupervised classification with hybrid generative discriminative methods
 In KDD
, 2007
"... We compare two recently proposed frameworks for combining generative and discriminative probabilistic classifiers and apply them to semisupervised classification. In both cases we explore the tradeoff between maximizing a discriminative likelihood of labeled data and a generative likelihood of labe ..."
Abstract

Cited by 30 (3 self)
 Add to MetaCart
(Show Context)
We compare two recently proposed frameworks for combining generative and discriminative probabilistic classifiers and apply them to semisupervised classification. In both cases we explore the tradeoff between maximizing a discriminative likelihood of labeled data and a generative likelihood of labeled and unlabeled data. While prominent semisupervised learning methods assume low density regions between classes or are subject to generative modeling assumptions, we conjecture that hybrid generative/discriminative methods allow semisupervised learning in the presence of strongly overlapping classes and reduce the risk of modeling structure in the unlabeled data that is irrelevant for the specific classification task of interest. We apply both hybrid approaches within naively structured Markov random field models and provide a thorough empirical comparison with two wellknown semisupervised learning methods on six text classification tasks. A semisupervised hybrid generative/discriminative method provides the best accuracy in 75 % of the experiments, and the multiconditional learning hybrid approach achieves the highest overall mean accuracy across all tasks.
Robust bayesian linear classifier ensembles
 In Machine Learning: ECML 2005
, 2005
"... Abstract. Ensemble classifiers combine the classification results of several classifiers. Simple ensemble methods such as uniform averaging over a set of models usually provide an improvement over selecting the single best model. Usually probabilistic classifiers restrict the set of possible models ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Ensemble classifiers combine the classification results of several classifiers. Simple ensemble methods such as uniform averaging over a set of models usually provide an improvement over selecting the single best model. Usually probabilistic classifiers restrict the set of possible models that can be learnt in order to lower computational complexity costs. In these restricted spaces, where incorrect modelling assumptions are possibly made, uniform averaging sometimes performs even better than bayesian model averaging. Linear mixtures over sets of models provide an space that includes uniform averaging as a particular case. We develop two algorithms for learning maximum a posteriori weights for linear mixtures, based on expectation maximization and on constrained optimizition. We provide a nontrivial example of the utility of these two algorithms by applying them for one dependence estimators.We develop the conjugate distribution for one dependence estimators and empirically show that uniform averaging is clearly superior to BMA for this family of models. After that we empirically show that the maximum a posteriori linear mixture weights improve accuracy significantly over uniform aggregation.
Discriminatively Trained Markov Model for Sequence Classification
"... In this paper, we propose a discriminative counterpart of the directed Markov Models of order k  1, or MM(k1) for sequence classification. MM(k1) models capture dependencies among neighboring elements of a sequence. The parameters of the classifiers are initialized to based on the maximum likeli ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
In this paper, we propose a discriminative counterpart of the directed Markov Models of order k  1, or MM(k1) for sequence classification. MM(k1) models capture dependencies among neighboring elements of a sequence. The parameters of the classifiers are initialized to based on the maximum likelihood estimates for their generative counterparts. We derive gradient based update equations for the parameters of the sequence classifiers in order to maximize the conditional likelihood function. Results of our experiments with data sets drawn from biological sequence classification (specifically protein function and subcellular localization) and text classification applications show that the discriminatively trained sequence classifiers outperform their generative counterparts, confirming the benefits of discriminative training when the primary objective is classification. Our experiments also show that the discriminatively trained MM(k 1) sequence classifiers are competitive with the computationally much more expensive Support Vector Machines trained using kgram representations of sequences.
Learning probabilistic models of tree edit distance
, 2008
"... Nowadays, there is a growing interest in machine learning and pattern recognition for treestructured data. Trees actually provide a suitable structural representation to deal with complex tasks such as web information extraction, RNA secondary structure prediction, computer music, or conversion of ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
(Show Context)
Nowadays, there is a growing interest in machine learning and pattern recognition for treestructured data. Trees actually provide a suitable structural representation to deal with complex tasks such as web information extraction, RNA secondary structure prediction, computer music, or conversion of semistructured data (e.g. XML documents). Many applications in these domains require the calculation of similarities over pairs of trees. In this context, the tree edit distance (ED) has been subject of investigations for many years in order to improve its computational efficiency. However, used in its classical form, the tree ED needs a priori fixed edit costs which are often difficult to tune, that leaves little room for tackling complex problems. In this paper, to overcome this drawback, we focus on the automatic learning of a non parametric stochastic tree ED. More precisely, we are interested in two kinds of probabilistic approaches. The first one builds a generative model of the tree ED from a joint distribution over the edit operations, while the second works from a conditional distribution providing then a discriminative model. To tackle