• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Discriminative, generative and imitative learning. (2002)

by T Jebara
Add To MetaCart

Tools

Sorted by:
Results 11 - 20 of 43
Next 10 →

Efficient Heuristics for Discriminative Structure Learning of Bayesian Network Classifiers

by Franz Pernkopf, Jeff A. Bilmes, Russ Greiner
"... We introduce a simple order-based greedy heuristic for learning discriminative structure within generative Bayesian network classifiers. We propose two methods for establishing an order of N features. They are based on the conditional mutual information and classification rate (i.e., risk), respecti ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
We introduce a simple order-based greedy heuristic for learning discriminative structure within generative Bayesian network classifiers. We propose two methods for establishing an order of N features. They are based on the conditional mutual information and classification rate (i.e., risk), respectively. Given an ordering, we can find a discriminative structure withO ( N k+1) score evaluations (where constant k is the tree-width of the sub-graph over the attributes). We present results on 25 data sets from the UCI repository, for phonetic classification using the TIMIT database, for a visual surface inspection task, and for two handwritten digit recognition tasks. We provide classification performance for both discriminative and generative parameter learning on both discriminatively and generatively structured networks. The discriminative structure found by our new procedures significantly outperforms generatively produced structures, and achieves a classification accuracy on par with the best discriminative (greedy) Bayesian network learning approach, but does so with a factor of ∼10-40 speedup. We also show that the advantages of generative discriminatively structured Bayesian network classifiers still hold in the case of missing features, a case where generative classifiers have an advantage over discriminative classifiers.
(Show Context)

Citation Context

...es. 3. Discriminative Learning in Generative Models A dichotomy exists between the two primary approaches to statistical pattern classifiers, generative and discriminative (Bishop and Lasserre, 2007; =-=Jebara, 2001-=-; Ng and Jordan, 2002; Bishop, 2006; Raina et al., 2004; Juang et al., 1997; Juang and Katagiri, 1992; Bahl et al., 1986). Under generative models, what is learned is a model of the joint probability ...

Nonparametric max-margin matrix factorization for collaborative prediction

by Minjie Xu , Jun Zhu , Bo Zhang - In Advances in Neural Information Processing Systems 25 , 2012
"... Abstract We present a probabilistic formulation of max-margin matrix factorization and build accordingly a nonparametric Bayesian model which automatically resolves the unknown number of latent factors. Our work demonstrates a successful example that integrates Bayesian nonparametrics and max-margi ..."
Abstract - Cited by 7 (1 self) - Add to MetaCart
Abstract We present a probabilistic formulation of max-margin matrix factorization and build accordingly a nonparametric Bayesian model which automatically resolves the unknown number of latent factors. Our work demonstrates a successful example that integrates Bayesian nonparametrics and max-margin learning, which are conventionally two separate paradigms and enjoy complementary advantages. We develop an efficient variational algorithm for posterior inference, and our extensive empirical studies on large-scale MovieLens and EachMovie data sets appear to justify the aforementioned dual advantages.
(Show Context)

Citation Context

...ides an elegant way to integrate discriminative max-margin learning and Bayesian generative modeling. In fact, MED subsumes SVM as a special case and has been extended to incorporate latent variables =-=[5, 18]-=- and perform structured output prediction [21]. Recent work has further extended MED to unite Bayesian nonparametrics and max-margin learning [20, 19], which have been largely treated as isolated topi...

Statistical Part-Based Models: Theory and Applications in Image Similarity, Object Detection and Region Labeling

by Dong-Qing Zhang , 2005
"... The automatic analysis and indexing of visual content in unconstrained domain are impor-tant and challenging problems for a variety of multimedia applications. Much of the prior research work deals with the problems by modeling images and videos as feature vectors, such as global histogram or block- ..."
Abstract - Cited by 6 (1 self) - Add to MetaCart
The automatic analysis and indexing of visual content in unconstrained domain are impor-tant and challenging problems for a variety of multimedia applications. Much of the prior research work deals with the problems by modeling images and videos as feature vectors, such as global histogram or block-based representation. Despite substantial research efforts on analysis and indexing algorithms based on this representation, their performance remains unsatisfactory. This dissertation attempts to explore the problem from a different perspective through a part-based representation, where images and videos are represented as a collection of parts with their appearance and relational features. Such representation is partly motivated by the human vision research showing that the human vision system adopts similar mechanism to perceive images. Although part-based representation has been investigated for decades, most of the prior work has been focused on ad hoc or deterministic approaches, which require manual designs of the models and often have poor performance for real-world images or videos due to their inability to model uncertainty and noise. The main focus of this thesis instead is on incorporating statistical modeling and machine learning techniques into the

Confidence and Margin-Based MMI/MPE Discriminative Training for Offline Handwriting Recognition

by Philippe Dreuw, Georg Heigold, Hermann Ney - INTERNATIONAL JOURNAL OF DOCUMENT ANALYSIS AND RECOGNITION , 2011
"... We present a novel confidence- and marginbased discriminative training approach for model adaptation of a hidden Markov model (HMM) based handwriting recognition system to handle different handwriting styles and their variations. Most current approaches are maximum-likelihood (ML) trained HMM syst ..."
Abstract - Cited by 6 (4 self) - Add to MetaCart
We present a novel confidence- and marginbased discriminative training approach for model adaptation of a hidden Markov model (HMM) based handwriting recognition system to handle different handwriting styles and their variations. Most current approaches are maximum-likelihood (ML) trained HMM systems and try to adapt their models to different writing styles using writer adaptive training, unsupervised clustering, or additional writer specific data. Here, discriminative training based on the maximum mutual information (MMI) and minimum phone error (MPE) criteria are used to train writer independent handwriting models. For model adaptation during decoding, an unsupervised confidence-based discriminative training on a word and frame level within a two-pass decoding process is proposed. The proposed methods are evaluated for closedvocabulary isolated handwritten word recognition on the IFN/ENIT Arabic handwriting database, where the word-error-rate is decreased by 33 % relative compared to a ML trained baseline system. On the largevocabulary line recognition task of the IAM English handwriting database, the word-error-rate is decreased by 25 % relative.

Selection of generative models in classification

by Guillaume Bouchard, Gilles Celeux - IEEE Transactions on Pattern Analysis and Machine Intelligence , 2006
"... Abstract—This paper is concerned with the selection of a generative model for supervised classification. Classical criteria for model selection assess the fit of a model rather than its ability to produce a low classification error rate. A new criterion, the Bayesian Entropy Criterion (BEC), is prop ..."
Abstract - Cited by 6 (1 self) - Add to MetaCart
Abstract—This paper is concerned with the selection of a generative model for supervised classification. Classical criteria for model selection assess the fit of a model rather than its ability to produce a low classification error rate. A new criterion, the Bayesian Entropy Criterion (BEC), is proposed. This criterion takes into account the decisional purpose of a model by minimizing the integrated classification entropy. It provides an interesting alternative to the cross-validated error rate which is computationally expensive. The asymptotic behavior of the BEC criterion is presented. Numerical experiments on both simulated and real data sets show that BEC performs better than the BIC criterion to select a model minimizing the classification error rate and provides analogous performance to the cross-validated error rate. Index Terms—Generative classification, integrated likelihood, integrated conditional likelihood, classification entropy, cross-validated error rate, AIC and BIC criteria.
(Show Context)

Citation Context

...ds to the plug-in classifier ðXÞ argmax pðX; Y kj k2f1;...;Kg ^ mÞ; ð4Þ where ^ m is an estimator of the parameter m based on the training data. This approach is known as generative classification =-=[20]-=-, [35]. The maximum-likelihood (ml) estimator based on the class-conditional distributions is a popular estimator in which the joint likelihood of the input x ðx1; ...;xnÞ and output y ðy1; ...;ynÞ ...

Confidence-Based Discriminative Training for Model Adaptation in Offline Arabic Handwriting Recognition

by Georg Heigold, Hermann Ney
"... We present a novel confidence-based discriminative training for model adaptation approach for an HMM based Arabic handwriting recognition system to handle different handwriting styles and their variations. Most current approaches are maximum-likelihood trained HMM systems and try to adapt their mode ..."
Abstract - Cited by 5 (3 self) - Add to MetaCart
We present a novel confidence-based discriminative training for model adaptation approach for an HMM based Arabic handwriting recognition system to handle different handwriting styles and their variations. Most current approaches are maximum-likelihood trained HMM systems and try to adapt their models to different writing styles using writer adaptive training, unsupervised clustering, or additional writer specific data. Discriminative training based on the Maximum Mutual Information criterion is used to train writer independent handwriting models. For model adaptation during decoding, an unsupervised confidence-based discriminative training on a word and frame level within a two-pass decoding process is proposed. Additionally, the training criterion is extended to incorporate a margin term. The proposed methods are evaluated on the IFN/ENIT Arabic handwriting database, where the proposed novel adaptation approach can decrease the word-error-rate by 33 % relative. 1.
(Show Context)

Citation Context

...native training using the Maximum Mutual Information (MMI) which is modified by a margin term. This margin term can be interpreted as an additional observationdependent prior weakening the true prior =-=[9]-=-, and is identical with the SVM optimization problem of log-linear models [7]. The most common way for unsupervised adaptation is the use of the automatic transcription of a previous recognition pass ...

Gin i-support vector machine: Quadratic entropy based multi-class probability regression

by Shantanu Chakrabartty, Gert Cauwenberghs, Alex Smola - Journal of Machine Learning Research , 2007
"... Many classification tasks require estimation of output class probabilities for use as confidence scores or for inference integrated with other models. Probability estimates derived from large margin classifiers such as support vector machines (SVMs) are often unreliable. We extend SVM large margin c ..."
Abstract - Cited by 4 (2 self) - Add to MetaCart
Many classification tasks require estimation of output class probabilities for use as confidence scores or for inference integrated with other models. Probability estimates derived from large margin classifiers such as support vector machines (SVMs) are often unreliable. We extend SVM large margin classification to GiniSVM maximum entropy multi-class probability regression. GiniSVM combines a quadratic (Gini-Simpson) entropy based agnostic model with a kernel based similarity model. A form of Huber loss in the GiniSVM primal formulation elucidates a connection to robust estimation, further corroborated by the impulsive noise filtering property of the reverse water-filling procedure to arrive at normalized classification margins. The GiniSVM normalized classification margins directly provide estimates of class conditional probabilities, approximating kernel logistic regression (KLR) but at reduced computational cost. As with other SVMs, GiniSVM produces a sparse kernel expansion and is trained by solving a quadratic program under linear constraints. GiniSVM training is efficiently implemented by sequential minimum optimization or by growth transformation on probability functions. Results on synthetic and benchmark data, including speaker verification and face detection data, show improved classification performance and increased tolerance to imprecision over soft-margin SVM and KLR.
(Show Context)

Citation Context

... of this paper is to describe a unifying framework for SVM based classification that directly produces probability scores. Previous work in this area used Shannon entropy in a large margin framework (=-=Jebara, 2001-=-) which led directly to KLR and hence inherited its potential disadvantages of non-sparsity. One of the important contributions of the paper is exploration of links between maximum entropy based learn...

Learning Methods for Sequential Decision Making with Imperfect Representations

by Shivaram Kalyanakrishnan , 2011
"... ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
Abstract not found

Lower Bounds on the Redundancy of Natural Images

by Reshad Hosseini, Fabian Sinz, Matthias Bethge
"... The light intensities of natural images exhibit a high degree of redundancy. Knowing the exact amount of their statistical dependencies is important for biological vision as well as compression and coding applications but estimat-ing the total amount of redundancy, the multi-information, is intrinsi ..."
Abstract - Cited by 4 (3 self) - Add to MetaCart
The light intensities of natural images exhibit a high degree of redundancy. Knowing the exact amount of their statistical dependencies is important for biological vision as well as compression and coding applications but estimat-ing the total amount of redundancy, the multi-information, is intrinsically hard. The common approach is to estimate the multi-information for patches of increasing sizes and divide by the number of pixels. Here, we show that the limiting value of this sequence—the multi-information rate—can be better estimated by using another limiting process based on measuring the mutual information between a pixel and a causal neighborhood of increasing size around it. Although in principle this method has been known for decades, its superiority for estimating the multi-information rate of natural images has not been fully exploited yet. Either method provides a lower bound on the multi-information rate, but the mutual information based sequence converges much faster to the multi-information rate than the conventional method does. Using this fact, we provide improved estimates of the multi-information rate of natural images and a better understanding of its underlying spatial struc-
(Show Context)

Citation Context

...kelihood for the conditional distribution which would be equivalent to minimizing the average log-loss of the conditional distribution. Based on Jebara’s work on conditional expectation maximization (=-=Jebara, 2002-=-) we developed a new algorithm (see Appendix II) that we used to optimize the conditional likelihood for the GSM model. The result of this optimization is shown in Fig. 6(a). In this way we obtained o...

Learning from Partially Labeled Data

by Martin Szummer, Tommi Jaakkola, Tomaso Poggio - Massachusetts Inst. of Technology , 2002
"... The Problem: Learning from data with both labeled training points (x,y pairs) and unlabeled training points (x alone). For the labeled points, supervised learning techniques apply, but theycannot take advantage of the unlabeled points. On the other hand, unsupervised techniques can model the unlabel ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
The Problem: Learning from data with both labeled training points (x,y pairs) and unlabeled training points (x alone). For the labeled points, supervised learning techniques apply, but theycannot take advantage of the unlabeled points. On the other hand, unsupervised techniques can model the unlabeled data distribution, but do not exploit the labels. Thus, this task falls between traditional supervised and unsupervised learning. Motivation: Supervised learning performance improves with larger training data sets. Unfortunately, it is often infeasible to obtain labels for large training sets. Assigning labels can require expensive resources such as human labor or laboratory tests. In some cases ground truth labels are impossible to obtain, e.g. if the necessary measurements can no longer be made, or if the labels will be given only in the future. In contrast, unlabeled training data is frequently easy to obtain in large quantities, and can outnumber the amount of labeled data by a large factor. For example, it is expensive to collect image databases of only faces, but it is cheap to collect arbitrary imagery with occasional faces, e.g. by crawling the world wide web, or by pointing a video camera out the window. There are also developmental motivations for studying the process of learning from partially data. Children acquire language mainly by listening and imitating, with verylimited feedback from adults. Human beings also excel at other partially labeled learning tasks, suchasvisualdiscrimination with hyperacuity [2]. Previous Work: Learning from partially labeled data is not well understood from the theoretical perspective. Labeled
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University