Results 11 - 20
of
43
Efficient Heuristics for Discriminative Structure Learning of Bayesian Network Classifiers
"... We introduce a simple order-based greedy heuristic for learning discriminative structure within generative Bayesian network classifiers. We propose two methods for establishing an order of N features. They are based on the conditional mutual information and classification rate (i.e., risk), respecti ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
We introduce a simple order-based greedy heuristic for learning discriminative structure within generative Bayesian network classifiers. We propose two methods for establishing an order of N features. They are based on the conditional mutual information and classification rate (i.e., risk), respectively. Given an ordering, we can find a discriminative structure withO ( N k+1) score evaluations (where constant k is the tree-width of the sub-graph over the attributes). We present results on 25 data sets from the UCI repository, for phonetic classification using the TIMIT database, for a visual surface inspection task, and for two handwritten digit recognition tasks. We provide classification performance for both discriminative and generative parameter learning on both discriminatively and generatively structured networks. The discriminative structure found by our new procedures significantly outperforms generatively produced structures, and achieves a classification accuracy on par with the best discriminative (greedy) Bayesian network learning approach, but does so with a factor of ∼10-40 speedup. We also show that the advantages of generative discriminatively structured Bayesian network classifiers still hold in the case of missing features, a case where generative classifiers have an advantage over discriminative classifiers.
Nonparametric max-margin matrix factorization for collaborative prediction
- In Advances in Neural Information Processing Systems 25
, 2012
"... Abstract We present a probabilistic formulation of max-margin matrix factorization and build accordingly a nonparametric Bayesian model which automatically resolves the unknown number of latent factors. Our work demonstrates a successful example that integrates Bayesian nonparametrics and max-margi ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
(Show Context)
Abstract We present a probabilistic formulation of max-margin matrix factorization and build accordingly a nonparametric Bayesian model which automatically resolves the unknown number of latent factors. Our work demonstrates a successful example that integrates Bayesian nonparametrics and max-margin learning, which are conventionally two separate paradigms and enjoy complementary advantages. We develop an efficient variational algorithm for posterior inference, and our extensive empirical studies on large-scale MovieLens and EachMovie data sets appear to justify the aforementioned dual advantages.
Statistical Part-Based Models: Theory and Applications in Image Similarity, Object Detection and Region Labeling
, 2005
"... The automatic analysis and indexing of visual content in unconstrained domain are impor-tant and challenging problems for a variety of multimedia applications. Much of the prior research work deals with the problems by modeling images and videos as feature vectors, such as global histogram or block- ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
The automatic analysis and indexing of visual content in unconstrained domain are impor-tant and challenging problems for a variety of multimedia applications. Much of the prior research work deals with the problems by modeling images and videos as feature vectors, such as global histogram or block-based representation. Despite substantial research efforts on analysis and indexing algorithms based on this representation, their performance remains unsatisfactory. This dissertation attempts to explore the problem from a different perspective through a part-based representation, where images and videos are represented as a collection of parts with their appearance and relational features. Such representation is partly motivated by the human vision research showing that the human vision system adopts similar mechanism to perceive images. Although part-based representation has been investigated for decades, most of the prior work has been focused on ad hoc or deterministic approaches, which require manual designs of the models and often have poor performance for real-world images or videos due to their inability to model uncertainty and noise. The main focus of this thesis instead is on incorporating statistical modeling and machine learning techniques into the
Confidence and Margin-Based MMI/MPE Discriminative Training for Offline Handwriting Recognition
- INTERNATIONAL JOURNAL OF DOCUMENT ANALYSIS AND RECOGNITION
, 2011
"... We present a novel confidence- and marginbased discriminative training approach for model adaptation of a hidden Markov model (HMM) based handwriting recognition system to handle different handwriting styles and their variations. Most current approaches are maximum-likelihood (ML) trained HMM syst ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
We present a novel confidence- and marginbased discriminative training approach for model adaptation of a hidden Markov model (HMM) based handwriting recognition system to handle different handwriting styles and their variations. Most current approaches are maximum-likelihood (ML) trained HMM systems and try to adapt their models to different writing styles using writer adaptive training, unsupervised clustering, or additional writer specific data. Here, discriminative training based on the maximum mutual information (MMI) and minimum phone error (MPE) criteria are used to train writer independent handwriting models. For model adaptation during decoding, an unsupervised confidence-based discriminative training on a word and frame level within a two-pass decoding process is proposed. The proposed methods are evaluated for closedvocabulary isolated handwritten word recognition on the IFN/ENIT Arabic handwriting database, where the word-error-rate is decreased by 33 % relative compared to a ML trained baseline system. On the largevocabulary line recognition task of the IAM English handwriting database, the word-error-rate is decreased by 25 % relative.
Selection of generative models in classification
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2006
"... Abstract—This paper is concerned with the selection of a generative model for supervised classification. Classical criteria for model selection assess the fit of a model rather than its ability to produce a low classification error rate. A new criterion, the Bayesian Entropy Criterion (BEC), is prop ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
(Show Context)
Abstract—This paper is concerned with the selection of a generative model for supervised classification. Classical criteria for model selection assess the fit of a model rather than its ability to produce a low classification error rate. A new criterion, the Bayesian Entropy Criterion (BEC), is proposed. This criterion takes into account the decisional purpose of a model by minimizing the integrated classification entropy. It provides an interesting alternative to the cross-validated error rate which is computationally expensive. The asymptotic behavior of the BEC criterion is presented. Numerical experiments on both simulated and real data sets show that BEC performs better than the BIC criterion to select a model minimizing the classification error rate and provides analogous performance to the cross-validated error rate. Index Terms—Generative classification, integrated likelihood, integrated conditional likelihood, classification entropy, cross-validated error rate, AIC and BIC criteria.
Confidence-Based Discriminative Training for Model Adaptation in Offline Arabic Handwriting Recognition
"... We present a novel confidence-based discriminative training for model adaptation approach for an HMM based Arabic handwriting recognition system to handle different handwriting styles and their variations. Most current approaches are maximum-likelihood trained HMM systems and try to adapt their mode ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
(Show Context)
We present a novel confidence-based discriminative training for model adaptation approach for an HMM based Arabic handwriting recognition system to handle different handwriting styles and their variations. Most current approaches are maximum-likelihood trained HMM systems and try to adapt their models to different writing styles using writer adaptive training, unsupervised clustering, or additional writer specific data. Discriminative training based on the Maximum Mutual Information criterion is used to train writer independent handwriting models. For model adaptation during decoding, an unsupervised confidence-based discriminative training on a word and frame level within a two-pass decoding process is proposed. Additionally, the training criterion is extended to incorporate a margin term. The proposed methods are evaluated on the IFN/ENIT Arabic handwriting database, where the proposed novel adaptation approach can decrease the word-error-rate by 33 % relative. 1.
Gin i-support vector machine: Quadratic entropy based multi-class probability regression
- Journal of Machine Learning Research
, 2007
"... Many classification tasks require estimation of output class probabilities for use as confidence scores or for inference integrated with other models. Probability estimates derived from large margin classifiers such as support vector machines (SVMs) are often unreliable. We extend SVM large margin c ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
Many classification tasks require estimation of output class probabilities for use as confidence scores or for inference integrated with other models. Probability estimates derived from large margin classifiers such as support vector machines (SVMs) are often unreliable. We extend SVM large margin classification to GiniSVM maximum entropy multi-class probability regression. GiniSVM combines a quadratic (Gini-Simpson) entropy based agnostic model with a kernel based similarity model. A form of Huber loss in the GiniSVM primal formulation elucidates a connection to robust estimation, further corroborated by the impulsive noise filtering property of the reverse water-filling procedure to arrive at normalized classification margins. The GiniSVM normalized classification margins directly provide estimates of class conditional probabilities, approximating kernel logistic regression (KLR) but at reduced computational cost. As with other SVMs, GiniSVM produces a sparse kernel expansion and is trained by solving a quadratic program under linear constraints. GiniSVM training is efficiently implemented by sequential minimum optimization or by growth transformation on probability functions. Results on synthetic and benchmark data, including speaker verification and face detection data, show improved classification performance and increased tolerance to imprecision over soft-margin SVM and KLR.
Lower Bounds on the Redundancy of Natural Images
"... The light intensities of natural images exhibit a high degree of redundancy. Knowing the exact amount of their statistical dependencies is important for biological vision as well as compression and coding applications but estimat-ing the total amount of redundancy, the multi-information, is intrinsi ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
(Show Context)
The light intensities of natural images exhibit a high degree of redundancy. Knowing the exact amount of their statistical dependencies is important for biological vision as well as compression and coding applications but estimat-ing the total amount of redundancy, the multi-information, is intrinsically hard. The common approach is to estimate the multi-information for patches of increasing sizes and divide by the number of pixels. Here, we show that the limiting value of this sequence—the multi-information rate—can be better estimated by using another limiting process based on measuring the mutual information between a pixel and a causal neighborhood of increasing size around it. Although in principle this method has been known for decades, its superiority for estimating the multi-information rate of natural images has not been fully exploited yet. Either method provides a lower bound on the multi-information rate, but the mutual information based sequence converges much faster to the multi-information rate than the conventional method does. Using this fact, we provide improved estimates of the multi-information rate of natural images and a better understanding of its underlying spatial struc-
Learning from Partially Labeled Data
- Massachusetts Inst. of Technology
, 2002
"... The Problem: Learning from data with both labeled training points (x,y pairs) and unlabeled training points (x alone). For the labeled points, supervised learning techniques apply, but theycannot take advantage of the unlabeled points. On the other hand, unsupervised techniques can model the unlabel ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The Problem: Learning from data with both labeled training points (x,y pairs) and unlabeled training points (x alone). For the labeled points, supervised learning techniques apply, but theycannot take advantage of the unlabeled points. On the other hand, unsupervised techniques can model the unlabeled data distribution, but do not exploit the labels. Thus, this task falls between traditional supervised and unsupervised learning. Motivation: Supervised learning performance improves with larger training data sets. Unfortunately, it is often infeasible to obtain labels for large training sets. Assigning labels can require expensive resources such as human labor or laboratory tests. In some cases ground truth labels are impossible to obtain, e.g. if the necessary measurements can no longer be made, or if the labels will be given only in the future. In contrast, unlabeled training data is frequently easy to obtain in large quantities, and can outnumber the amount of labeled data by a large factor. For example, it is expensive to collect image databases of only faces, but it is cheap to collect arbitrary imagery with occasional faces, e.g. by crawling the world wide web, or by pointing a video camera out the window. There are also developmental motivations for studying the process of learning from partially data. Children acquire language mainly by listening and imitating, with verylimited feedback from adults. Human beings also excel at other partially labeled learning tasks, suchasvisualdiscrimination with hyperacuity [2]. Previous Work: Learning from partially labeled data is not well understood from the theoretical perspective. Labeled