Results 1 - 10
of
17
Efficient co-regularised least squares regression
- in ICML’06
, 2006
"... In many applications, unlabelled examples are inexpensive and easy to obtain. Semisupervised approaches try to utilise such examples to reduce the predictive error. In this paper, we investigate a semi-supervised least squares regression algorithm based on the co-learning approach. Similar to other ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
In many applications, unlabelled examples are inexpensive and easy to obtain. Semisupervised approaches try to utilise such examples to reduce the predictive error. In this paper, we investigate a semi-supervised least squares regression algorithm based on the co-learning approach. Similar to other semisupervised algorithms, our base algorithm has cubic runtime complexity in the number of unlabelled examples. To be able to handle larger sets of unlabelled examples, we devise a semi-parametric variant that scales linearly in the number of unlabelled examples. Experiments show a significant error reduction by co-regularisation and a large runtime improvement for the semi-parametric approximation. Last but not least, we propose a distributed procedure that can be applied without collecting all data at a single site. 1.
Learning to integrate web taxonomies
- Journal of Web Semantics
, 2004
"... We investigate machine learning methods for automatically integrating objects from different taxonomies into a master taxonomy. This problem is not only currently pervasive on the Web, but is also important to the emerging Semantic Web. A straightforward approach to automating this process would be ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
We investigate machine learning methods for automatically integrating objects from different taxonomies into a master taxonomy. This problem is not only currently pervasive on the Web, but is also important to the emerging Semantic Web. A straightforward approach to automating this process would be to build classifiers through machine learning and then use these classifiers to classify objects from the source taxonomies into categories of the master taxonomy. However, conventional machine learning algorithms totally ignore the availability of the source taxonomies. In fact, source and master taxonomies often have common categories under different names or other more complex semantic overlaps. We introduce two techniques that exploit the semantic overlap between the source and master taxonomies to build better classifiers for the master taxonomy. The first technique, Cluster Shrinkage, biases the learning algorithm against splitting source categories by making objects in the same category appear more similar to each other. The second technique, Co-Bootstrapping, tries to facilitate the exploitation of inter-taxonomy relationships by providing category indicator functions as additional features for the objects. Our experiments with real-world Web data show that these proposed add-on techniques can enhance various machine learning algorithms to achieve substantial improvements in performance for taxonomy integration.
Multiview discriminative sequential learning
- Proceedings of the European Conference on Machine Learning
, 2005
"... 1 Introduction The problem of labeling observation sequences has applications that range fromlanguage processing tasks such as named entity recognition, part-of-speech tagging, and information extraction to biological tasks in which the instances areoften DNA strings. Traditionally, sequence models ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
1 Introduction The problem of labeling observation sequences has applications that range fromlanguage processing tasks such as named entity recognition, part-of-speech tagging, and information extraction to biological tasks in which the instances areoften DNA strings. Traditionally, sequence models such as the hidden Markov model and variants thereof have been applied to the label sequence learningproblem. Learning procedures for generative models adjust the parameters such that the joint likelihood of training observations and label sequences is maxi-mized. By contrast, from the application point of view the true benefit of a label sequence predictor corresponds to its ability to find the correct label sequencegiven an observation sequence.
A General Model for Multiple View Unsupervised Learning
, 2008
"... Multiple view data, which have multiple representations from different feature spaces or graph spaces, arise in various data mining applications such as information retrieval, bioinformatics and social network analysis. Since different representations could have very different statistical properties ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Multiple view data, which have multiple representations from different feature spaces or graph spaces, arise in various data mining applications such as information retrieval, bioinformatics and social network analysis. Since different representations could have very different statistical properties, how to learn a consensus pattern from multiple representations is a challenging problem. In this paper, we propose a general model for multiple view unsupervised learning. The proposed model introduces the concept of mapping function to make the different patterns from different pattern spaces comparable and hence an optimal pattern can be learned from the multiple patterns of multiple representations. Under this model, we formulate two specific models for
Bayesian Co-Training
"... We propose a Bayesian undirected graphical model for co-training, or more generally for semi-supervised multi-view learning. This makes explicit the previously unstated assumptions of a large class of co-training type algorithms, and also clarifies the circumstances under which these assumptions fai ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We propose a Bayesian undirected graphical model for co-training, or more generally for semi-supervised multi-view learning. This makes explicit the previously unstated assumptions of a large class of co-training type algorithms, and also clarifies the circumstances under which these assumptions fail. Building upon new insights from this model, we propose an improved method for co-training, which is a novel co-training kernel for Gaussian process classifiers. The resulting approach is convex and avoids local-maxima problems, unlike some previous multi-view learning methods. Furthermore, it can automatically estimate how much each view should be trusted, and thus accommodate noisy or unreliable views. Experiments on toy data and real world data sets illustrate the benefits of this approach. 1
Estimation of mixture models using Co-EM
- In Proceedings of the ICML Workshop on Learning with Multiple Views
, 2005
"... We study estimation of mixture models for problems in which multiple views of the instances are available. Examples of this setting include clustering web pages or research papers that have intrinsic (text) and extrinsic (references) attributes. Our optimization criterion quantifies the likelihood a ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We study estimation of mixture models for problems in which multiple views of the instances are available. Examples of this setting include clustering web pages or research papers that have intrinsic (text) and extrinsic (references) attributes. Our optimization criterion quantifies the likelihood and the consensus among models in the individual views; maximizing this consensus minimizes a bound on the risk of assigning an instance to an incorrect mixture component. We derive an algorithm that maximizes this criterion. Empirically, we observe that the resulting clustering method incurs a lower cluster entropy than regular EM for web pages, research papers, and many text collections. 1.
Multi-View Hidden Markov Perceptrons
"... Discriminative learning techniques for sequential data have proven to be more effective than generative models for named entity recognition, information extraction, and other tasks of discrimination. ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Discriminative learning techniques for sequential data have proven to be more effective than generative models for named entity recognition, information extraction, and other tasks of discrimination.
ON SEMI-SUPERVISED KERNEL METHODS
"... Semi-supervised learning is an emerging computational paradigm for learning from limited supervision by utilizing large amounts of inexpensive, unsupervised observations. Not only does this paradigm carry appeal as a model for natural learning, but it also has an increasing practical need in most if ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Semi-supervised learning is an emerging computational paradigm for learning from limited supervision by utilizing large amounts of inexpensive, unsupervised observations. Not only does this paradigm carry appeal as a model for natural learning, but it also has an increasing practical need in most if not all applications of machine learning – those where abundant amounts of data can be cheaply and automatically collected but manual labeling for the purposes of training learning algorithms is often slow, expensive, and error-prone. In this thesis, we develop families of algorithms for semi-supervised inference. These algorithms are based on intuitions about the natural structure and geometry of probability distributions that underlie typical datasets for learning. The classical framework of Regularization in Reproducing Kernel Hilbert Spaces (which is the basis of state-of-the-art supervised algorithms such as SVMs) is extended in several ways to utilize unlabeled data. These extensions are embodied in the following contributions: (1) Manifold Regularization is based on the assumption that high-dimensional
Predictive Subspace Learning for Multi-view Data: a Large Margin Approach
"... Learning from multi-view data is important in many applications, such as image classification and annotation. In this paper, we present a large-margin learning framework to discover a predictive latent subspace representation shared by multiple views. Our approach is based on an undirected latent sp ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Learning from multi-view data is important in many applications, such as image classification and annotation. In this paper, we present a large-margin learning framework to discover a predictive latent subspace representation shared by multiple views. Our approach is based on an undirected latent space Markov network that fulfills a weak conditional independence assumption that multi-view observations and response variables are independent given a set of latent variables. We provide efficient inference and parameter estimation methods for the latent subspace model. Finally, we demonstrate the advantages of large-margin learning on real video and web image data for discovering predictive latent representations and improving the performance on image classification, annotation and retrieval. 1
Discriminative Models for Semi-Supervised Natural Language Learning
"... An interesting question surrounding semisupervised learning for NLP is: should we use discriminative models or generative models? Despite the fact that generative models have been frequently employed in a semi-supervised setting since the early days of the statistical revolution in NLP, we advocate ..."
Abstract
- Add to MetaCart
An interesting question surrounding semisupervised learning for NLP is: should we use discriminative models or generative models? Despite the fact that generative models have been frequently employed in a semi-supervised setting since the early days of the statistical revolution in NLP, we advocate the use of discriminative models. The ability of discriminative models to handle complex, high-dimensional feature spaces and their strong theoretical guarantees have made them a very appealing alternative to their generative counterparts. Perhaps more importantly, discriminative models have been shown to offer competitive performance on a variety of sequential and structured learning tasks in NLP that are

