Results 1  10
of
175
SemiSupervised Learning Literature Survey
, 2006
"... We review the literature on semisupervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole
spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semisupervised learning. This document is a chapter ..."
Abstract

Cited by 757 (8 self)
 Add to MetaCart
We review the literature on semisupervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole
spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semisupervised learning. This document is a chapter excerpt from the author’s
doctoral thesis (Zhu, 2005). However the author plans to update the online version frequently to incorporate the latest development in the field. Please obtain the latest
version at http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf
Large scale transductive svms
 JMLR
"... We show how the ConcaveConvex Procedure can be applied to Transductive SVMs, which traditionally require solving a combinatorial search problem. This provides for the first time a highly scalable algorithm in the nonlinear case. Detailed experiments verify the utility of our approach. Software is a ..."
Abstract

Cited by 92 (5 self)
 Add to MetaCart
(Show Context)
We show how the ConcaveConvex Procedure can be applied to Transductive SVMs, which traditionally require solving a combinatorial search problem. This provides for the first time a highly scalable algorithm in the nonlinear case. Detailed experiments verify the utility of our approach. Software is available at
Trading convexity for scalability
 ICML06, 23rd International Conference on Machine Learning
, 2006
"... Convex learning algorithms, such as Support Vector Machines (SVMs), are often seen as highly desirable because they offer strong practical properties and are amenable to theoretical analysis. However, in this work we show how nonconvexity can provide scalability advantages over convexity. We show h ..."
Abstract

Cited by 88 (3 self)
 Add to MetaCart
(Show Context)
Convex learning algorithms, such as Support Vector Machines (SVMs), are often seen as highly desirable because they offer strong practical properties and are amenable to theoretical analysis. However, in this work we show how nonconvexity can provide scalability advantages over convexity. We show how concaveconvex programming can be applied to produce (i) faster SVMs where training errors are no longer support vectors, and (ii) much faster Transductive SVMs. 1.
Large Scale Semisupervised Linear SVMs
, 2006
"... Large scale learning is often realistic only in a semisupervised setting where a small set of labeled examples is available together with a large collection of unlabeled data. In many information retrieval and data mining applications, linear classifiers are strongly preferred because of their ease ..."
Abstract

Cited by 73 (9 self)
 Add to MetaCart
Large scale learning is often realistic only in a semisupervised setting where a small set of labeled examples is available together with a large collection of unlabeled data. In many information retrieval and data mining applications, linear classifiers are strongly preferred because of their ease of implementation, interpretability and empirical performance. In this work, we present a family of semisupervised linear support vector classifiers that are designed to handle partiallylabeled sparse datasets with possibly very large number of examples and features. At their core, our algorithms employ recently developed modified finite Newton techniques. Our contributions in this paper are as follows: (a) We provide an implementation of Transductive SVM (TSVM) that is significantly more efficient and scalable than currently used dual techniques, for linear classification problems involving large, sparse datasets. (b) We propose a variant of TSVM that involves multiple switching of labels. Experimental results show that this variant provides an order of magnitude further improvement in training efficiency. (c) We present a new algorithm for semisupervised learning based on a Deterministic Annealing (DA) approach. This algorithm alleviates the problem of local minimum in the TSVM optimization procedure while also being computationally attractive. We conduct an empirical study on several document classification tasks which confirms the value of our methods in large scale semisupervised settings.
Optimization Techniques for SemiSupervised Support Vector Machines
"... Due to its wide applicability, the problem of semisupervised classification is attracting increasing attention in machine learning. SemiSupervised Support Vector Machines (S 3 VMs) are based on applying the margin maximization principle to both labeled and unlabeled examples. Unlike SVMs, their fo ..."
Abstract

Cited by 66 (6 self)
 Add to MetaCart
(Show Context)
Due to its wide applicability, the problem of semisupervised classification is attracting increasing attention in machine learning. SemiSupervised Support Vector Machines (S 3 VMs) are based on applying the margin maximization principle to both labeled and unlabeled examples. Unlike SVMs, their formulation leads to a nonconvex optimization problem. A suite of algorithms have recently been proposed for solving S 3 VMs. This paper reviews key ideas in this literature. The performance and behavior of various S 3 VM algorithms is studied together, under a common experimental setting.
Deep learning via semisupervised embedding
 International Conference on Machine Learning
, 2008
"... We show how nonlinear embedding algorithms popular for use with shallow semisupervised learning techniques such as kernel methods can be applied to deep multilayer architectures, either as a regularizer at the output layer, or on each layer of the architecture. This provides a simple alternative to ..."
Abstract

Cited by 66 (5 self)
 Add to MetaCart
(Show Context)
We show how nonlinear embedding algorithms popular for use with shallow semisupervised learning techniques such as kernel methods can be applied to deep multilayer architectures, either as a regularizer at the output layer, or on each layer of the architecture. This provides a simple alternative to existing approaches to deep learning whilst yielding competitive error rates compared to those methods, and existing shallow semisupervised techniques. 1.
A novel transductive SVM for the semisupervised classification of remote sensing images
 IEEE Trans. Geoscience and Remote Sensing
, 2006
"... Abstract—This paper introduces a semisupervised classification method that exploits both labeled and unlabeled samples for addressing illposed problems with support vector machines (SVMs). The method is based on recent developments in statistical learning theory concerning transductive inference an ..."
Abstract

Cited by 63 (8 self)
 Add to MetaCart
(Show Context)
Abstract—This paper introduces a semisupervised classification method that exploits both labeled and unlabeled samples for addressing illposed problems with support vector machines (SVMs). The method is based on recent developments in statistical learning theory concerning transductive inference and in particular transductive SVMs (TSVMs). TSVMs exploit specific iterative algorithms which gradually search a reliable separating hyperplane (in the kernel space) with a transductive process that incorporates both labeled and unlabeled samples in the training phase. Based on an analysis of the properties of the TSVMs presented in the literature, a novel modified TSVM classifier designed for addressing illposed remotesensing problems is proposed. In particular, the proposed technique: 1) is based on a novel transductive procedure that exploits a weighting strategy for unlabeled patterns, based on a timedependent criterion; 2) is able to mitigate the effects of suboptimal model selection (which is unavoidable in the presence of smallsize training sets); and 3) can address multiclass cases. Experimental results confirm the effectiveness of the proposed method on a set of illposed remotesensing classification problems representing different operative conditions. Index Terms—Illposed problems, labeled and unlabeled patterns, machine learning, remote sensing, semisupervised classification, support vector machines (SVMs), transductive inference. I.
Multiclass multiple kernel learning
 In ICML. ACM
"... In many applications it is desirable to learn from several kernels. “Multiple kernel learning” (MKL) allows the practitioner to optimize over linear combinations of kernels. By enforcing sparse coefficients, it also generalizes feature selection to kernel selection. We propose MKL for joint feature ..."
Abstract

Cited by 61 (4 self)
 Add to MetaCart
(Show Context)
In many applications it is desirable to learn from several kernels. “Multiple kernel learning” (MKL) allows the practitioner to optimize over linear combinations of kernels. By enforcing sparse coefficients, it also generalizes feature selection to kernel selection. We propose MKL for joint feature maps. This provides a convenient and principled way for MKL with multiclass problems. In addition, we can exploit the joint feature map to learn kernels on output spaces. We show the equivalence of several different primal formulations including different regularizers. We present several optimization methods, and compare a convex quadratically constrained quadratic program (QCQP) and two semiinfinite linear programs (SILPs) on toy data, showing that the SILPs are faster than the QCQP. We then demonstrate the utility of our method by applying the SILP to three real world datasets. 1.
Diffrac: a discriminative and flexible framework for clustering
 In Advances in Neural Information Processing Systems 20
, 2007
"... We present a novel linear clustering framework (DIFFRAC) which relies on a linear discriminative cost function and a convex relaxation of a combinatorial optimization problem. The large convex optimization problem is solved through a sequence of lower dimensional singular value decompositions. Thi ..."
Abstract

Cited by 54 (11 self)
 Add to MetaCart
(Show Context)
We present a novel linear clustering framework (DIFFRAC) which relies on a linear discriminative cost function and a convex relaxation of a combinatorial optimization problem. The large convex optimization problem is solved through a sequence of lower dimensional singular value decompositions. This framework has several attractive properties: (1) although apparently similar to Kmeans, it exhibits superior clustering performance than Kmeans, in particular in terms of robustness to noise. (2) It can be readily extended to non linear clustering if the discriminative cost function is based on positive definite kernels, and can then be seen as an alternative to spectral clustering. (3) Prior information on the partition is easily incorporated, leading to stateoftheart performance for semisupervised learning, for clustering or classification. We present empirical evaluations of our algorithms on synthetic and real mediumscale datasets. 1
Image classification with segmentation graph kernels
 In Proc. CVPR
, 2007
"... We propose a family of kernels between images, defined as kernels between their respective segmentation graphs. The kernels are based on soft matching of subtreepatterns of the respective graphs, leveraging the natural structure of images while remaining robust to the associated segmentation proces ..."
Abstract

Cited by 47 (12 self)
 Add to MetaCart
(Show Context)
We propose a family of kernels between images, defined as kernels between their respective segmentation graphs. The kernels are based on soft matching of subtreepatterns of the respective graphs, leveraging the natural structure of images while remaining robust to the associated segmentation process uncertainty. Indeed, output from morphological segmentation is often represented by a labelled graph, each vertex corresponding to a segmented region, with edges joining neighboring regions. However, such image representations have mostly remained underused for learning tasks, partly because of the observed instability of the segmentation process and the inherent hardness of inexact graph matching with uncertain graphs. Our kernels count common virtual substructures amongst images, which enables to perform efficient supervised classification of natural images with a support vector machine. Moreover, the kernel machinery allows us to take advantage of recent advances in kernelbased learning: i) semisupervised learning reduces the required number of labelled images, while ii) multiple kernel learning algorithms efficiently select the most relevant similarity measures between images within our family. 1.