• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Semi-Supervised Classification by Low Density Separation (2005)

by Olivier Chapelle, Alexander Zien
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 77
Next 10 →

Semi-Supervised Learning Literature Survey

by Xiaojin Zhu , 2006
"... We review the literature on semi-supervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semi-supervised learning. This document is a chapter ..."
Abstract - Cited by 268 (7 self) - Add to MetaCart
We review the literature on semi-supervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semi-supervised learning. This document is a chapter excerpt from the author’s doctoral thesis (Zhu, 2005). However the author plans to update the online version frequently to incorporate the latest development in the field. Please obtain the latest version at http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf

Manifold regularization: A geometric framework for learning from examples

by Mikhail Belkin, Partha Niyogi, Vikas Sindhwani, Peter Bartlett - Journal of Machine Learning Research , 2004
"... We propose a family of learning algorithms based on a new form of regularization that allows us to exploit the geometry of the marginal distribution. We focus on a semi-supervised framework that incorporates labeled and unlabeled data in a general-purpose learner. Some transductive graph learning al ..."
Abstract - Cited by 197 (12 self) - Add to MetaCart
We propose a family of learning algorithms based on a new form of regularization that allows us to exploit the geometry of the marginal distribution. We focus on a semi-supervised framework that incorporates labeled and unlabeled data in a general-purpose learner. Some transductive graph learning algorithms and standard methods including Support Vector Machines and Regularized Least Squares can be obtained as special cases. We utilize properties of Reproducing Kernel Hilbert spaces to prove new Representer theorems that provide theoretical basis for the algorithms. As a result (in contrast to purely graph-based approaches) we obtain a natural out-of-sample extension to novel examples and so are able to handle both transductive and truly semi-supervised settings. We present experimental evidence suggesting that our semi-supervised algorithms are able to use unlabeled data effectively. Finally we have a brief discussion of unsupervised and fully supervised learning within our general framework. 1.

Large scale transductive svms

by Ronan Collobert, Fabian Sinz, Jason Weston, Léon Bottou, Thorsten Joachims - JMLR
"... We show how the Concave-Convex Procedure can be applied to Transductive SVMs, which traditionally require solving a combinatorial search problem. This provides for the first time a highly scalable algorithm in the nonlinear case. Detailed experiments verify the utility of our approach. Software is a ..."
Abstract - Cited by 38 (2 self) - Add to MetaCart
We show how the Concave-Convex Procedure can be applied to Transductive SVMs, which traditionally require solving a combinatorial search problem. This provides for the first time a highly scalable algorithm in the nonlinear case. Detailed experiments verify the utility of our approach. Software is available at

Trading convexity for scalability

by Ronan Collobert, Fabian Sinz, Jason Weston, Léon Bottou - ICML06, 23rd International Conference on Machine Learning , 2006
"... Convex learning algorithms, such as Support Vector Machines (SVMs), are often seen as highly desirable because they offer strong practical properties and are amenable to theoretical analysis. However, in this work we show how non-convexity can provide scalability advantages over convexity. We show h ..."
Abstract - Cited by 33 (2 self) - Add to MetaCart
Convex learning algorithms, such as Support Vector Machines (SVMs), are often seen as highly desirable because they offer strong practical properties and are amenable to theoretical analysis. However, in this work we show how non-convexity can provide scalability advantages over convexity. We show how concave-convex programming can be applied to produce (i) faster SVMs where training errors are no longer support vectors, and (ii) much faster Transductive SVMs. 1.

Large Scale Semi-supervised Linear SVMs

by Vikas Sindhwani, et al. , 2006
"... Large scale learning is often realistic only in a semi-supervised setting where a small set of labeled examples is available together with a large collection of unlabeled data. In many information retrieval and data mining applications, linear classifiers are strongly preferred because of their ease ..."
Abstract - Cited by 30 (6 self) - Add to MetaCart
Large scale learning is often realistic only in a semi-supervised setting where a small set of labeled examples is available together with a large collection of unlabeled data. In many information retrieval and data mining applications, linear classifiers are strongly preferred because of their ease of implementation, interpretability and empirical performance. In this work, we present a family of semi-supervised linear support vector classifiers that are designed to handle partially-labeled sparse datasets with possibly very large number of examples and features. At their core, our algorithms employ recently developed modified finite Newton techniques. Our contributions in this paper are as follows: (a) We provide an implementation of Transductive SVM (TSVM) that is significantly more efficient and scalable than currently used dual techniques, for linear classification problems involving large, sparse datasets. (b) We propose a variant of TSVM that involves multiple switching of labels. Experimental results show that this variant provides an order of magnitude further improvement in training efficiency. (c) We present a new algorithm for semi-supervised learning based on a Deterministic Annealing (DA) approach. This algorithm alleviates the problem of local minimum in the TSVM optimization procedure while also being computationally attractive. We conduct an empirical study on several document classification tasks which confirms the value of our methods in large scale semisupervised settings.

Multiclass multiple kernel learning

by Alexander Zien, Cheng Soon Ong - In ICML. ACM
"... In many applications it is desirable to learn from several kernels. “Multiple kernel learning” (MKL) allows the practitioner to optimize over linear combinations of kernels. By enforcing sparse coefficients, it also generalizes feature selection to kernel selection. We propose MKL for joint feature ..."
Abstract - Cited by 26 (3 self) - Add to MetaCart
In many applications it is desirable to learn from several kernels. “Multiple kernel learning” (MKL) allows the practitioner to optimize over linear combinations of kernels. By enforcing sparse coefficients, it also generalizes feature selection to kernel selection. We propose MKL for joint feature maps. This provides a convenient and principled way for MKL with multiclass problems. In addition, we can exploit the joint feature map to learn kernels on output spaces. We show the equivalence of several different primal formulations including different regularizers. We present several optimization methods, and compare a convex quadratically constrained quadratic program (QCQP) and two semi-infinite linear programs (SILPs) on toy data, showing that the SILPs are faster than the QCQP. We then demonstrate the utility of our method by applying the SILP to three real world datasets. 1.

Deep learning via semi-supervised embedding

by Jason Weston, Frédéric Ratle - International Conference on Machine Learning , 2008
"... We show how nonlinear embedding algorithms popular for use with shallow semisupervised learning techniques such as kernel methods can be applied to deep multilayer architectures, either as a regularizer at the output layer, or on each layer of the architecture. This provides a simple alternative to ..."
Abstract - Cited by 21 (3 self) - Add to MetaCart
We show how nonlinear embedding algorithms popular for use with shallow semisupervised learning techniques such as kernel methods can be applied to deep multilayer architectures, either as a regularizer at the output layer, or on each layer of the architecture. This provides a simple alternative to existing approaches to deep learning whilst yielding competitive error rates compared to those methods, and existing shallow semi-supervised techniques. 1.

A continuation method for semi-supervised svms

by Olivier Chapelle, Mingmin Chi, Alexander Zien - In International Conference on Machine Learning , 2006
"... Semi-Supervised Support Vector Machines (S3VMs) are an appealing method for using unlabeled data in classification: their objective function favors decision boundaries which do not cut clusters. However their main problem is that the optimization problem is non-convex and has many local minima, whic ..."
Abstract - Cited by 17 (3 self) - Add to MetaCart
Semi-Supervised Support Vector Machines (S3VMs) are an appealing method for using unlabeled data in classification: their objective function favors decision boundaries which do not cut clusters. However their main problem is that the optimization problem is non-convex and has many local minima, which often results in suboptimal performances. In this paper we propose to use a global optimization technique known as continuation to alleviate this problem. Compared to other algorithms minimizing the same objective function, our continuation method often leads to lower test errors. 1.

Multiple Instance Learning for Sparse Positive Bags

by Razvan C. Bunescu, Raymond J. Mooney - In ICML , 2007
"... We present a new approach to multiple instance learning (MIL) that is particularly effective when the positive bags are sparse (i.e. contain few positive instances). Unlike other SVM-based MIL methods, our approach more directly enforces the desired constraint that at least one of the instances in a ..."
Abstract - Cited by 17 (0 self) - Add to MetaCart
We present a new approach to multiple instance learning (MIL) that is particularly effective when the positive bags are sparse (i.e. contain few positive instances). Unlike other SVM-based MIL methods, our approach more directly enforces the desired constraint that at least one of the instances in a positive bag is positive. Using both artificial and real-world data, we experimentally demonstrate that our approach achieves greater accuracy than state-of-the-art MIL methods when positive bags are sparse, and performs competitively when they are not. In particular, our approach is the best performing method for image region classification. 1.

Deterministic annealing for semi-supervised kernel machines

by Vikas Sindhwani, S. Sathiya Keerthi, Olivier Chapelle - In International Conference on Machine Learning , 2006
"... An intuitive approach to utilizing unlabeled data in kernel-based classification algorithms is to simply treat unknown labels as additional optimization variables. For marginbased loss functions, one can view this approach as attempting to learn low-density separators. However, this is a hard optimi ..."
Abstract - Cited by 16 (7 self) - Add to MetaCart
An intuitive approach to utilizing unlabeled data in kernel-based classification algorithms is to simply treat unknown labels as additional optimization variables. For marginbased loss functions, one can view this approach as attempting to learn low-density separators. However, this is a hard optimization problem to solve in typical semisupervised settings where unlabeled data is abundant. The popular Transductive SVM algorithm is a label-switching-retraining procedure that is known to be susceptible to local minima. In this paper, we present a global optimization framework for semi-supervised Kernel machines where an easier problem is parametrically deformed to the original hard problem and minimizers are smoothly tracked. Our approach is motivated from deterministic annealing techniques and involves a sequence of convex optimization problems that are exactly and efficiently solved. We present empirical results on several synthetic and real world datasets that demonstrate the effectiveness of our approach. 1.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University