Results 1  10
of
77
Learning Structural SVMs with Latent Variables
"... It is well known in statistics and machine learning that the combination of latent (or hidden) variables and observed variables offer more expressive power than models with observed variables alone. Latent variables ..."
Abstract

Cited by 215 (2 self)
 Add to MetaCart
(Show Context)
It is well known in statistics and machine learning that the combination of latent (or hidden) variables and observed variables offer more expressive power than models with observed variables alone. Latent variables
Large scale transductive svms
 JMLR
"... We show how the ConcaveConvex Procedure can be applied to Transductive SVMs, which traditionally require solving a combinatorial search problem. This provides for the first time a highly scalable algorithm in the nonlinear case. Detailed experiments verify the utility of our approach. Software is a ..."
Abstract

Cited by 93 (5 self)
 Add to MetaCart
(Show Context)
We show how the ConcaveConvex Procedure can be applied to Transductive SVMs, which traditionally require solving a combinatorial search problem. This provides for the first time a highly scalable algorithm in the nonlinear case. Detailed experiments verify the utility of our approach. Software is available at
Trading convexity for scalability
 ICML06, 23rd International Conference on Machine Learning
, 2006
"... Convex learning algorithms, such as Support Vector Machines (SVMs), are often seen as highly desirable because they offer strong practical properties and are amenable to theoretical analysis. However, in this work we show how nonconvexity can provide scalability advantages over convexity. We show h ..."
Abstract

Cited by 90 (3 self)
 Add to MetaCart
(Show Context)
Convex learning algorithms, such as Support Vector Machines (SVMs), are often seen as highly desirable because they offer strong practical properties and are amenable to theoretical analysis. However, in this work we show how nonconvexity can provide scalability advantages over convexity. We show how concaveconvex programming can be applied to produce (i) faster SVMs where training errors are no longer support vectors, and (ii) much faster Transductive SVMs. 1.
Maximum margin clustering made practical.
 IEEE Transactions on Neural Networks,
, 2009
"... ..."
(Show Context)
Learning on the border: Active learning in imbalanced data classification
 In Proc. ACM Conf. on Information and Knowledge Management (CIKM ’07
, 2007
"... This paper is concerned with the class imbalance problem which has been known to hinder the learning performance of classification algorithms. The problem occurs when there are significantly less number of observations of the target concept. Various realworld classification tasks, such as medical d ..."
Abstract

Cited by 51 (3 self)
 Add to MetaCart
(Show Context)
This paper is concerned with the class imbalance problem which has been known to hinder the learning performance of classification algorithms. The problem occurs when there are significantly less number of observations of the target concept. Various realworld classification tasks, such as medical diagnosis, text categorization and fraud detection suffer from this phenomenon. The standard machine learning algorithms yield better prediction performance with balanced datasets. In this paper, we demonstrate that active learning is capable of solving the class imbalance problem by providing the learner more balanced classes. We also propose an efficient way of selecting informative instances from a smaller pool of samples for active learning which does not necessitate a search through the entire dataset. The proposed method yields an efficient querying system and allows active learning to be applied to very large datasets. Our experimental results show that with an early stopping criteria, active learning achieves a fast solution with competitive prediction performance in imbalanced data classification.
S.: Weaklysupervised hashing in kernel space
 In: Proceedings of Computer Vision and Pattern Recognition
, 2010
"... The explosive growth of the vision data motivates the recent studies on efficient data indexing methods such as localitysensitive hashing (LSH). Most existing approaches perform hashing in an unsupervised way. In this paper we move one step forward and propose a supervised hashing method, i.e., the ..."
Abstract

Cited by 50 (4 self)
 Add to MetaCart
(Show Context)
The explosive growth of the vision data motivates the recent studies on efficient data indexing methods such as localitysensitive hashing (LSH). Most existing approaches perform hashing in an unsupervised way. In this paper we move one step forward and propose a supervised hashing method, i.e., the LAbelregularized Maxmargin Partition (LAMP) algorithm. The proposed method generates hash functions in weaklysupervised setting, where a small portion of sample pairs are manually labeled to be “similar” or “dissimilar”. We formulate the task as a Constrained ConvexConcave Procedure (CCCP), which can be relaxed into a series of convex subproblems solvable with efficient QuadraticProgram (QP). The proposed hashing method possesses other characteristics including: 1) most existing LSH approaches rely on linear feature representation. Unfortunately, kernel tricks are often more natural to gauge the similarity between visual objects in vision research, which corresponds to probably infinitedimensional Hilbert spaces. The proposed LAMP has a natural support for kernelbased feature representation. 2) traditional hashing methods assume uniform data distributions. Typically, the collision probability of two samples in hash buckets is only determined by pairwise similarity, unrelated to contextual data distribution. In contrast, we provide such a collision bound which is beyond pairwise data interaction based on Markov random fields theory. Extensive empirical evaluations are conducted on five widelyused benchmarks. It takes only several seconds to generate a new hashing function, and the adopted random supportingvector scheme enables the LAMP algorithm scalable to largescale problems. Experimental results well validate the superiorities of the LAMP algorithm over the stateoftheart kernelbased hashing methods. 1.
On the convergence of concaveconvex procedure
 In NIPS Workshop on Optimization for Machine Learning
, 2009
"... The concaveconvex procedure (CCCP) is a majorizationminimization algorithm that solves d.c. (difference of convex functions) programs as a sequence of convex programs. In machine learning, CCCP is extensively used in many learning algorithms like sparse support vector machines (SVMs), transductive ..."
Abstract

Cited by 48 (1 self)
 Add to MetaCart
(Show Context)
The concaveconvex procedure (CCCP) is a majorizationminimization algorithm that solves d.c. (difference of convex functions) programs as a sequence of convex programs. In machine learning, CCCP is extensively used in many learning algorithms like sparse support vector machines (SVMs), transductive SVMs, sparse principal component analysis, etc. Though widely used in many applications, the convergence behavior of CCCP has not gotten a lot of specific attention. Yuille and Rangarajan analyzed its convergence in their original paper, however, we believe the analysis is not complete. Although the convergence of CCCP can be derived from the convergence of the d.c. algorithm (DCA), its proof is more specialized and technical than actually required for the specific case of CCCP. In this paper, we follow a different reasoning and show how Zangwill’s global convergence theory of iterative algorithms provides a natural framework to prove the convergence of CCCP, allowing a more elegant and simple proof. This underlines Zangwill’s theory as a powerful and general framework to deal with the convergence issues of iterative algorithms, after also being used to prove the convergence of algorithms like expectationmaximization, generalized alternating minimization, etc. In this paper, we provide a rigorous analysis of the convergence of CCCP by addressing these questions: (i) When does CCCP find a local minimum or a stationary point of the d.c. program under consideration? (ii) When does the sequence generated by CCCP converge? We also present an open problem on the issue of local convergence of CCCP. 1
Tighter and convex maximum margin clustering
 In AISTATS, 2009b
"... Maximum margin principle has been successfully applied to many supervised and semisupervised problems in machine learning. Recently, this principle was extended for clustering, referred to as Maximum Margin Clustering (MMC) and achieved promising performance in recent studies. To avoid the problem ..."
Abstract

Cited by 41 (14 self)
 Add to MetaCart
(Show Context)
Maximum margin principle has been successfully applied to many supervised and semisupervised problems in machine learning. Recently, this principle was extended for clustering, referred to as Maximum Margin Clustering (MMC) and achieved promising performance in recent studies. To avoid the problem of local minima, MMC can be solved globally via convex semidefinite programming (SDP) relaxation. Although many efficient approaches have been proposed to alleviate the computational burden of SDP, convex MMCs are still not scalable for medium data sets. In this paper, we propose a novel convex optimization method, LGMMC, which maximizes the margin of opposite clusters via “Label Generation”. It can be shown that LGMMC is much more scalable than existing convex approaches. Moreover, we show that our convex relaxation is tighter than stateofart convex MMCs. Experiments on seventeen UCI datasets and MNIST dataset show significant improvement over existing MMC algorithms. 1
Active policy learning for robot planning and exploration under uncertainty
 IN PROCEEDINGS OF ROBOTICS: SCIENCE AND SYSTEMS
, 2007
"... This paper proposes a simulationbased active policy learning algorithm for finitehorizon, partiallyobserved sequential decision processes. The algorithm is tested in the domain of robot navigation and exploration under uncertainty. In such a setting, the expected cost, that must be minimized, is ..."
Abstract

Cited by 39 (5 self)
 Add to MetaCart
(Show Context)
This paper proposes a simulationbased active policy learning algorithm for finitehorizon, partiallyobserved sequential decision processes. The algorithm is tested in the domain of robot navigation and exploration under uncertainty. In such a setting, the expected cost, that must be minimized, is a function of the belief state (filtering distribution). This filtering distribution is in turn nonlinear and subject to discontinuities, which arise because constraints in the robot motion and control models. As a result, the expected cost is nondifferentiable and very expensive to simulate. The new algorithm overcomes the first difficulty and reduces the number of required simulations as follows. First, it assumes that we have carried out previous simulations which returned values of the expected cost for different corresponding policy parameters. Second, it fits a Gaussian process (GP) regression model to these values, so as to approximate the expected cost as a function of the policy parameters. Third, it uses the GP predicted mean and variance to construct a statistical measure that determines which policy parameters should be used in the next simulation. The process is then repeated using the new parameters and the newly gathered expected cost observation. Since the objective is to find the policy parameters that minimize the expected cost, this iterative active learning approach effectively tradesoff between exploration (in regions where the GP variance is large) and exploitation (where the GP mean is low). In our experiments, a robot uses the proposed algorithm to plan an optimal path for accomplishing a series of tasks, while maximizing the information about its pose and map estimates. These estimates are obtained with a standard filter for simultaneous localization and mapping. Upon gathering new observations, the robot updates the state estimates and is able to replan a new path in the spirit of openloop feedback control.
Multiinstance multilabel learning
 Artificial Intelligence
"... In this paper, we propose the MIML (MultiInstance MultiLabel learning) framework where an example is described by multiple instances and associated with multiple class labels. Compared to traditional learning frameworks, the MIML framework is more convenient and natural for representing complicate ..."
Abstract

Cited by 38 (16 self)
 Add to MetaCart
(Show Context)
In this paper, we propose the MIML (MultiInstance MultiLabel learning) framework where an example is described by multiple instances and associated with multiple class labels. Compared to traditional learning frameworks, the MIML framework is more convenient and natural for representing complicated objects which have multiple semantic meanings. To learn from MIML examples, we propose the MimlBoost and MimlSvm algorithms based on a simple degeneration strategy, and experiments show that solving problems involving complicated objects with multiple semantic meanings in the MIML framework can lead to good performance. Consideringthat the degeneration process may lose information, we propose the DMimlSvm algorithm which tackles MIML problems directly in a regularization framework. Moreover, we show that even when we do not have access to the real objects and thus cannot capture more information from real objects by using the MIML representation, MIML is still useful. We propose the InsDif and SubCod algorithms. InsDif works by transforming singleinstances into the MIML representation for learning, while SubCod works by transforming singlelabel examples into the MIML representation for learning. Experiments show that in some tasks they are able to achieve better performance than learning the singleinstances or singlelabel examples directly.