Results 1  10
of
318
Efficient Additive Kernels via Explicit Feature Maps
"... Maji and Berg [13] have recently introduced an explicit feature map approximating the intersection kernel. This enables efficient learning methods for linear kernels to be applied to the nonlinear intersection kernel, expanding the applicability of this model to much larger problems. In this paper ..."
Abstract

Cited by 244 (9 self)
 Add to MetaCart
(Show Context)
Maji and Berg [13] have recently introduced an explicit feature map approximating the intersection kernel. This enables efficient learning methods for linear kernels to be applied to the nonlinear intersection kernel, expanding the applicability of this model to much larger problems. In this paper we generalize this idea, and analyse a large family of additive kernels, called homogeneous, in a unified framework. The family includes the intersection, Hellinger’s, and χ2 kernels commonly employed in computer vision. Using the framework we are able to: (i) provide explicit feature maps for all homogeneous additive kernels along with closed form expression for all common kernels; (ii) derive corresponding approximate finitedimensional feature maps based on the Fourier sampling theorem; and (iii) quantify the extent of the approximation. We demonstrate that the approximations have indistinguishable performance from the full kernel on a number of standard datasets, yet greatly reduce the train/test times of SVM implementations. We show that the χ2 kernel, which has been found to yield the best performance in most applications, also has the most compact feature representation. Given these train/test advantages we are able to obtain a significant performance improvement over current state of the art results based on the intersection kernel. 1.
MedLDA: Maximum Margin Supervised Topic Models for Regression and Classification
"... Supervised topic models utilize document’s side information for discovering predictive low dimensional representations of documents; and existing models apply likelihoodbased estimation. In this paper, we present a maxmargin supervised topic model for both continuous and categorical response variab ..."
Abstract

Cited by 93 (27 self)
 Add to MetaCart
Supervised topic models utilize document’s side information for discovering predictive low dimensional representations of documents; and existing models apply likelihoodbased estimation. In this paper, we present a maxmargin supervised topic model for both continuous and categorical response variables. Our approach, the maximum entropy discrimination latent Dirichlet allocation (MedLDA), utilizes the maxmargin principle to train supervised topic models and estimate predictive topic representations that are arguably more suitable for prediction. We develop efficient variational methods for posterior inference and demonstrate qualitatively and quantitatively the advantages of MedLDA over likelihoodbased topic models on movie review and 20 Newsgroups data sets. 1.
Recognizing Human Actions from Still Images with Latent Poses
"... We consider the problem of recognizing human actions from still images. We propose a novel approach that treats the pose of the person in the image as latent variables that will help with recognition. Different from other work that learns separate systems for pose estimation and action recognition, ..."
Abstract

Cited by 86 (7 self)
 Add to MetaCart
We consider the problem of recognizing human actions from still images. We propose a novel approach that treats the pose of the person in the image as latent variables that will help with recognition. Different from other work that learns separate systems for pose estimation and action recognition, then combines them in an adhoc fashion, our system is trained in an integrated fashion that jointly considers poses and actions. Our learning objective is designed to directly exploit the pose information for action recognition. Our experimental results demonstrate that by inferring the latent poses, we can improve the final action recognition results. 1.
Metric learning to rank
 In Proceedings of the 27th annual International Conference on Machine Learning (ICML
, 2010
"... We study metric learning as a problem of information retrieval. We present a general metric learning algorithm, based on the structural SVM framework, to learn a metric such that rankings of data induced by distance from a query can be optimized against various ranking measures, such as AUC, Precisi ..."
Abstract

Cited by 60 (9 self)
 Add to MetaCart
We study metric learning as a problem of information retrieval. We present a general metric learning algorithm, based on the structural SVM framework, to learn a metric such that rankings of data induced by distance from a query can be optimized against various ranking measures, such as AUC, Precisionatk, MRR, MAP or NDCG. We demonstrate experimental results on standard classification data sets, and a largescale online dating recommendation problem. 1.
Learning human activities and object affordances from rgbd videos. IJRR
, 2013
"... such as making cereal and arranging objects in a room (see Fig. 9). For example, the making cereal activity consists of around 12 subactivities on average, which includes reaching the pitcher, moving the pitcher to the bowl, and then pouring the milk into the bowl. This proves to be a very challeng ..."
Abstract

Cited by 59 (16 self)
 Add to MetaCart
(Show Context)
such as making cereal and arranging objects in a room (see Fig. 9). For example, the making cereal activity consists of around 12 subactivities on average, which includes reaching the pitcher, moving the pitcher to the bowl, and then pouring the milk into the bowl. This proves to be a very challenging task given the variability across individuals in performing each subactivity, and other environment induced conditions such as cluttered background and viewpoint changes. (See Fig. 2 for some examples.) In most previous works, object detection and activity recognition have been addressed as separate tasks. Only recently, some works have shown that modeling mutual context is beneficial (Gupta et al., 2009; Yao and FeiFei, 2010). The key idea in our work is to note that, in activity detection, it is sometimes more informative to know how an object is being used (associated affordances, Gibson, 1979) rather than knowing what the object is (i.e. the object category). For example, both chair and sofa might be categorized as ‘sittable, ’ and a cup might be categorized as both ‘drinkable ’ and ‘pourable. ’ Note that the affordances of an object change over time depending on its use, e.g., a pitcher may first be reachable, then movable and finally pourable. In addition to helping activity recognition, recognizing object affordances is important by itself because of their use in robotic applications (e.g., Kormushev et al., 2010; Jiang et al., 2012a; Jiang and Saxena, 2012). We propose a method to learn human activities by modarXiv:1210.1207v2
Stochastic blockcoordinate frankwolfe optimization for structural svms. arXiv preprint:1207.4747
, 2012
"... We propose a randomized blockcoordinate variant of the classic FrankWolfe algorithm for convex optimization with blockseparable constraints. Despite its lower iteration cost, we show that it achieves a similar convergence rate in duality gap as the full FrankWolfe algorithm. We also show that, w ..."
Abstract

Cited by 58 (6 self)
 Add to MetaCart
We propose a randomized blockcoordinate variant of the classic FrankWolfe algorithm for convex optimization with blockseparable constraints. Despite its lower iteration cost, we show that it achieves a similar convergence rate in duality gap as the full FrankWolfe algorithm. We also show that, when applied to the dual structural support vector machine (SVM) objective, this yields an online algorithm that has the same low iteration complexity as primal stochastic subgradient methods. However, unlike stochastic subgradient methods, the blockcoordinate FrankWolfe algorithm allows us to compute the optimal stepsize and yields a computable duality gap guarantee. Our experiments indicate that this simple algorithm outperforms competing structural SVM solvers. 1.
Learning hierarchical poselets for human parsing
 In CVPR’11
"... We consider the problem of human parsing with partbased models. Most previous work in partbased models only considers rigid parts (e.g. torso, head, half limbs) guided by human anatomy. We argue that this representation of parts is not necessarily appropriate for human parsing. In this paper, we ..."
Abstract

Cited by 52 (2 self)
 Add to MetaCart
(Show Context)
We consider the problem of human parsing with partbased models. Most previous work in partbased models only considers rigid parts (e.g. torso, head, half limbs) guided by human anatomy. We argue that this representation of parts is not necessarily appropriate for human parsing. In this paper, we introduce hierarchical poselets – a new representation for human parsing. Hierarchical poselets can be rigid parts, but they can also be parts that cover large portions of human bodies (e.g. torso + left arm). In the extreme case, they can be the whole bodies. We develop a structured model to organize poselets in a hierarchical way and learn the model parameters in a maxmargin framework. We demonstrate the superior performance of our proposed approach on two datasets with aggressive pose variations. 1.
SelfPaced Learning for Latent Variable Models
, 2010
"... Latent variable models are a powerful tool for addressing several tasks in machine learning. However, the algorithms for learning the parameters of latent variable models are prone to getting stuck in a bad local optimum. To alleviate this problem, we build on the intuition that, rather than conside ..."
Abstract

Cited by 51 (5 self)
 Add to MetaCart
Latent variable models are a powerful tool for addressing several tasks in machine learning. However, the algorithms for learning the parameters of latent variable models are prone to getting stuck in a bad local optimum. To alleviate this problem, we build on the intuition that, rather than considering all samples simultaneously, the algorithm should be presented with the training data in a meaningful order that facilitates learning. The order of the samples is determined by how easy they are. The main challenge is that often we are not provided with a readily computable measure of the easiness of samples. We address this issue by proposing a novel, iterative selfpaced learning algorithm where each iteration simultaneously selects easy samples and learns a new parameter vector. The number of samples selected is governed by a weight that is annealed until the entire training data has been considered. We empirically demonstrate that the selfpaced learning algorithm outperforms the state of the art method for learning a latent structural SVM on four applications: object localization, noun phrase coreference, motif finding and handwritten digit recognition.
Discriminative Learning with Latent Variables for Cluttered Indoor Scene Understanding
, 2010
"... We address the problem of understanding an indoor scene from a single image in terms of recovering the layouts of the faces (floor, ceiling, walls) and furniture. A major challenge of this task arises from the fact that most indoor scenes are cluttered by furniture and decorations, whose appearanc ..."
Abstract

Cited by 44 (1 self)
 Add to MetaCart
(Show Context)
We address the problem of understanding an indoor scene from a single image in terms of recovering the layouts of the faces (floor, ceiling, walls) and furniture. A major challenge of this task arises from the fact that most indoor scenes are cluttered by furniture and decorations, whose appearances vary drastically across scenes, and can hardly be modeled (or even handlabeled) consistently. In this paper we tackle this problem by introducing latent variables to account for clutters, so that the observed image is jointly explained by the face and clutter layouts. Model parameters are learned in the maximum margin formulation, which is constrained by extra prior energy terms that define the role of the latent variables. Our approach enables taking into account and inferring indoor clutter layouts without handlabeling of the clutters in the training set. Yet it outperforms the stateoftheart method of Hedau et al. [4] that requires clutter labels.
A unified framework for multitarget tracking and collective activity recognition
 In ECCV
, 2012
"... Abstract. We present a coherent, discriminative framework for simultaneously tracking multiple people and estimating their collective activities. Instead of treating the two problems separately, our model is grounded in the intuition that a strong correlation exists between a person’s motion, their ..."
Abstract

Cited by 39 (3 self)
 Add to MetaCart
(Show Context)
Abstract. We present a coherent, discriminative framework for simultaneously tracking multiple people and estimating their collective activities. Instead of treating the two problems separately, our model is grounded in the intuition that a strong correlation exists between a person’s motion, their activity, and the motion and activities of other nearby people. Instead of directly linking the solutions to these two problems, we introduce a hierarchy of activity types that creates a natural progression that leads from a specific person’s motion to the activity of the group as a whole. Our model is capable of jointly tracking multiple people, recognizing individual activities (atomic activities), the interactions between pairs of people (interaction activities), and finally the behavior of groups of people (collective activities). We also propose an algorithm for solving this otherwise intractable joint inference problem by combining belief propagation with a version of the branch and bound algorithm equipped with integer programming. Experimental results on challenging video datasets demonstrate our theoretical claims and indicate that our model achieves the best collective activity classification results to date.