Results 1 -
9 of
9
Discriminative models for multi-class object layout
"... Many state-of-the-art approaches for object recognition reduce the problem to a 0-1 classification task. Such reductions allow one to leverage sophisticated classifiers for learning. These models are typically trained independently for each class using positive and negative examples cropped from ima ..."
Abstract
-
Cited by 51 (5 self)
- Add to MetaCart
Many state-of-the-art approaches for object recognition reduce the problem to a 0-1 classification task. Such reductions allow one to leverage sophisticated classifiers for learning. These models are typically trained independently for each class using positive and negative examples cropped from images. At test-time, various post-processing heuristics such as non-maxima suppression (NMS) are required to reconcile multiple detections within and between different classes for each image. Though crucial to good performance on benchmarks, this post-processing is usually defined heuristically. We introduce a unified model for multi-class object recognition that casts the problem as a structured prediction task. Rather than predicting a binary label for each image
A Sequential Dual Method for Large Scale Multi-Class Linear SVMs
, 2008
"... Efficient training of direct multi-class formulations of linear Support Vector Machines is very useful in applications such as text classification with a huge number examples as well as features. This paper presents a fast dual method for this training. The main idea is to sequentially traverse thro ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Efficient training of direct multi-class formulations of linear Support Vector Machines is very useful in applications such as text classification with a huge number examples as well as features. This paper presents a fast dual method for this training. The main idea is to sequentially traverse through the training set and optimize the dual variables associated with one example at a time. The speed of training is enhanced further by shrinking and cooling heuristics. Experiments indicate that our method is much faster than state of the art solvers such as bundle, cutting plane and exponentiated gradient methods.
Detecting actions, poses, and objects with relational phraselets. ECCV
, 2012
"... Abstract. We present a novel approach to modeling human pose, together with interacting objects, based on compositional models of local visual interactions and their relations. Skeleton models, while flexible enough to capture large articulations, fail to accurately model selfocclusions and interact ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract. We present a novel approach to modeling human pose, together with interacting objects, based on compositional models of local visual interactions and their relations. Skeleton models, while flexible enough to capture large articulations, fail to accurately model selfocclusions and interactions. Poselets and Visual Phrases address this limitation, but do so at the expense of requiring a large set of templates. We combine all three approaches with a compositional model that is flexible enough to model detailed articulations but still captures occlusions and object interactions. Unlike much previous work on action classification, we do not assume test images are labeled with a person, and instead present results for “action detection ” in an unlabeled image. Notably, for each detection, our model reports back a detailed description including an action label, articulated human pose, object poses, and occlusion flags. We demonstrate that modeling occlusion is crucial for recognizing human-object interactions. We present results on the PASCAL Action
A Note on Structural Extensions of SVMs
, 2009
"... This document is intended as an introduction to structural extensions of support vector machines for those who are familiar with logistic regression (binary and multinomial) and discrete-state probabilistic graphical models (in particular, conditional random fields). No prior knowledge about support ..."
Abstract
- Add to MetaCart
This document is intended as an introduction to structural extensions of support vector machines for those who are familiar with logistic regression (binary and multinomial) and discrete-state probabilistic graphical models (in particular, conditional random fields). No prior knowledge about support vector machines is assumed. The outline is as follows
Int J Comput Vis DOI 10.1007/s11263-011-0439-x Discriminative Models for Multi-Class Object Layout
, 2010
"... Abstract Many state-of-the-art approaches for object recognition reduce the problem to a 0-1 classification task. This allows one to leverage sophisticated machine learning techniques for training classifiers from labeled examples. However, these models are typically trained independently for each c ..."
Abstract
- Add to MetaCart
Abstract Many state-of-the-art approaches for object recognition reduce the problem to a 0-1 classification task. This allows one to leverage sophisticated machine learning techniques for training classifiers from labeled examples. However, these models are typically trained independently for each class using positive and negative examples cropped from images. At test-time, various post-processing heuristics such as non-maxima suppression (NMS) are required to reconcile multiple detections within and between different classes for each image. Though crucial to good performance on benchmarks, this post-processing is usually defined heuristically. We introduce a unified model for multi-class object recognition that casts the problem as a structured prediction task. Rather than predicting a binary label for each image window independently, our model simultaneously predicts a structured labeling of the entire image (Fig. 1). Our model learns statistics that capture the spatial arrangements of various object classes in real images, both in terms of which arrangements to suppress through NMS and which arrangements to favor through spatial co-occurrence statistics. We formulate parameter estimation in our model as a max-margin learning problem. Given training images with ground-truth object locations, we show how to formulate learning as a convex optimization problem. We employ the cutting plane algorithm of Joachims et al. (Mach. Learn. 2009) to efficiently learn a model from thousands
Recognizing Proxemics in Personal Photos
"... Proxemics is the study of how people interact. We present a computational formulation of visual proxemics by attempting to label each pair of people in an image with a subset of physically-based “touch codes. ” A baseline approach would be to first perform pose estimation and then detect the touch c ..."
Abstract
- Add to MetaCart
Proxemics is the study of how people interact. We present a computational formulation of visual proxemics by attempting to label each pair of people in an image with a subset of physically-based “touch codes. ” A baseline approach would be to first perform pose estimation and then detect the touch codes based on the estimated joint locations. We found that this sequential approach does not perform well because pose estimation step is too unreliable for images of interacting people, due to difficulties with occlusion and limb ambiguities. Instead, we propose a direct approach where we build an articulated model tuned for each touch code. Each such model contains two people, connected in an appropriate manner for the touch code in question. We fit this model to the image and then base classification on the fitting error. Experiments show that this approach significantly outperforms the sequential baseline as well as other related approches. 1.
DOI 10.1007/s11263-011-0439-x Discriminative Models for Multi-Class Object Layout
"... Abstract Many state-of-the-art approaches for object recognition reduce the problem to a 0-1 classification task. This allows one to leverage sophisticated machine learning techniques for training classifiers from labeled examples. However, these models are typically trained independently for each c ..."
Abstract
- Add to MetaCart
Abstract Many state-of-the-art approaches for object recognition reduce the problem to a 0-1 classification task. This allows one to leverage sophisticated machine learning techniques for training classifiers from labeled examples. However, these models are typically trained independently for each class using positive and negative examples cropped from images. At test-time, various post-processing heuristics such as non-maxima suppression (NMS) are required to reconcile multiple detections within and between different classes for each image. Though crucial to good performance on benchmarks, this post-processing is usually defined heuristically. We introduce a unified model for multi-class object recognition that casts the problem as a structured prediction task. Rather than predicting a binary label for each image window independently, our model simultaneously predicts a structured labeling of the entire image (Fig. 1). Our model learns statistics that capture the spatial arrangements of various object classes in real images, both in terms of which arrangements to suppress through NMS and which arrangements to favor through spatial co-occurrence statistics. We formulate parameter estimation in our model as a max-margin learning problem. Given training images with The Marr Prize is awarded to the best paper(s) at the biannual flagship vision conference, the IEEE International Conference on Computer Vision (ICCV). This paper is an extended and re-reviewed journal version of the 2009 prize-winning conference paper.
Analyzing 3D Objects in Cluttered Images
"... We present an approach to detecting and analyzing the 3D configuration of objects in real-world images with heavy occlusion and clutter. We focus on the application of finding and analyzing cars. We do so with a two-stage model; the first stage reasons about 2D shape and appearance variation due to ..."
Abstract
- Add to MetaCart
We present an approach to detecting and analyzing the 3D configuration of objects in real-world images with heavy occlusion and clutter. We focus on the application of finding and analyzing cars. We do so with a two-stage model; the first stage reasons about 2D shape and appearance variation due to within-class variation (station wagons look different than sedans) and changes in viewpoint. Rather than using a view-based model, we describe a compositional representation that models a large number of effective views and shapes using a small number of local view-based templates. We use this model to propose candidate detections and 2D estimates of shape. These estimates are then refined by our second stage, using an explicit 3D model of shape and viewpoint. We use a morphable model to capture 3D within-class variation, and use a weak-perspective camera model to capture viewpoint. We learn all model parameters from 2D annotations. We demonstrate state-of-the-art accuracy for detection, viewpoint estimation, and 3D shape reconstruction on challenging images from the PASCAL VOC 2011 dataset. 1

