Results 1 - 10
of
33
Evaluation of Super-Voxel Methods for Early Video Processing
"... Supervoxel segmentation has strong potential to be incorporated into early video analysis as superpixel segmentation has in image analysis. However, there are many plausible supervoxel methods and little understanding as to when and where each is most appropriate. Indeed, we are not aware of a singl ..."
Abstract
-
Cited by 43 (7 self)
- Add to MetaCart
(Show Context)
Supervoxel segmentation has strong potential to be incorporated into early video analysis as superpixel segmentation has in image analysis. However, there are many plausible supervoxel methods and little understanding as to when and where each is most appropriate. Indeed, we are not aware of a single comparative study on supervoxel segmentation. To that end, we study five supervoxel algorithms in the context of what we consider to be a good supervoxel: namely, spatiotemporal uniformity, object/region boundary detection, region compression and parsimony. For the evaluation we propose a comprehensive suite of 3D volumetric quality metrics to measure these desirable supervoxel characteristics. We use three benchmark video data sets with a variety of content-types and varying amounts of human annotations. Our findings have led us to conclusive evidence that the hierarchical graph-based and segmentation by weighted aggregation methods perform best and almost equally-well on nearly all the metrics and are the methods of choice given our proposed assumptions. 1.
V.: SEEDS: Superpixels extracted via energy-driven sampling
- In European Conference on Computer Vision, ECCV (7) (2012
"... Abstract. Superpixel algorithms aim to over-segment the image by grouping pixels that belong to the same object. Many state-of-the-art superpixel algorithms rely on minimizing objective functions to enforce color homogeneity. The optimization is accomplished by sophisticated methods that progressiv ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
(Show Context)
Abstract. Superpixel algorithms aim to over-segment the image by grouping pixels that belong to the same object. Many state-of-the-art superpixel algorithms rely on minimizing objective functions to enforce color homogeneity. The optimization is accomplished by sophisticated methods that progressively build the superpixels, typically by adding cuts or growing superpixels. As a result, they are computationally too expensive for real-time applications. We introduce a new approach based on a simple hill-climbing optimization. Starting from an initial superpixel partitioning, it continuously refines the superpixels by modifying the boundaries. We define a robust and fast to evaluate energy function, based on enforcing color similarity between the boundaries and the superpixel color histogram. In a series of experiments, we show that we achieve an excellent compromise between accuracy and efficiency. We are able to achieve a performance comparable to the state-of-the-art, but in real-time on a single Intel i7 CPU at 2.8GHz.
Submodular Dictionary Learning for Sparse Coding
"... A greedy-based approach to learn a compact and discriminative dictionary for sparse representation is presented. We propose an objective function consisting of two components: entropy rate of a random walk on a graph and a discriminative term. Dictionary learning is achieved by finding a graph topol ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
A greedy-based approach to learn a compact and discriminative dictionary for sparse representation is presented. We propose an objective function consisting of two components: entropy rate of a random walk on a graph and a discriminative term. Dictionary learning is achieved by finding a graph topology which maximizes the objective function. By exploiting the monotonicity and submodularity properties of the objective function and the matroid constraint, we present a highly efficient greedy-based optimization algorithm. It is more than an order of magnitude faster than several recently proposed dictionary learning approaches. Moreover, the greedy algorithm gives a near-optimal solution with a (1/2)-approximation bound. Our approach yields dictionaries having the property that feature points from the same class have very similar sparse codes. Experimental results demonstrate that our approach outperforms several recently proposed dictionary learning
Image segmentation by cascaded region agglomeration
- In CVPR
, 2013
"... We propose a hierarchical segmentation algorithm that starts with a very fine oversegmentation and gradually merges regions using a cascade of boundary classifiers. This approach allows the weights of region and boundary features to adapt to the segmentation scale at which they are applied. The stag ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
(Show Context)
We propose a hierarchical segmentation algorithm that starts with a very fine oversegmentation and gradually merges regions using a cascade of boundary classifiers. This approach allows the weights of region and boundary features to adapt to the segmentation scale at which they are applied. The stages of the cascade are trained sequentially, with asymetric loss to maximize boundary recall. On six segmentation data sets, our algorithm achieves best performance under most region-quality measures, and does it with fewer segments than the prior work. Our algorithm is also highly competitive in a dense oversegmentation (superpixel) regime under boundary-based measures. 1.
Articulated Pose Estimation with Parts Connectivity using Discriminative Local Oriented Contours
"... This paper proposes contour-based features for artic-ulated pose estimation. Most of recent methods are de-signed using tree-structured models with appearance evalu-ation only within the region of each part. While these mod-els allow us to speed up global optimization in localizing the whole parts, ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
This paper proposes contour-based features for artic-ulated pose estimation. Most of recent methods are de-signed using tree-structured models with appearance evalu-ation only within the region of each part. While these mod-els allow us to speed up global optimization in localizing the whole parts, useful appearance cues between neighbor-ing parts are missing. Our work focuses on how to evalu-ate parts connectivity using contour cues. Unlike previous works, we locally evaluate parts connectivity only along the orientation between neighboring parts within where they overlap. This adaptive localization of the features is re-quired for suppressing bad effects due to nuisance edges such as those of background clutter and clothing textures, as well as for reducing computational cost. Discriminative training of the contour features improves estimation accu-racy more. Experimental results verify the effectiveness of our contour-based features. 1.
Matching-cnn meets knn: Quasiparametric human parsing
- CVPR
"... Both parametric and non-parametric approaches have demonstrated encouraging performances in the human parsing task, namely segmenting a human image into sev-eral semantic regions (e.g., hat, bag, left arm, face). In this work, we aim to develop a new solution with the advan-tages of both methodologi ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Both parametric and non-parametric approaches have demonstrated encouraging performances in the human parsing task, namely segmenting a human image into sev-eral semantic regions (e.g., hat, bag, left arm, face). In this work, we aim to develop a new solution with the advan-tages of both methodologies, namely supervision from an-notated data and the flexibility to use newly annotated (pos-sibly uncommon) images, and present a quasi-parametric human parsing model. Under the classic K Nearest Neigh-bor (KNN)-based nonparametric framework, the paramet-ric Matching Convolutional Neural Network (M-CNN) is proposed to predict the matching confidence and displace-ments of the best matched region in the testing image for a particular semantic region in one KNN image. Given a testing image, we first retrieve its KNN images from the annotated/manually-parsed human image corpus. Then each semantic region in each KNN image is matched with confidence to the testing image using M-CNN, and the matched regions from all KNN images are further fused, followed by a superpixel smoothing procedure to obtain the ultimate human parsing result. The M-CNN differs from the classic CNN [12] in that the tailored cross image match-ing filters are introduced to characterize the matching be-tween the testing image and the semantic region of a KNN image. The cross image matching filters are defined at dif-ferent convolutional layers, each aiming to capture a par-ticular range of displacements. Comprehensive evaluations over a large dataset with 7,700 annotated human images well demonstrate the significant performance gain from the quasi-parametric model over the state-of-the-arts [29, 30], for the human parsing task. 1.
Road Boundary Detection in Challenging Scenarios
"... Abstract—This paper presents a new approach for automatic road detection in traffic cameras. The technique proposed here detects the dominant road boundary and estimates the vanishing point in images captured by traffic cameras under a wide range of lighting and environmental conditions, e.g., in im ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract—This paper presents a new approach for automatic road detection in traffic cameras. The technique proposed here detects the dominant road boundary and estimates the vanishing point in images captured by traffic cameras under a wide range of lighting and environmental conditions, e.g., in images of unlit highways captured at night, etc. The approach starts by segmenting the traffic scene into a number of superpixel regions. The contours of these regions are used to generate a large number of edges which are organized into clusters of co-linearly similar sets using hierarchical bottom up clustering. A confidence level is assigned to each cluster using a statistical approach and the best clusters are chosen. Pairs of clusters with high confidence levels are then ranked and filtered according to image perspective and activity. The top ranked pair is selected as the road boundary. The proposed technique is tested on a real world dataset collected from the Ontario 401 traffic surveillance system. Experimental results demonstrates a distinct speedup and improvement in accuracy of the proposed technique in detecting the dominant road boundary in challenging scenarios compared to the state of the art Gabor filter based technique.
Constraints as Features
"... In this paper, we introduce a new approach to constrained clustering which treats the constraints as features. Our method augments the original feature space with additional dimensions, each of which derived from a given Cannot-link constraints. The specified Cannot-link pair gets extreme coordinate ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
In this paper, we introduce a new approach to constrained clustering which treats the constraints as features. Our method augments the original feature space with additional dimensions, each of which derived from a given Cannot-link constraints. The specified Cannot-link pair gets extreme coordinates values, and the rest of the points get coordinate values that express their spatial influence from the specified constrained pair. After augmenting all the new features, a standard unconstrained clustering algorithm can be performed, like k-means or spectral clustering. We demonstrate the efficacy of our method for active semi-supervised learning applied to image segmentation and compare it to alternative methods. We also evaluate the performance of our method on the four most commonly evaluated datasets from the UCI machine learning repository. 1.
Deep Hierarchical Parsing for Semantic Segmentation
, 2015
"... This paper proposes a learning-based approach to scene parsing inspired by the deep Re-cursive Context Propagation Network (RCPN). RCPN is a deep feed-forward neural network that utilizes the contextual information from the entire image, through bottom-up followed by top-down context propagation via ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
This paper proposes a learning-based approach to scene parsing inspired by the deep Re-cursive Context Propagation Network (RCPN). RCPN is a deep feed-forward neural network that utilizes the contextual information from the entire image, through bottom-up followed by top-down context propagation via random binary parse trees. This improves the feature rep-resentation of every super-pixel in the image for better classification into semantic categories. We analyze RCPN and propose two novel contributions to further improve the model. We first analyze the learning of RCPN parameters and discover the presence of bypass error paths in the computation graph of RCPN that can hinder contextual propagation. We propose to tackle this problem by including the classification loss of the internal nodes of the random parse trees in the original RCPN loss function. Secondly, we use an MRF on the parse tree nodes to model the hierarchical dependency present in the output. Both modifications pro-vide performance boosts over the original RCPN and the new system achieves state-of-the-art performance on Stanford Background, SIFT-Flow and Daimler urban datasets.