Results 1 - 10
of
19
Segmentation and recognition using structure from motion point clouds
- In ECCV
, 2008
"... Abstract. We propose an algorithm for semantic segmentation based on 3D point clouds derived from ego-motion. We motivate five simple cues designed to model specific patterns of motion and 3D world structure that vary with object category. We introduce features that project the 3D cues back to the 2 ..."
Abstract
-
Cited by 29 (8 self)
- Add to MetaCart
Abstract. We propose an algorithm for semantic segmentation based on 3D point clouds derived from ego-motion. We motivate five simple cues designed to model specific patterns of motion and 3D world structure that vary with object category. We introduce features that project the 3D cues back to the 2D image plane while modeling spatial layout and context. A randomized decision forest combines many such features to achieve a coherent 2D segmentation and recognize the object categories present. Our main contribution is to show how semantic segmentation is possible based solely on motion-derived 3D world structure. Our method works well on sparse, noisy point clouds, and unlike existing approaches, does not need appearance-based descriptors. Experiments were performed on a challenging new video database containing sequences filmed from a moving car in daylight and at dusk. The results confirm that indeed, accurate segmentation and recognition are possible using only motion and 3D world structure. Further, we show that the motion-derived information complements an existing state-of-the-art appearance-based method, improving both qualitative and quantitative performance. input video frame reconstructed 3D point cloud automatic segmentation Fig. 1. The proposed algorithm uses 3D point clouds estimated from videos such as the pictured driving sequence (with ground truth inset). Having trained on point clouds from other driving sequences, our new motion and structure features, based purely on the point cloud, perform 11-class semantic segmentation of each test frame. The colors in the ground truth and inferred segmentation indicate category labels. 2 1
Object class segmentation using random forests. BMVC
, 2008
"... This work investigates the use of Random Forests for class based pixel-wise segmentation of images. The contribution of this paper is three-fold. First, we show that apparently quite dissimilar classifiers (such as nearest neighbour matching to texton class histograms) can be mapped onto a Random Fo ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
This work investigates the use of Random Forests for class based pixel-wise segmentation of images. The contribution of this paper is three-fold. First, we show that apparently quite dissimilar classifiers (such as nearest neighbour matching to texton class histograms) can be mapped onto a Random Forest architecture. Second, based on this insight, we show that the performance of such classifiers can be improved by incorporating the spatial context and discriminative learning that arises naturally in the Random Forest framework. Finally, we show that the ability of Random Forests to combine multiple features leads to a further increase in performance when textons, colour, filterbanks, and HOG features are used simultaneously. The benefit of the multi-feature classifier is demonstrated with extensive experimentation on existing labelled image datasets. The method equals or exceeds the state of the art on these datasets. 1
Learning to find object boundaries using motion cues
- In IEEE international conference on computer vision (ICCV
, 2007
"... While great strides have been made in detecting and localizing specific objects in natural images, the bottom-up segmentation of unknown, generic objects remains a difficult challenge. We believe that occlusion can provide a strong cue for object segmentation and “pop-out”, but detecting an object’s ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
While great strides have been made in detecting and localizing specific objects in natural images, the bottom-up segmentation of unknown, generic objects remains a difficult challenge. We believe that occlusion can provide a strong cue for object segmentation and “pop-out”, but detecting an object’s occlusion boundaries using appearance alone is a difficult problem in itself. If the camera or the scene is moving, however, that motion provides an additional powerful indicator of occlusion. Thus, we use standard appearance cues (e.g. brightness/color gradient) in addition to motion cues that capture subtle differences in the relative surface motion (i.e. parallax) on either side of an occlusion boundary. We describe a learned local classifier and global inference approach which provide a framework for combining and reasoning about these appearance and motion cues to estimate which region boundaries of an initial over-segmentation correspond to object/occlusion boundaries in the scene. Through results on a dataset which contains short videos with labeled boundaries, we demonstrate the effectiveness of motion cues for this task. 1.
Moving Object Extraction with a Hand-held Camera
"... This paper presents a new method to detect and accurately extract the moving object from a video sequence taken by a hand-held camera. In order to extract the high quality moving foreground, previous approaches usually assume that the background is static or through only planar-perspective transform ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
This paper presents a new method to detect and accurately extract the moving object from a video sequence taken by a hand-held camera. In order to extract the high quality moving foreground, previous approaches usually assume that the background is static or through only planar-perspective transformation. In our method, based on the robust motion estimation, we are capable of handling challenging videos where the background contains complex depth and the camera undergoes unknown motions. We propose the appearance and structure consistency constraint in 3D warping to robustly model the background, which greatly improves the foreground separation even on the object boundary. The estimated dense motion field and the bilayer segmentation result are iteratively refined where continuous and discrete optimizations are alternatively used. Experimental results of high quality moving object extraction from challenging videos demonstrate the effectiveness of our method. 1.
Occlusion boundaries from motion: Low-level detection and mid-level reasoning
- International Journal of Computer Vision
, 2009
"... Abstract The boundaries of objects in an image are often considered a nuisance to be “handled ” due to the occlusion they exhibit. Since most, if not all, computer vision techniques aggregate information spatially within a scene, information spanning these boundaries, and therefore from different ph ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Abstract The boundaries of objects in an image are often considered a nuisance to be “handled ” due to the occlusion they exhibit. Since most, if not all, computer vision techniques aggregate information spatially within a scene, information spanning these boundaries, and therefore from different physical surfaces, is invariably and erroneously considered together. In addition, these boundaries convey important perceptual information about 3D scene structure and shape. Consequently, their identification can benefit many different computer vision pursuits, from low-level processing techniques to high-level reasoning tasks. While much focus in computer vision is placed on the processing of individual, static images, many applications actually offer video, or sequences of images, as input. The extra temporal dimension of the data allows the motion of the camera or the scene to be used in processing. In this paper, we focus on the exploitation of subtle relative-motion cues present at occlusion boundaries. When combined with more standard appearance information, we demonstrate these cues ’ utility in detecting occlusion boundaries locally. We also present a novel, mid-level model for reasoning more globally about object boundaries and propagating such local information to extract improved, extended boundaries.
Incorporating on-demand stereo for real time recognition
- In CVPR
, 2007
"... A new method for localising and recognising hand poses and objects in real-time is presented. This problem is important in vision-driven applications where it is natural for a user to combine hand gestures and real objects when interacting with a machine. Examples include using a real eraser to remo ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
A new method for localising and recognising hand poses and objects in real-time is presented. This problem is important in vision-driven applications where it is natural for a user to combine hand gestures and real objects when interacting with a machine. Examples include using a real eraser to remove words from a document displayed on an electronic surface. In this paper the task of simultaneously recognising object classes, hand gestures and detecting touch events is cast as a single classification problem. A random forest algorithm is employed which adaptively selects and combines a minimal set of appearance, shape and stereo features to achieve maximum class discrimination for a given image. This minimal set leads to both efficiency at run time and good generalisation. Unlike previous stereo works which explicitly construct disparity maps, here the stereo matching costs are used directly as visual cue and only computed on-demand, i.e. only for pixels where they are necessary for recognition. This leads to improved efficiency. The proposed method is assessed on a database of a variety of objects and hand poses selected for interacting on a flat surface in an office environment. 1.
LIVEcut: Learning-based Interactive Video Segmentation by Evaluation of Multiple Propagated Cues
"... Video sequences contain many cues that may be used to segment objects in them, such as color, gradient, color adjacency, shape, temporal coherence, camera and object motion, and easily-trackable points. This paper introduces LIVEcut, a novel method for interactively selecting objects in video sequen ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Video sequences contain many cues that may be used to segment objects in them, such as color, gradient, color adjacency, shape, temporal coherence, camera and object motion, and easily-trackable points. This paper introduces LIVEcut, a novel method for interactively selecting objects in video sequences by extracting and leveraging as much of this information as possible. Using a graph-cut optimization framework, LIVEcut propagates the selection forward frame by frame, allowing the user to correct any mistakes along the way if needed. Enhanced methods of extracting many of the features are provided. In order to use the most accurate information from the various potentiallyconflicting features, each feature is automatically weighted locally based on its estimated accuracy using the previous implicitly-validated frame. Feature weights are further updated by learning from the user corrections required in the previous frame. The effectiveness of LIVEcut is shown through timing comparisons to other interactive methods, accuracy comparisons to unsupervised methods, and qualitatively through selections on various video sequences. 1.
Robust bilayer segmentation and motion/depth estimation with a handheld camera
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2011
"... Extracting high-quality dynamic foreground layers from a video sequence is a challenging problem due to the coupling of color, motion, and occlusion. Many approaches assume that the background scene is static or undergoes the planar perspective transformation. In this paper, we relax these restrict ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Extracting high-quality dynamic foreground layers from a video sequence is a challenging problem due to the coupling of color, motion, and occlusion. Many approaches assume that the background scene is static or undergoes the planar perspective transformation. In this paper, we relax these restrictions and present a comprehensive system for accurately computing object motion, layer, and depth information. A novel algorithm that combines different clues to extract the foreground layer is proposed, where a voting-like scheme robust to outliers is employed in optimization. The system is capable of handling difficult examples in which the background is nonplanar and the camera freely moves during video capturing. Our work finds several applications, such as high-quality view interpolation and video editing.
FAST MULTI-CLASS IMAGE ANNOTATION WITH RANDOM SUBWINDOWS AND MULTIPLE OUTPUT RANDOMIZED TREES
"... image annotation, machine learning, decision trees, extremely randomized trees, structured outputs This paper addresses image annotation, i.e. labelling pixels of an image with a class among a finite set of predefined classes. We propose a new method which extracts a sample of subwindows from a set ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
image annotation, machine learning, decision trees, extremely randomized trees, structured outputs This paper addresses image annotation, i.e. labelling pixels of an image with a class among a finite set of predefined classes. We propose a new method which extracts a sample of subwindows from a set of annotated images in order to train a subwindow annotation model by using the extremely randomized trees ensemble method appropriately extended to handle high-dimensional output spaces. The annotation of a pixel of an unseen image is done by aggregating the annotations of its subwindows containing this pixel. The proposed method is compared to a more basic approach predicting the class of a pixel from a single window centered on that pixel and to other state-of-the-art image annotation methods. In terms of accuracy, the proposed method significantly outperforms the basic method and shows good performances with respect to the state-of-the-art, while being more generic, conceptually simpler, and of higher computational efficiency than these latter. 1
SKIN DETECTION: A RANDOM FOREST APPROACH
"... Skin detection is used in applications ranging from face detection, tracking body parts and hand gesture analysis, to retrieval and blocking objectionable content. For robust skin segmentation and detection, we investigate color classification based on random forest. A random forest is a statistical ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Skin detection is used in applications ranging from face detection, tracking body parts and hand gesture analysis, to retrieval and blocking objectionable content. For robust skin segmentation and detection, we investigate color classification based on random forest. A random forest is a statistical framework with a very high generalization accuracy and quick training times. The random forest approach is used with the IHLS color space for raw pixel based skin detection. We evaluate random forest based skin detection and compare it to Bayesian network, Multilayer Perceptron, SVM, AdaBoost, Naive Bayes and RBF network. Results on a database of 8991 images with manually annotated pixel-level ground truth show that with the IHLS color space, the random forest approach outperforms other approaches. We also show the effect of increasing the number of trees grown for random forest. With fewer trees we get faster training times and with 10 trees we get the highest F-score. 1.

