DMCA
Unsupervised Object Discovery and Tracking in Video Collections (2015)
Citations
10600 | Introduction to algorithms
- Cormen, Leiserson, et al.
- 1995
(Show Context)
Citation Context ...se according to their confidence scores. Only the selected regions are considered during optimization for efficiency. The objective of Eq.(1) is then efficiently optimized by dynamic programming (DP) =-=[6, 26]-=-. Note that using the p best tubes (p = 5 in all our experiments) for each video at each iteration (except the last one), instead of retaining only one candidate, increases the robustness of our appro... |
3735 | Histograms of oriented gradients for human detection. CVPR
- Dalal, Triggs
- 2005
(Show Context)
Citation Context ...rance and geometric consistency. Assume two sets of region proposals have been extracted from vt and vu: Rt = R(vt) and Ru = R(vu). Let rt = (ft, lt) ∈ Rt be a region with its 8 × 8 HOG descriptor ft =-=[6, 13]-=- and its location lt, i.e., position and scale. The score for match m = (rt, ru) is decomposed into an appearance term ma = (ft, fu) and a geometry term mg = (lt, lu). Let x denote the location offset... |
839 | ImageNet: a largescale hierarchical image database
- Deng, Dong
(Show Context)
Citation Context ...ion. 1. Introduction Visual learning and interpretation is traditionally formulated as a supervised classification problem, with manually selected bounding boxes acting as (strong) supervisory signal =-=[7, 9]-=-. To reduce human effort and subjective biases in manual annotation, recent work has addressed the discovery and localization of objects from weakly-annotated or even unlabelled datasets [4, 5, 8, 26,... |
700 | Object tracking: A survey
- Yilmaz, Javed, et al.
- 2006
(Show Context)
Citation Context ...to detect object candidates. Similar approaches have been proposed for salient region detection [16], image cosegmentation [31, 32], and image colocalization [4]. Conventional object tracking methods =-=[35]-=- usually require annotations for at least one frame [12, 14, 34], or object detectors trained for target classes in a supervised manner [1, 2, 22]. Our method does not require such supervision and ins... |
649 | A.: The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results
- Everingham, Gool, et al.
- 2012
(Show Context)
Citation Context ...ion. 1. Introduction Visual learning and interpretation is traditionally formulated as a supervised classification problem, with manually selected bounding boxes acting as (strong) supervisory signal =-=[7, 9]-=-. To reduce human effort and subjective biases in manual annotation, recent work has addressed the discovery and localization of objects from weakly-annotated or even unlabelled datasets [4, 5, 8, 26,... |
315 | Using multiple segmentations to discover objects and their extent in image collections - Russell, Efros, et al. - 2006 |
197 | Discovering object categories in image collections,
- Sivic, Russell, et al.
- 2005
(Show Context)
Citation Context ...g tube for each video. We dub this iterative process a discovery and tracking procedure since finding the tubes maximizing foreground confidence across videos is akin to unsupervised object discovery =-=[4, 10, 11, 24, 27]-=-, whereas finding the tubes maximizing temporal consistency within a video is similar to object tracking [1, 2, 12, 22, 34, 35]. Interestingly, because we update the matching neighborhood structure at... |
185 | Small codes and large image databases for recognition
- Torralba, Fergus, et al.
- 2008
(Show Context)
Citation Context ...pdate the neighborhood structure N by k nearest neighbor retrieval for each localized object region. At the first iteration, the nearest neighbor search is based on distances between GIST descriptors =-=[30]-=- of frames as the tube r is initialized as the entire video. From the second iteration, the metric is defined as the appearance similarity between potential object regions localized at the previous it... |
145 | Object segmentation by long term analysis of point trajectories
- Brox, Malik
- 2010
(Show Context)
Citation Context .... Prest et al. [23] generate spatio-temporal tubes of object candidates, and select one of these per video through energy minimization. Since the candidate tubes rely only on clusters of point tracks =-=[3]-=-, this 1 Object Discovery across Videos Tracking Object within Video Figure 1. Given a noisy collection of videos, dominant objects are automatically localized as spatio-temporal tubes. The discovery ... |
111 | Struck: Structured output tracking with kernels.
- Hare, Saffari, et al.
- 2011
(Show Context)
Citation Context ...n proposed for salient region detection [16], image cosegmentation [31, 32], and image colocalization [4]. Conventional object tracking methods [35] usually require annotations for at least one frame =-=[12, 14, 34]-=-, or object detectors trained for target classes in a supervised manner [1, 2, 22]. Our method does not require such supervision and instead alternates discovery and tracking of object candidates. The... |
93 | Globally-Optimal Greedy Algorithms for Tracking a Variable Number of Objects,” - Pirsiavash, Ramanan, et al. - 2011 |
81 |
Stable multi-target tracking in real-time surveillance video,” in
- Benfold, Reid
- 2011
(Show Context)
Citation Context ...ge colocalization [4]. Conventional object tracking methods [35] usually require annotations for at least one frame [12, 14, 34], or object detectors trained for target classes in a supervised manner =-=[1, 2, 22]-=-. Our method does not require such supervision and instead alternates discovery and tracking of object candidates. The problem we address is closely related to video object colocalization [15, 23], wh... |
78 | Online Multi-Person Trackingby-Detection from a Single, Uncalibrated Camera.
- Breitenstein, Reichlin, et al.
- 2010
(Show Context)
Citation Context ...ge colocalization [4]. Conventional object tracking methods [35] usually require annotations for at least one frame [12, 14, 34], or object detectors trained for target classes in a supervised manner =-=[1, 2, 22]-=-. Our method does not require such supervision and instead alternates discovery and tracking of object candidates. The problem we address is closely related to video object colocalization [15, 23], wh... |
77 | Online object tracking: A benchmark.
- Wu, Lim, et al.
- 2013
(Show Context)
Citation Context ...n proposed for salient region detection [16], image cosegmentation [31, 32], and image colocalization [4]. Conventional object tracking methods [35] usually require annotations for at least one frame =-=[12, 14, 34]-=-, or object detectors trained for target classes in a supervised manner [1, 2, 22]. Our method does not require such supervision and instead alternates discovery and tracking of object candidates. The... |
74 | Unsupervised Learning of Categories from Sets of Partially Matching Image Features
- Grauman, Darrell
- 2006
(Show Context)
Citation Context ...g tube for each video. We dub this iterative process a discovery and tracking procedure since finding the tubes maximizing foreground confidence across videos is akin to unsupervised object discovery =-=[4, 10, 11, 24, 27]-=-, whereas finding the tubes maximizing temporal consistency within a video is similar to object tracking [1, 2, 12, 22, 34, 35]. Interestingly, because we update the matching neighborhood structure at... |
69 | Discriminative decorrelation for clustering and classification.
- Hariharan, Malik, et al.
- 2012
(Show Context)
Citation Context ...rance and geometric consistency. Assume two sets of region proposals have been extracted from vt and vu: Rt = R(vt) and Ru = R(vu). Let rt = (ft, lt) ∈ Rt be a region with its 8 × 8 HOG descriptor ft =-=[6, 13]-=- and its location lt, i.e., position and scale. The score for match m = (rt, ru) is decomposed into an appearance term ma = (ft, fu) and a geometry term mg = (lt, lu). Let x denote the location offset... |
60 | Key-segments for video object segmentation
- Lee, Kim, et al.
- 2011
(Show Context)
Citation Context ...tion or cosegmentation in videos. For video object segmentation, clusters of long-term point tracks have been used [3, 19, 20], while assuming that points from the same object have similar tracks. In =-=[17, 21]-=-, appearances of potential object and background are modeled and combined with motion information for the task. These methods produce results for individual videos and do not investigate relationships... |
52 | Learning object class detectors from weakly annotated video
- Prest, Leistner, et al.
- 2012
(Show Context)
Citation Context ...y still lay significantly behind strongly-supervised methods. With the ever growing popularity of video sharing sites such as YouTube, recent research has started to handle the similar task in videos =-=[15, 23, 25, 33]-=-, and has shown that exploiting the space-time structure of the world, which is absent in static images, e.g., motion information, may be crucial for achieving object discovery or localization with le... |
51 | Localizing objects while learning their appearance.
- Deselaers, Alexe, et al.
- 2010
(Show Context)
Citation Context ...signal [7, 9]. To reduce human effort and subjective biases in manual annotation, recent work has addressed the discovery and localization of objects from weakly-annotated or even unlabelled datasets =-=[4, 5, 8, 26, 28]-=-. However, this task is difficult and most approaches today still lay significantly behind strongly-supervised methods. With the ever growing popularity of video sharing sites such as YouTube, recent ... |
34 | A.: Unsupervised detection of regions of interest using iterative link analysis - Kim, Torralba - 2009 |
34 | Object segmentation in video: A hierarchical variational approach for turning point trajectories into dense regions
- Ochs, Brox
- 2011
(Show Context)
Citation Context ...cts in our experiments of Section 5.3. Our setting is also related to object segmentation or cosegmentation in videos. For video object segmentation, clusters of long-term point tracks have been used =-=[3, 19, 20]-=-, while assuming that points from the same object have similar tracks. In [17, 21], appearances of potential object and background are modeled and combined with motion information for the task. These ... |
25 |
Prime object proposals with randomized prim’s algorithm.
- Manen, Guillaumin, et al.
- 2013
(Show Context)
Citation Context ...We consider a set of videos v, each consisting of T frames (images) vt (t = 1, . . . , T ), and denote by R(vt) a set of candidate regions identified in vt by some separate bottom-up proposal process =-=[18]-=-. Every region proposal is represented by a box in this paper. We also associate with vt a matching neighborhood N(vt) formed by the k closest frames wu among all videos w 6= v, according to a robust ... |
25 | Higher order motion models and spectral clustering.
- Ochs, Brox
- 2012
(Show Context)
Citation Context ...cts in our experiments of Section 5.3. Our setting is also related to object segmentation or cosegmentation in videos. For video object segmentation, clusters of long-term point tracks have been used =-=[3, 19, 20]-=-, while assuming that points from the same object have similar tracks. In [17, 21], appearances of potential object and background are modeled and combined with motion information for the task. These ... |
24 | Fast object segmentation in unconstrained video
- Papazoglou, Ferrari
- 2013
(Show Context)
Citation Context ...tion or cosegmentation in videos. For video object segmentation, clusters of long-term point tracks have been used [3, 19, 20], while assuming that points from the same object have similar tracks. In =-=[17, 21]-=-, appearances of potential object and background are modeled and combined with motion information for the task. These methods produce results for individual videos and do not investigate relationships... |
24 | Discriminative segment annotation in weakly labeled video
- Tang, Sukthankar, et al.
- 2013
(Show Context)
Citation Context ... objects they contain. Video object cosegmentation aims to segment a detailed mask of common object out of videos. This problem has been addressed with weak supervision such as object class per video =-=[29]-=- and additional labels for a few frames that indicate whether the frames contain target object or not [33]. 1.2. Proposed approach We consider a set of videos v, each consisting of T frames (images) v... |
15 | clustering by composition unsupervised discovery of image categories.
- Faktor, Irani
- 2013
(Show Context)
Citation Context ...g tube for each video. We dub this iterative process a discovery and tracking procedure since finding the tubes maximizing foreground confidence across videos is akin to unsupervised object discovery =-=[4, 10, 11, 24, 27]-=-, whereas finding the tubes maximizing temporal consistency within a video is similar to object tracking [1, 2, 12, 22, 34, 35]. Interestingly, because we update the matching neighborhood structure at... |
13 | Multi-fold mil training for weakly supervised object localization.
- Cinbis, Verbeek, et al.
- 2014
(Show Context)
Citation Context ...signal [7, 9]. To reduce human effort and subjective biases in manual annotation, recent work has addressed the discovery and localization of objects from weakly-annotated or even unlabelled datasets =-=[4, 5, 8, 26, 28]-=-. However, this task is difficult and most approaches today still lay significantly behind strongly-supervised methods. With the ever growing popularity of video sharing sites such as YouTube, recent ... |
13 | Efficient image and video co-localization with Frank-Wolfe algorithm.
- Joulin, Tang, et al.
- 2014
(Show Context)
Citation Context ...y still lay significantly behind strongly-supervised methods. With the ever growing popularity of video sharing sites such as YouTube, recent research has started to handle the similar task in videos =-=[15, 23, 25, 33]-=-, and has shown that exploiting the space-time structure of the world, which is absent in static images, e.g., motion information, may be crucial for achieving object discovery or localization with le... |
13 | Image co-segmentation via consistent functional maps
- Wang, Huang, et al.
- 2013
(Show Context)
Citation Context ... The discovery part establishes correspondences between frames across videos to detect object candidates. Similar approaches have been proposed for salient region detection [16], image cosegmentation =-=[31, 32]-=-, and image colocalization [4]. Conventional object tracking methods [35] usually require annotations for at least one frame [12, 14, 34], or object detectors trained for target classes in a supervise... |
12 | Spatio-temporal object detection proposals.
- Oneata, Revaud, et al.
- 2014
(Show Context)
Citation Context ...k supervision such as an object label per video [33] and additional labels for a few frames that indicate whether the frames contain a target object or not [38]. Finally, spatio-temporal proposals of =-=[16, 24]-=- and action localization [39, 42] are relevant to our work as they also return spatio-temporal tubes as output. However, our method localizes an object through a single volume, whereas the proposals [... |
12 | Propagative Hough Voting for Human Activity Recognition,
- Yu, Yuan, et al.
- 2012
(Show Context)
Citation Context ...abel per video [33] and additional labels for a few frames that indicate whether the frames contain a target object or not [38]. Finally, spatio-temporal proposals of [16, 24] and action localization =-=[39, 42]-=- are relevant to our work as they also return spatio-temporal tubes as output. However, our method localizes an object through a single volume, whereas the proposals [16, 24] form a large number of hy... |
11 | Bayesian joint topic modelling for weakly supervised object localisation
- Shi, Hospedales, et al.
- 2013
(Show Context)
Citation Context ...signal [7, 9]. To reduce human effort and subjective biases in manual annotation, recent work has addressed the discovery and localization of objects from weakly-annotated or even unlabelled datasets =-=[4, 5, 8, 26, 28]-=-. However, this task is difficult and most approaches today still lay significantly behind strongly-supervised methods. With the ever growing popularity of video sharing sites such as YouTube, recent ... |
10 | Generalized background subtraction based on hybrid inference by belief propagation and bayesian filtering
- Kwak, Lim, et al.
(Show Context)
Citation Context ...gmentation or cosegmentation in videos. For video object segmentation, clusters of long-term point tracks have been used [3, 22, 23], assuming that points from the same object have similar tracks. In =-=[19, 20, 25, 35]-=-, the appearance of potential object and background regions is modeled and combined with motion information. These methods produce results for individual videos and do not investigate relationships be... |
8 | Action localization with tubelets from motion.
- Jain, Gemert, et al.
- 2014
(Show Context)
Citation Context ...k supervision such as an object label per video [33] and additional labels for a few frames that indicate whether the frames contain a target object or not [38]. Finally, spatio-temporal proposals of =-=[16, 24]-=- and action localization [39, 42] are relevant to our work as they also return spatio-temporal tubes as output. However, our method localizes an object through a single volume, whereas the proposals [... |
5 | Orderless tracking through model-averaged posterior estimation
- Hong, Kwak, et al.
(Show Context)
Citation Context ...n proposed for salient region detection [16], image cosegmentation [31, 32], and image colocalization [4]. Conventional object tracking methods [35] usually require annotations for at least one frame =-=[12, 14, 34]-=-, or object detectors trained for target classes in a supervised manner [1, 2, 22]. Our method does not require such supervision and instead alternates discovery and tracking of object candidates. The... |
5 |
Video object proposals
- Sharir, Tuytelaars
(Show Context)
Citation Context ...y still lay significantly behind strongly-supervised methods. With the ever growing popularity of video sharing sites such as YouTube, recent research has started to handle the similar task in videos =-=[15, 23, 25, 33]-=-, and has shown that exploiting the space-time structure of the world, which is absent in static images, e.g., motion information, may be crucial for achieving object discovery or localization with le... |
5 |
Detecting human action as the spatio-temporal tube of maximum mutual information
- Wang, Wang, et al.
- 2014
(Show Context)
Citation Context ...abel per video [33] and additional labels for a few frames that indicate whether the frames contain a target object or not [38]. Finally, spatio-temporal proposals of [16, 24] and action localization =-=[39, 42]-=- are relevant to our work as they also return spatio-temporal tubes as output. However, our method localizes an object through a single volume, whereas the proposals [16, 24] form a large number of hy... |
4 |
Video object discovery and co-segmentation with extremely weak supervision.
- Wang, Hua, et al.
- 2014
(Show Context)
Citation Context ...y still lay significantly behind strongly-supervised methods. With the ever growing popularity of video sharing sites such as YouTube, recent research has started to handle the similar task in videos =-=[15, 23, 25, 33]-=-, and has shown that exploiting the space-time structure of the world, which is absent in static images, e.g., motion information, may be crucial for achieving object discovery or localization with le... |
3 | Unsupervised multi-class joint image segmentation
- Wang, Huang, et al.
(Show Context)
Citation Context ... The discovery part establishes correspondences between frames across videos to detect object candidates. Similar approaches have been proposed for salient region detection [16], image cosegmentation =-=[31, 32]-=-, and image colocalization [4]. Conventional object tracking methods [35] usually require annotations for at least one frame [12, 14, 34], or object detectors trained for target classes in a supervise... |
2 |
Video segmentation with spatiotemporal tubes
- Trichet, Nevatia
- 2013
(Show Context)
Citation Context ...gmentation or cosegmentation in videos. For video object segmentation, clusters of long-term point tracks have been used [3, 22, 23], assuming that points from the same object have similar tracks. In =-=[19, 20, 25, 35]-=-, the appearance of potential object and background regions is modeled and combined with motion information. These methods produce results for individual videos and do not investigate relationships be... |
1 | Unsupervised object discovery and localization in the wild: Part-based matching using bottom-up region proposals
- Cho, Kwak, et al.
(Show Context)
Citation Context ...riments) for each video at each iteration except the last one, instead of retaining only one candidate at each iteration, increases the robustness of our approach. This agrees with the conclusions of =-=[4]-=- in the still image domain, and has also been confirmed empirically by our experiments. We obtain p best tubes by sequential DPs, which iteratively remove the best tube and re-run DP again.1 5. Implem... |