## Unsupervised Temporal Commonality Discovery

Citation Context ...mporal words for the subsequence in the interval [b1, e1]. Another notable benefit of the histogram representation is that it allows for fast recursive computation using the concept of integral image =-=[29]-=-. That is, for frame t, we accumulate the sum of ϕ A[1,t] of the histograms up to t. Using this structure, we can efficiently compute the histogram for any subsequence A[t1, t2] as ϕ A[t1,t2] = ϕ A[1,... |

Citation Context ...tate R from Q; 8 end 9 Assign the optimal rectangle r ∗ ← R; 3.3 Construction of a Bounding Function Representation of signals: Throughout the paper we will use the Bag of Temporal Words (BoTW) model =-=[26,32]-=- to represent video segments. Observe, that any features that can be discretized into histograms can fit into our framework. In BoTW the codebook is built using a clustering method (e.g., k-means) to ... |

Citation Context ...ences in a longer video. Let Q be the query sequence we want to find in the target video T. We can modify (1) by fixing one of the pairwise sequences: min bt,et d ( ϕ T[bt,et], ϕQ ) s.t. et − bt ≥ ℓ. =-=(14)-=- The problem now becomes simpler but it still is an integer programming. Nevertheless, Algorithm 1 can be applied again to find the optimal match efficiently. Searching for multiple segments can also ... |

Citation Context ...nds, we show in the following exemplar constructions of bounds between histograms, i.e., ℓ1, intersection, and χ 2 distance, which have been widely applied to many tasks such as objection recognition =-=[9,13]-=- and action recognition [7,11,14,16,22]. 1) Bounding ℓ1 distance: Applying the operators min/max on (2), we get min(h − b , k− b ) ≤ min(hb, kb) ≤ min(h + b , k+ b max(h − b , k− b ) ≤ max(hb, kb) ≤ m... |

Citation Context ...these work detects motifs within only one sequence, but TCD considers two (or more) sequences. Moreover, it is unclear how these technique can be robust to noise. The longest common subsequence (LCS) =-=[10,17,21]-=- is also related to TCD. The LCS problem consists on finding the longest subsequence that is common within a set of sequences (often just two) [21,31]. Closer to our work is the algorithm for longest ... |

Citation Context ...ed discovery of visual patterns in images has been a long standing computer vision problem driven by applications to cosegmentation [8,15,20], learning grammars of images [34], detecting irregularity =-=[6]-=- and automatic tagging [23]. Although recently there has been several work on unsupervised discovery of visual patterns in images, a relatively unexplored problem in computer vision is to discover com... |

Citation Context ...exemplar constructions of bounds between histograms, i.e., ℓ1, intersection, and χ 2 distance, which have been widely applied to many tasks such as objection recognition [9,13] and action recognition =-=[7,11,14,16,22]-=-. 1) Bounding ℓ1 distance: Applying the operators min/max on (2), we get min(h − b , k− b ) ≤ min(hb, kb) ≤ min(h + b , k+ b max(h − b , k− b ) ≤ max(hb, kb) ≤ max(h + b , k+ ), (4) Reordering both th... |

Citation Context ...these work detects motifs within only one sequence, but TCD considers two (or more) sequences. Moreover, it is unclear how these technique can be robust to noise. The longest common subsequence (LCS) =-=[10,17,21]-=- is also related to TCD. The LCS problem consists on finding the longest subsequence that is common within a set of sequences (often just two) [21,31]. Closer to our work is the algorithm for longest ... |

Citation Context ...art. We just want to illustrate the versatility of our approach. 5 Experimental Results We evaluated our approach on two experiments. First, we discovered common facial events in the RU-FACS database =-=[5]-=-. Second, we found multiple common human actions in CMU-Mocap dataset [1]. The code is available at http://www. humansensing.cs.cmu.edu/software/tcd.html. 5.1 Common Facial Events Discovery This exper... |

Citation Context ...greater than ℓ. To show an example, consider two 1-D sequences A = [1, 2, 2, 1] and B = [1, 1, 3]. Suppose we use ℓ1 distance, set the minimal length ℓ = 3, and represent their 3-bin histograms as ϕ A=-=[1,4]-=- = [2, 2, 0], ϕ A[1,3] = [1, 2, 0] and ϕB = [2, 0, 1]. Hereby we can conclude by showing the distances: dℓ1(ϕ A[1,4], ϕB) = 3 < 4 = dℓ1 (ϕ A[1,3], ϕB). Differences from ESS [13] and STBB [32]: Althoug... |

Citation Context ...ry. 1 Introduction Unsupervised discovery of visual patterns in images has been a long standing computer vision problem driven by applications to cosegmentation [8,15,20], learning grammars of images =-=[34]-=-, detecting irregularity [6] and automatic tagging [23]. Although recently there has been several work on unsupervised discovery of visual patterns in images, a relatively unexplored problem in comput... |

Citation Context ...three billion possible matchings that need to be computed at different lengths and locations. Therefore, the naive approach is computationally prohibitive for reasonable length sequences. Inspired by =-=[13,32]-=- that used the branch and bound (B&B) algorithm to efficiently search for optimal image patches or video volumes, we propose to adopt B&B for searching simultaneously over all possible segments in eac... |

Citation Context ...| 2) Bounding intersection distance: Given two normalized histograms ϕA = { h1, . . . , hK} and ϕB = { k1, . . . , kK}, we define their intersection distance by the Hilbert space representation =-=[24]-=-: K∑ d∩( ϕA, ϕB) = − min( hb, kb). (10) By (3) and (4), we can find its lower bound and upper bound: K∑ K∑ l∩(R) = − b=1 min( h+ b |A−| b=1 k+ b , |B−| ) and u∩(R) = − b=1 min( h− b |A+| k− b , |B... |

Citation Context ...es unsupervised search of commonalities in video sequences. Also, note that there are several studies that address the problem of event detection or sequence labeling of human actions in video (e.g., =-=[12,27,32]-=-). However, unlike TCD, those studies require learning a set of classifiers from training data. (2) Formulate the TCD as an integer optimization problem and propose an efficient B&B algorithm that fin... |

Citation Context ...ch video sequence (see Fig. 1). Two are the main contributions of this study: (1) Introduce the new problem of unsupervised TCD. While there exist studies that address commonality discovery in images =-=[8,15,20,30]-=-, to the best of our knowledge there is little work that tackles unsupervised search of commonalities in video sequences. Also, note that there are several studies that address the problem of event de... |

Citation Context ...exemplar constructions of bounds between histograms, i.e., ℓ1, intersection, and χ 2 distance, which have been widely applied to many tasks such as objection recognition [9,13] and action recognition =-=[7,11,14,16,22]-=-. 1) Bounding ℓ1 distance: Applying the operators min/max on (2), we get min(h − b , k− b ) ≤ min(hb, kb) ≤ min(h + b , k+ b max(h − b , k− b ) ≤ max(hb, kb) ≤ max(h + b , k+ ), (4) Reordering both th... |

Citation Context ...exemplar constructions of bounds between histograms, i.e., ℓ1, intersection, and χ 2 distance, which have been widely applied to many tasks such as objection recognition [9,13] and action recognition =-=[7,11,14,16,22]-=-. 1) Bounding ℓ1 distance: Applying the operators min/max on (2), we get min(h − b , k− b ) ≤ min(hb, kb) ≤ min(h + b , k+ b max(h − b , k− b ) ≤ max(hb, kb) ≤ max(h + b , k+ ), (4) Reordering both th... |

Citation Context ...l B&B is much more efficient than the naive search. Note that the optimal discovered sequences can be of length greater than ℓ. To show an example, consider two 1-D sequences A = [1, 2, 2, 1] and B = =-=[1, 1, 3]-=-. Suppose we use ℓ1 distance, set the minimal length ℓ = 3, and represent their 3-bin histograms as ϕ A[1,4] = [2, 2, 0], ϕ A[1,3] = [1, 2, 0] and ϕB = [2, 0, 1]. Hereby we can conclude by showing the... |

Citation Context |

Citation Context ...and bound, temporal commonality discovery. 1 Introduction Unsupervised discovery of visual patterns in images has been a long standing computer vision problem driven by applications to cosegmentation =-=[8,15,20]-=-, learning grammars of images [34], detecting irregularity [6] and automatic tagging [23]. Although recently there has been several work on unsupervised discovery of visual patterns in images, a relat... |

Citation Context ...these work detects motifs within only one sequence, but TCD considers two (or more) sequences. Moreover, it is unclear how these technique can be robust to noise. The longest common subsequence (LCS) =-=[10,17,21]-=- is also related to TCD. The LCS problem consists on finding the longest subsequence that is common within a set of sequences (often just two) [21,31]. Closer to our work is the algorithm for longest ... |

Citation Context ...es unsupervised search of commonalities in video sequences. Also, note that there are several studies that address the problem of event detection or sequence labeling of human actions in video (e.g., =-=[12,27,32]-=-). However, unlike TCD, those studies require learning a set of classifiers from training data. (2) Formulate the TCD as an integer optimization problem and propose an efficient B&B algorithm that fin... |

Citation Context ...terns in images has been a long standing computer vision problem driven by applications to cosegmentation [8,15,20], learning grammars of images [34], detecting irregularity [6] and automatic tagging =-=[23]-=-. Although recently there has been several work on unsupervised discovery of visual patterns in images, a relatively unexplored problem in computer vision is to discover common temporal patterns among... |

Citation Context ...three billion possible matchings that need to be computed at different lengths and locations. Therefore, the naive approach is computationally prohibitive for reasonable length sequences. Inspired by =-=[13,32]-=- that used the branch and bound (B&B) algorithm to efficiently search for optimal image patches or video volumes, we propose to adopt B&B for searching simultaneously over all possible segments in eac... |

Citation Context ...ifferent subjects. We represented features as the distances between the height of lips and teeth, angles for mouth corners and SIFT descriptors in the points tracked by Active Appearance Models (AAM) =-=[33]-=- (see Fig. 4(a) for an illustration). We built a 1,000-entry codebook on a random subset of 50,000 feature vectors (see Sec. 3.3).10 W.-S. Chu, F. Zhou, and F. De la Torre SW1 SW2 SW3 (a) SR= √ 2, SS... |

Citation Context ...and bound, temporal commonality discovery. 1 Introduction Unsupervised discovery of visual patterns in images has been a long standing computer vision problem driven by applications to cosegmentation =-=[8,15,20]-=-, learning grammars of images [34], detecting irregularity [6] and automatic tagging [23]. Although recently there has been several work on unsupervised discovery of visual patterns in images, a relat... |

Citation Context ... a dictionary of atomic actions. Zhou et al. [33] combined spectral clustering and dynamic time warping to cluster time series, and applied it to learn taxonomies of facial expressions. Turaga et al. =-=[28]-=- used extensions of switching linear dynamical systems for clustering human actions in video sequences. However, if we cluster two sequences that only have one segment in common, previous methods for ... |

Citation Context ...rs only similar segments and avoids the need for clustering all the video that is computationally expensive and prone to local minima. Another unsupervised technique related to TCD is motif detection =-=[18,19]-=-. Time series motif algorithms find repeated patterns within a single sequence. Minnen et al. [18] discovered motifs as high-density regions in the space of all subsequences. Mueen and Keogh [19] furt... |

Citation Context ...rs only similar segments and avoids the need for clustering all the video that is computationally expensive and prone to local minima. Another unsupervised technique related to TCD is motif detection =-=[18,19]-=-. Time series motif algorithms find repeated patterns within a single sequence. Minnen et al. [18] discovered motifs as high-density regions in the space of all subsequences. Mueen and Keogh [19] furt... |

Citation Context ... clustering algorithms for unsupervised discovery of human actions. Wang et al. [30] exploited deformable template matching of shape and context in static images to discover action classes. Si et al. =-=[25]-=- learned an event grammar by clustering event co-occurrence into a dictionary of atomic actions. Zhou et al. [33] combined spectral clustering and dynamic time warping to cluster time series, and appl... |

Citation Context ...and bound, temporal commonality discovery. 1 Introduction Unsupervised discovery of visual patterns in images has been a long standing computer vision problem driven by applications to cosegmentation =-=[8,15,20]-=-, learning grammars of images [34], detecting irregularity [6] and automatic tagging [23]. Although recently there has been several work on unsupervised discovery of visual patterns in images, a relat... |

Citation Context ...y show that in general B&B is much more efficient than the naive search. Note that the optimal discovered sequences can be of length greater than ℓ. To show an example, consider two 1-D sequences A = =-=[1, 2, 2, 1]-=- and B = [1, 1, 3]. Suppose we use ℓ1 distance, set the minimal length ℓ = 3, and represent their 3-bin histograms as ϕ A[1,4] = [2, 2, 0], ϕ A[1,3] = [1, 2, 0] and ϕB = [2, 0, 1]. Hereby we can concl... |

Citation Context ...to noise. The longest common subsequence (LCS) [10,17,21] is also related to TCD. The LCS problem consists on finding the longest subsequence that is common within a set of sequences (often just two) =-=[21,31]-=-. Closer to our work is the algorithm for longest consecutive common subsequence (LCCS) [31] that finds the longest contiguous part of original sequences (e.g., videos). However, different from TCD, t... |