Results 1  10
of
48
Recognizing human actions: A local SVM approach
 In ICPR
, 2004
"... Local spacetime features capture local events in video and can be adapted to the size, the frequency and the velocity of moving patterns. In this paper we demonstrate how such features can be used for recognizing complex motion patterns. We construct video representations in terms of local spaceti ..."
Abstract

Cited by 758 (20 self)
 Add to MetaCart
(Show Context)
Local spacetime features capture local events in video and can be adapted to the size, the frequency and the velocity of moving patterns. In this paper we demonstrate how such features can be used for recognizing complex motion patterns. We construct video representations in terms of local spacetime features and integrate such representations with SVM classification schemes for recognition. For the purpose of evaluation we introduce a new video database containing 2391 sequences of six human actions performed by 25 people in four different scenarios. The presented results of action recognition justify the proposed method and demonstrate its advantage compared to other relative approaches for action recognition. 1.
A kernel between sets of vectors
 In International Conference on Machine Learning
, 2003
"... In various application domains, including image recognition, it is natural to represent each example as a set of vectors. With a base kernel we can implicitly map these vectors to a Hilbert space and fit a Gaussian distribution to the whole set using Kernel PCA. We define our kernel between examples ..."
Abstract

Cited by 130 (8 self)
 Add to MetaCart
In various application domains, including image recognition, it is natural to represent each example as a set of vectors. With a base kernel we can implicitly map these vectors to a Hilbert space and fit a Gaussian distribution to the whole set using Kernel PCA. We define our kernel between examples as Bhattacharyya’s measure of affinity between such Gaussians. The resulting kernel is computable in closed form and enjoys many favorable properties, including graceful behavior under transformations, potentially justifying the vector set representation even in cases when more conventional representations also exist. 1.
Learning over Sets using Kernel Principal Angles
 Journal of Machine Learning Research
, 2003
"... We consider the problem of learning with instances defined over a space of sets of vectors. We derive a new positive definite kernel f (A,B) defined over pairs of matrices A,B based on the concept of principal angles between two linear subspaces. We show that the principal angles can be recovered ..."
Abstract

Cited by 106 (2 self)
 Add to MetaCart
(Show Context)
We consider the problem of learning with instances defined over a space of sets of vectors. We derive a new positive definite kernel f (A,B) defined over pairs of matrices A,B based on the concept of principal angles between two linear subspaces. We show that the principal angles can be recovered using only innerproducts between pairs of column vectors of the input matrices thereby allowing the original column vectors of A,B to be mapped onto arbitrarily highdimensional feature spaces.
Canonical correlation analysis of video volume tensors for action categorization and detection
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2009
"... Abstract—This paper addresses a spatiotemporal pattern recognition problem. The main purpose of this study is to find a right representation and matching of action video volumes for categorization. A novel method is proposed to measure videotovideo volume similarity by extending Canonical Correlat ..."
Abstract

Cited by 58 (0 self)
 Add to MetaCart
(Show Context)
Abstract—This paper addresses a spatiotemporal pattern recognition problem. The main purpose of this study is to find a right representation and matching of action video volumes for categorization. A novel method is proposed to measure videotovideo volume similarity by extending Canonical Correlation Analysis (CCA), a principled tool to inspect linear relations between two sets of vectors, to that of two multiway data arrays (or tensors). The proposed method analyzes video volumes as inputs avoiding the difficult problem of explicit motion estimation required in traditional methods and provides a way of spatiotemporal pattern matching that is robust to intraclass variations of actions. The proposed matching is demonstrated for action classification by a simple Nearest Neighbor classifier. We, moreover, propose an automatic action detection method, which performs 3D window search over an input video with action exemplars. The search is speeded up by dynamic learning of subspaces in the proposed CCA. Experiments on a public action data set (KTH) and a selfrecorded hand gesture data showed that the proposed method is significantly better than various stateoftheart methods with respect to accuracy. Our method has low time complexity and does not require any major tuning parameters. Index Terms—Action categorization, gesture recognition, canonical correlation analysis, tensor, action detection, incremental subspace learning, spatiotemporal pattern classification. Ç 1
Feature Selection for Unsupervised and Supervised Inference: the Emergence of Sparsity in a Weightedbased Approach
 School of Eng. and CS, June 2003. Proceedings of the Ninth IEEE International Conference on Computer Vision (ICCV 2003) 2Volume Set 0769519504/03 $17.00 © 2003 IEEE
"... The problem of selecting a subset of relevant features in a potentially overwhelming quantity of data is classic and found in many branches of science including — examples in computer vision, text processing and more recently bioinformatics are abundant. In this work we present a definition of ”rele ..."
Abstract

Cited by 57 (3 self)
 Add to MetaCart
(Show Context)
The problem of selecting a subset of relevant features in a potentially overwhelming quantity of data is classic and found in many branches of science including — examples in computer vision, text processing and more recently bioinformatics are abundant. In this work we present a definition of ”relevancy ” based on spectral properties of the Affinity (or Laplacian) of the features ’ measurement matrix. The feature selection process is then based on a continuous ranking of the features defined by a leastsquares optimization process. A remarkable property of the feature relevance function is that sparse solutions for the ranking values naturally emerge as a result of a “biased nonnegativity ” of a key matrix in the process. As a result, a simple leastsquares optimization process converges onto a sparse solution, i.e., a selection of a subset of features which form a local maxima over the relevance function. The feature selection algorithm can be embedded in both unsupervised and supervised inference problems and empirical evidence show that the feature selections typically achieve high accuracy even when only a small fraction of the features are relevant. 1.
A system identification approach for videobased face recognition
 Proceedings of International Conference on Pattern Recognition
, 2004
"... The paper poses videotovideo face recognition as a dynamical system identification and classification problem. Videotovideo means that both gallery and probe consists of videos. We model a moving face as a linear dynamical system whose appearance changes with pose. An autoregressive and moving a ..."
Abstract

Cited by 51 (4 self)
 Add to MetaCart
The paper poses videotovideo face recognition as a dynamical system identification and classification problem. Videotovideo means that both gallery and probe consists of videos. We model a moving face as a linear dynamical system whose appearance changes with pose. An autoregressive and moving average (ARMA) model is used to represent such a system. The choice of ARMA model is based on its ability to take care of the change in appearance while modeling the dynamics of pose, expression etc. Recognition is performed using the concept of subspace angles to compute distances between probe and gallery video sequences. The results obtained are very promising given the extent of pose, expression and illumination variation in the video data used for experiments. 1.
Local velocityadapted motion events for spatiotemporal recognition
 CVIU
, 2007
"... In this paper we address the problem in motion recognition using eventbased local motion representations. We assume that similar patterns of motion contain similar events with consistent motion across image sequences. Using this assumption, we formulate the problem of motion recognition as a matchi ..."
Abstract

Cited by 45 (8 self)
 Add to MetaCart
(Show Context)
In this paper we address the problem in motion recognition using eventbased local motion representations. We assume that similar patterns of motion contain similar events with consistent motion across image sequences. Using this assumption, we formulate the problem of motion recognition as a matching of corresponding events in image sequences. To enable the matching, we present and evaluate a set of motion descriptors exploiting the spatial and the temporal coherence of motion measurements between corresponding events in image sequences. As motion measurements may depend on the relative motion of the camera, we also present a mechanism for local velocity adaptation of events and evaluate its influence when recognizing image sequences subjected to different camera motions. When recognizing motion, we compare the performance of nearest neighbor (NN) classifier with the performance of support vector machine (SVM). We also compare eventbased motion representations to motion representations by global histograms. An experimental evaluation on a large video database with human actions demonstrates the advantage of the proposed scheme for eventbased motion representation in combination with SVM classification. The particular advantage of eventbased representations and velocity adaptation is further emphasized when recognizing human actions in unconstrained scenes with complex and nonstationary backgrounds.
Cue integration through discriminative accumulation
 in Proc. CVPR’04
"... Object recognition systems aiming to work in real world settings should use multiple cues in order to achieve robustness. We present a new cue integration scheme which extends the idea of cue accumulation to discriminative classifiers. We derive and test the scheme for Support Vector Machines (SVMs) ..."
Abstract

Cited by 39 (12 self)
 Add to MetaCart
(Show Context)
Object recognition systems aiming to work in real world settings should use multiple cues in order to achieve robustness. We present a new cue integration scheme which extends the idea of cue accumulation to discriminative classifiers. We derive and test the scheme for Support Vector Machines (SVMs), but we also show that it is easily extendible to any large margin classifier. Interestingly, in the case of oneclass SVMs, the scheme can be interpreted as a new class of Mercer kernels for multiple cues. Experimental comparison with a probabilistic accumulation scheme is favorable to our method. Comparison with voting scheme shows that our method may suffer as the number of object classes increases. Based on these results, we propose a recognition algorithm consisting of a decision tree where decisions at each node are taken using our accumulation scheme. Results obtained using this new algorithm compare very favorably to accumulation (both probabilistic and discriminative) and voting scheme. 1
A Framework for 3D Object Recognition Using the Kernel Constrained Mutual Subspace Method
 In Proc. of ACCV
, 2006
"... Abstract. This paper introduces the kernel constrained mutual subspace method (KCMSM) and provides a new framework for 3D object recognition by applying it to multiple view images. KCMSM is a kernel method for classifying a set of patterns. An input pattern x is mapped into the highdimensional feat ..."
Abstract

Cited by 18 (6 self)
 Add to MetaCart
(Show Context)
Abstract. This paper introduces the kernel constrained mutual subspace method (KCMSM) and provides a new framework for 3D object recognition by applying it to multiple view images. KCMSM is a kernel method for classifying a set of patterns. An input pattern x is mapped into the highdimensional feature space F via a nonlinear function φ, and the mapped pattern φ(x) is projected onto the kernel generalized difference subspace, which represents the difference among subspaces in the feature space F. KCMSM classifies an input set based on the canonical angles between the input subspace and a reference subspace. This subspace is generated from the mapped patterns on the kernel generalized difference subspace, using principal component analysis. This framework is similar to conventional kernel methods using canonical angles, however, the method is different in that it includes a powerful feature extraction step for the classification of the subspaces in the feature space F by projecting the data onto the kernel generalized difference subspace. The validity of our method is demonstrated by experiments in a 3D object recognition task using multiview images. 1
Subspace distance analysis with application to adaptive Bayesian algorithm for face recognition
 Pattern Recognition
, 2006
"... We propose subspace distance measures to analyze the similarity between intrapersonal face subspaces, which characterize the variations between face images of the same individual. We call the conventional intrapersonal subspace Average Intrapersonal Subspace (AIS) because the image differences often ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
(Show Context)
We propose subspace distance measures to analyze the similarity between intrapersonal face subspaces, which characterize the variations between face images of the same individual. We call the conventional intrapersonal subspace Average Intrapersonal Subspace (AIS) because the image differences often come from a large number of persons. An intrapersonal subspace is referred to as Specific Intrapersonal Subspace (SIS) if the image differences are from just one person. We demonstrate that SIS varies significantly from person to person, and most SISs are not similar to AIS. Based on these observations, we introduce the maximum a posteriori (MAP) adaptation to the problem of SIS estimation, and apply it to the Bayesian face recognition algorithm. Experimental results show that the adaptive Bayesian algorithm outperforms the nonadaptive Bayesian algorithm as well as Eigenface and Fisherface methods if a small number of adaptation images are available.