| Wilson, A., Bobick, A.: Learning visual behavior for gesture analysis. In: Proceedings of the IEEE Symposium on Computer Vision, Florida, USA (1995) |
....the computer vision community to address action recognition problems in which time variation is significant. Starner and Pentland[6] have developed a real time HMM based system for recognizing sentence level American Sign Language(ASL) without explicitly modeling the fingers. Wilson and Bobick[7] have proposed an approach for gesture analysis that incorporates multiple representations into the HMM framework. Our work differs from theirs in that we take an agent centered view and incorporate an extensive description of the agent s gaze, head and hand movements. 3 Attention based Action ....
Wilson, A., Bobick, A.: Learning visual behavior for gesture analysis. In: Proceedings of the IEEE Symposium on Computer Vision, Florida, USA (1995)
.... Skill #n HMM #n Teacher Chan Data Chan 0 Data Chan 1 Figure 1: Architecture of the Demonstration Based Programming System ognizing American Sign Language (ASL) symbols [9] Wilson and Bobick developed a system for classifying human motions using a variation on the standard HMM framework [12]. Brand et al. recognize Tai Chi motions with another variant on the standard HMM framework [3] There are many others working on gesture recognition with HMMs, but these contributed most to our inspiration for this paper. Our HMM is described in sections 3 and 4.2. 3 Hidden Markov Models Hidden ....
A. Wilson and A. Bobick. Learning visual behavior for gesture analysis. In Proceedings of the IEEE Symposium on Computer Vision, Coral Gables, Florida, 1995.
....was by Yamato et al. 11] to recognize tennis swings, using as features a 25 Theta 25 pixel binarized camera image. Schlenzig et al. 6] used HMMs to recognize gestures in sequences, using a rotation invariant representation of a binary image, processed through a neural net. Wilson and Bobick [10] incorporated multiple representations in an HMM framework, using eigen image weights as features. In a non HMM approach Cui and Weng [3] used maximally discriminating (MDF) image weights as features, and a vector quantization of a low dimensional MDF space as a distance metric. Stereo tracking ....
A. D. Wilson and A. F. Bobick. Learning visual behavior for gesture analysis. In Proc. IEEE Int'l. Symp. on Comp. Vis., Coral Gables, Florida, November 1995.
.... (see, for example, 57, 72, 118] It is also envisioned that action recognition can be used in systems for helping people with disabilities [165] and general, in human machine interaction [122] However, it is debatable if the pattern recognition based approaches in use today (for example [26, 43, 181, 182]) are appropriate to handle the recognition of human actions. Recent work in vision has stressed the necessity of adequate symbolic representations for action [97, 111, 159] The framework presented in this thesis supports this view and extends the previous research by proposing more expressive ....
....belief we challenge is that finite state machines are good representations for the temporal structure of human action. Finite state machines are embedded in the commonly used models for gesture and action recognition (especially in the probabilistic versions, using dynamic programming [40] or HMMs [26, 165, 181]) The drawback of such models is that they can not efficiently handle parallel actions or events. In particular, the number of nodes necessary to represent parallel actions increases exponentially with the number of actions. Our research proposes temporal models based on networks of temporal ....
[Article contains additional citation context not shown here]
A. Wilson and A. F. Bobick. "Learning Visual Behavior for Gesture Analysis", Proc. of the IEEE-PAMI International Symposium on Computer Vision, Coral Gables, Florida, pp. 229-234. November. 1995.
....templates. Schlenzig et al. 15] use hidden Markov models to recognize hello, good bye, and rotate. While Baum Welch re estimation was not implemented, this study shows the continuous gesture recognition capabilities of HMM s by recognizing gesture sequences. Recently, Wilson and Bobick [21] explore incorporating multiple representations in HMM frameworks. 4 Hidden Markov Modeling While a substantial body of literature exists on HMM technology [1, 8, 13, 23] this section briefly outlines a traditional discussion of the algorithms. After outlining the fundamental theory in training ....
A. Wilson and A. Bobick. Learning visual behavior for gesture analysis. Proc. IEEE Int'l. Symp. on Comp. Vis., Nov. 1995.
....Starner was able to build an HMM system capable of recognizing forty American Sign Language gestures in a real time system. The features are computed using a system that tracks hands wearing colored gloves. The system is capable of recognizing gestures with an accuracy of 97 . Wilson and Bobick [39] develop a state based method of learning visual behavior of gestures in an image sequence. Multiple representations are fed into an HMM and the input s overall membership in a given state is determined by which representation best describes the input. Darrell and Pentland have explored a ....
A. D. Wilson and A. F. Bobick. Learning visual behavior for gesture analysis. In Proc. IEEE Int'l. Symp. on Comp. Vis., Coral Gables, FL, November 1995.
....head. Figure 4: Multiple Pose Eigenfaces. Mean templates (E0) are shown on the left along with the first 4 eigenvectors (E1 to E4) metric [9] This scheme can be viewed as a multiple observer system where separate eigenspaces are simultaneously competing in describing the input image (see [12] and [4] for related work) Examples of eigenfaces for multiple poses (at the same spatial location) are shown in Figure 4. The key difference between the view based and parametric representations can be understood by considering the geometry of facespace. In the highdimensional vector space of an ....
Wilson, A., and Bobick, A., "Learning visual behavior for gesture analysis ", to appear, Proc. International Symposium on Computer Vision, Coral Gables, November 1995.
....were not reported with the standard accuracy measures accepted in the speech and handwriting recognition communities, and training and testing databases were often identical or dependent in some manner. Since this time, HMM based gesture recognizers for other tasks have appeared in the literature [21, 2], and, last year, several HMM based continuous sign lan1 guage systems were demonstrated. In a submission to UIST 97, Liang and Ouhyoung s work in Taiwanese Sign Language [8] shows very encouraging results with a glove based recognizer. This HMM based system recognizes 51 postures, 8 orientations, ....
A. D. Wilson and A. F. Bobick. Learning visual behavior for gesture analysis. In Proc. IEEE Int'l. Symp. on Comp. Vis., Coral Gables, Florida, November 1995.
....Especially for work in which the recovery of the complete three dimensional structure of a scene is not deemed necessary, HMM s are increasingly being used. Significant successes have been obtained in gesture recognition with a more simple view based approach for which HMM s are appropriate [11, 12, 14]. Starner achieved a recognition rate of ninety seven percent (on a forty word lexicon) using a two dimensional representation of the hands. In Staying Alive, eighteen different T ai Chi movements have meaning in the virtual world. Table 1 contains a list of some of the gestures with a brief ....
A. D. Wilson and A. F. Bobick. Learning visual behavior for gesture analysis. In Proc. IEEE Int'l. Symp. on Comp. Vis., Coral Gables, Florida, November 1995.
....et al. 21] use hidden Markov models to recognize hello, good bye, and rotate. While Baum Welch re estimation was not implemented, this study shows the continuous gesture recognition capabilities of HMM s by recognizing gesture sequences. Closer to the task of this paper, Wilson and Bobick [28] explore incorporating multiple representations in HMM frameworks, and Campbell et al. 2] use a HMM based gesture system to recognize 18 T ai Chi gestures with 98 accuracy. 4 Tracking Hands in Video Previous systems have shown that, given some constraints, relatively detailed models of the ....
A. Wilson and A. Bobick. "Learning visual behavior for gesture analysis." Proc. IEEE Int'l. Symp. on Comp. Vis., Nov. 1995.
....They offer dynamic time warping, an efficient learning algorithm and clear Bayesian semantics. HMM s have been prominently and successfully used in speech recognition and, more recently, in handwriting recognition. However, their application to visual recognition purposes is more recent [46] [41], 42] 38] HMMs are usually depicted rolled out in time, as figure 1 illustrates. The posterior state sequence probability in a HMM is given by Figure 1: Graphical Representation of Real time leftto right Hidden Markov Models P (SjO) P s1 p s1 (o 1 ) Q T t=2 p s t (o t )P s t js t Gamma1 , ....
A. Wilson and A. Bobick. Learning visual behavior for gesture analysis. In IEEE International Symposium on Computer Vision, 1995.
....Especially for work in which the recovery of the complete three dimensional structure of a scene is not deemed necessary, HMM s are increasingly being used. Significant successes have been obtained in gesture recognition with a more simple view based approach for which HMM s are appropriate [10, 11, 13]. Starner achieved a recognition rate of ninety seven percent (on a forty word lexicon) using a two dimensional representation of the hands. In Staying Alive, eighteen different T ai Chi movements have meaning in the virtual world. A pictorial description of some of the gestures can be found in ....
A. D. Wilson and A. F. Bobick. Learning visual behavior for gesture analysis. In Proc. IEEE Int'l. Symp. on Comp. Vis., Coral Gables, Florida, November 1995.
....in a multitude of frameworks, where an action is described by a sequence of 2 D instances poses of the object. Many methods require a normalized image of the object (usually with no background) for representation. For example, Cui et al. 6] Darrell and Pentland [7] and also Wilson and Bobick [33] present results using actions (mostly hand gestures) where the actual grayscale images (with no background) are used in the representation for the action. Though hand appearances remain fairly similar over a wide range of people, with the obvious exception of skin pigmentation, actions that ....
Wilson, A. and A. Bobick. Learning visual behavior for gesture analysis. In Proc. IEEE Int'l. Symp. on Comp. Vis., Coral Gables, Florida, November 1995.
....the system must consider at any given instance. Feature space size. Single agent action representations for computer vision have generally used features such as an agent s velocity, acceleration, and shape characteristics over time without considering other objects in an environment (e.g. see [40, 48]) The set of all possible perceptual features used for recognition is generally quite small (e.g. less than about 20 parameters) A system observing a multiagent environment, however, can compute the relative states of agents over time and agents with respect to other groups of agents, leading ....
A. D. Wilson and A. F. Bobick, Learning visual behavior for gesture analysis, in Proc. IEEE Int'l. Symp. on Computer Vision, Coral Gables, Florida, Nov. 1995.
....the system must consider at any given instance. Feature space size. Single agent action representations for computer vision have generally used features like an agent s velocity, acceleration, and shape characteristics over time without considering other objects in an environment (e.g. see [48, 40]) The set of all possible perceptual features used for recognition is generally quite small (e.g. less than about 20 parameters) A system observing a multi agent environment, however, can compute the relative states of agents over time and agents with respect to other groups of agents, leading ....
A. D. Wilson and A. F. Bobick. Learning visual behavior for gesture analysis. In Proc. IEEE Int'l. Symp. on Computer Vision, Coral Gables, Florida, Nov. 1995.
....is a close facsimile of the input image, the input likely Figure 2: The set of training images of the handused in computing a basis set for hand images. Each image is 30 x 30 pixels. belongs to the set of hand images. We similarly use an eigenvector decomposition to match hand images in [7]. For Luxomatic, 100 images of the hand were segmented manually from an image sequence. These images were used in the computation of an eigenvector basis set; the five eigenvectors accounting for most of the variance were kept. The training images are shown in Figure 2. 1.2 Radial Basis ....
A. D. Wilson and A. F. Bobick. Learning visual behavior for gesture analysis. In Proc. IEEE Int'l. Symp. on Comp. Vis., Coral Gables, Florida, November 1995.
....In human communication, sometimes how a gesture is performed carries significant meaning. ASL, for example, is subject to complex grammatical processes that operate on multiple simultaneous levels [21] One approach is to explicitly model the space of variation exhibited by a class of signals. In [27], we apply HMMs to the task of hand gesture recognition from video by training an eigenvector basis set of the images at each state. An image s membership to each state is a function of the residual of the reconstruction of the image using the state s eigenvectors. The state membership is thus ....
A.D. Wilson and A.F. Bobick, Learning Visual Behavior for Gesture Analysis,º Proc. IEEE Int'l. Symp. Computer Vision, Coral Gables, Fla., Nov. 1995.
No context found.
Wilson, A.D. & Bobick, A.F. 1995 "Learning visual behavior for gesture analysis," Proc. IEEE Int'l Symp. Computer Vision, Coral Gables, Florida.
....process (POMDP) In [1] we used hand tuned HMMs using temporal properties to recognize two broad classes of natural, spontaneous gesture. Campbell and Bobick [6] search for orthogonal projections of the feature space to find the most diagnostic projections in order to classify ballet steps. In [20], we apply HMMs to the task of hand gesture recognition from video by training an eigenvector basis set of the images at each state. An image s membership to each state is a function of the residual of the reconstruction of the image using the state s eigenvectors. The state membership is thus ....
A. D. Wilson and A. F. Bobick. Learning visual behavior for gesture analysis. In Proc. IEEE Int'l. Symp. on Comp. Vis., Coral Gables, Florida, November 1995.
....for each gesture to be recognized. This usually proceeds by collecting a number of examples of the gesture, computing the mean gesture and quantifying the variance seen in the examples. The hope is that this description will generalize to the actual test data. Examples of this approach include [9, 1, 13, 4, 10]. This typical pattern recognition approach may be well suited to the recognition of stylized or literal gesture, such as the gestures made by a user navigating aeronautical data in a virtual reality system by contorting their hands. These actions are less gestures than particular literal ....
....The approach is to use a Markovian state description, but with the traditional use of transition probabilities replaced with an explicit model of duration. 4. 1 Markovian states with duration modeling Although Hidden Markov Models have been a popular technique for the recognition of gesture (see [14, 9, 10, 13]) we note that in our system the states are not hidden. In particular, our analysis of natural gesture types in section 2 identifies rest (R) transition (T) and stroke (S) states. The properties of these states are known and can be characterized by similarity in appearance to a rest state, ....
A. D. Wilson and A. F. Bobick. Learning visual behavior for gesture analysis. In Proc. IEEE Int'l. Symp. on Comp. Vis., Coral Gables, Florida, November 1995.
....six frames later. The program cannot see him during this time, but it knows he s there. Understanding time can be either explicit, as in the above example, or implicit, captured in the representation of action. One example that we will expand upon later is our work in gesture recognition [3, 22]. In this work gesture is represented either deterministically by an explicit sequence of states through which the hand must move, or probabilistically by a hidden Markov model. In both cases the requirement that the interpretation be consistent with the temporal constraints of the domain is ....
.... the notion of multiple models multiple ways of describing a set of sensor data[15] 2) makes explicit the idea that a given phase of a gesture is constrained to be within some small subspace of possible human motions; and 3) represents time as a probabilistic trajectory through states [22]. The basic idea is that the different models need to approximate the (small) subspace associated with a particular state and membership in a state is determined by how well the state models can represent the current observation. The parsing of the entire gesture is accomplished by finding a ....
[Article contains additional citation context not shown here]
A. D. Wilson and A. F. Bobick. Learning visual behavior for gesture analysis. In Proc. IEEE Int'l. Symp. on Comp. Vis., Coral Gables, Florida, November 1995.
No context found.
A. Wilson, A. Bobick, Learning visual behavior for gesture analysis, IEEE International Symposium on Computer Vision, 1995.
No context found.
. A. Wilson and A. Bobick, "Learning visual behavior for gesture analysis," in IEEE International Symposium on Computer Vision, 1995.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC