Results 1  10
of
15
Factorized Latent Spaces with Structured Sparsity
"... Recent approaches to multiview learning have shown that factorizing the information into parts that are shared across all views and parts that are private to each view could effectively account for the dependencies and independencies between the different input modalities. Unfortunately, these appr ..."
Abstract

Cited by 37 (3 self)
 Add to MetaCart
(Show Context)
Recent approaches to multiview learning have shown that factorizing the information into parts that are shared across all views and parts that are private to each view could effectively account for the dependencies and independencies between the different input modalities. Unfortunately, these approaches involve minimizing nonconvex objective functions. In this paper, we propose an approach to learning such factorized representations inspired by sparse coding techniques. In particular, we show that structured sparsity allows us to address the multiview learning problem by alternately solving two convex optimization problems. Furthermore, the resulting factorized latent spaces generalize over existing approaches in that they allow having latent dimensions shared between any subset of the views instead of between all the views only. We show that our approach outperforms stateoftheart methods on the task of human pose estimation. 1
Deep Canonical Correlation Analysis
"... We introduce Deep Canonical Correlation Analysis (DCCA), a method to learn complex nonlinear transformations of two views of data such that the resulting representations are highly linearly correlated. Parameters of both transformations are jointly learned to maximize the (regularized) total correla ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
We introduce Deep Canonical Correlation Analysis (DCCA), a method to learn complex nonlinear transformations of two views of data such that the resulting representations are highly linearly correlated. Parameters of both transformations are jointly learned to maximize the (regularized) total correlation. It can be viewed as a nonlinear extension of the linear method canonical correlation analysis (CCA). It is an alternative to the nonparametric method kernel canonical correlation analysis (KCCA) for learning correlated nonlinear transformations. Unlike KCCA, DCCA does not require an inner product, and has the advantages of a parametric method: training time scales well with data size and the training data need not be referenced when computing the representations of unseen instances. In experiments on two realworld datasets, we find that DCCA learns representations with significantly higher correlation than those learned by CCA and KCCA. We also introduce a novel nonsaturating sigmoid function based on the cube root that may be useful more generally in feedforward neural networks.
Factorized Orthogonal Latent Spaces
"... Existing approaches to multiview learning are particularly effective when the views are either independent (i.e, multikernel approaches) or fully dependent (i.e., shared latent spaces). However, in real scenarios, these assumptions are almost never truly satisfied. Recently, two methods have attem ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
(Show Context)
Existing approaches to multiview learning are particularly effective when the views are either independent (i.e, multikernel approaches) or fully dependent (i.e., shared latent spaces). However, in real scenarios, these assumptions are almost never truly satisfied. Recently, two methods have attempted to tackle this problem by factorizing the information and learn separate latent spaces for modeling the shared (i.e., correlated) and private (i.e., independent) parts of the data. However, these approaches are very sensitive to parameters setting or initialization. In this paper we propose a robust approach to factorizing the latent space into shared and private spaces by introducing orthogonality constraints, which penalize redundant latent representations. Furthermore, unlike previous approaches, we simultaneously learn the structure and dimensionality of the latent spaces by relying on a regularizer that encourages the latent space of each data stream to be low dimensional. To demonstrate the benefits of our approach, we apply it to two existing shared latent space models that assume full dependence of the views, the sGPLVM and the sKIE, and show that our constraints improve the performance of these models on the task of pose estimation from monocular images. 1
Visual Speech Synthesis by Modelling Coarticulation Dynamics using a NonParametric Switching StateSpace Model
"... We present a novel approach to speechdriven facial animation using a nonparametric switching state space model based on Gaussian processes. The model is an extension of the shared Gaussian process dynamical model, augmented with switching states. Audio and visual data from a talking head corpus ar ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
We present a novel approach to speechdriven facial animation using a nonparametric switching state space model based on Gaussian processes. The model is an extension of the shared Gaussian process dynamical model, augmented with switching states. Audio and visual data from a talking head corpus are jointly modelled using the proposed method. The switching states are found using variable length Markov models trained on labelled phonetic data. We also propose a synthesis technique that takes into account both previous and future phonetic context, thus accounting for coarticulatory effects in speech.
SpeechDriven Facial Animation Using A Shared Gaussian Process Latent Variable Model
"... Abstract. In this work, synthesis of facial animation is done by modelling the mapping between facial motion and speech using the shared Gaussian process latent variable model. Both data are processed separately and subsequently coupled together to yield a shared latent space. This method allows coa ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract. In this work, synthesis of facial animation is done by modelling the mapping between facial motion and speech using the shared Gaussian process latent variable model. Both data are processed separately and subsequently coupled together to yield a shared latent space. This method allows coarticulation to be modelled by having a dynamical model on the latent space. Synthesis of novel animation is done by first obtaining intermediate latent points from the audio data and then using a Gaussian Process mapping to predict the corresponding visual data. Statistical evaluation of generated visual features against ground truth data compares favourably with known methods of speech animation. The generated videos are found to show proper synchronisation with audio and exhibit correct facial dynamics. 1
Relativelypaired space analysis
 In BMVC
, 2013
"... Discovering a latent common space between different modalities plays an important role in crossmodality pattern recognition. Existing techniques often require absolutelypaired observations as training data, and are incapable of capturing more general semantic relationships between crossmodality ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Discovering a latent common space between different modalities plays an important role in crossmodality pattern recognition. Existing techniques often require absolutelypaired observations as training data, and are incapable of capturing more general semantic relationships between crossmodality observations. This greatly limits their applications. In this paper, we propose a general framework for learning a latent common space from relativelypaired observations (i.e., two observations from different modalities are morelikelypaired than another two). Relativepairing information is encoded using relative proximities of observations in the latent common space. By building a discriminative model and maximizing a distance margin, a projection function that maps observations into the latent common space is learned for each modality. Crossmodality pattern recognition can then be carried out in the latent common space. To evaluate its performance, the proposed framework has been applied to crosspose face recognition and feature fusion. Experimental results demonstrate that the proposed framework outperforms other stateoftheart approaches. 1
NonLinear Domain Adaptation with Boosting
"... A common assumption in machine vision is that the training and test samples are drawn from the same distribution. However, there are many problems when this assumption is grossly violated, as in biomedical applications where different acquisitions can generate drastic variations in the appearance o ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
A common assumption in machine vision is that the training and test samples are drawn from the same distribution. However, there are many problems when this assumption is grossly violated, as in biomedical applications where different acquisitions can generate drastic variations in the appearance of the data due to changing experimental conditions. This problem is accentuated with 3D data, for which annotation is very timeconsuming, limiting the amount of data that can be labeled in new acquisitions for training. In this paper we present a multitask learning algorithm for domain adaptation based on boosting. Unlike previous approaches that learn taskspecific decision boundaries, our method learns a single decision boundary in a shared feature space, common to all tasks. We use the boostingtrick to learn a nonlinear mapping of the observations in each task, with no need for specific apriori knowledge of its global analytical form. This yields a more parameterfree domain adaptation approach that successfully leverages learning on new tasks where labeled data is scarce. We evaluate our approach on two challenging biomedical datasets and achieve a significant improvement over the state of the art. 1
FOLS: Factorized Orthogonal Latent Spaces
"... Many machine learning problems inherently involve multiple views. Kernel combination approaches to multiview learning [1] are particularly effective when the views are independent. In contrast, other methods take advantage ..."
Abstract
 Add to MetaCart
(Show Context)
Many machine learning problems inherently involve multiple views. Kernel combination approaches to multiview learning [1] are particularly effective when the views are independent. In contrast, other methods take advantage
IEEE TRANSACTIONS ON CYBERNETICS 1 Monocular 3D Gait Tracking in Surveillance Scenes
"... Abstract—Gait recognition can potentially provide a noninvasive and effective biometric authentication from a distance. However, the performance of gait recognition systems will suffer in real surveillance scenarios with multiple interacting individuals and where the camera is usually placed at a s ..."
Abstract
 Add to MetaCart
Abstract—Gait recognition can potentially provide a noninvasive and effective biometric authentication from a distance. However, the performance of gait recognition systems will suffer in real surveillance scenarios with multiple interacting individuals and where the camera is usually placed at a significant angle and distance from the floor. We present a methodology for viewinvariant monocular 3D human pose tracking in manmade environments in which we assume that observed people move on a known ground plane. First, we model 3D body poses and camera viewpoints with a low dimensional manifold and learn a generative model of the silhouette from this manifold to a reduced set of training views. During the online stage, 3D body poses are tracked using recursive Bayesian sampling conducted jointly over the scene’s ground plane and the poseviewpoint manifold. For each sample, the homography that relates the corresponding training plane to the image points is calculated using the dominant 3D directions of the scene, the sampled location on the ground plane and the sampled camera view. Each regressed silhouette shape is projected using this homographic transformation and matched in the image to estimate its likelihood. Our framework is able to track 3D human walking poses in a 3D environment exploring only a 4 dimensional state space with success. In our experimental evaluation, we demonstrate the significant improvements of the homographic alignment over a commonly used similarity transformation and provide quantitative pose tracking results for the monocular sequences with high perspective effect from the CAVIAR dataset. Index Terms—Monocular gait tracking, 3D body pose, videosurveillance, viewinvariance, particle filtering. I.