Results 11  20
of
180
Spectral Clustering and Embedding with Hidden Markov Models
"... Abstract. Clustering has recently enjoyed progress via spectral methods which group data using only pairwise affinities and avoid parametric assumptions. While spectral clustering of vector inputs is straightforward, extensions to structured data or timeseries data remain less explored. This paper ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Clustering has recently enjoyed progress via spectral methods which group data using only pairwise affinities and avoid parametric assumptions. While spectral clustering of vector inputs is straightforward, extensions to structured data or timeseries data remain less explored. This paper proposes a clustering method for timeseries data that couples nonparametric spectral clustering with parametric hidden Markov models (HMMs). HMMs add some beneficial structural and parametric assumptions such as Markov properties and hidden state variables which are useful for clustering. This article shows that using probabilistic pairwise kernel estimates between parametric models provides improved experimental results for unsupervised clustering and visualization of real and synthetic datasets. Results are compared with a fully parametric baseline method (a mixture of hidden Markov models) and a nonparametric baseline method (spectral clustering with nonparametric timeseries kernels). 1
Towards less supervision in activity recognition from wearable sensors
 In Wearable Computers, 2006 10th IEEE International Symposium on
"... Activity Recognition has gained a lot of interest in recent years due to its potential and usefulness for contextaware wearable computing. However, most approaches for activity recognition rely on supervised learning techniques limiting their applicability in realworld scenarios and their scalab ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
(Show Context)
Activity Recognition has gained a lot of interest in recent years due to its potential and usefulness for contextaware wearable computing. However, most approaches for activity recognition rely on supervised learning techniques limiting their applicability in realworld scenarios and their scalability to large amounts of activities and training data. Stateoftheart activity recognition algorithms can roughly be divided in two groups concerning the choice of the classifier, one group using generative models and the other discriminative approaches. This paper presents a method for activity recognition which combines a generative model with a discriminative classifier in an integrated approach. The generative part of the algorithm allows to extract and learn structure in activity data without any labeling or supervision. The discriminant part then uses a small but labeled subset of the training data to train a discriminant classifier. In experiments we show that this scheme enables to attain high recognition rates even though only a subset of the training data is used for training. Also the tradeoff between labeling effort and recognition performance is analyzed and discussed. 1
Nonextensive information theoretic kernels on measures
 J. of Mach. Learning
, 2009
"... Abstract Positive definite kernels on probability measures have been recently applied to classification problems involving text, images, and other types of structured data. Some of these kernels are related to classic information theoretic quantities, such as (Shannon's) mutual information and ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
Abstract Positive definite kernels on probability measures have been recently applied to classification problems involving text, images, and other types of structured data. Some of these kernels are related to classic information theoretic quantities, such as (Shannon's) mutual information and the JensenShannon (JS) divergence. Meanwhile, there have been recent advances in nonextensive generalizations of Shannon's information theory. This paper bridges these two trends by introducing nonextensive information theoretic kernels on probability measures, based on new JStype divergences. These new divergences result from extending the the two building blocks of the classical JS divergence: convexity and Shannon's entropy. The notion of convexity is extended to the wider concept of qconvexity, for which we prove a Jensen qinequality. Based on this inequality, we introduce JensenTsallis (JT) qdifferences, a nonextensive generalization of the JS divergence, and define a kth order JT qdifference between stochastic processes. We then define a new family of nonextensive mutual information kernels, which allow weights to be assigned to their arguments, and which includes the Boolean, JS, and linear kernels as particular cases. Nonextensive string kernels are also defined that generalize the pspectrum kernel. We illustrate the performance of these kernels on text categorization tasks, in which documents are modeled both as bags of words and as sequences of characters.
THE NATURAL LANGUAGE OF PLAYLISTS
"... We propose a simple, scalable, and objective evaluation procedure for playlist generation algorithms. Drawing on standard techniques for statistical natural language processing, we characterize playlist algorithms as generative models of strings of songs belonging to some unknown language. To demons ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
(Show Context)
We propose a simple, scalable, and objective evaluation procedure for playlist generation algorithms. Drawing on standard techniques for statistical natural language processing, we characterize playlist algorithms as generative models of strings of songs belonging to some unknown language. To demonstrate the procedure, we compare several playlist algorithms derived from content, semantics, and metadata. We then develop an efficient algorithm to learn an optimal combination of simple playlist algorithms. Experiments on a large collection of naturally occurring playlists demonstrate the efficacy of the evaluation procedure and learning algorithm. 1.
A generative theory of similarity
 In CogSci
, 2005
"... We propose that similarity judgments are inferences about generative processes, and that two objects appear similar when they are likely to have been generated by the same process. We present a formal model based on this idea, and suggest that it may be particularly useful for explaining highlevel ..."
Abstract

Cited by 18 (6 self)
 Add to MetaCart
We propose that similarity judgments are inferences about generative processes, and that two objects appear similar when they are likely to have been generated by the same process. We present a formal model based on this idea, and suggest that it may be particularly useful for explaining highlevel judgments of similarity. We compare our model to featural and transformational accounts, and describe an experiment where it outperforms a transformational model.
Combining audio content and social context for semantic music discovery
 Proc. 32nd ACM SIGIR
, 2009
"... When attempting to annotate music, it is important to consider both acoustic content and social context. This paper explores techniques for collecting and combining multiple sources of such information for the purpose of building a querybytext music retrieval system. We consider two representation ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
(Show Context)
When attempting to annotate music, it is important to consider both acoustic content and social context. This paper explores techniques for collecting and combining multiple sources of such information for the purpose of building a querybytext music retrieval system. We consider two representations of the acoustic content (related to timbre and harmony) and two social sources (social tags and web documents). We then compare three algorithms that combine these information sources: calibrated score averaging (CSA), RankBoost, and kernel combination support vector machines (KCSVM). We demonstrate empirically that each of these algorithms is superior to algorithms that use individual information sources. Categories and Subject Descriptors
Extracting keysubstringgroup features for text classification
 In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD ’06
, 2006
"... In many text classification applications, it is appealing to take every document as a string of characters rather than a bag of words. Previous research studies in this area mostly focused on different variants of generative Markov chain models. Although discriminative machine learning methods like ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
(Show Context)
In many text classification applications, it is appealing to take every document as a string of characters rather than a bag of words. Previous research studies in this area mostly focused on different variants of generative Markov chain models. Although discriminative machine learning methods like Support Vector Machine (SVM) have been quite successful in text classification with word features, it is neither effective nor efficient to apply them straightforwardly taking all substrings in the corpus as features. In this paper, we propose to partition all substrings into statistical equivalence groups, and then pick those groups which are important (in the statistical sense) as features (named keysubstringgroup features) for text classification. In particular, we propose a suffix tree based algorithm that can extract such features in linear time (with respect to the total number of characters in the corpus). Our experiments on English, Chinese and Greek datasets show that SVM with keysubstringgroup features can achieve outstanding performance for various text classification tasks.
Generalized Kernelbased Visual Tracking
"... Abstract—Kernelbased mean shift (MS) trackers have proven to be a promising alternative to stochastic particle filtering trackers. Despite its popularity, MS trackers have two fundamental drawbacks: (1) The template model can only be built from a single image; (2) It is difficult to adaptively upda ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
(Show Context)
Abstract—Kernelbased mean shift (MS) trackers have proven to be a promising alternative to stochastic particle filtering trackers. Despite its popularity, MS trackers have two fundamental drawbacks: (1) The template model can only be built from a single image; (2) It is difficult to adaptively update the template model. In this work we generalize the plain MS trackers and attempt to overcome these two limitations. It is well known that modeling and maintaining a representation of a target object is an important component of a successful visual tracker. However, little work has been done on building a robust template model for kernelbased MS tracking. In contrast to building a template from a single frame, we train a robust object representation model from a large amount of data. Tracking is viewed as a binary classification problem, and a discriminative classification rule is learned to distinguish between the object and background. We adopt a support vector machine (SVM) for training. The tracker is then implemented by maximizing the classification score. An iterative optimization scheme very similar to MS is derived for this purpose. Compared with the plain MS tracker, it is now much easier to incorporate online template adaptation to cope with inherent changes during the course of tracking. To this end, a sophisticated online support vector machine is used. We demonstrate successful localization and tracking on various data sets. Index Terms—Kernelbased tracking, mean shift, particle filter, support vector machine, global mode seeking. I.
A Reproducing Kernel Hilbert Space Framework for InformationTheoretic Learning
"... Abstract—This paper provides a functional analysis perspective of informationtheoretic learning (ITL) by defining bottomup a reproducing kernel Hilbert space (RKHS) uniquely determined by the symmetric nonnegative definite kernel function known as the crossinformation potential (CIP). The CIP as ..."
Abstract

Cited by 13 (8 self)
 Add to MetaCart
(Show Context)
Abstract—This paper provides a functional analysis perspective of informationtheoretic learning (ITL) by defining bottomup a reproducing kernel Hilbert space (RKHS) uniquely determined by the symmetric nonnegative definite kernel function known as the crossinformation potential (CIP). The CIP as an integral of the product of two probability density functions characterizes similarity between two stochastic functions. We prove the existence of a onetoone congruence mapping between the ITL RKHS and the Hilbert space spanned by square integrable probability density functions. Therefore, all the statistical descriptors in the original informationtheoretic learning formulation can be rewritten as algebraic computations on deterministic functional vectors in the ITL RKHS, instead of limiting the functional view to the estimators as is commonly done in kernel methods. A connection between the ITL RKHS and kernel approaches interested in quantifying the statistics of the projected data is also established. Index Terms—Crossinformation potential, informationtheoretic learning (ITL), kernel function, probability density function,
Priors for Diversity in Generative Latent Variable Models
"... Probabilistic latent variable models are one of the cornerstones of machine learning. They offer a convenient and coherent way to specify prior distributions over unobserved structure in data, so that these unknown properties can be inferred via posterior inference. Such models are useful for explor ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
(Show Context)
Probabilistic latent variable models are one of the cornerstones of machine learning. They offer a convenient and coherent way to specify prior distributions over unobserved structure in data, so that these unknown properties can be inferred via posterior inference. Such models are useful for exploratory analysis and visualization, for building density models of data, and for providing features that can be used for later discriminative tasks. A significant limitation of these models, however, is that draws from the prior are often highly redundant due to i.i.d. assumptions on internal parameters. For example, there is no preference in the prior of a mixture model to make components nonoverlapping, or in topic model to ensure that cooccurring words only appear in a small number of topics. In this work, we revisit these independence assumptions for probabilistic latent variable models, replacing the underlying i.i.d. prior with a determinantal point process (DPP). The DPP allows us to specify a preference for diversity in our latent variables using a positive definite kernel function. Using a kernel between probability distributions, we are able to define a DPP on probability measures. We show how to perform MAP inference with DPP priors in latent Dirichlet allocation and in mixture models, leading to better intuition for the latent variable representation and quantitatively improved unsupervised feature extraction, without compromising the generative aspects of the model. 1