Results 1  10
of
269
Estimating the Support of a HighDimensional Distribution
, 1999
"... Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We propo ..."
Abstract

Cited by 783 (29 self)
 Add to MetaCart
Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We propose a method to approach this problem by trying to estimate a function f which is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a preliminary theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabelled d...
From frequency to meaning : Vector space models of semantics
 Journal of Artificial Intelligence Research
, 2010
"... Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are begi ..."
Abstract

Cited by 347 (3 self)
 Add to MetaCart
(Show Context)
Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term–document, word–context, and pair–pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field. 1.
Convolution Kernels for Natural Language
 Advances in Neural Information Processing Systems 14
, 2001
"... We describe the application of kernel methods to Natural Language Processing (NLP) problems. In many NLP tasks the objects being modeled are strings, trees, graphs or other discrete structures which require some mechanism to convert them into feature vectors. We describe kernels for various natural ..."
Abstract

Cited by 340 (7 self)
 Add to MetaCart
(Show Context)
We describe the application of kernel methods to Natural Language Processing (NLP) problems. In many NLP tasks the objects being modeled are strings, trees, graphs or other discrete structures which require some mechanism to convert them into feature vectors. We describe kernels for various natural language structures, allowing rich, high dimensional representations of these structures. We show how a kernel over trees can be applied to parsing using the voted perceptron algorithm, and we give experimental results on the ATIS corpus of parse trees.
On kernel target alignment
 ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14
, 2002
"... Kernel based methods are increasingly being used for data modeling because of their conceptual simplicity and outstanding performance on many tasks. However, the kernel function is often chosen using trialanderror heuristics. In this paper we address the problem of measuring the degree of agreem ..."
Abstract

Cited by 298 (8 self)
 Add to MetaCart
Kernel based methods are increasingly being used for data modeling because of their conceptual simplicity and outstanding performance on many tasks. However, the kernel function is often chosen using trialanderror heuristics. In this paper we address the problem of measuring the degree of agreement between a kernel and a learning task. A quantitative measure of agreement is important from both a theoretical and practical point of view. We propose a quantity to capture this notion, which we call Alignment. We study its theoretical properties, and derive a series of simple algorithms for adapting a kernel to the labels and vice versa. This produces a series of novel methods for clustering and transduction, kernel combination and kernel selection. The algorithms are tested on two publicly available datasets and are shown to exhibit good performance.
Gaussian process latent variable models for visualisation of high dimensional data
 Adv. in Neural Inf. Proc. Sys
, 2004
"... We introduce a variational inference framework for training the Gaussian process latent variable model and thus performing Bayesian nonlinear dimensionality reduction. This method allows us to variationally integrate out the input variables of the Gaussian process and compute a lower bound on the ex ..."
Abstract

Cited by 230 (13 self)
 Add to MetaCart
(Show Context)
We introduce a variational inference framework for training the Gaussian process latent variable model and thus performing Bayesian nonlinear dimensionality reduction. This method allows us to variationally integrate out the input variables of the Gaussian process and compute a lower bound on the exact marginal likelihood of the nonlinear latent variable model. The maximization of the variational lower bound provides a Bayesian training procedure that is robust to overfitting and can automatically select the dimensionality of the nonlinear latent space. We demonstrate our method on real world datasets. The focus in this paper is on dimensionality reduction problems, but the methodology is more general. For example, our algorithm is immediately applicable for training Gaussian process models in the presence of missing or uncertain inputs. 1
A Generalized Representer Theorem
 In Proceedings of the Annual Conference on Computational Learning Theory
, 2001
"... Wahba's classical representer theorem states that the solutions of certain risk minimization problems involving an empirical risk term and a quadratic regularizer can be written as expansions in terms of the training examples. We generalize the theorem to a larger class of regularizers and ..."
Abstract

Cited by 222 (17 self)
 Add to MetaCart
(Show Context)
Wahba's classical representer theorem states that the solutions of certain risk minimization problems involving an empirical risk term and a quadratic regularizer can be written as expansions in terms of the training examples. We generalize the theorem to a larger class of regularizers and empirical risk terms, and give a selfcontained proof utilizing the feature space associated with a kernel. The result shows that a wide range of problems have optimal solutions that live in the finite dimensional span of the training examples mapped into feature space, thus enabling us to carry out kernel algorithms independent of the (potentially infinite) dimensionality of the feature space.
Perspectives on system identification
 In Plenary talk at the proceedings of the 17th IFAC World Congress, Seoul, South Korea
, 2008
"... System identification is the art and science of building mathematical models of dynamic systems from observed inputoutput data. It can be seen as the interface between the real world of applications and the mathematical world of control theory and model abstractions. As such, it is an ubiquitous ne ..."
Abstract

Cited by 167 (3 self)
 Add to MetaCart
(Show Context)
System identification is the art and science of building mathematical models of dynamic systems from observed inputoutput data. It can be seen as the interface between the real world of applications and the mathematical world of control theory and model abstractions. As such, it is an ubiquitous necessity for successful applications. System identification is a very large topic, with different techniques that depend on the character of the models to be estimated: linear, nonlinear, hybrid, nonparametric etc. At the same time, the area can be characterized by a small number of leading principles, e.g. to look for sustainable descriptions by proper decisions in the triangle of model complexity, information contents in the data, and effective validation. The area has many facets and there are many approaches and methods. A tutorial or a survey in a few pages is not quite possible. Instead, this presentation aims at giving an overview of the “science ” side, i.e. basic principles and results and at pointing to open problem areas in the practical, “art”, side of how to approach and solve a real problem. 1.
Support Vector Method for Novelty Detection
, 2000
"... Suppose you are given some dataset drawn from an underlying probability distributionPand you want to estimate a “simple ” subsetSof input space such that the probability that a test point drawn from P lies outside of Sequals some a priori specified between0and1. We propose a m ethod to approach this ..."
Abstract

Cited by 164 (4 self)
 Add to MetaCart
Suppose you are given some dataset drawn from an underlying probability distributionPand you want to estimate a “simple ” subsetSof input space such that the probability that a test point drawn from P lies outside of Sequals some a priori specified between0and1. We propose a m ethod to approach this problem by trying to estimate a function f which is positive on S and negative on the complement. The functional form offis given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. We provide a theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabelled data.
Iterative quantization: A procrustean approach to learning binary codes
 In Proc. of the IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR
, 2011
"... This paper addresses the problem of learning similaritypreserving binary codes for efficient retrieval in largescale image collections. We propose a simple and efficient alternating minimization scheme for finding a rotation of zerocentered data so as to minimize the quantization error of mapping t ..."
Abstract

Cited by 157 (6 self)
 Add to MetaCart
This paper addresses the problem of learning similaritypreserving binary codes for efficient retrieval in largescale image collections. We propose a simple and efficient alternating minimization scheme for finding a rotation of zerocentered data so as to minimize the quantization error of mapping this data to the vertices of a zerocentered binary hypercube. This method, dubbed iterative quantization (ITQ), has connections to multiclass spectral clustering and to the orthogonal Procrustes problem, and it can be used both with unsupervised data embeddings such as PCA and supervised embeddings such as canonical correlation analysis (CCA). Our experiments show that the resulting binary coding schemes decisively outperform several other stateoftheart methods. 1.
Performance Animation from Lowdimensional Control Signals
 ACM Transactions on Graphics
, 2005
"... This paper introduces an approach to performance animation that employs video cameras and a small set of retroreflective markers to create a lowcost, easytouse system that might someday be practical for home use. The lowdimensional control signals from the user's performance are supplement ..."
Abstract

Cited by 129 (18 self)
 Add to MetaCart
This paper introduces an approach to performance animation that employs video cameras and a small set of retroreflective markers to create a lowcost, easytouse system that might someday be practical for home use. The lowdimensional control signals from the user's performance are supplemented by a database of prerecorded human motion. At run time, the system automatically learns a series of local models from a set of motion capture examples that are a close match to the marker locations captured by the cameras. These local models are then used to reconstruct the motion of the user as a fullbody animation. We demonstrate the power of this approach with realtime control of six different behaviors using two video cameras and a small set of retroreflective markers. We compare the resulting animation to animation from commercial motion capture equipment with a full set of markers.