Results 11  20
of
41
Hidden Markov tree models for semantic class induction Édouard Grave Inria Sierra ProjectTeam
"... In this paper, we propose a new method for semantic class induction. First, we introduce a generative model of sentences, based on dependency trees and which takes into account homonymy. Our model can thus be seen as a generalization of Brown clustering. Second, we describe an efficient algorithm to ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
In this paper, we propose a new method for semantic class induction. First, we introduce a generative model of sentences, based on dependency trees and which takes into account homonymy. Our model can thus be seen as a generalization of Brown clustering. Second, we describe an efficient algorithm to perform inference and learning in this model. Third, we apply our proposed method on two large datasets (10 8 tokens, 10 5 words types), and demonstrate that classes induced by our algorithm improve performance over Brown clustering on the task of semisupervised supersense tagging and named entity recognition. 1
A Spectral Algorithm for Learning ClassBased ngram Models of Natural Language
"... The Brown clustering algorithm (Brown et al., 1992) is widely used in natural language processing (NLP) to derive lexical representations that are then used to improve performance on various NLP problems. The algorithm assumes an underlying model that is essentially an HMM, with the restriction th ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
The Brown clustering algorithm (Brown et al., 1992) is widely used in natural language processing (NLP) to derive lexical representations that are then used to improve performance on various NLP problems. The algorithm assumes an underlying model that is essentially an HMM, with the restriction that each word in the vocabulary is emitted from a single state. A greedy, bottomup method is then used to find the clustering; this method does not have a guarantee of finding the correct underlying clustering. In this paper we describe a new algorithm for clustering under the Brown et al. model. The method relies on two steps: first, the use of canonical correlation analysis to derive a lowdimensional representation of words; second, a bottomup hierarchical clustering over these representations. We show that given a sufficient number of training examples sampled from the Brown et al. model, the method is guaranteed to recover the correct clustering. Experiments show that the method recovers clusters of comparable quality to the algorithm of Brown et al. (1992), but is an order of magnitude more efficient. 1
Nudging the envelope of direct transfer methods for multilingual named entity recognition
 In WILSNAACL
, 2012
"... In this paper, we study direct transfer methods for multilingual named entity recognition. Specifically, we extend the method recently proposed by Täckström et al. (2012), which is based on crosslingual word cluster features. First, we show that by using multiple source languages, combined with sel ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
In this paper, we study direct transfer methods for multilingual named entity recognition. Specifically, we extend the method recently proposed by Täckström et al. (2012), which is based on crosslingual word cluster features. First, we show that by using multiple source languages, combined with selftraining for target language adaptation, we can achieve significant improvements compared to using only single source direct transfer. Second, we investigate how the direct transfer system fares against a supervised target language system and conclude that between 8,000 and 16,000 word tokens need to be annotated in each target language to match the best direct transfer system. Finally, we show that we can significantly improve target language performance, even after annotating up to 64,000 tokens in the target language, by simply concatenating source and target language annotations. 1
An efficient approach to integrating radius information into multiple kernel learning
 IEEE. TSMCB
, 2012
"... Abstract—Integrating radius information has been demonstrated by recent work on multiple kernel learning (MKL) as a promising way to improve kernel learning performance. Directly integrating the radius of the minimum enclosing ball (MEB) into MKL as it is, however, not only incurs significant compu ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
Abstract—Integrating radius information has been demonstrated by recent work on multiple kernel learning (MKL) as a promising way to improve kernel learning performance. Directly integrating the radius of the minimum enclosing ball (MEB) into MKL as it is, however, not only incurs significant computational overhead but also possibly adversely affects the kernel learning performance due to the notorious sensitivity of this radius to outliers. Inspired by the relationship between the radius of the MEB and the trace of total data scattering matrix, this paper proposes to incorporate the latter into MKL to improve the situation. In particular, in order to well justify the incorporation of radius information, we strictly comply with the radius–margin bound of support vector machines (SVMs) and thus focus on the 2norm softmargin SVM classifier. Detailed theoretical analysis is conducted to show how the proposed approach effectively preserves
New Subsampling Algorithms for Fast Least Squares Regression
"... We address the problem of fast estimation of ordinary least squares (OLS) from large amounts of data (n p). We propose three methods which solve the big data problem by subsampling the covariance matrix using either a single or two stage estimation. All three run in the order of size of input i.e. ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
We address the problem of fast estimation of ordinary least squares (OLS) from large amounts of data (n p). We propose three methods which solve the big data problem by subsampling the covariance matrix using either a single or two stage estimation. All three run in the order of size of input i.e. O(np) and our best method, Uluru, gives an error bound of O( p/n) which is independent of the amount of subsampling as long as it is above a threshold. We provide theoretical bounds for our algorithms in the fixed design (with Randomized Hadamard preconditioning) as well as subGaussian random design setting. We also compare the performance of our methods on synthetic and realworld datasets and show that if observations are i.i.d., subGaussian then one can directly subsample without the expensive Randomized Hadamard preconditioning without loss of accuracy. 1
Spectral learning of latentvariable PCFGs: Algorithms and sample complexity
 Journal of Machine Learning Research
, 2014
"... Abstract We introduce a spectral learning algorithm for latentvariable PCFGs ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Abstract We introduce a spectral learning algorithm for latentvariable PCFGs
Efficient Dimensionality Reduction for Canonical Correlation Analysis
"... We present a fast algorithm for approximate Canonical Correlation Analysis (CCA). Given a pair of tallandthin matrices, the proposed algorithm first employs a randomized dimensionality reduction transform to reduce the size of the input matrices, and then applies any standard CCA algorithm to the ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
We present a fast algorithm for approximate Canonical Correlation Analysis (CCA). Given a pair of tallandthin matrices, the proposed algorithm first employs a randomized dimensionality reduction transform to reduce the size of the input matrices, and then applies any standard CCA algorithm to the new pair of matrices. The algorithm computes an approximate CCA to the original pair of matrices with provable guarantees, while requiring asymptotically less operations than the stateoftheart exact algorithms. 1.
Large scale canonical correlation analysis with iterative least squares. arXiv preprint arXiv:1407.4508
, 2014
"... ar ..."
(Show Context)
Efficient learning for spoken language understanding tasks with word embedding based pretraining
 in Proceedings of the Interspeech
, 2015
"... Spoken language understanding (SLU) tasks such as goal estimation and intention identification from user’s commands are essential components in spoken dialog systems. In recent years, neural network approaches have shown great success in various SLU tasks. However, one major difficulty of SLU is th ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Spoken language understanding (SLU) tasks such as goal estimation and intention identification from user’s commands are essential components in spoken dialog systems. In recent years, neural network approaches have shown great success in various SLU tasks. However, one major difficulty of SLU is that the annotation of collected data can be expensive. Often this results in insufficient data being available for a task. The performance of a neural network trained in low resource conditions is usually inferior because of overtraining. To improve the performance, this paper investigates the use of unsupervised training methods with largescale corpora based on word embedding and latent topic models to pretrain the SLU networks. In order to capture longterm characteristics over the entire dialog, we propose a novel Recurrent Neural Network (RNN) architecture. The proposed RNN uses two subnetworks to model the different time scales represented by word and turn sequences. The combination of pretraining and RNN gives us a 18 % relative error reduction compared to a baseline system
Learning canonical correlations of paired tensor sets via tensortovector projection
 In: the 23rd International Joint Conference on Artificial Intelligence (2013
"... Canonical correlation analysis (CCA) is a useful technique for measuring relationship between two sets of vector data. For paired tensor data sets, we propose a multilinear CCA (MCCA) method. Unlike existing multilinear variations of CCA, MCCA extracts uncorrelated features under two architectures w ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Canonical correlation analysis (CCA) is a useful technique for measuring relationship between two sets of vector data. For paired tensor data sets, we propose a multilinear CCA (MCCA) method. Unlike existing multilinear variations of CCA, MCCA extracts uncorrelated features under two architectures while maximizing paired correlations. Through a pair of tensortovector projections, one architecture enforces zerocorrelation within each set while the other enforces zerocorrelation between different pairs of the two sets. We take a successive and iterative approach to solve the problem. Experiments on matching faces of different poses show that MCCA outperforms CCA and 2DCCA, while using much fewer features. In addition, the fusion of two architectures leads to performance improvement, indicating complementary information. 1