Results 1  10
of
20
A Spectral Algorithm for Learning ClassBased ngram Models of Natural Language
"... The Brown clustering algorithm (Brown et al., 1992) is widely used in natural language processing (NLP) to derive lexical representations that are then used to improve performance on various NLP problems. The algorithm assumes an underlying model that is essentially an HMM, with the restriction th ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
The Brown clustering algorithm (Brown et al., 1992) is widely used in natural language processing (NLP) to derive lexical representations that are then used to improve performance on various NLP problems. The algorithm assumes an underlying model that is essentially an HMM, with the restriction that each word in the vocabulary is emitted from a single state. A greedy, bottomup method is then used to find the clustering; this method does not have a guarantee of finding the correct underlying clustering. In this paper we describe a new algorithm for clustering under the Brown et al. model. The method relies on two steps: first, the use of canonical correlation analysis to derive a lowdimensional representation of words; second, a bottomup hierarchical clustering over these representations. We show that given a sufficient number of training examples sampled from the Brown et al. model, the method is guaranteed to recover the correct clustering. Experiments show that the method recovers clusters of comparable quality to the algorithm of Brown et al. (1992), but is an order of magnitude more efficient. 1
Contrastive Learning Using Spectral Methods
"... In many natural settings, the analysis goal is not to characterize a single data set in isolation, but rather to understand the difference between one set of observations and another. For example, given a background corpus of news articles together with writings of a particular author, one may want ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
In many natural settings, the analysis goal is not to characterize a single data set in isolation, but rather to understand the difference between one set of observations and another. For example, given a background corpus of news articles together with writings of a particular author, one may want a topic model that explains word patterns and themes specific to the author. Another example comes from genomics, in which biological signals may be collected from different regions of a genome, and one wants a model that captures the differential statistics observed in these regions. This paper formalizes this notion of contrastive learning for mixture models, and develops spectral algorithms for inferring mixture components specific to a foreground data set when contrasted with a background data set. The method builds on recent momentbased estimators and tensor decompositions for latent variable models, and has the intuitive feature of using background data statistics to appropriately modify moments estimated from foreground data. A key advantage of the method is that the background data need only be coarsely modeled, which is important when the background is too complex, noisy, or not of interest. The method is demonstrated on applications in contrastive topic modeling and genomic sequence analysis. 1
Methods of Moments for Learning Stochastic Languages: Unified Presentation and Empirical Comparison
"... Probabilistic latentvariable models are a powerful tool for modelling structured data. However, traditional expectationmaximization methods of learning such models are both computationally expensive and prone to localminima. In contrast to these traditional methods, recently developed learning a ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Probabilistic latentvariable models are a powerful tool for modelling structured data. However, traditional expectationmaximization methods of learning such models are both computationally expensive and prone to localminima. In contrast to these traditional methods, recently developed learning algorithms based upon the method of moments are both computationally efficient and provide strong statistical guarantees. In this work we provide a unified presentation and empirical comparison of three general momentbased methods in the context of modelling stochastic languages. By rephrasing these methods upon a common theoretical ground, introducing novel theoretical results where necessary, we provide a clear comparison, making explicit the statistical assumptions upon which each method relies. With this theoretical grounding, we then provide an indepth empirical analysis of the methods on both real and synthetic data with the goal of elucidating performance trends and highlighting important implementation details. 1.
Spectral Learning of Refinement HMMs
, 2013
"... We derive a spectral algorithm for learning the parameters of a refinement HMM. This method is simple, efficient, and can be applied to a wide range of supervised sequence labeling tasks. Like other spectral methods, it avoids the problem of local optima and provides a consistent estimate of the par ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
We derive a spectral algorithm for learning the parameters of a refinement HMM. This method is simple, efficient, and can be applied to a wide range of supervised sequence labeling tasks. Like other spectral methods, it avoids the problem of local optima and provides a consistent estimate of the parameters. Our experiments on a phoneme recognition task show that when equipped with informative feature functions, it performs significantly better than a supervised HMM and competitively with EM.
Spectral learning of latentvariable PCFGs: Algorithms and sample complexity
 Journal of Machine Learning Research
, 2014
"... Abstract We introduce a spectral learning algorithm for latentvariable PCFGs ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Abstract We introduce a spectral learning algorithm for latentvariable PCFGs
Modelbased word embeddings from decompositions of count matrices
, 2015
"... This work develops a new statistical understanding of word embeddings induced from transformed count data. Using the class of hidden Markov models (HMMs) underlying Brown clustering as a generative model, we demonstrate how canonical correlation analysis (CCA) and certain count transformations pe ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
This work develops a new statistical understanding of word embeddings induced from transformed count data. Using the class of hidden Markov models (HMMs) underlying Brown clustering as a generative model, we demonstrate how canonical correlation analysis (CCA) and certain count transformations permit efficient and effective recovery of model parameters with lexical semantics. We further show in experiments that these techniques empirically outperform existing spectral methods on word similarity and analogy tasks, and are also competitive with other popular methods such as WORD2VEC and GLOVE. 1
Diversity in spectral learning for natural language parsing
 In Proceedings of EMNLP
, 2015
"... We describe an approach to create a diverse set of predictions with spectral learning of latentvariable PCFGs (LPCFGs). Our approach works by creating multiple spectral models where noise is added to the underlying features in the training set before the estimation of each model. We describe thr ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
We describe an approach to create a diverse set of predictions with spectral learning of latentvariable PCFGs (LPCFGs). Our approach works by creating multiple spectral models where noise is added to the underlying features in the training set before the estimation of each model. We describe three ways to decode with multiple models. In addition, we describe a simple variant of the spectral algorithm for LPCFGs that is fast and leads to compact models. Our experiments for natural language parsing, for English and German, show that we get a significant improvement over baselines comparable to state of the art. For English, we achieve the F1 score of 90.18, and for German we achieve the F1 score of 83.38. 1
A Provably Correct Learning Algorithm for LatentVariable PCFGs
"... We introduce a provably correct learning algorithm for latentvariable PCFGs. The algorithm relies on two steps: first, the use of a matrixdecomposition algorithm applied to a cooccurrence matrix estimated from the parse trees in a training sample; second, the use of EM applied to a convex object ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
We introduce a provably correct learning algorithm for latentvariable PCFGs. The algorithm relies on two steps: first, the use of a matrixdecomposition algorithm applied to a cooccurrence matrix estimated from the parse trees in a training sample; second, the use of EM applied to a convex objective derived from the training samples in combination with the output from the matrix decomposition. Experiments on parsing and a language modeling problem show that the algorithm is efficient and effective in practice. 1
LatentVariable Synchronous CFGs for Hierarchical Translation
"... Datadriven refinement of nonterminal categories has been demonstrated to be a reliable technique for improving monolingual parsing with PCFGs. In this paper, we extend these techniques to learn latent refinements of singlecategory synchronous grammars, so as to improve translation performance. ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Datadriven refinement of nonterminal categories has been demonstrated to be a reliable technique for improving monolingual parsing with PCFGs. In this paper, we extend these techniques to learn latent refinements of singlecategory synchronous grammars, so as to improve translation performance. We compare two estimators for this latentvariable model: one based on EM and the other is a spectral algorithm based on the method of moments. We evaluate their performance on a Chinese–English translation task. The results indicate that we can achieve significant gains over the baseline with both approaches, but in particular the momentsbased estimator is both faster and performs better than EM. 1
2014b. Learning grammar with explicit annotations for subordinating conjunctions in chinese
 In Proceedings of the 52th annual meeting of the Association for Computational Linguistics Student Research Workshop. Association for Computational Linguistics
"... Datadriven approach for parsing may suffer from data sparsity when entirely unsupervised. External knowledge has been shown to be an effective way to alleviate this problem. Subordinating conjunctions impose important constraints on Chinese syntactic structures. This paper proposes a method to de ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Datadriven approach for parsing may suffer from data sparsity when entirely unsupervised. External knowledge has been shown to be an effective way to alleviate this problem. Subordinating conjunctions impose important constraints on Chinese syntactic structures. This paper proposes a method to develop a grammar with hierarchical category knowledge of subordinating conjunctions as explicit annotations. Firstly, each partofspeech tag of the subordinating conjunctions is annotated with the most general category in the hierarchical knowledge. Those categories are humandefined to represent distinct syntactic constraints, and provide an appropriate starting point for splitting. Secondly, based on the datadriven statesplit approach, we establish a mapping from each automatic refined subcategory to the one in the hierarchical knowledge. Then the datadriven splitting of these categories is restricted by the knowledge to avoid over refinement. Experiments demonstrate that constraining the grammar learning by the hierarchical knowledge improves parsing performance significantly over the baseline. 1