• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 29
Next 10 →

Evaluation methods for topic models

by Hanna M. Wallach, Iain Murray, Ruslan Salakhutdinov, David Mimno - In ICML , 2009
"... A natural evaluation metric for statistical topic models is the probability of held-out documents given a trained model. While exact computation of this probability is intractable, several estimators for this probability have been used in the topic modeling literature, including the harmonic mean me ..."
Abstract - Cited by 111 (10 self) - Add to MetaCart
A natural evaluation metric for statistical topic models is the probability of held-out documents given a trained model. While exact computation of this probability is intractable, several estimators for this probability have been used in the topic modeling literature, including the harmonic mean

Pachinko allocation: DAG-structured mixture models of topic correlations

by Wei Li, Andrew Mccallum - In Proceedings of the 23rd International Conference on Machine Learning , 2006
"... Latent Dirichlet allocation (LDA) and other related topic models are increasingly popular tools for summarization and manifold discovery in discrete data. However, LDA does not capture correlations between topics. In this paper, we introduce the pachinko allocation model (PAM), which captures arbitr ..."
Abstract - Cited by 181 (8 self) - Add to MetaCart
a flexible alternative to recent work by Blei and Lafferty (2006), which captures correlations only between pairs of topics. Using text data from newsgroups, historic NIPS proceedings and other research paper corpora, we show improved performance of PAM in document classification, likelihood of held-out

Replicated softmax: an undirected topic model

by Ruslan Salakhutdinov, Geoffrey Hinton - In Advances in Neural Information Processing Systems
"... We introduce a two-layer undirected graphical model, called a “Replicated Softmax”, that can be used to model and automatically extract low-dimensional latent semantic representations from a large unstructured collection of documents. We present efficient learning and inference algorithms for this m ..."
Abstract - Cited by 67 (14 self) - Add to MetaCart
in terms of both the log-probability of held-out documents and the retrieval accuracy. 1

Mixtures of hierarchical topics with pachinko allocation

by David Mimno, Wei Li, Andrew Mccallum - In Proceedings of the 21st International Conference on Machine Learning , 2007
"... The four-level pachinko allocation model (PAM) (Li & McCallum, 2006) represents correlations among topics using a DAG structure. It does not, however, represent a nested hierarchy of topics, with some topical word distributions representing the vocabulary that is shared among several more specif ..."
Abstract - Cited by 64 (2 self) - Add to MetaCart
improvements in likelihood of held-out documents, as well as mutual information between automatically-discovered topics and humangenerated categories such as journals. 1.

Effective Document-Level Features for Chinese Patent Word Segmentation

by Si Li, Nianwen Xue
"... A patent is a property right for an inven-tion granted by the government to the in-ventor. Patents often have a high con-centration of scientific and technical terms that are rare in everyday language. How-ever, some scientific and technical terms usually appear with high frequency only in one speci ..."
Abstract - Add to MetaCart
specific patent. In this paper, we propose a pragmatic approach to Chinese word segmentation on patents where we train a sequence labeling model based on a group of novel document-level features. Experiments show that the accuracy of our model reached 96.3 % (F1 score) on the de-velopment set and 95

Evaluating topic models for information retrieval

by Xing Yi, James Allan - In Proceedings of CIKM 2008 , 2008
"... We explore the utility of different types of topic models, both probabilistic and not, for retrieval purposes. We show that: (1) topic models are effective for document smoothing; (2) more elaborate topic models that capture topic dependencies provide no additional gains; (3) smoothing documents by ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
by using their similar documents is as effective as smoothing them by using topic models; (4) topics discovered on the whole corpus are too coarse-grained to be useful for query expansion. Experiments to measure topic models ’ ability to predict held-out likelihood confirm past results on small corpora

Accounting for Burstiness in Topic Models

by Gabriel Doyle, Charles Elkan
"... Many different topic models have been used successfully for a variety of applications. However, even state-of-the-art topic models suffer from the important flaw that they do not capture the tendency of words to appear in bursts; it is a fundamental property of language that if a word is used once i ..."
Abstract - Cited by 21 (0 self) - Add to MetaCart
in a document, it is more likely to be used again. We introduce a topic model that uses Dirichlet compound multinomial (DCM) distributions to model this burstiness phenomenon. On both text and non-text datasets, the new model achieves better held-out likelihood than standard latent Dirichlet allocation

Omnifluent TM English-to-French and Russian-to-English Systems for the 2013 Workshop on Statistical Machine Translation

by Evgeny Matusov, Gregor Leusch
"... This paper describes OmnifluentTM Translate – a state-of-the-art hybrid MT system capable of high-quality, high-speed translations of text and speech. The system participated in the English-to-French and Russian-to-English WMT evaluation tasks with competitive results. The features which contributed ..."
Abstract - Add to MetaCart
contributed the most to high translation quality were training data sub-sampling methods, document-specific models, as well as rule-based morphological normalization for Russian. The latter improved the baseline Russian-to-English BLEU score from 30.1 to 31.3 % on a heldout test set. 1

The Polylingual Labeled Topic Model

by Lisa Posch, Arnim Bleier, Markus Strohmaier
"... Abstract. In this paper, we present the Polylingual Labeled Topic Model, a model which combines the characteristics of the existing Polylingual Topic Model and Labeled LDA. The model accounts for multiple lan-guages with separate topic distributions for each language while restrict-ing the permitted ..."
Abstract - Add to MetaCart
the permitted topics of a document to a set of predefined labels. We explore the properties of the model in a two-language setting on a dataset from the social science domain. Our experiments show that our model outperforms LDA and Labeled LDA in terms of their held-out perplexity and that it produces

Vouros. Non-parametric estimation of topic hierarchies from texts with hierarchical dirichlet processes

by Elias Zavitsanos, Georgios Paliouras, Patriarhou Gregoriou, Neapoleos Street, George A. Vouros, David Blei , 2011
"... This paper presents hHDP, a hierarchical algorithm for representing a document collection as a hierarchy of latent topics, based on Dirichlet process priors. The hierarchical nature of the algorithm refers to the Bayesian hierarchy that it comprises, as well as to the hierarchy of the latent topics. ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
This paper presents hHDP, a hierarchical algorithm for representing a document collection as a hierarchy of latent topics, based on Dirichlet process priors. The hierarchical nature of the algorithm refers to the Bayesian hierarchy that it comprises, as well as to the hierarchy of the latent topics
Next 10 →
Results 1 - 10 of 29
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University