Results 1  10
of
77
Topics over time: A nonMarkov continuoustime model of topical trends
 in SIGKDD
, 2006
"... This paper presents an LDAstyle topic model that captures not only the lowdimensional structure of data, but also how the structure changes over time. Unlike other recent work that relies on Markov assumptions or discretization of time, here each topic is associated with a continuous distribution ..."
Abstract

Cited by 251 (10 self)
 Add to MetaCart
This paper presents an LDAstyle topic model that captures not only the lowdimensional structure of data, but also how the structure changes over time. Unlike other recent work that relies on Markov assumptions or discretization of time, here each topic is associated with a continuous distribution over timestamps, and for each generated document, the mixture distribution over topics is influenced by both word cooccurrences and the document’s timestamp. Thus, the meaning of a particular topic can be relied upon as constant, but the topics ’ occurrence and correlations change significantly over time. We present results on nine months of personal email, 17 years of NIPS research papers and over 200 years of presidential stateoftheunion addresses, showing improved topics, better timestamp prediction, and interpretable trends.
A bayesian framework for word segmentation: Exploring the effects of context
 In 46th Annual Meeting of the ACL
, 2009
"... Since the experiments of Saffran et al. (1996a), there has been a great deal of interest in the question of how statistical regularities in the speech stream might be used by infants to begin to identify individual words. In this work, we use computational modeling to explore the effects of differen ..."
Abstract

Cited by 110 (30 self)
 Add to MetaCart
Since the experiments of Saffran et al. (1996a), there has been a great deal of interest in the question of how statistical regularities in the speech stream might be used by infants to begin to identify individual words. In this work, we use computational modeling to explore the effects of different assumptions the learner might make regarding the nature of words – in particular, how these assumptions affect the kinds of words that are segmented from a corpus of transcribed childdirected speech. We develop several models within a Bayesian ideal observer framework, and use them to examine the consequences of assuming either that words are independent units, or units that help to predict other units. We show through empirical and theoretical results that the assumption of independence causes the learner to undersegment the corpus, with many two and threeword sequences (e.g. what’s that, do you, in the house) misidentified as individual words. In contrast, when the learner assumes that words are predictive, the resulting segmentation is far more accurate. These results indicate that taking context into account is important for a statistical word segmentation strategy to be successful, and raise the possibility that even young infants may be able to exploit more subtle statistical patterns than have usually been considered. 1
Clustering documents with an exponentialfamily approximation of the dirichlet compound multinomial distribution
 In ICML
, 2006
"... The Dirichlet compound multinomial (DCM) distribution, also called the multivariate Polya distribution, is a model for text documents that takes into account burstiness: the fact that if a word occurs once in a document, it is likely to occur repeatedly. We derive a new family of distributions that ..."
Abstract

Cited by 53 (2 self)
 Add to MetaCart
(Show Context)
The Dirichlet compound multinomial (DCM) distribution, also called the multivariate Polya distribution, is a model for text documents that takes into account burstiness: the fact that if a word occurs once in a document, it is likely to occur repeatedly. We derive a new family of distributions that are approximations to DCM distributions and constitute an exponential family, unlike DCM distributions. We use these socalled EDCM distributions to obtain insights into the properties of DCM distributions, and then derive an algorithm for EDCM maximumlikelihood training that is many times faster than the corresponding method for DCM distributions. Next, we investigate expectationmaximization with EDCM components and deterministic annealing as a new clustering algorithm for documents. Experiments show that the new algorithm is competitive with the best methods in the literature, and superior from the point of view of finding models with low perplexity. 1.
Topic models over text streams: a study of batch and online unsupervised learning
 In Proc. 7th SIAM Int’l. Conf. on Data Mining
"... Topic modeling techniques have widespread use in text data mining applications. Some applications use batch models, which perform clustering on the document collection in aggregate. In this paper, we analyze and compare the performance of three recentlyproposed batch topic models—Latent Dirichlet A ..."
Abstract

Cited by 35 (1 self)
 Add to MetaCart
(Show Context)
Topic modeling techniques have widespread use in text data mining applications. Some applications use batch models, which perform clustering on the document collection in aggregate. In this paper, we analyze and compare the performance of three recentlyproposed batch topic models—Latent Dirichlet Allocation (LDA), Dirichlet Compound Multinomial (DCM) mixtures and vonMises Fisher (vMF) mixture models. In cases where offline clustering on complete document collections is infeasible due to resource and responserate constraints, online unsupervised clustering methods that process incoming data incrementally are necessary. To this end, we propose online variants of vMF, EDCM and LDA. Experiments on large realworld document collections, in both the offline and online settings, demonstrate that though LDA is a good model for finding wordlevel topics, vMF finds better documentlevel topic clusters more efficiently, which is often important in text mining applications. Finally, we propose a practical heuristic for hybrid topic modeling, which learns online topic models on streaming text and intermittently runs batch topic models on aggregated documents offline. Such a hybrid model is useful for several applications (e.g., dynamic topicbased aggregation of usergenerated content in social networks) that need a good tradeoff between the performance of batch offline algorithms and efficiency of incremental online algorithms. 1
DivRank: the Interplay of Prestige and Diversity in Information Networks
, 2010
"... Information networks are widely used to characterize the relationships between data items such as text documents. Many important retrieval and mining tasks rely on ranking the data items based on their centrality or prestige in the network. Beyond prestige, diversity has been recognized as a crucial ..."
Abstract

Cited by 27 (3 self)
 Add to MetaCart
Information networks are widely used to characterize the relationships between data items such as text documents. Many important retrieval and mining tasks rely on ranking the data items based on their centrality or prestige in the network. Beyond prestige, diversity has been recognized as a crucial objective in ranking, aiming at providing a nonredundant and high coverage piece of information in the top ranked results. Nevertheless, existing networkbased ranking approaches either disregard the concern of diversity, or handle it with nonoptimized heuristics, usually based on greedy vertex selection. We propose a novel ranking algorithm, DivRank,based on a reinforced random walk in an information network. This model automatically balances the prestige and the diversity of the top ranked vertices in a principled way. DivRank not only has a clear optimization explanation, but also well connects to classical models in mathematics and network science. We evaluate DivRank using empirical experiments on three different networks as well as a text summarization task. DivRank outperforms existing networkbased ranking methods in terms of enhancing diversity in prestige.
Accounting for Burstiness in Topic Models
"... Many different topic models have been used successfully for a variety of applications. However, even stateoftheart topic models suffer from the important flaw that they do not capture the tendency of words to appear in bursts; it is a fundamental property of language that if a word is used once i ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
(Show Context)
Many different topic models have been used successfully for a variety of applications. However, even stateoftheart topic models suffer from the important flaw that they do not capture the tendency of words to appear in bursts; it is a fundamental property of language that if a word is used once in a document, it is more likely to be used again. We introduce a topic model that uses Dirichlet compound multinomial (DCM) distributions to model this burstiness phenomenon. On both text and nontext datasets, the new model achieves better heldout likelihood than standard latent Dirichlet allocation (LDA). It is straightforward to incorporate the DCM extension into topic models that are more complex than LDA. 1.
Informationbased models for ad hoc IR
 In SIGIR’10, conference on Research and development in information retrieval
, 2010
"... We introduce in this paper the family of informationbased models for ad hoc information retrieval. These models draw their inspiration from a longstanding hypothesis in IR, namely the fact that the difference in the behaviors of a word at the document and collection levels brings information on th ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
(Show Context)
We introduce in this paper the family of informationbased models for ad hoc information retrieval. These models draw their inspiration from a longstanding hypothesis in IR, namely the fact that the difference in the behaviors of a word at the document and collection levels brings information on the significance of the word for the document. This hypothesis has been exploited in the 2Poisson mixture models, in the notion of eliteness in BM25, and more recently in DFR models. We show here that, combined with notions related to burstiness, it can lead to simpler and better models.
PLISS: Detecting and Labeling Places Using Online ChangePoint Detection
"... Sequence Segmentation), a novel technique for place recognition and categorization from visual cues. PLISS operates on video or image streams and works by segmenting it into pieces corresponding to distinct places in the environment. An online Bayesian changepoint detection framework that detects c ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
(Show Context)
Sequence Segmentation), a novel technique for place recognition and categorization from visual cues. PLISS operates on video or image streams and works by segmenting it into pieces corresponding to distinct places in the environment. An online Bayesian changepoint detection framework that detects changes to model parameters is used to segment the image stream. Unlike current place recognition methods, in addition to using previously learned place models for labeling, PLISS can also detect and learn a previously unknown place or place category in an online manner. Moreover, since both the inferred boundaries of places (changepoints) and the place labels are fully probabilistic, they can indicate when the inference is uncertain. New places and categories are detected using a systematic statistical hypothesis testing framework. We present extensive experiments on a large and difficult image dataset. We validate our claims by comparing results obtained using different types of features and by comparing results from PLISS against the state of the art. I.
Latent Class Models for Algorithm Portfolio Methods
"... Different solvers for computationally difficult problems such as satisfiability (SAT) perform best on different instances. Algorithm portfolios exploit this phenomenon by predicting solvers ’ performance on specific problem instances, then shifting computational resources to the solvers that appear ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
Different solvers for computationally difficult problems such as satisfiability (SAT) perform best on different instances. Algorithm portfolios exploit this phenomenon by predicting solvers ’ performance on specific problem instances, then shifting computational resources to the solvers that appear best suited. This paper develops a new approach to the problem of making such performance predictions: natural generative models of solver behavior. Two are proposed, both following from an assumption that problem instances cluster into latent classes: a mixture of multinomial distributions, and a mixture of Dirichlet compound multinomial distributions. The latter model extends the former to capture burstiness, the tendency of solver outcomes to recur. These models are integrated into an algorithm portfolio architecture and used to run standard SAT solvers on competition benchmarks. This approach is found competitive with the most prominent existing portfolio, SATzilla, which relies on domainspecific, handselected problem features; the latent class models, in contrast, use minimal domain knowledge. Their success suggests that these models can lead to more powerful and more general algorithm portfolio methods.