Results 1 - 10
of
12
A Military History of the
- Western World
, 1956
"... Survival outcomes Ejaculate specimens Prostate cancer detection Background: Determining whether men diagnosed with early prostate cancer (PCa) will live long enough to benefit from interventions with curative intent is difficult. Although validated instruments for predicting patient survival are ava ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Survival outcomes Ejaculate specimens Prostate cancer detection Background: Determining whether men diagnosed with early prostate cancer (PCa) will live long enough to benefit from interventions with curative intent is difficult. Although validated instruments for predicting patient survival are available, these do not have clinical utility so are not used routinely in practice. Objective: To test the hypothesis that volunteers who provided ejaculate specimens had a high survival rate at 10 and 15 yr and beyond. Design, setting, and participants: A total of 290 patients investigated because of high serum prostate-specific antigen donated ejaculate specimens for research between January 1992 and May 2003. The median age at the time of ejaculation was 63.5 yr. 153 of the donors were diagnosed with PCa and followed up to December 31, 2013. Outcome measurements and statistical analysis: Survival outcomes were compared
Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality
- In Proc. of the Europ. Chap. of the Assoc. for
, 2014
"... Topic models based on latent Dirichlet al-location and related methods are used in a range of user-focused tasks including doc-ument navigation and trend analysis, but evaluation of the intrinsic quality of the topic model and topics remains an open research area. In this work, we explore the two ta ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Topic models based on latent Dirichlet al-location and related methods are used in a range of user-focused tasks including doc-ument navigation and trend analysis, but evaluation of the intrinsic quality of the topic model and topics remains an open research area. In this work, we explore the two tasks of automatic evaluation of single topics and automatic evaluation of whole topic models, and provide recom-mendations on the best strategy for per-forming the two tasks, in addition to pro-viding an open-source toolkit for topic and topic model evaluation. 1
Automatic Labelling of Topic Models Learned from Twitter by
"... Latent topics derived by topic models such as Latent Dirichlet Allocation (LDA) are the result of hidden thematic structures which provide further insights into the data. The automatic labelling of such topics derived from social media poses however new challenges since topics may characterise novel ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Latent topics derived by topic models such as Latent Dirichlet Allocation (LDA) are the result of hidden thematic structures which provide further insights into the data. The automatic labelling of such topics derived from social media poses however new challenges since topics may characterise novel events happening in the real world. Existing automatic topic la-belling approaches which depend on exter-nal knowledge sources become less appli-cable here since relevant articles/concepts of the extracted topics may not exist in ex-ternal sources. In this paper we propose to address the problem of automatic la-belling of latent topics learned from Twit-ter as a summarisation problem. We in-troduce a framework which apply sum-marisation algorithms to generate topic la-bels. These algorithms are independent of external sources and only rely on the identification of dominant terms in doc-uments related to the latent topic. We compare the efficiency of existing state of the art summarisation algorithms. Our results suggest that summarisation algo-rithms generate better topic labels which capture event-related context compared to the top-n terms returned by LDA. 1
M.: Evaluating topic representations for exploring document collections
- Journal of the Association for Information Science and Technology
"... Topic models have been shown to be a useful way of representing the content of large document collections, for example, via visualization interfaces (topic brows-ers). These systems enable users to explore collections by way of latent topics. A standard way to represent a topic is using a term list; ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Topic models have been shown to be a useful way of representing the content of large document collections, for example, via visualization interfaces (topic brows-ers). These systems enable users to explore collections by way of latent topics. A standard way to represent a topic is using a term list; that is the top-n words with highest conditional probability within the topic. Other topic representations such as textual and image labels also have been proposed. However, there has been no comparison of these alternative representations. In this article, we compare 3 different topic representations in a document retrieval task. Participants were asked to retrieve relevant documents based on predefined queries within a fixed time limit, presenting topics in one of the following modalities: (a) lists of terms, (b) textual phrase labels, and (c) image labels. Results show that textual labels are easier for users to interpret than are term lists and image labels. Moreover, the precision of retrieved documents for textual and image labels is com-parable to the precision achieved by representing topics using term lists, demonstrating that labeling methods are an effective alternative topic representation.
Measuring the Similarity between Automatically Generated Topics
"... Previous approaches to the problem of measuring similarity between automati-cally generated topics have been based on comparison of the topics ’ word probability distributions. This paper presents alterna-tive approaches, including ones based on distributional semantics and knowledge-based measures, ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Previous approaches to the problem of measuring similarity between automati-cally generated topics have been based on comparison of the topics ’ word probability distributions. This paper presents alterna-tive approaches, including ones based on distributional semantics and knowledge-based measures, evaluated by compari-son with human judgements. The best performing methods provide reliable esti-mates of topic similarity comparable with human performance and should be used in preference to the word probability distri-bution measures used previously. 1
unknown title
"... rs aig elin ch ed ity o prop d b terms of both their coherence and associated generality, using a combination of existing and new mea-e disco ent co et alloc metho of the terms used to describe a particular topic, despite the obser-vation that evaluation methods such as perplexity are often not corr ..."
Abstract
- Add to MetaCart
(Show Context)
rs aig elin ch ed ity o prop d b terms of both their coherence and associated generality, using a combination of existing and new mea-e disco ent co et alloc metho of the terms used to describe a particular topic, despite the obser-vation that evaluation methods such as perplexity are often not correlated with human judgements of topic quality (Chang, Boyd-Graber, Gerrish, Wang, & Blei, 2009). However, a number of measures have been proposed in recent years for the measurement iptors can b listic mod example, using the top N highest-ranked terms from an NM basis vector. In our previous work, we generated topics usin LDA and NMF with two particular corpora, where a qua analysis of the corresponding termdescriptors found themost read-ily-interpretable topics to be discovered by NMF (O’Callaghan, Greene, Conway, Carthy, & Cunningham, 2013). An example of the issues we encountered can be illustrated with the following topics thatwere discovered by LDA andNMF for the same value of kwithin a corpus of online news articles (described in further detail in
Summarizing topical contents from PubMed documents using a thematic analysis
"... Improving the search and browsing ex-perience in PubMedr is a key compo-nent in helping users detect information of interest. In particular, when explor-ing a novel field, it is important to pro-vide a comprehensive view for a specific subject. One solution for providing this panoramic picture is to ..."
Abstract
- Add to MetaCart
(Show Context)
Improving the search and browsing ex-perience in PubMedr is a key compo-nent in helping users detect information of interest. In particular, when explor-ing a novel field, it is important to pro-vide a comprehensive view for a specific subject. One solution for providing this panoramic picture is to find sub-topics from a set of documents. We propose a method that finds sub-topics that we refer to as themes and computes representative titles based on a set of documents in each theme. The method combines a thematic clustering algorithm and the Pool Adja-cent Violators algorithm to induce signifi-cant themes. Then, for each theme, a title is computed using PubMed document ti-tles and theme-dependent term scores. We tested our system on five disease sets from OMIMr and evaluated the results based on normalized point-wise mutual informa-tion and MeSHr terms. For both perfor-mance measures, the proposed approach outperformed LDA. The quality of theme titles were also evaluated by comparing them with manually created titles. 1
Text, Topics, and Turkers: A Consensus Measure for Statistical Topics
"... Topic modeling is an important tool in social media anal-ysis, allowing researchers to quickly understand large text corpora by investigating the topics underlying them. One of the fundamental problems of topic models lies in how to assess the quality of the topics from the perspective of human inte ..."
Abstract
- Add to MetaCart
(Show Context)
Topic modeling is an important tool in social media anal-ysis, allowing researchers to quickly understand large text corpora by investigating the topics underlying them. One of the fundamental problems of topic models lies in how to assess the quality of the topics from the perspective of human interpretability. How well can humans understand the meaning of topics generated by statistical topic model-ing algorithms? In this work we advance the study of this question by introducing Topic Consensus: a new measure that calculates the quality of a topic through investigating its consensus with some known topics underlying the data. We view the quality of the topics from three perspectives: 1) topic interpretability, 2) how documents relate to the under-lying topics, and 3) how interpretable the topics are when the corpus has an underlying categorization. We provide in-sights into how well the results of Mechanical Turk match automated methods for calculating topic quality. The prob-ability distribution of the words in the topic best fit the Topic Coherence measure, in terms of both correlation as well as finding the best topics.
PU TrendMiner Consortium This document is part of the TrendMiner research project (No. 287863), partially funded by the FP7-ICT Programme.
, 2013
"... D3.2.1 Clustering models for discovery of regional and demographic variation ..."
Abstract
- Add to MetaCart
(Show Context)
D3.2.1 Clustering models for discovery of regional and demographic variation
leipzig.de Andreas Both
"... halle.de Quantifying the coherence of a set of statements is a long standing problem with many potential applications that has attracted researchers from different sciences. The special case of measuring coherence of topics has been recently stud-ied to remedy the problem that topic models give no g ..."
Abstract
- Add to MetaCart
(Show Context)
halle.de Quantifying the coherence of a set of statements is a long standing problem with many potential applications that has attracted researchers from different sciences. The special case of measuring coherence of topics has been recently stud-ied to remedy the problem that topic models give no guar-anty on the interpretablity of their output. Several bench-mark datasets were produced that record human judgements of the interpretability of topics. We are the first to propose a framework that allows to construct existing word based coherence measures as well as new ones by combining ele-mentary components. We conduct a systematic search of the space of coherence measures using all publicly available topic relevance data for the evaluation. Our results show that new combinations of components outperform existing measures with respect to correlation to human ratings. Finally, we outline how our results can be transferred to further appli-cations in the context of text mining, information retrieval and the world wide web.