Results 1 - 10
of
768
Centroid-Based Summarization of Multiple Documents: Sentence Extraction, Utility-Based Evaluation, and User Studies
, 2000
"... We present a multi-document summarizer, called MEAD, which generates summaries using cluster centroids produced by a topic detection and tracking system. We also des.cdbe two new techniques, based on sentence utility and subsumption, which we have applied to the evaluation of both single and multipl ..."
Abstract
-
Cited by 356 (19 self)
- Add to MetaCart
(Show Context)
We present a multi-document summarizer, called MEAD, which generates summaries using cluster centroids produced by a topic detection and tracking system. We also des.cdbe two new techniques, based on sentence utility and subsumption, which we have applied to the evaluation of both single and multiple document summaries. Finally, we describe two user studies that test our models of multi-document summarization.
Diversifying Search Results
, 2009
"... We study the problem of answering ambiguous web queries in a setting where there exists a taxonomy of information, and that both queries and documents may belong to more than one category according to this taxonomy. We present a systematic approach to diversifying results that aims to minimize the r ..."
Abstract
-
Cited by 283 (5 self)
- Add to MetaCart
We study the problem of answering ambiguous web queries in a setting where there exists a taxonomy of information, and that both queries and documents may belong to more than one category according to this taxonomy. We present a systematic approach to diversifying results that aims to minimize the risk of dissatisfaction of the average user. We propose an algorithm that well approximates this objective in general, and is provably optimal for a natural special case. Furthermore, we generalize several classical IR metrics, including NDCG, MRR, and MAP, to explicitly account for the value of diversification. We demonstrate empirically that our algorithm scores higher in these generalized metrics compared to results produced by commercial search engines.
LexRank: Graph-based lexical centrality as salience in text summarization
- Journal of Artificial Intelligence Research
, 2004
"... We introduce a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing. We test the technique on the problem of Text Summarization (TS). Extractive TS relies on the concept of sentence salience to identify the most important sentences in a doc ..."
Abstract
-
Cited by 266 (9 self)
- Add to MetaCart
(Show Context)
We introduce a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing. We test the technique on the problem of Text Summarization (TS). Extractive TS relies on the concept of sentence salience to identify the most important sentences in a document or set of documents. Salience is typically dened in terms of the presence of particular important words or in terms of similarity to a centroid pseudo-sentence. We consider a new approach, LexRank, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences. In this model, a connectivity matrix based on intra-sentence cosine similarity is used as the adjacency matrix of the graph representation of sentences. Our system, based on LexRank ranked in rst place in more than one task in the recent DUC 2004 evaluation. In this paper we present a detailed analysis of our approach and apply it to a larger data set including data from earlier DUC evaluations. We discuss several methods to compute centrality using the similarity graph. The results show that degree-based methods (including LexRank) outperform both centroid-based methods and other systems participating in DUC in most of the cases. Furthermore, the LexRank with threshold method outperforms the other degree-based techniques including continuous LexRank. We also show that our approach is quite insensitive to the noise in the data that may result from an imperfect topical clustering of documents. 1.
Summarizing Text Documents: Sentence Selection and Evaluation Metrics
- In Research and Development in Information Retrieval
, 1999
"... Human-quality text summarization systems are difficult to design, and even more difficult to evaluate, in part because documents can differ along several dimensions, such as length, writing style and lexical usage. Nevertheless, certain cues can often help suggest the selection of sentences for incl ..."
Abstract
-
Cited by 236 (7 self)
- Add to MetaCart
(Show Context)
Human-quality text summarization systems are difficult to design, and even more difficult to evaluate, in part because documents can differ along several dimensions, such as length, writing style and lexical usage. Nevertheless, certain cues can often help suggest the selection of sentences for inclusion in a summary. This paper presents our analysis of news-article summaries generated by sentence selection. Sentences are ranked for potential inclusion in the summary using a weighted combination of statistical and linguistic features. The statistical features were adapted from standard IR methods. The potential linguistic ones were derived from an analysis of news-wire summaries. Toevaluate these features we use a normalized version of precision-recall curves, with a baseline of random sentence selection, as well as analyze the properties of such a baseline. We illustrate our discussions with empirical results showing the importance of corpus-dependent baseline summarization standards, compression ratios and carefully crafted long queries.
Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval
- In Proceedings of SIGIR
, 2003
"... We present a non-traditional retrieval problem we call subtopic retrieval. The subtopic retrieval problem is concerned with finding documents that cover many different subtopics of a query topic. This means that the utility of a document in a ranking is dependent on other documents in the ranking, v ..."
Abstract
-
Cited by 217 (6 self)
- Add to MetaCart
We present a non-traditional retrieval problem we call subtopic retrieval. The subtopic retrieval problem is concerned with finding documents that cover many different subtopics of a query topic. This means that the utility of a document in a ranking is dependent on other documents in the ranking, violating the assumption of independent relevance which is assumed in most traditional retrieval methods. Subtopic retrieval poses challenges for evaluating performance, as well as for developing effective algorithms. We propose a framework for evaluating subtopic retrieval which generalizes the traditional precision and recall metrics by accounting for intrinsic topic difficulty as well as redundancy in documents. We propose and systematically evaluate several methods for performing subtopic retrieval using statistical language models and a maximal marginal relevance (MMR) ranking strategy. A mixture model combined with query likelihood relevance ranking is shown to modestly outperform a baseline relevance ranking on a data set used in the TREC interactive track.
Functional Analysis and
, 1957
"... the effect of cosmetic essence by independent component ..."
Abstract
-
Cited by 183 (0 self)
- Add to MetaCart
(Show Context)
the effect of cosmetic essence by independent component
Information fusion in the context of multi-document summarization
- In Proceedings of A CL
, 1999
"... We present a method to automatically generate a concise summary by identifying and synthe-sizing similar elements across related text from a set of multiple documents. Our approach is unique in its usage of language generation to reformulate the wording of the summary. 1 ..."
Abstract
-
Cited by 168 (22 self)
- Add to MetaCart
(Show Context)
We present a method to automatically generate a concise summary by identifying and synthe-sizing similar elements across related text from a set of multiple documents. Our approach is unique in its usage of language generation to reformulate the wording of the summary. 1
Less is More -- Probabilistic Models for Retrieving Fewer Relevant Documents
, 2006
"... Traditionally, information retrieval systems aim to maximize the number of relevant documents returned to a user within some window of the top. For that goal, the probability ranking principle, which ranks documents in decreasing order of probability of relevance, is provably optimal. However, there ..."
Abstract
-
Cited by 148 (1 self)
- Add to MetaCart
Traditionally, information retrieval systems aim to maximize the number of relevant documents returned to a user within some window of the top. For that goal, the probability ranking principle, which ranks documents in decreasing order of probability of relevance, is provably optimal. However, there are many scenarios in which that ranking does not optimize for the user’s information need. One example is when the user would be satisfied with some limited number of relevant documents, rather than needing all relevant documents. We show that in such a scenario, an attempt to return many relevant documents can actually reduce the chances of finding any relevant documents. We consider a number of information retrieval metrics from the literature, including the rank of the first relevant result, the %no metric that penalizes a system only for retrieving no relevant results near the top, and the diversity of retrieved results when queries have multiple interpretations. We observe that given a probabilistic model of relevance, it is appropriate to rank so as to directly optimize these metrics in expectation. While doing so may be computationally intractable, we show that a simple greedy optimization algorithm that approximately optimizes the given objectives produces rankings for TREC queries that outperform the standard approach based on the probability ranking principle.
Constrained Parametric Min-Cuts for Automatic Object Segmentation
, 2010
"... We present a novel framework for generating and rankingplausibleobjectshypothesesin animage using bottom-up processes and mid-level cues. The object hypotheses arerepresented as figure-ground segmentations, and are extracted automatically, withoutpriorknowledgeabout properties of individual object c ..."
Abstract
-
Cited by 123 (11 self)
- Add to MetaCart
(Show Context)
We present a novel framework for generating and rankingplausibleobjectshypothesesin animage using bottom-up processes and mid-level cues. The object hypotheses arerepresented as figure-ground segmentations, and are extracted automatically, withoutpriorknowledgeabout properties of individual object classes, by solving a sequence of constrained parametric min-cut problems (CPMC) on a regular image grid. We then learn to rank the object hypotheses by training a continuous model to predict how plausible the segments are, given their mid-level region properties. We show that this algorithm significantly outperforms the state of the art for low-level segmentation in the VOC09 segmentation dataset. It achieves the same average best segmentation covering as the best performing technique to date [2], 0.61 when using just the top 7 ranked segments, instead of the full hierarchy in [2]. Our methodachieves0.78averagebest covering using 154 segments. In a companion paper [18], we also show that the algorithm achieves state-of-the art results when used in a segmentation-based recognition pipeline.