Results 1 - 10
of
49
A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization
- In Proc. of SIGIR
, 2006
"... The usual approach for automatic summarization is sentence extraction, where key sentences from the input documents are selected based on a suite of features. While word frequency often is used as a feature in summarization, its impact on system performance has not been isolated. In this paper, we s ..."
Abstract
-
Cited by 56 (11 self)
- Add to MetaCart
(Show Context)
The usual approach for automatic summarization is sentence extraction, where key sentences from the input documents are selected based on a suite of features. While word frequency often is used as a feature in summarization, its impact on system performance has not been isolated. In this paper, we study the contribution to summarization of three factors related to frequency: content word frequency, composition functions for estimating sentence importance from word frequency, and adjustment of frequency weights based on context. We carry out our analysis using datasets from the Document Understanding Conferences, studying not only the impact of these features on automatic summarizers, but also their role in human summarization. Our research shows that a frequency based summarizer can achieve performance comparable to that of state-of-the-art systems, but only with a good composition function; context sensitivity improves performance and significantly reduces repetition.
Exploiting E-mail Structure to Improve Summarization
, 2002
"... This paper presents the design and implementation of a system to summarize e-mail messages. The system exploits two aspects of e-mail, thread reply chains and commonly-found features, to generate summaries. The system uses existing software designed to summarize single text documents. Such software ..."
Abstract
-
Cited by 46 (1 self)
- Add to MetaCart
(Show Context)
This paper presents the design and implementation of a system to summarize e-mail messages. The system exploits two aspects of e-mail, thread reply chains and commonly-found features, to generate summaries. The system uses existing software designed to summarize single text documents. Such software typically performs best on well-authored, formal documents. E-mail messages, however, are typically neither well-authored, nor formal. As a result, existing summarization software gives a poor summary of e-mail messages. To remedy this poor performance, our system pre-processes email messages using heuristics to remove e-mail signatures, header fields, and quoted text from parent messages. We also present a heuristics-based approach to identifying and reporting names, dates, and companies found in e-mail messages. Lastly, we discuss conclusions from a pilot user study of the summarization system, and conclude with areas for further investigation.
Event-Based Extractive Summarization
- In Proceedings of ACL Workshop on Summarization
, 2004
"... Most approaches to extractive summarization define a set of features upon which selection of sentences is based, using algorithms independent of the features themselves. We propose a new set of features based on low-level, atomic events that describe relationships between important actors in a docum ..."
Abstract
-
Cited by 43 (0 self)
- Add to MetaCart
(Show Context)
Most approaches to extractive summarization define a set of features upon which selection of sentences is based, using algorithms independent of the features themselves. We propose a new set of features based on low-level, atomic events that describe relationships between important actors in a document or set of documents. We investigate the effect this new feature has on extractive summarization, compared with a baseline feature set consisting of the words in the input documents, and with state-of-the-art summarization systems. Our experimental results indicate that not only the event-based features offer an improvement in summary quality over words as features, but that this effect is more pronounced for more sophisticated summarization methods that avoid redundancy in the output. 1
A formal model for information selection in multi-sentence text extraction
- In Proceedings of the International Conference on Computational Linguistics (COLING
, 2004
"... Selecting important information while accounting for repetitions is a hard task for both summarization and question answering. We propose a formal model that represents a collection of documents in a two-dimensional space of textual and conceptual units with an associated mapping between these two d ..."
Abstract
-
Cited by 27 (1 self)
- Add to MetaCart
(Show Context)
Selecting important information while accounting for repetitions is a hard task for both summarization and question answering. We propose a formal model that represents a collection of documents in a two-dimensional space of textual and conceptual units with an associated mapping between these two dimensions. This representation is then used to describe the task of selecting textual units for a summary or answer as a formal optimization task. We provide approximation algorithms and empirically validate the performance of the proposed model when used with two very different sets of features, words and atomic events. 1
Multi-dimensional scattered ranking methods for geographic information retrieval
- GeoInformatica
, 2005
"... Geographic Information Retrieval is concerned with retrieving documents in response to a spatially related query. This paper addresses the ranking of documents by both textual and spatial relevance. To this end, we introduce multi-dimensional scattered ranking, where textually and spatially similar ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
(Show Context)
Geographic Information Retrieval is concerned with retrieving documents in response to a spatially related query. This paper addresses the ranking of documents by both textual and spatial relevance. To this end, we introduce multi-dimensional scattered ranking, where textually and spatially similar documents are ranked spread in the list, instead of consecutively. The effect of this is that documents close together in the ranked list have less redundant information. We present various ranking methods of this type, efficient algorithms to implement them, and experiments to show the outcome of the methods.
Language Models for Hierarchical Summarization
, 2003
"... Hierarchies have long been used for organization, summarization, and access to information. In this dissertation we define summarization in terms of a probabilistic language model and use this definition to explore a new technique for automatically generating topic hierarchies. We use the language ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Hierarchies have long been used for organization, summarization, and access to information. In this dissertation we define summarization in terms of a probabilistic language model and use this definition to explore a new technique for automatically generating topic hierarchies. We use the language model to characterize the documents that will be summarized and then apply a graph-theoretic algorithm to determine the best topic words for the hierarchical summary. This work is very different from previous attempts to generate topic hierarchies because it relies on statistical analysis and language modeling to identify descriptive words for a document and organize the words in a hierarchical structure. We compare
World Wide Web Site Summarization
- Web Intelligence and Agent Systems: An International Journal
, 2002
"... Summaries of Web sites help Web users get an idea of the site contents without having to spend time browsing the sites. Currently, manually constructed summaries of Web sites by volunteer experts are available, such as the DMOZ Open Directory Project. This research is directed towards automating the ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
(Show Context)
Summaries of Web sites help Web users get an idea of the site contents without having to spend time browsing the sites. Currently, manually constructed summaries of Web sites by volunteer experts are available, such as the DMOZ Open Directory Project. This research is directed towards automating the Web site summarization task. To achieve this objective, an approach which applies machine learning and natural language processing techniques is developed to summarize a Web site automatically. The information content of the automatically generated summaries is compared, via a formal evaluation process involving human subjects, to DMOZ summaries, home page browsing and time-limited site browsing, for a number of academic and commercial Web sites. Statistical evaluation of the scores of the answers to a list of questions about the sites demonstrates that the automatically generated summaries convey the same information to the reader as DMOZ summaries do, and more information than the two browsing options. 1
Term-Based Clustering and Summarization of Web Page Collections
- In Advances in Artificial Intelligence, Proceedings of the Seventeenth Conference of the Canadian Society for Computational Studies of Intelligence
, 2004
"... Abstract. Effectively summarizing Web page collections becomes more and more critical as the amount of information continues to grow on the World Wide Web. A concise and meaningful summary of a Web page collection, which is generated automatically, can help Web users understand the essential topics ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
(Show Context)
Abstract. Effectively summarizing Web page collections becomes more and more critical as the amount of information continues to grow on the World Wide Web. A concise and meaningful summary of a Web page collection, which is generated automatically, can help Web users understand the essential topics and main contents covered in the collection quickly without spending much browsing time. However, automatically generating coherent summaries as good as human-authored summaries is a challenging task since Web page collections often contain diverse topics and contents. This research aims towards clustering of Web page collections using automatically extracted topical terms, and automatic summarization of the resulting clusters. We experiment with word- and term-based representations of Web documents and demonstrate that term-based clustering significantly outperforms word-based clustering with much lower dimensionality. The summaries of computed clusters are informative and meaningful, which indicates that clustering and summarization of large Web page collections is promising for alleviating the information overload problem. 1
Relevance of Cluster size in MMR based Summarizer: A Report 11-742: Self-paced lab in Information Retrieval
"... Maximal Marginal Relevance Multi Document (MMR-MD) uses passage clustering to choose passages with large coverage and to aid in reducing redundancy. It is expected that the Quality of Summary (QoS) would directly depend on the cluster granularity. The objective of this work is to study the relevance ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
Maximal Marginal Relevance Multi Document (MMR-MD) uses passage clustering to choose passages with large coverage and to aid in reducing redundancy. It is expected that the Quality of Summary (QoS) would directly depend on the cluster granularity. The objective of this work is to study the relevance of granularity of passage clusters towards QoS, in the context of MMR-MD summarizer. This has been done, and the results on the Document Understanding Conference (DUC-2002) data set are reported. This report also presents an overview of extractive summarization methods, features useful in selecting summary sentences, and strategies and metrics for evaluating summaries. Observations on passage clustering by bottom-up approach, followed by results of the study on QoS versus cluster-granularity are then presented. Based on the observations from this study, a new method for extractive summarization is also proposed for future work. 1 Problem Statement Maximal Marginal Relevance summarization presented in [6] is a cluster-based, extractive summarization method, where passages are first clustered based on similarity, prior to the selection of passages that form the extractive summary of the documents. Passage clustering forms a main component in this system that aims to extract the most relevant sentences of the documents at the same time keeping the summary nonredundant. The goal of this work is to study how the quality of the summary varies with the granularity of clusters. In this report we present the conclusions of this study, after presenting an introduction to automatic document summarization and an overview of the methods of summarization from current literature. OVERVIEW OF SUMMARIZATION METHODS 2
A Comparison of Keyword- and Keyterm-Based Methods for Automatic Web Site Summarization
- In Technical Report WS-04-01, Papers from the AAAI’04 Workshop on Adaptive Text Extraction and Mining
, 2004
"... ..."
(Show Context)