| , Stanford, California, March 1998. American Association for Artificial Intelligence. |
....information retrieval (IR) In the context of IR, summarization has at least two possibilities: it can be used in preprocessing or in postprocessing. Many researchers have dealt with the latter, by providing the user with query focused summaries of retrieved documents for ecient relevance preview [1, 5, 12, 22, 24]. In contrast, little work has been reported regarding the former, which involves the use of generic summaries instead of fulltexts as the source of document index terms to improve retrieval e ectiveness and or eciency. For convenience, we treat full text as a single word in this paper. While ....
....our experiments suggest that the term re nement e ect is in fact the dominant factor (See Section 4.1) 2. 3 Compression Ratio The compression ratio (CR) of a summary is the ratio of the summary length to the corresponding fulltext length, as measured in characters [22] words [1] or sentences [5]. Morris et al. reported that their 20 and 30 summaries were as informative as the fulltexts for a task of solving English reading comprehension questions [9] Although this work is sometimes misquoted as if they discovered the optimal CR for any kind of text for any purpose [1, 27] it is not ....
[Article contains additional citation context not shown here]
Jing, H., Barzilay, R., McKeown, K. and Elhadad, M.: Summarization Evaluation Methods: Experiments and Analysis, Working Notes of the AAAI-98 Spring Symposium on Intelligent Text Summarization, pp. 60-68 (1998).
....difficulty lies in translating the results into ranking of the summaries. Both methods are thoroughly reviewed in the TIPSTER 2 project. Attempts to compare different methods using these evaluation tools concluded that improvements are required in order to establish differences between summarizers [10]. For the ideal based analysis Hongyan et al. [10] suggested that the binary precision and recall measures are too sensitive to provide a robust evaluation, and, for the task based analysis, the measures used do not translate to any indicative measure of quality. Among the various attempts to ....
....of the summaries. Both methods are thoroughly reviewed in the TIPSTER 2 project. Attempts to compare different methods using these evaluation tools concluded that improvements are required in order to establish differences between summarizers [10] For the ideal based analysis Hongyan et al. [10] suggested that the binary precision and recall measures are too sensitive to provide a robust evaluation, and, for the task based analysis, the measures used do not translate to any indicative measure of quality. Among the various attempts to improve these evaluation techniques, D. Marcu [14] ....
[Article contains additional citation context not shown here]
H. Jing, R. Barzilay, C. McKeown, and M. Elhadad. Summarization evaluation methods: Experiments and analysis, 1998.
....textual cohesion, balance and coverage, it is possible to produce domain independent summaries that are indicative [12] 3. Summary Evaluation In order to compare the quality of summaries produced by different ATS systems it is important to have some form of standard evaluation. Hongyan et al. [5] suggests that one of the main failings in the field of automatic summarisation is the lack of just such a methodology. Many developers adopt non standard techniques that are only suitable for their particular implementation making direct comparison across systems impossible. However, this does ....
Hongyan, J., Barzilay, R., McKeown, C. Elhadad, M. (1998). Summarization Evaluation Methods: Experiments and Analysis. In AAAI Spring Symposium on Intelligent Text Summarization, Technical Report SS-98-06 of the AAAI, pp. 60-68. Stanford, CA, USA.
....has been to measure the similarity between summaries that are produced automatically and by hand. However, this evaluation method has been criticized because it assumes that there is only one correct summary. A taskbased evaluation scheme has been recently adopted as new way of evaluating summaries(Jing et al. 1998; Mani et al. 1998; Tombros and Sanderson, 1998) It evaluates the performance of a summarization system in a given task, such as information retrieval and text categorization. This paper compares ten different summarization methods based on information retrieval tasks. To evaluate the system ....
.... keep the percentage of documents relevant to the query higher than 50 because a smaller number of relevant documents would make the results of the 1 We admit that we should make more thorough experiments with multiple summary lengths, since different summary lengths will yield different results(Jing et al. 1998; Mittal et al. 1999) 2 BMIR J2 was constructed by the SIG Database Systems of the Information Processing Society of Japan, in collaboration with the Real World Computing Partnership. experiments less reliable. The average length of the queries is 3.2 words, and the average length of the ....
[Article contains additional citation context not shown here]
Jing, H., R. Barzilay, K. McKeown, and M. Elhadad, 1998. Summarization Evaluation Methods: Experiments and Analysis. In Intelligent Text Summarization. AAAI Press, pages 51--59.
....for generic summaries (an indicative summary) 3) establishing whether summaries can answer a specified set of questions (an informative summary) by comparison to an ideal summary. In each task, the summaries were rated in terms of confidence in decision, intelligibility and length. Jing et al. [10] performed a pilot experiment (40 sentences) in which they examined the precision recall performance of three summarization systems. They found that different systems achieved their best performance at different lengths (compression ratios) They also found the same results for determining ....
Jing, H., Barzilay, R., McKeown, K., and Elhadad, M. Summarization evaluation methods experiments and analysis. In AAAI Intelligent Text Summarization Workshop (Stanford, CA, Mar. 1998), pp. 60--68.
.... unseen trigrams) using the Good Turing estimate [9] For the final two steps we used the publicly available CMU Cambridge Language Modelling Toolkit [4] 6 Evaluation Summarization research has grappled for years with the issue of how to perform a rigorous evaluation of a summarization system [8, 10, 12]. We have not solved that problem here, but nonetheless present a series of quantitative and qualitative assessments of the functionality of the various components of ocelot. 6.1 Measuring word overlap We begin by examining the behavior of the simplest of the proposed gisting algorithms. To this ....
Jing, H., Barzilay, R., McKeown, K., and Elhadad, M. Summarization evaluation methods experiments and analysis. In AAAI Intelligent Text Summarization Workshop (Mar. 1998), pp. 60--68.
.... research is notorious for its lack of adequate corpora, a situation that prevents rapid progress in the field: today, there exist only a few small collections of texts whose units have been manually annotated for textual importance [Edmundson, 1968, Kupiec et al. 1995, Teufel and Moens, 1997, Jing et al. 1998, Marcu, 1999] Given the cost and tediousness of the annotation process, it is very unlikely that we will ever manually annotate for textual importance sufficiently large corpora. To circumvent this problem, we have developed an algorithm that constructs such corpora automatically. 1.2 Towards ....
Hongyan Jing, Regina Barzilay, Kathleen McKeown, and Michael Elhadad. Summarization evaluation methods: Experiments and analysis. In Proceedings of the AAAI--98 Spring Symposium on Intelligent Text Summarization, pages 60--68, Stanford, 1998.
....information is repeated in the articles, 3) the ability for high compression ratios from the selection of multiple articles to summarize, and (4) an inherent temporal dimension due to the nature of news. Furthermore, there has been a focus on single document summarization for newswire articles [17, 23]. The second domain will be that of the internet. Due to the diverse nature of web pages and their varying amount of useful information content, a multi document summarization engine could assist users in rapidly locating pages and sections of interest. Reports indicate that users tend not to ....
....for generic summaries (an indicative summary) 3) establishing whether summaries can answer a specified set of questions (an informative summary) by comparison to an ideal summary. In each task, the summaries were rated in terms of confidence in decision, intelligibility and length. Jing et al. [17] performed a pilot experiment (40 sentences) in which they examined the precision recall performance of three summarization systems. They found that different systems achieved their best performance at different lengths (compression ratios) They also found the same results for determining ....
[Article contains additional citation context not shown here]
H. Jing, R. Barzilay, K. McKeown, and M. Elhadad. Summarization evaluation methods experiments and analysis. In AAAI Intelligent Text Summarization Workshop, pages 60--68, Stanford, CA, Mar. 1998.
....(an indicative summary) 3) establishing whether summaries can answer a specified set of questions (an informative summary) by comparison to a human generated model summary. In each task, the summaries were rated in terms of confidence in decision, intelligibility and length. Jing et al. [10] performed a pilot experiment (for 40 sentence articles) in which they examined the precisionrecall performance of three summarization systems. They found that different systems achieved their best performance at different lengths (compression ratios) They also found the same results for ....
Jing, H., Barzilay, R., McKeown, K., and Elhadad, M. Summarization evaluation methods experiments and analysis. In AAAI Intelligent Text Summarization Workshop (Stanford, CA, Mar. 1998), pp. 60--68.
No context found.
, Stanford, California, March 1998. American Association for Artificial Intelligence.
No context found.
H. Jing, R. Barzilay, K. McKeown, and M. Elhadad. 1998. Summarization evaluation methods: Experiments and analysis. In AAAI Symposium on Intelligent Summarization.
No context found.
H. Jing, R. Barzilay, K. McKeown, and M. Elhadad. Summarization evaluation methods: Experiments and analysis. In AAAI Symposium on Intelligent Summarization, 1998.
....based summarization algorithm at its core. To judge the implemented system s performance, we performed a ranking evaluation that tests our performance against two other systems. This type of evaluation nicely avoids the dicult problem of having to produce canonical summaries from human subjects [Jing et al..1998] and having to reconcile di erent, but equally ideal summaries. We chose the lead based method and a TF IDF based method as the two competing techniques. We collected a set of ten new long test articles for the evaluation, which we partitioned into two sets of ve. Because of time limitations, we ....
Hongyan Jing, Regina Barzilay, Kathleen McKeown, and Michael Elhadad. Summarization Evaluation Methods: Experiments and Analysis. In The Working Notes of AAAI Symposium on Intelligent Summarization, Stanford University, CA, March 1998.
....among human subjects, 40 documents were selected; for each document, 10 summaries were constructed by 5 human subjects using sentence extraction. Each subject constructed 2 summaries of a document: one at 10 length and the other at 20 1 . For convenience, percent of length 1 According to [14] ideal summary based evaluation is ex8 was computed in terms of number of sentences. A total of 16 summaries were produced for each document. The documents were selected from the TREC collection [5] They are news articles on computers, terrorism, hypnosis and nuclear treaties. The average length ....
....chains as a knowledge source for sentence extraction. As discussed in the introduction, we expect that combining lexical chains with additional knowledge sources will improve the precision and recall of the system. More extensive evaluation results, both intrinsic and task based are provided in [14] and [1] 5 Limitations and Future Work We have identified the following main problems with our method: ffl Sentence granularity: all our methods extract whole sentences as single units. This has several drawbacks: long sentences have significantly higher likelihood to be selected, they also ....
Hong-Yan Jing, Regina Barzilay, Kathleen McKeown, and Michael Elhadad. Summarization 11 evaluation methods: Experiments and analysis. In Proceedings of AAAI-98 Symposium, 1998 [to appear].
....have only 46 overlap. The two most probable reasons for this high percent agreement are the style of the TREC articles and our restriction on uniform length. Statistical Significance Using the same methodology as in (Passonneau Litman 1993; Hearst 1994; Marcu 1997) we applied 1 According to (Jing et al. 1998) ideal summary based evaluation is extremely sensitive to the required summary length. Therefore, we use an evaluation of 10 and 20 summaries in order to decrease the bias of the length factor. Microsoft Lexical Chain Prec Recall Prec Recall 10 33 37 61 67 20 32 39 47 64 Table 2: ....
....chains as a knowledge source for sentence extraction. As discussed in the introduction, we expect that combining lexical chains with additional knowledge sources will improve the precision and recall of the system. More extensive evaluation results, both intrinsic and task based are provided in (Jing et al. 1998) and (Barzilay 1997) 5 Limitations and Future Work We have identified the following main problems with our method: Sentence granularity: all our methods extract whole sentences as single units. This has several drawbacks: long sentences have significantly higher likelihood to be selected, ....
Jing, H.; Barzilay, R.; McKeown, K.; and Elhadad, M. 1998. Summarization evaluation methods: Experiments and analysis. In Proceedings of AAAI-98 Symposium. Stanford University, Stanford, California: American Association for Artificial Intelligence.
No context found.
Hongyan Jing, Regina Barzilay, Kathleen McKeown, and Michael Elhadad, Summarization Evaluation Methods: Experiments and Analysis, In Working Notes, AAAI Spring Symposium on Intelligent Text Summarization, Stanford, CA, April 1998.
No context found.
Hongyan Jing, Regina Barzilay, Kathleen McKeown, and Michael Elhadad. 1998. Summarization evaluation methods: Experiments and analysis. In Proceedings of the AAAI Symposium on Intelligent Summarization, Stanford University, CA, March 23-25.
No context found.
Hongyan Jing, Regina Barzilay, Kathleen McKeown, and Michael Elhadad. Summarization evaluation methods: Experiments and analysis. In AAAI Symposium on Intelligent Summarization, 1998.
No context found.
Hongyan Jing, Kathleen McKeown, Regina Barzilay, and Michael Elhadad. Summarization Evaluation Methods: Experiments and Analysis. In Intelligent Text Summarization. Papers from the 1998.
No context found.
H. Jing, R. Barzilay,K.McKeown, and M. Elhadad. Summarization Evaluation Methods: Experiments and Analysis. Working notes, AAAI Spring Symposium on Intelligent Text Summarization, Stanford, CA, April, 1998.
No context found.
Hongyan Jing, Regina Barzilay, Kathleen McKeown, and Michael Elhadad. Summarization evaluation methods: Experiments and analysis. In Symposium on Intelligent Text Summarization, pages 60-68, Stanford, California, March 1998. American Association for Articial Intelligence.
No context found.
Hongyan Jing, Regina Barzilay, Kathleen McKeown, and Michael Elhadad. Summarization evaluation methods: Experiments and analysis. In Symposium on Intelligent Text Summarization, pages 60--68, Stanford, California, March 1998. American Association for Artificial Intelligence.
No context found.
Hongyan Jing, Regina Barzilay, Kathleen McKeown, and Michael Elhadad. 1998. Summarization evaluation methods: Experiments and analysis. In Proceedings of the AAAI--98 Spring Symposium on Intelligent Text Summarization, pages 60--68, Stanford, March 23--25.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC