Results 1 - 10
of
79
Individual and domain adaptation in sentence planning for dialogue
- Journal of Artificial Intelligence Research (JAIR
, 2007
"... One of the biggest challenges in the development and deployment of spoken dialogue systems is the design of the spoken language generation module. This challenge arises from the need for the generator to adapt to many features of the dialogue domain, user population, and dialogue context. A promisin ..."
Abstract
-
Cited by 20 (5 self)
- Add to MetaCart
(Show Context)
One of the biggest challenges in the development and deployment of spoken dialogue systems is the design of the spoken language generation module. This challenge arises from the need for the generator to adapt to many features of the dialogue domain, user population, and dialogue context. A promising approach is trainable generation, which uses general-purpose linguistic knowledge that is automatically adapted to the features of interest, such as the application domain, individual user, or user group. In this paper we present and evaluate a trainable sentence planner for providing restaurant information in the MATCH dialogue system. We show that trainable sentence planning can produce complex information presentations whose quality is comparable to the output of a templatebased generator tuned to this domain. We also show that our method easily supports adapting the sentence planner to individuals, and that the individualized sentence planners generally perform better than models trained and tested on a population of individuals. Previous work has documented and utilized individual preferences for content selection, but to our knowledge, these results provide the first demonstration of individual preferences for sentence planning operations, affecting the content order, discourse structure and sentence structure of system responses. Finally, we evaluate the contribution of different feature sets, and show that, in our application, n-gram features often do as well as features based on higher-level linguistic representations. 1.
Extrinsic Summarization Evaluation: A Decision Audit Task
"... Abstract. In this work we describe a large-scale extrinsic evaluation of automatic speech summarization technologies for meeting speech. The particular task is a decision audit, wherein a user must satisfy a complex information need, navigating several meetings in order to gain an understanding of h ..."
Abstract
-
Cited by 19 (7 self)
- Add to MetaCart
Abstract. In this work we describe a large-scale extrinsic evaluation of automatic speech summarization technologies for meeting speech. The particular task is a decision audit, wherein a user must satisfy a complex information need, navigating several meetings in order to gain an understanding of how and why a given decision was made. We compare the usefulness of extractive and abstractive technologies in satisfying this information need, and assess the impact of automatic speech recognition (ASR) errors on user performance. We employ several evaluation methods for participant performance, including post-questionnaire data, human subjective and objective judgments, and an analysis of participant browsing behaviour. 1
A publicly available annotated corpus for supervised email summarization
, 2008
"... Annotated email corpora are necessary for evaluation and training of machine learning summarization techniques. The scarcity of corpora has been a limiting factor for research in this field. We describe our process of creating a new annotated email thread corpus that will be made publicly available. ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
Annotated email corpora are necessary for evaluation and training of machine learning summarization techniques. The scarcity of corpora has been a limiting factor for research in this field. We describe our process of creating a new annotated email thread corpus that will be made publicly available. We present the trade-offs of the different annotation methods that could be used.
Cross-language document summarization based on machine translation quality prediction
- Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
"... Cross-language document summarization is a task of producing a summary in one language for a document set in a different language. Existing methods simply use machine translation for document translation or summary translation. However, current machine translation services are far from satisfactory, ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
(Show Context)
Cross-language document summarization is a task of producing a summary in one language for a document set in a different language. Existing methods simply use machine translation for document translation or summary translation. However, current machine translation services are far from satisfactory, which results in that the quality of the cross-language summary is usually very poor, both in readability and content. In this paper, we propose to consider the translation quality of each sentence in the English-to-Chinese cross-language summarization process. First, the translation quality of each English sentence in the document set is predicted with the SVM regression method, and then the quality score of each sentence is incorporated into the summarization process. Finally, the English sentences with high translation quality and high informativeness are selected and translated to form the Chinese summary. Experimental results demonstrate the effectiveness and usefulness of the proposed approach. 1
Using Graded-Relevance Metrics for Evaluating Community QA Answer Selection
"... Community Question Answering (CQA) sites such as Yahoo! Answers have emerged as rich knowledge resources for information seekers. However, answers posted to CQA sites can be irrelevant, incomplete, redundant, incorrect, biased, ill-formed or even abusive. Hence, automatic selection of “good ” answer ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
(Show Context)
Community Question Answering (CQA) sites such as Yahoo! Answers have emerged as rich knowledge resources for information seekers. However, answers posted to CQA sites can be irrelevant, incomplete, redundant, incorrect, biased, ill-formed or even abusive. Hence, automatic selection of “good ” answers for a given posted question is a practical research problem that will help us manage the quality of accumulated knowledge. One way to evaluate answer selection systems for CQA would be to use the Best Answers (BAs) that are readily available from the CQA sites. However, BAs may be biased, and even if they are not, there may be other good answers besides BAs. To remedy these two problems, we propose system evaluation methods that involve multiple answer assessors and graded-relevance information retrieval metrics. Our main findings from experiments using the NTCIR-8 CQA task data are that, using our evaluation methods, (a) we can detect many substantial differences between systems that would have been overlooked by BA-based evaluation; and (b) we can better identify hard questions (i.e. those that are handled poorly by many systems and therefore require focussed investigation) compared to BAbased evaluation. We therefore argue that our approach is useful for building effective CQA answer selection systems despite the cost of manual answer assessments.
Comparative Document Summarization via Discriminative Sentence Selection
"... Given a collection of document groups, a quick question is what are the differences in these groups. In this paper, we study a novel problem of summarizing the differences between document groups. A discriminative sentence selection method is proposed to extract the most discriminative sentences whi ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
Given a collection of document groups, a quick question is what are the differences in these groups. In this paper, we study a novel problem of summarizing the differences between document groups. A discriminative sentence selection method is proposed to extract the most discriminative sentences which represent the specific characteristics of each document group. Experiments on real world data sets demonstrate the effectiveness of our proposed method.
Trainable speaker-based referring expression generation
- In Twelfth Conference on Computational Natural Language Learning (CoNLL2008
, 2008
"... Previous work in referring expression generation has explored general purpose techniques for attribute selection and surface realization. However, most of this work did not take into account: a) stylistic differences between speakers; or b) trainable surface realization approaches that combine seman ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
(Show Context)
Previous work in referring expression generation has explored general purpose techniques for attribute selection and surface realization. However, most of this work did not take into account: a) stylistic differences between speakers; or b) trainable surface realization approaches that combine semantic and word order information. In this paper we describe and evaluate several end-to-end referring expression generation algorithms that take into consideration speaker style and use data-driven surface realization techniques. 1
Document summarization via guided sentence compression
, 2013
"... Joint compression and summarization has been used recently to generate high quality summaries. However, such word-based joint optimization is computationally expensive. In this paper we adopt the ‘sentence compression + sentence selection ’ pipeline approach for compressive summarization, but propos ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
(Show Context)
Joint compression and summarization has been used recently to generate high quality summaries. However, such word-based joint optimization is computationally expensive. In this paper we adopt the ‘sentence compression + sentence selection ’ pipeline approach for compressive summarization, but propose to perform summary guided compression, rather than generic sentence-based compression. To create an annotated corpus, the human anno-tators were asked to compress sentences while explicitly given the important summary words in the sentences. Using this corpus, we train a supervised sentence compression model us-ing a set of word-, syntax-, and document-level features. During summarization, we use multiple compressed sentences in the inte-ger linear programming framework to select salient summary sentences. Our results on the TAC 2008 and 2011 summarization data sets show that by incorporating the guided sen-tence compression model, our summarization system can yield significant performance gain as compared to the state-of-the-art. 1
An Assessment of the Accuracy of Automatic Evaluation in Summarization
"... Automatic evaluation has greatly facilitated system development in summarization. At the same time, the use of automatic evaluation has been viewed with mistrust by many, as its accuracy and correct application are not well understood. In this paper we provide an assessment of the automatic evaluati ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
(Show Context)
Automatic evaluation has greatly facilitated system development in summarization. At the same time, the use of automatic evaluation has been viewed with mistrust by many, as its accuracy and correct application are not well understood. In this paper we provide an assessment of the automatic evaluations used for multi-document summarization of news. We outline our recommendations about how any evaluation, manual or automatic, should be used to find statistically significant differences between summarization systems. We identify the reference automatic evaluation metrics— ROUGE 1 and 2—that appear to best emulate human pyramid and responsiveness scores on four years of NIST evaluations. We then demonstrate the accuracy of these metrics in reproducing human judgements about the relative content quality of pairs of systems and present an empirical assessment of the relationship between statistically significant differences between systems according to manual evaluations, and the difference according to automatic evaluations. Finally, we present a case study of how new metrics should be compared to the reference evaluation, as we search for even more accurate automatic measures. 1
Click the Search Button and Be Happy: Evaluating Direct and Immediate Information Access
"... We define Direct Information Access as a type of information access where there is no user operation such as clicking or scrolling between the user’s click on the search button and the user’s information acquisition; we define Immediate Information Access as a type of information access where the us ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
(Show Context)
We define Direct Information Access as a type of information access where there is no user operation such as clicking or scrolling between the user’s click on the search button and the user’s information acquisition; we define Immediate Information Access as a type of information access where the user can locate the relevant information within the system output very quickly. Hence, a Direct and Immediate Information Access (DIIA) system is expected to satisfy the user’s information need very quickly with its very first response. We propose a nugget-based evaluation framework for DIIA, which takes nugget positions into account in order to evaluate the ability of a system to present important nuggets first and to minimise the amount of text the user has to read. To demonstrate the integrity, usefulness and limitations of our framework, we built a Japanese DIIA test collection with 60 queries and over 2,800 nuggets as well as an offset-based nugget match evaluation interface, and conducted experiments with manual and automatic runs. The results suggest our proposal is a useful complement to traditional ranked retrieval evaluation based on document relevance.