Results 1 -
5 of
5
Stochastic language generation in a dialogue system: Toward a domain independent generator
- In Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue
, 2004
"... This paper describes Acorn, a sentence planner and surface realizer for dialogue systems. Improvements to previous stochastic word-forest based approaches are described, countering recent criticism of this class of algorithms for their slow speed. An evaluation of the approach with semantic input sh ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
This paper describes Acorn, a sentence planner and surface realizer for dialogue systems. Improvements to previous stochastic word-forest based approaches are described, countering recent criticism of this class of algorithms for their slow speed. An evaluation of the approach with semantic input shows runtimes of a fraction of a second and presents results that suggest it is also portable across domains. 1
Evaluating evaluation methods for generation in the presence of variation
- in Proceedings of CICLing 2005
, 2005
"... Abstract. Recent years have seen increasing interest in automatic metrics for the evaluation of generation systems. When a system can generate syntactic variation, automatic evaluation becomes more difficult. In this paper, we compare the performance of several automatic evaluation metrics using a c ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Abstract. Recent years have seen increasing interest in automatic metrics for the evaluation of generation systems. When a system can generate syntactic variation, automatic evaluation becomes more difficult. In this paper, we compare the performance of several automatic evaluation metrics using a corpus of automatically generated paraphrases. We show that these evaluation metrics can at least partially measure adequacy (similarity in meaning), but are not good measures of fluency (syntactic correctness). We make several proposals for improving the evaluation of generation systems that produce variation. 1
Introducing shared task evaluation to nlg: The TUNA shared task evaluation challenges
- In Emiel Krahmer and Mariët Theune, editors, Empirical Methods in Natural Language Generation, volume 5790 of Lecture Notes in Artificial Intelligence (LNAI
, 2010
"... Abstract. Shared Task Evaluation Challenges (stecs) have only recently begun in the field of nlg. The tuna stecs, which focused on Referring Expression Generation (reg), have been part of this development since its inception. This chapter looks back on the experience of organising the three tuna Cha ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Abstract. Shared Task Evaluation Challenges (stecs) have only recently begun in the field of nlg. The tuna stecs, which focused on Referring Expression Generation (reg), have been part of this development since its inception. This chapter looks back on the experience of organising the three tuna Challenges, which came to an end in 2009. While we discuss the role of the stecs in yielding a substantial body of research on the reg problem, which has opened new avenues for future research, our main focus is on the role of different evaluation methods in assessing the output quality of reg algorithms, and on the relationship between such methods. 1
Text content and task performance in the evaluation of a natural language generation system
- In Proceedings of the International Conference on Recent Advances in Natural Language Processing
, 2009
"... An important question in the evaluation of Natural Language Generation systems concerns the relationship between textual characteristics and task performance. If the results of task-based evaluation can be correlated to properties of the text, there are better prospects for improving the system. The ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
An important question in the evaluation of Natural Language Generation systems concerns the relationship between textual characteristics and task performance. If the results of task-based evaluation can be correlated to properties of the text, there are better prospects for improving the system. The present paper investigates this relationship by focusing on the outcomes of a task-based evaluation of a system that generates summaries of patient data, attempting to correlate these with the results of an analysis of the system’s texts, compared to a set of gold standard human-authored summaries.
Author manuscript, published in "23rd International Conference on Computational Linguistics (COLING 2010): Posters (2010)" Comparing
, 2011
"... the performance of two TAG-based surface realisers using controlled grammar traversal ..."
Abstract
- Add to MetaCart
the performance of two TAG-based surface realisers using controlled grammar traversal

