Results 1 -
6 of
6
Dependency tree based sentence compression
- In Proceedings of the Fifth International Natural Language Generation Conference. Association for Computational Linguistics
, 2008
"... We present a novel unsupervised method for sentence compression which relies on a dependency tree representation and shortens sentences by removing subtrees. An automatic evaluation shows that our method obtains result comparable or superior to the state of the art. We demonstrate that the choice of ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
(Show Context)
We present a novel unsupervised method for sentence compression which relies on a dependency tree representation and shortens sentences by removing subtrees. An automatic evaluation shows that our method obtains result comparable or superior to the state of the art. We demonstrate that the choice of the parser affects the performance of the system. We also apply the method to German and report the results of an evaluation with humans. 1
Sentence fusion via dependency graph compression
- In Proceedings of EMNLP
, 2008
"... We present a novel unsupervised sentence fusion method which we apply to a corpus of biographies in German. Given a group of related sentences, we align their dependency trees and build a dependency graph. Using integer linear programming we compress this graph to a new tree, which we then linearize ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
(Show Context)
We present a novel unsupervised sentence fusion method which we apply to a corpus of biographies in German. Given a group of related sentences, we align their dependency trees and build a dependency graph. Using integer linear programming we compress this graph to a new tree, which we then linearize. We use GermaNet and Wikipedia for checking semantic compatibility of co-arguments. In an evaluation with human judges our method outperforms the fusion approach of Barzilay & McKeown (2005) with respect to readability. 1
Sentence Compression for Automated Subtitling: A Hybrid Approach
"... In this paper a sentence compression tool is described. We describe how an input sentence gets analysed by using a.o. a tagger, a shallow parser and a subordinate clause detector, and how, based on this analysis, several compressed versions of this sentence are generated, each with an associated est ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
In this paper a sentence compression tool is described. We describe how an input sentence gets analysed by using a.o. a tagger, a shallow parser and a subordinate clause detector, and how, based on this analysis, several compressed versions of this sentence are generated, each with an associated estimated probability. These probabilities were estimated from a parallel transcript/subtitle corpus. To avoid ungrammatical sentences, the tool also makes use of a number of rules. The evaluation was done on three different pronunciation speeds, averaging sentence reduction rates of 40 % to 17%. The number of reasonable reductions ranges between 32.9% and 51%, depending on the average estimated pronunciation speed.
Robust document image understanding technologies
- In HDP ’04: Proceedings of the 1st ACM Workshop on Hardcopy Document Processing,pages 9–14,2004
"... No existing document image understanding technology, whether experimental or commercially available, can guarantee high accuracy across the full range of documents of interest to industrial and government agency users. Ideally, users should be able to search, access, examine, and navigate among docu ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
No existing document image understanding technology, whether experimental or commercially available, can guarantee high accuracy across the full range of documents of interest to industrial and government agency users. Ideally, users should be able to search, access, examine, and navigate among document images as effectively as they can among encoded data files, using familiar interfaces and tools as fully as possible. We are investigating novel algorithms and software tools at the frontiers of document image analysis, information retrieval, text mining, and visualization that will assist in the full integration of such documents into collections of textual document images as well as “born digital ” documents. Our approaches emphasize versatility first: that is, methods which work reliably across the broadest possible range of documents.
Summarizing noisy documents
- In Proceedings of the Symposium on Document Image Understanding Technology
, 2003
"... We investigate the problem of summarizing text documents that contain errors as a result of optical character recognition. Each stage in the process is tested, the error effects analyzed, and possible solutions suggested. Our experimental results show that current approaches, which are developed to ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
(Show Context)
We investigate the problem of summarizing text documents that contain errors as a result of optical character recognition. Each stage in the process is tested, the error effects analyzed, and possible solutions suggested. Our experimental results show that current approaches, which are developed to deal with clean text, suffer significant degradation even with slight increases in the noise level of a document. We conclude by proposing possible ways of improving the performance of noisy document summarization. 1
From Extracts to Abstracts: Human Summary Production Operations for Computer-Aided Summarisation
, 2007
"... This paper presents a classification and evaluation of human summary production operations used to transform extracts into more concise, coherent and readable abstracts. Computer-aided summarisation (CAS) allows a user to post-edit an automatically produced extract to improve it. However, unlike oth ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper presents a classification and evaluation of human summary production operations used to transform extracts into more concise, coherent and readable abstracts. Computer-aided summarisation (CAS) allows a user to post-edit an automatically produced extract to improve it. However, unlike other areas of summarisation, no guidance is available to users of CAS systems to help them complete their task. The research reported here addresses this by examining linguistic operations used by a human summariser to transform extracts into abstracts. An evaluation proves that the operations are useful; they do improve coherence when applied to extracts.