• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Cut-and-Paste Text Summarization (2001)

by H Jing
Add To MetaCart

Tools

Sorted by:
Results 1 - 6 of 6

Dependency tree based sentence compression

by Michael Strube - In Proceedings of the Fifth International Natural Language Generation Conference. Association for Computational Linguistics , 2008
"... We present a novel unsupervised method for sentence compression which relies on a dependency tree representation and shortens sentences by removing subtrees. An automatic evaluation shows that our method obtains result comparable or superior to the state of the art. We demonstrate that the choice of ..."
Abstract - Cited by 23 (2 self) - Add to MetaCart
We present a novel unsupervised method for sentence compression which relies on a dependency tree representation and shortens sentences by removing subtrees. An automatic evaluation shows that our method obtains result comparable or superior to the state of the art. We demonstrate that the choice of the parser affects the performance of the system. We also apply the method to German and report the results of an evaluation with humans. 1
(Show Context)

Citation Context

... systems are extractive, sentence compression techniques are a common way to deal with redundancy in their output. In recent years, a number of approaches to sentence compression have been developed (=-=Jing, 2001-=-; Knight & Marcu, 2002; Gagnon & Da Sylva, 2005; Turner & Charniak, 2005; Clarke & Lapata, 2008, inter alia). Many explicitly rely on a language model, usually a trigram model, to produce grammatical ...

Sentence fusion via dependency graph compression

by Michael Strube - In Proceedings of EMNLP , 2008
"... We present a novel unsupervised sentence fusion method which we apply to a corpus of biographies in German. Given a group of related sentences, we align their dependency trees and build a dependency graph. Using integer linear programming we compress this graph to a new tree, which we then linearize ..."
Abstract - Cited by 22 (0 self) - Add to MetaCart
We present a novel unsupervised sentence fusion method which we apply to a corpus of biographies in German. Given a group of related sentences, we align their dependency trees and build a dependency graph. Using integer linear programming we compress this graph to a new tree, which we then linearize. We use GermaNet and Wikipedia for checking semantic compatibility of co-arguments. In an evaluation with human judges our method outperforms the fusion approach of Barzilay & McKeown (2005) with respect to readability. 1
(Show Context)

Citation Context

...s of the evaluation. The conclusions follow in the final section. 2 Related Work Most studies on text-to-text generation concern sentence compression where the input consists of exactly one sentence (=-=Jing, 2001-=-; Hori & Furui, 2004; Clarke & Lapata, 2008, inter alia). In such setting, redundancy, incompleteness and compatibility 1 We follow Barzilay & McKeown (2005) and refer to aggregation within text-to-te...

Sentence Compression for Automated Subtitling: A Hybrid Approach

by Vincent Vandeghinste, Yi Pan
"... In this paper a sentence compression tool is described. We describe how an input sentence gets analysed by using a.o. a tagger, a shallow parser and a subordinate clause detector, and how, based on this analysis, several compressed versions of this sentence are generated, each with an associated est ..."
Abstract - Cited by 20 (0 self) - Add to MetaCart
In this paper a sentence compression tool is described. We describe how an input sentence gets analysed by using a.o. a tagger, a shallow parser and a subordinate clause detector, and how, based on this analysis, several compressed versions of this sentence are generated, each with an associated estimated probability. These probabilities were estimated from a parallel transcript/subtitle corpus. To avoid ungrammatical sentences, the tool also makes use of a number of rules. The evaluation was done on three different pronunciation speeds, averaging sentence reduction rates of 40 % to 17%. The number of reasonable reductions ranges between 32.9% and 51%, depending on the average estimated pronunciation speed.

Robust document image understanding technologies

by Henry S. Baird, Daniel Lopresti, Brian D. Davison, William M. Pottenger - In HDP ’04: Proceedings of the 1st ACM Workshop on Hardcopy Document Processing,pages 9–14,2004
"... No existing document image understanding technology, whether experimental or commercially available, can guarantee high accuracy across the full range of documents of interest to industrial and government agency users. Ideally, users should be able to search, access, examine, and navigate among docu ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
No existing document image understanding technology, whether experimental or commercially available, can guarantee high accuracy across the full range of documents of interest to industrial and government agency users. Ideally, users should be able to search, access, examine, and navigate among document images as effectively as they can among encoded data files, using familiar interfaces and tools as fully as possible. We are investigating novel algorithms and software tools at the frontiers of document image analysis, information retrieval, text mining, and visualization that will assist in the full integration of such documents into collections of textual document images as well as “born digital ” documents. Our approaches emphasize versatility first: that is, methods which work reliably across the broadest possible range of documents.
(Show Context)

Citation Context

...s. In particular, we broke down the summarization process into four steps: sentence boundary detection, preprocessing (part-of-speech tagging [35] and syntactic parsing), extraction, and post-editing =-=[25]-=-. We tested each step on noisy documents and analyzed the errors that arose, finding that these modules suffered significant degradation as the noise level in the document increased. We also studied h...

Summarizing noisy documents

by Hongyan Jing, Daniel Lopresti, Chilin Shih - In Proceedings of the Symposium on Document Image Understanding Technology , 2003
"... We investigate the problem of summarizing text documents that contain errors as a result of optical character recognition. Each stage in the process is tested, the error effects analyzed, and possible solutions suggested. Our experimental results show that current approaches, which are developed to ..."
Abstract - Cited by 6 (3 self) - Add to MetaCart
We investigate the problem of summarizing text documents that contain errors as a result of optical character recognition. Each stage in the process is tested, the error effects analyzed, and possible solutions suggested. Our experimental results show that current approaches, which are developed to deal with clean text, suffer significant degradation even with slight increases in the noise level of a document. We conclude by proposing possible ways of improving the performance of noisy document summarization. 1
(Show Context)

Citation Context

...o study in our experiment. Three of four documents were from the data collection used in the Text REtrieval Conferences (TREC) [10] and one was from a Telecommunications corpus we collected ourselves =-=[13]-=-. All were professionally written news articles, each containing from 200 to 800 words (the shortest document was 9 sentences and the longest was 38 sentences). For each document, we created 10 noisy ...

From Extracts to Abstracts: Human Summary Production Operations for Computer-Aided Summarisation

by Laura Hasler , 2007
"... This paper presents a classification and evaluation of human summary production operations used to transform extracts into more concise, coherent and readable abstracts. Computer-aided summarisation (CAS) allows a user to post-edit an automatically produced extract to improve it. However, unlike oth ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
This paper presents a classification and evaluation of human summary production operations used to transform extracts into more concise, coherent and readable abstracts. Computer-aided summarisation (CAS) allows a user to post-edit an automatically produced extract to improve it. However, unlike other areas of summarisation, no guidance is available to users of CAS systems to help them complete their task. The research reported here addresses this by examining linguistic operations used by a human summariser to transform extracts into abstracts. An evaluation proves that the operations are useful; they do improve coherence when applied to extracts.
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University