MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Layout and language: An ecient algorithm for detecting text blocks based on spatial and linguistic evidence (2001) [1 citations — 0 self]

Download:
Download as a PDF | Download as a PS
by Matthew Hurst
In Document Recognition and Retrieval VIII
ftp://ftp.cogsci.ed.ac.uk/pub/matth/spie01.ps
Add To MetaCart

Abstract:

The ability to accurately detect those areas in plain text documents that consist of contiguous text is an important pre-process to many applications. This paper introduces a novel method that uses both spatial and linguistic knowledge in an accurate manner to provide an initial analysis of the document. This initial analysis may then be extended to provide a complete analysis of the text areas in the document.

Citations

12 Using white space for automated document structuring – RUS, SUMMERS - 1994
10 Automatic Discovery of Logical Document Structure – Summers - 1998
8 A paper-to-HTML table converting system – Kieninger, Dengel - 1998