Results 1 -
1 of
1
Automated Labeling of Zones from Scanned Documents
- Proceedings SDIUT99
, 1999
"... The Lister Hill National Center for Biomedical Communications, a research and development division of the National Library of Medicine (NLM), is developing an automated system, the Medical Article Record System (MARS), to identify and convert bibliographic information from printed biomedical journal ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
The Lister Hill National Center for Biomedical Communications, a research and development division of the National Library of Medicine (NLM), is developing an automated system, the Medical Article Record System (MARS), to identify and convert bibliographic information from printed biomedical journals to electronic format for inclusion in the MEDLINE database. This paper describes one aspect of this ongoing effort: the automated labeling of zones from scanned images with labels such as titles, authors, affiliations, and abstracts. This labeling is based on features calculated from optical character recognition (OCR) output, neural network models, machine learning methods, and a set of rules that is derived from an analysis of the page layout for each journal and from generic typesetting knowledge for English text. Several learning systems are considered including back-propagation neural networks, decision trees, and rule-based systems. Experiments are carried out on a variety of medica...

