Results 1 -
4 of
4
Semi-automatic ground truth generation for chart image recognition
- in: Workshop on Document Analysis Systems (DAS
, 2006
"... Abstract. While research on scientific chart recognition is being carried out, there is no suitable standard that can be used to evaluate the overall performance of the chart recognition results. In this paper, a system for semi-automatic chart ground truth generation is introduced. Using the system ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Abstract. While research on scientific chart recognition is being carried out, there is no suitable standard that can be used to evaluate the overall performance of the chart recognition results. In this paper, a system for semi-automatic chart ground truth generation is introduced. Using the system, the user is able to extract multiple levels of ground truth data. The role of the user is to perform verification and correction and to input values where necessary. The system carries out automatic tasks such as text blocks detection and line detection etc. It can effectively reduce the time to generate ground truth data, comparing to full manual processing. We experimented the system using 115 images. The images and ground truth data generated are available to the public. 1
Quality Assurance in High Volume Document Digitization: A Survey
, 2006
"... quality assurance, document image analysis, OCR, digital library Quality assurance (QA) plays a critical role in high volume document digitization projects by making sure that the specified quality standard is reached under cost and time constraints. This paper takes a systematic view on this issue ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
quality assurance, document image analysis, OCR, digital library Quality assurance (QA) plays a critical role in high volume document digitization projects by making sure that the specified quality standard is reached under cost and time constraints. This paper takes a systematic view on this issue by summarizing and abstracting related existing work: quality bottlenecks and technical solutions throughout the whole processing pipeline, including cataloging, capture, image analysis and recognition, and error cascading; various strategies to conduct costeffective QA, such as combination of auto-QA and manual QA, batch QA, special QA user interface, and open source QA.
PixLabeler: User Interface for Pixel-Level Labeling of Elements in Document Images
"... We present a user interface design for labeling elements in document images at a pixel level. Labels are represented by overlay color, which might map to such terms as “handwriting”, “machine print”, “graphics”, etc. The primary purpose is to streamline processes for manual production of groundtruth ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We present a user interface design for labeling elements in document images at a pixel level. Labels are represented by overlay color, which might map to such terms as “handwriting”, “machine print”, “graphics”, etc. The primary purpose is to streamline processes for manual production of groundtruth data, which is necessary for training algorithms and evaluating performance. Unlike general painttype programs, the UI design is targeted specifically toward selection of collections of foreground pixels that are likely to be meaningful elements in a document image analysis context. Our implementation, called PixLabeler, is available for download and allows customized plug-ins for bootstrapping according to the labeling task. 1.
Generating Ground Truthed Dataset: Automatic or Semi-automatic?
"... Abstract. Ground truthing tools mainly fall into two categories: automatic and semi-automatic. In this paper, we first discuss the pros and cons of the two approaches. We then report our own work on designing and implementing systems for generating chart image dataset and multilevel ground truth dat ..."
Abstract
- Add to MetaCart
Abstract. Ground truthing tools mainly fall into two categories: automatic and semi-automatic. In this paper, we first discuss the pros and cons of the two approaches. We then report our own work on designing and implementing systems for generating chart image dataset and multilevel ground truth data. Both semi-automatic and automatic approaches were adopted, resulting in two independent systems. The dataset as well as the ground truth data are publicly available so that other researchers can access them for evaluating and comparing performances of different systems.

