Results 1 - 10
of
22
Machine printed text and handwriting identification in noisy document images
- IEEE Transaction on Pattern Analysis and Machine Intelligence(PAMI
"... Abstract—In this paper, we address the problem of the identification of text in noisy document images. We are especially focused on segmenting and identifying between handwriting and machine printed text because: 1) Handwriting in a document often indicates corrections, additions, or other supplemen ..."
Abstract
-
Cited by 49 (4 self)
- Add to MetaCart
Abstract—In this paper, we address the problem of the identification of text in noisy document images. We are especially focused on segmenting and identifying between handwriting and machine printed text because: 1) Handwriting in a document often indicates corrections, additions, or other supplemental information that should be treated differently from themain content and 2) the segmentation and recognition techniques requested for machine printed and handwritten text are significantly different. A novel aspect of our approach is that we treat noise as a separate class and model noise based on selected features. Trained Fisher classifiers are used to identify machine printed text and handwriting from noise and we further exploit context to refine the classification. AMarkov RandomField-based (MRF) approach is used to model the geometrical structure of the printed text, handwriting, and noise to rectify misclassifications. Experimental results show that our approach is robust and can significantly improve page segmentation in noisy document collections. Index Terms—Text identification, handwriting identification, Markov random field, postprocessing, noisy document image enhancement, document analysis. 1
Empirical Performance Evaluation Methodology and Its Application to Page Segmentation Algorithms
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2001
"... this paper, we use the following five-step methodology to quantitatively compare the performance of page segmentation algorithms: 1) First, we create mutually exclusive training and test data sets with groundtruth, 2) we then select a meaningful and computable performance metric, 3) an optimizatio ..."
Abstract
-
Cited by 38 (5 self)
- Add to MetaCart
(Show Context)
this paper, we use the following five-step methodology to quantitatively compare the performance of page segmentation algorithms: 1) First, we create mutually exclusive training and test data sets with groundtruth, 2) we then select a meaningful and computable performance metric, 3) an optimization procedure is then used to search automatically for the optimal parameter values of the segmentation algorithms on the training data set, 4) the segmentation algorithms are then evaluated on the test data set, and, finally, 5) a statistical and error analysis is performed to give the statistical significance of the experimental results. In particular, instead of the ad hoc and manual approach typically used in the literature for training algorithms, we pose the automatic training of algorithms as an optimization problem and use the Simplex algorithm to search for the optimal parameter value. A paired-model statistical analysis and an error analysis are then conducted to provide confidence intervals for the experimental results of the algorithms. This methodology is applied to the evaluation of five page segmentation algorithms of which, three are representative research algorithms and the other two are well-known commercial products, on 978 images from the University of Washington III data set. It is found that the performance indices (average textline accuracy) of the Voronoi, Docstrum, and Caere segmentation algorithms are not significantly different from each other, but they are significantly better than that of ScanSoft's segmentation algorithm, which, in turn, is significantly better than that of X-Y cut
Object count/Area Graphs for the Evaluation of Object Detection and Segmentation Algorithms
- International Journal on Document Analysis and Recognition
"... Evaluation of object detection algorithms is a non-trivial task: a detection result is usually evaluated by comparing the bounding box of the detected object with the bounding box of the ground truth object. The commonly used precision and recall measures are computed from the overlap area of these ..."
Abstract
-
Cited by 34 (4 self)
- Add to MetaCart
(Show Context)
Evaluation of object detection algorithms is a non-trivial task: a detection result is usually evaluated by comparing the bounding box of the detected object with the bounding box of the ground truth object. The commonly used precision and recall measures are computed from the overlap area of these two rectangles. However, these measures have several drawbacks: they don’t give intuitive information about the proportion of the correctly detected objects and the number of false alarms, and they cannot be accumulated across multiple images without creating ambiguity in their interpretation. Furthermore, quantitative and qualitative evaluation is often mixed resulting in ambiguous measures. In this paper we propose a new approach which tackles these problems. The performance of a detection algorithm is illustrated intuitively by performance graphs which present object level precision and recall depending on constraints on detection quality. In order to compare different detection algorithms, a representative single performance value is computed from the graphs. The influence of the test database on the detection performance is illustrated by performance/generality graphs. The evaluation method can be applied to different types of object detection algorithms. It has been tested on different text detection algorithms, among which are the participants of the ICDAR 2003 text detection competition.
Document Structure Analysis and Performance Evaluation
, 1999
"... Document Structure Analysis and Performance Evaluation by Jisheng Liang Chair of Supervisory Committee Professor Robert M. Haralick Electrical Engineering The goal of the document structure analysis is to find an optimal solution to partition the set of glyphs on a given document to a hierarchical t ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
Document Structure Analysis and Performance Evaluation by Jisheng Liang Chair of Supervisory Committee Professor Robert M. Haralick Electrical Engineering The goal of the document structure analysis is to find an optimal solution to partition the set of glyphs on a given document to a hierarchical tree structure where entities within the hierarchy are associated with their physical properties and semantic labels. In this dissertation, we present a unified document structure extraction algorithm that is probability based, where the probabilities are estimated from an extensive training set of various kinds of measurements of distances between the terminal and non-terminal entities with which the algorithm works. The off-line probabilities estimated in the training then drive all decisions in the on-line segmentation module. An iterative, relaxation like method is used to find the partitioning solution that maximizes the joint probability. This approach can be uniformly apply to the cons...
Extraction Of Special Effects Caption Text Events From Digital Video
, 2003
"... The popularity of digital video is increasing rapidly. To help users navigate libraries of video, algorithms that automatically index video based on content are needed. One approach is to extract text appearing in video, which often reflects a scene's semantic content. This is a di#cult problem ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
The popularity of digital video is increasing rapidly. To help users navigate libraries of video, algorithms that automatically index video based on content are needed. One approach is to extract text appearing in video, which often reflects a scene's semantic content. This is a di#cult problem due to the unconstrained nature of general-purpose video. Text can have arbitrary color, size, and orientation. Backgrounds may be complex and changing. Most work so far has made restrictive assumptions about the nature of text occurring in video. Such work is therefore not directly applicable to unconstrained, general-purpose video. In addition, most work so far has focused only on detecting the spatial extent of text in individual video frames. However, text occurring in video usually persists for several seconds. This constitutes a text event that should be entered only once in the video index. Therefore it is also necessary to determine the temporal extent of text events. This is a non-trivial problem because text may move, rotate, grow, shrink, or otherwise change over time. Such text e#ects are common in television programs and commercials but so far have received little attention in the literature. This paper discusses detecting, binarizing, and tracking caption text in general-purpose MPEG-1 video. Solutions are proposed for each of these problems and compared with existing work found in the literature.
A Methodology for Empirical Performance Evaluation of Page Segmentation Algorithms
- IN PROCEEDINGS OF SPIE CONFERENCE ON DOCUMENT RECOGNITION AND RETRIEVAL
, 1999
"... Document page segmentation is a crucial preprocessing step in Optical Character Recognition (OCR) systems. While numerous page segmentation algorithms have been proposed, there is relatively less literature on comparative evaluation --- empirical or theoretical --- of these algorithms. For the exist ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
Document page segmentation is a crucial preprocessing step in Optical Character Recognition (OCR) systems. While numerous page segmentation algorithms have been proposed, there is relatively less literature on comparative evaluation --- empirical or theoretical --- of these algorithms. For the existing performance evaluation methods, two crucial components are usually missing: 1) automatic training of algorithms with free parameters and 2) statistical and error analysis of experimental results. In this thesis, we use the following five-step methodology to quantitatively compare the performance of page segmentation algorithms: 1) First we create mutually exclusive training and test datasets with groundtruth, 2) we then select a meaningful and computable performance metric, 3) an optimization procedure is then used to search automatically for the optimal parameter values of the segmentation algorithms, 4) the segmentation algorithms are then evaluated on the test dataset, and finally 5) ...
ICDAR 2003 robust reading competitions: entries, results and future directions
- INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION - SPECIAL ISSUE ON CAMERA-BASED TEXT AND DOCUMENT RECOGNITION 7(2–3
, 2005
"... This paper describes the robust reading competitions for ICDAR 2003. With the rapid growth in research over the last few years on recognizing text in natural scenes, there is an urgent need to establish some common benchmark datasets, and gain a clear understanding of the current state of the art. W ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
(Show Context)
This paper describes the robust reading competitions for ICDAR 2003. With the rapid growth in research over the last few years on recognizing text in natural scenes, there is an urgent need to establish some common benchmark datasets, and gain a clear understanding of the current state of the art. We use the term robust reading to refer to text images that are beyond the capabilities of current commer-cial OCR packages. We chose to break down the robust reading problem into three sub-problems, and run competitions for each stage, and also a competition for the best overall system. The sub-problems we chose were text locating, character recognition and word recognition. By breaking down the problem in this way, we hoped to gain a better under-standing of the state of the art in each of the sub-problems. Furthermore, our methodology involved storing detailed results of applying each algorithm to each image in the data sets, allowing researchers to study in depth the strengths and weaknesses of each algorithm. The text locating contest was the only one to have any entries. We give a brief description of each entry, and present the results of this contest, showing cases where the leading entries succeed and fail. We also de-scribe an algorithm for combining the outputs of the individual text locaters, and show how the combination scheme improves on any of the individual systems.
Document Layout Structure Extraction Using Bounding Boxes of Different Entities
- Proc. Workshop on Applications of Computer Vision
, 1996
"... This paper presents an efficient technique for document page layout structure extraction and classification by analyzing the spatial configuration of the bounding boxes of different entities on the given image. The algorithm segments an image into a list of homogeneous zones. The classification algo ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
(Show Context)
This paper presents an efficient technique for document page layout structure extraction and classification by analyzing the spatial configuration of the bounding boxes of different entities on the given image. The algorithm segments an image into a list of homogeneous zones. The classification algorithm labels each zone as text, table, line-drawing, halftone, ruling, or noise. The text-lines and words are extracted within text zones and neighboring text-lines are merged to form text-blocks. The tabular structure is further decomposed into row and column items. Finally, the document layout hierarchy is produced from these extracted entities. 1 Introduction The goal of document understanding is to convert existing paper documents into a machine readable form. It involves estimating the rotation skew of each document page, determining the geometric page layout, labeling blocks as text, math, figure, table, halftone, etc., determining the text of text blocks through an OCR system, deter...
A Framework for the Assessment of Text Extraction Algorithms on Complex Colour Images
- Proceedings of the 9th IARP Int. Workshop on Document Analysis Systems, ACM Press
"... The availability of open, ground-truthed datasets and clear performance metrics is a crucial factor in the development of an application domain. The domain of colour text image analysis (real scenes, Web and spam images, scanned colour documents) has traditionally suffered from a lack of a comprehen ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
(Show Context)
The availability of open, ground-truthed datasets and clear performance metrics is a crucial factor in the development of an application domain. The domain of colour text image analysis (real scenes, Web and spam images, scanned colour documents) has traditionally suffered from a lack of a comprehensive performance evaluation framework. Such a framework is extremely difficult to specify, and corresponding pixel-level accurate information tedious to define. In this paper we discuss the challenges and technical issues associated with developing such a framework. Then, we describe a complete framework for the evaluation of text extraction methods at multiple levels, provide a detailed ground-truth specification and present a case study on how this framework can be used in a real-life situation.
On segmentation of documents in complex scripts
- in International Conf. on Document Analysis and Recognition
"... Document image segmentation algorithms primarily aim at separating text and graphics in presence of complex layouts. However, for many non-Latin scripts, segmentation becomes a challenge due to the characteristics of the script. In this paper, we empirically demonstrate that successful algorithms fo ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
(Show Context)
Document image segmentation algorithms primarily aim at separating text and graphics in presence of complex layouts. However, for many non-Latin scripts, segmentation becomes a challenge due to the characteristics of the script. In this paper, we empirically demonstrate that successful algorithms for Latin scripts may not be very effective for Indic and complex scripts. We explain this based on the differences in the spatial distribution of symbols in the scripts. We argue that the visual information used for segmentation needs to be enhanced with other information like script models for accurate results. 1.