Results 1 -
9 of
9
How carefully designed open resource sharing can help and expand document analysis research
- In: Document Recognition and Retrieval XVIII - DRR 2011. vol. 7874. SPIE, San Francisco, United States
, 2011
"... Making datasets available for peer reviewing of published document analysis methods or distributing large commonly used document corpora for benchmarking are extremely useful and sound practices and initiatives. This paper shows that they cover only a very tiny segment of the uses shared and commonl ..."
Abstract
-
Cited by 9 (7 self)
- Add to MetaCart
Making datasets available for peer reviewing of published document analysis methods or distributing large commonly used document corpora for benchmarking are extremely useful and sound practices and initiatives. This paper shows that they cover only a very tiny segment of the uses shared and commonly available research data may have. We develop a completely new paradigm for sharing and accessing common data sets, benchmarks and other tools that is based on a very open and free community based contribution model. The model is operational and has been implemented so that it can be tested on a broad scale. The new interactions that will arise from its use may spark innovative ways of conducting document analysis research on the one hand, but create very challenging interactions with other research domains as well. 1.
Document Analysis Research in the Year 2021
- in Twenty-fourth International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems (IEA/AIE 2011
, 2011
"... HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
(Show Context)
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Scene Text Segmentation with Multi-level Maximally Stable Extremal Regions
"... Abstract—The segmentation of scene text from the image background has shown great importance in scene text recognition. In this paper, we propose a multi-level MSER technology that identifies the best-quality text candidates from a set of stable regions that are extracted from different color channe ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract—The segmentation of scene text from the image background has shown great importance in scene text recognition. In this paper, we propose a multi-level MSER technology that identifies the best-quality text candidates from a set of stable regions that are extracted from different color channel images. In order to identify the best-quality text candidates, a segmentation score is defined which exploits four measures to evaluate the text probability of each stable region including: 1) Stroke width that measures the small stroke width variation of the text; 2) Boundary curvature that measures the smoothness of the stable region boundary; 3) Character confidence that measures the likelihood of a stable region being text based on a pre-trained support vector classifier; 4) Color constancy that measures the global color consistency of each selected text candidate. Finally, the MSERs with the best segmentation score from each channel are combined to form the final segmentation. The proposed method is evaluated on the ICDAR2003 and SVT datasets and experiments show that it outperforms both popular document image binarization methods and state of the art scene text segmentation methods. I.
Character Extraction by Integrating Color into Edge-based Methods
"... Abstract Text recognition is difficult in e- ..."
(Show Context)
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 1 A Performance Evaluation Methodology for Historical Document Image Binarization
"... Abstract—Document image binarization is of great importance in the document image analysis and recognition pipeline since it affects further stages of the recognition process. The evaluation of a binarization method aids in studying its algorithmic behaviour and verifying its effectiveness by provid ..."
Abstract
- Add to MetaCart
Abstract—Document image binarization is of great importance in the document image analysis and recognition pipeline since it affects further stages of the recognition process. The evaluation of a binarization method aids in studying its algorithmic behaviour and verifying its effectiveness by providing qualitative and quantitative indication of its performance. This work concerns a pixel-based binarization evaluation methodology for historical handwritten/machine-printed document images. In the proposed evaluation scheme, the Recall and Precision evaluation measures are properly modified using a weighting scheme that diminishes any potential evaluation bias. Additional performance metrics of the proposed evaluation scheme consist of the percentage rates of broken and missed text, false alarms, background noise, character enlargement and merging. Several experiments conducted in comparison with other pixel-based evaluation measures, demonstrate the validity of the proposed evaluation scheme. Index Terms—document image binarization, performance evaluation, ground truth. I.
Robust Text Segmentation using Graph Cut
"... Abstract—Text segmentation provides important clues for the accurate identification of character locations and the analysis of character properties such as shape estimation and texture synthesis. In this paper, we propose a robust text segmentation method that employs Markov Random Field (MRF) and u ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Text segmentation provides important clues for the accurate identification of character locations and the analysis of character properties such as shape estimation and texture synthesis. In this paper, we propose a robust text segmentation method that employs Markov Random Field (MRF) and use graph cut algorithms to solve the energy minimization problem. To effectively select accurate seeds to boost the text segmentation performance, stroke feature transform is adopted to robustly identify text seeds and text edges. Background seeds are obtained near the text edges in order to well preserve the text boundaries. The energy functions are defined as an MRF consisting of data energy and smoothness energy which can be efficiently solved by graph cut algorithms. One distinctive property of the proposed technique is that it can identify more distinctive seeds so that only one cut is needed to well separate the text regions from the background, hence much faster than the existing iterative graph cut approach. Experiments on ICDAR 2003 and ICDAR 2011 datasets show that the proposed technique obtains superior performance on both pixel level and atom level segmentation. I.
unknown title
"... Abstract—This report presents the final results of the ICDAR 2013 Robust Reading Competition. The competition is structured in three Challenges addressing text extraction in different appli-cation domains, namely born-digital images, real scene images and real-scene videos. The Challenges are organi ..."
Abstract
- Add to MetaCart
Abstract—This report presents the final results of the ICDAR 2013 Robust Reading Competition. The competition is structured in three Challenges addressing text extraction in different appli-cation domains, namely born-digital images, real scene images and real-scene videos. The Challenges are organised around specific tasks covering text localisation, text segmentation and word recognition. The competition took place in the first quarter of 2013, and received a total of 42 submissions over the different tasks offered. This report describes the datasets and ground truth specification, details the performance evaluation protocols used and presents the final results along with a brief summary of the participating methods. I.
ICDAR 2011 Robust Reading Competition
"... Abstract—This paper presents the results of the first Challenge ..."
(Show Context)