Results 1 - 10
of
146
Localizing and Segmenting Text in Images and Videos,
- IEEE Transactions on Circuits and Systems for Video Technology,
, 2002
"... ..."
Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2003
"... Abstract—The current paper presents a novel texture-based method for detecting texts in images. A support vector machine (SVM) is used to analyze the textural properties of texts. No external texture feature extraction module is used; rather, the intensities of the raw pixels that make up the textur ..."
Abstract
-
Cited by 68 (0 self)
- Add to MetaCart
(Show Context)
Abstract—The current paper presents a novel texture-based method for detecting texts in images. A support vector machine (SVM) is used to analyze the textural properties of texts. No external texture feature extraction module is used; rather, the intensities of the raw pixels that make up the textural pattern are fed directly to the SVM, which works well even in high-dimensional spaces. Next, text regions are identified by applying a continuously adaptive mean shift algorithm (CAMSHIFT) to the results of the texture analysis. The combination of CAMSHIFT and SVMs produces both robust and efficient text detection, as time-consuming texture analyses for less relevant pixels are restricted, leaving only a small part of the input image to be texture-analyzed. Index Terms—Text detection, image indexing, texture analysis, support vector machine, CAMSHIFT.
An Adaptive Algorithm for Text Detection from Natural Scenes
- PROCEEDINGS OF COMPUTER VISION AND PATTERN RECOGNITION (CVPR
, 2001
"... We present a new adaptive algorithm for automatic detection of text from a natural scene. The initial cues of text regions are first detected from the captured image/video. An adaptive color modeling and searching algorithm is then utilized near the initial text cues, to discriminate text/non-text r ..."
Abstract
-
Cited by 40 (5 self)
- Add to MetaCart
We present a new adaptive algorithm for automatic detection of text from a natural scene. The initial cues of text regions are first detected from the captured image/video. An adaptive color modeling and searching algorithm is then utilized near the initial text cues, to discriminate text/non-text regions. EM optimization algorithm is used for color modeling, under the constraint of text layout relations for a specific language. The proposed algorithm combines the advantages of several previous approaches for text detection, and utilizes a focus-of-attention approach for text finding. The whole algorithm is applied in a prototype system that can automatically detect and recognize sign input from a video camera, and translate the signs into English text or voice streams. We present evaluation results of our algorithm on this system.
An Automatic Performance Evaluation Protocol for Video Text Detection Algorithms”,
- IEEE Trans. on CSVT,
, 2004
"... ..."
Word Spotting in the Wild
"... Abstract. We present a method for spotting words in the wild, i.e., in real images taken in unconstrained environments. Text found in the wild has a surprising range of difficulty. At one end of the spectrum, Optical Character Recognition (OCR) applied to scanned pages of well formatted printed text ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
(Show Context)
Abstract. We present a method for spotting words in the wild, i.e., in real images taken in unconstrained environments. Text found in the wild has a surprising range of difficulty. At one end of the spectrum, Optical Character Recognition (OCR) applied to scanned pages of well formatted printed text is one of the most successful applications of computer vision to date. At the other extreme lie visual CAPTCHAs – text that is constructed explicitly to fool computer vision algorithms. Both tasks involve recognizing text, yet one is nearly solved while the other remains extremely challenging. In this work, we argue that the appearance of words in the wild spans this range of difficulties and propose a new word recognition approach based on state-of-the-art methods from generic object recognition, in which we consider object categories to be the words themselves. We compare performance of leading OCR engines – one open source and one proprietary – with our new approach on the ICDAR Robust Reading data set and a new word spotting data set we introduce in this paper: the Street View Text data set. We show improvements of up to 16 % on the data sets, demonstrating the feasibility of a new approach to a seemingly old problem. 1
Extraction and Recognition of Artificial Text in Multimedia Documents
, 2002
"... The systems currently available for content based image and video retrieval work without semantic knowledge, i.e. they use image processing methods to extract low level features of the data. The similarity obtained by these approaches does not always correspond to the similarity a human user woul ..."
Abstract
-
Cited by 27 (8 self)
- Add to MetaCart
(Show Context)
The systems currently available for content based image and video retrieval work without semantic knowledge, i.e. they use image processing methods to extract low level features of the data. The similarity obtained by these approaches does not always correspond to the similarity a human user would expect. A way to include more semantic knowledge into the indexing process is to use the text included in the images and video sequences. It is rich in information but easy to use, e.g. by key word based queries. In this paper we present an algorithm to localize artificial text in images and videos using a measure of accumulated gradients and morphological processing. The quality of the localized text is improved by robust multiple frame integration.
Extracting Information From Text and Images for Location Proteomics
, 2003
"... There is extensive interest in automating the collection, organization and summarization of biological data. Data in the form of figures and accompanying captions in literature present special challenges for such efforts. Based on our previously developed search engines to find fluorescence microsco ..."
Abstract
-
Cited by 25 (13 self)
- Add to MetaCart
There is extensive interest in automating the collection, organization and summarization of biological data. Data in the form of figures and accompanying captions in literature present special challenges for such efforts. Based on our previously developed search engines to find fluorescence microscope images depicting protein subcellular patterns, we introduced text mining and Optical Character Recognition (OCR) techniques for caption understanding and figure-text matching, so as to build a robust, comprehensive toolset for extracting information about protein subcellular localization from the text and images found in online journals. Our current system can generate assertions such as "Figure N depicts a localization of type L for protein P in cell type C".
Detection of Text on Road Signs from Video
- IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS (ITS)
, 2005
"... A fast and robust framework for incrementally detecting text on road signs from video is presented in this paper. This new framework makes two main contributions. 1) The framework applies a divide-and-conquer strategy to decompose the original task into two subtasks, that is, the localization of roa ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
A fast and robust framework for incrementally detecting text on road signs from video is presented in this paper. This new framework makes two main contributions. 1) The framework applies a divide-and-conquer strategy to decompose the original task into two subtasks, that is, the localization of road signs and the detection of text on the signs. The algorithms for the two subtasks are naturally incorporated into a unified framework through a feature-based tracking algorithm. 2) The framework provides a novel way to detect text from video by integrating two-dimensional (2-D) image features in each video frame (e.g., color, edges, texture) with the three-dimensional (3-D) geometric structure information of objects extracted from video sequence (such as the vertical plane property of road signs). The feasibility of the proposed framework has been evaluated using 22 video sequences captured from a moving vehicle. This new framework gives an overall text detection rate of 88.9 % and a false hit rate of 9.2%. It can easily be applied to other tasks of text detection from video and potentially be embedded in a driver assistance system.
An Objective Evaluation Methodology for Document Image Binarization Techniques
- THE EIGHTH IAPR WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS
, 2008
"... Evaluation of document image binarization techniques is a tedious task that is mainly performed by a human expert or by involving an OCR engine. This paper presents an objective evaluation methodology for document image binarization techniques that aims to reduce the human involvement in the ground ..."
Abstract
-
Cited by 19 (8 self)
- Add to MetaCart
Evaluation of document image binarization techniques is a tedious task that is mainly performed by a human expert or by involving an OCR engine. This paper presents an objective evaluation methodology for document image binarization techniques that aims to reduce the human involvement in the ground truth construction and consecutive testing. A skeletonized ground truth image is produced by the user following a semi-automatic procedure. The estimated ground truth image can aid in evaluating the binarization result in terms of recall and precision as well as to further analyze the result by calculating broken and missing text, deformations and false alarms. A detailed description of the methodology along with a benchmarking of the six (6) most promising state-of-the-art binarization algorithms based on the proposed methodology is presented.
Application of information retrieval technologies to presentation slides
- IEEE TRANSACTIONS ON MULTIMEDIA
, 2006
"... Presentations are becoming an increasingly more common means of communication in working environments, and slides are often the necessary supporting material on which the presentations rely. In this paper, we describe a slide indexing and retrieval system in which the slides are captured as images ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
(Show Context)
Presentations are becoming an increasingly more common means of communication in working environments, and slides are often the necessary supporting material on which the presentations rely. In this paper, we describe a slide indexing and retrieval system in which the slides are captured as images (through a framegrabber) at the moment they are displayed during a presentation and then transcribed with an OCR system. In this context, we show that such an approach presents several advantages over the use of commercial software (API based) to obtain the slide transcriptions. We report a set of retrieval experiments conducted on a database of 26 real presentations (570 slides) collected at a workshop. The experiments show that the overall retrieval performance is close to that obtained using either a manual transcription of the slides or the API software. Moreover, the experiments show that the OCR based approach outperforms significantly the API in extracting the text embedded in images and figures.