| Rainer Lienhart and Frank Stuber, "Automatic text recognition in digital videos", in Image and Video Processing IV 1996, Proc. SPIE 2666-20, 1996. |
....be briefly classified into region based, texture based and edge based methods. Region based methods detect character as the monochrome regions satisfying certain heuristic constraints. Since the grayscale or color of the text pixels in input image are often not uniform, image segmentation [1] 4][5] or color clustering [10] preprocess has to be performed to reduce the total number of colors or grayscales in image. The performance of region based methods greatly rely on the monochrome assumption of text characters and are therefore not robust to complex background and compressed video. ....
....alarm regions in all the identified regions. The false pixel alarm rate (FPR) is defined as the total area of false alarm regions as a percentage of the whole area of the video frames. Table 1 compares the performances and running time costs of the proposed algorithm with typical region based [5], texture based [2] and edge based [11] methods. The proposed algorithm is a good tradeoff between high identification rate and low false alarm rates. The mounts of CPU required by this algorithm is higher than region and edge based method but much lower than texture based method. Most of the time ....
[Article contains additional citation context not shown here]
R. Lienhart, "Automatic text recognition in digital videos", In Proc. SPIE, Image and Video Processing IV, January 1996.
....added it is often more structured and closely related to the subject than scene text. In some domains such as sports, however, scene text can be used to uniquely identify objects (participants in the clip) Most related previous work has focused on the extraction of graphic text [4] 5] [6]. Although scene text is often difficult to detect and extract due to its virtually unlimited range of poses, sizes, shapes and colors, it is important in applications such as navigation, surveillance, video classification, or analysis of sporting events. Text often spans tens or even hundreds of ....
....of text by tracking it is useful not only for reducing processing time by not applying all steps (detection, extraction, enhancement and OCR) to every frame, but also for maintaining integrity by detecting when new information appears and as a means of enhancement. In the literature Lienhart [6] and Shim [7] make use of multiple frame information to reduce incorrectly identified text regions, but neither addresses the problem of tracking text in the video. In Lienhart s system, the text in each frame is extracted and saved after recognition, resulting in duplicates of the same text ....
R. Lienhart and F. Stuber, "Automatic text recognition in digital videos," in Proceedings of ACM Multimedia, 1996, pp. 11--20.
....and David Doermann Language and Media Processing Lab Center for Automation Research University of Maryland, College Park, MD 20742 3275. email: huiping cfar.umd.edu Abstract In this paper we present a video text detection system based on neural network training. Compared with previous work [3, 5, 7] which detected only graphical text with fixed parameters, our system has following advantages: 1) We provide a training mechanism so the parameters of the system can be adapted to changing environments. 2) Our system can detect both graphical text and scene text located in complex backgrounds. 3) ....
....than that of backgrounds; and the text can be distorted or blurred by camera or object motion. Other problems, like lighting and background variation, make the task even more difficult. Previous work on text detection has focused on the detection of graphical text superimposed on a video frame [3, 5, 7, 8]. In this work a character is modeled as a homogeneous region and extracted by color segmentation [3] splitting and merging [5] or generalized region labeling (GRL) 8] Such factors as the region s area, height, horizontal vertical aspect ratio, contrast and horizontal alignment are used to ....
[Article contains additional citation context not shown here]
R. Lienhart and F. Stuber. Automatic text recognition in digital videos. In Proceedings of ACM Multimedia, pages 11--20, 1996. 18
....with text rendered as small as 10 Theta 10 pixels, resulting in no output from OCR software, even though text is clearly human readable. 1. 1 Related Work Some work on the extraction of text from road signs [1] license plates [2] library books [3] WWW images [4] and isolated video frames [5] has been reported in the literature. The methods can be broadly classified into two types: connected component (CC) based and texture based. Scene images and video frames are usually recorded in multivalued (gray scale or colored) form. For CC based approaches, color clustering [6, 7] or ....
....from 6 to 38 for documents with varying quality. Liang [14] addresses the problem of document image restoration using morphological filters and achieves nearly 80 OCR accuracy for subtractive and additive noise images. Work on text recognition in scene images and digital video is reported in [5], 11] 15] and [16] Wu and Manmatha describe a text extraction and recognition system and achieve 84 correct OCR rates based only on OCRable text [11] Lienhart describes a text recognition system in digital video and achieves a recognition result of nearly 80 [5] Shim and Dorai present a ....
[Article contains additional citation context not shown here]
R. Lienhart and F. Stuber. Automatic text recognition in digital videos. In Proceedings of ACM Multimedia, pages 11--20, 1996.
....uses a polygonal line approach [3] to represent contours as a set of connected segments. The ending of a segment is detected when the relation between the current segment polygonal area and its length is beyond a certain threshold. Caption extraction Based on an existing caption extraction method [10] a new and more effective procedure was implemented. As the captions are usually artificially added to images, the first step of this procedure is extracting high contrast regions. This task is performed by segmenting the edge image, whose contours have been previously dilated by a certain radius. ....
Rainer Lienhart, Frank Stuber. "Automatic text Recognition in Digital Videos". University of Manhein, Departement of Computer Science, technical report TR-95-006, 1995. (http://www.informatik.uni.manhein.de/~lienhart/ papers/tr-95-006.gz)
....character recognition in text based video libraries [1] we IEEE International Workshop on Content Based Access of Image and Video Databases (CAIVD 98) have seen few achievements. Automatic character segmentation was performed for titles and credits in motion picture videos in [2] and [3]; however, both papers have insufficient consideration of character recognition. There are similar research fields which concern character recognition of videos [4] 5] In [4] character extraction from the car license plate using video images is presented. In [5] characters in scene images are ....
....to the background, making extraction extremely difficult. For the cases of the license plate research [4] the scene image research [5] and the document image analysis [7] the difficulty of complex backgrounds is rarely a problem although examples with plain backgrounds are illustrated. In [3], there is an assumption that characters are drawn in high contrast against the background. Object detection techniques such as matched spatial filtering are alternative approaches for extracting characters from the background. In [8] several template matching methods are compared to determine ....
R. Lienhart and F. Stuber, "Automatic text recognition in digital videos," In Proceedings of SPIE Image and Video Processing IV, vol. 2666, pp. 180-188, September 1996.
....Text can be separated easier than from the text blocks shown in Figure 2a d. The main contribution of this paper is that we use multiple frame information to enhance the textual image so the text can be easily separated from background while the previous work focuses primarily on single frames [5, 8]. 2 Text Block Registration A key point for the implementation of our scheme is the text registration in multiple frames. The first text block (reference text block) is detected by a hybrid wavelet neural network segmenter presented in our previous work [2] 2.1 Text Tracking The affine model ....
R. Lienhart and F. Stuber. Automatic text recognition in digital videos. In Proceedings of ACM Multimedia, pages 11--20, 1996.
....and text which moves as a result of camera motion. INTRODUCTION Text which appears in digital videos can provide important supplemental information for indexing and retrieval. Although some work has been done on the extraction of text from images ( 1, 2, 3, 4, 7, 8] and isolated video frames ([5, 6]) very little work has considered the temporal aspects of video. In our previous work, we provided a methodology which identifies both scene and graphic text from low resolution video key frames [5] We have found it necessary, however, to consider temporal changes to detect new information, to ....
....false detections while The support of this research by the Department of Defense under contract MDA 90496C 1250 is gratefully acknowledged. Monitoring Process Tracking Process Figure 1: The scheme for text tracking in digital videos. having multiple instances to provide enhancement for OCR. In [6] Lienhart and Stuber do make use of temporal information to enhance the extraction result, but they do not consider the problem of tracking moving text. This paper is organized as follows: In Section 2 we introduce our tracking scheme. Text element generation is presented in Section 3 and details ....
R. Lienhart and F. Stuber. Automatic text recognition in digital videos. In University of Mannheim, Department of Computer Science, Technical Report TR-95-036, 1995.
....from road signs [11, 7] license places [5, 1, 6] and library books [3] In [10] Manmatha describes a system which uses texture features, and relies on the detection of strokes from binarized images. Much less work has been done in the extraction of text from video frames. Lienhart and Stuber [8] provides a Split and Merge algorithm to segment text in video frames by recursively splitting image into quadrants and merging adjacent homogeneous regions using assumptions about the text size and line direction. All previous methods rely to some extent on the physical features of characters ....
R. Lienhart and F. Stuber. Automatic text recognition in digital videos. In University of Mannheim, Department of Computer Science, Technical Report TR-95036, 1995.
....in this area has focused on the identification of graphic text. 2. 2 Related Work In related domains there has been work on the extraction of text from road signs [32, 44] license plates [10, 29, 31] library books [19] WWW images [59, 60] scene images [30, 42, 57, 58] and isolated video frames [33, 36, 53]. The methods can be broadly classified into two types. The first is connected component (CC) based. Unlike binary document images, scene images and video frames are usually multivalued (gray scale or colored) and therefore multivalued image decomposition is required before the connected ....
....text detection in digital video, some authors use interframe analysis to add missing characters or to delete incorrectly identified regions. Shim detects text in five consecutive frames and then examines the similarity of text regions in terms of their positions, intensities and shapes [53] In [36] Lienhart extracts text in one frame and then checks if the region corresponds to text in the following frame by using simple region matching. Five frames are used to filter out incorrectly identified text regions. Neither work addresses the problem of tracking text to find temporal correspondence ....
R. Lienhart and F. Stuber. Automatic text recognition in digital videos. In Proceedings of ACM Multimedia, pages 11--20, 1996.
....project (Movie Content Analysis) at the University of Mannheim aims at the automatic analysis of streams of video and audio data. We have developed a workbench to support us in this diOEcult task [LPE96] First results have been achieved in automatic genre recognition [FLE95] text recognition [LS96], video abstracting [PLFE96] and audio content analysis. 1 Humans are well able to recognize the contents of anything seen or heard. Our eyes and ears take in visual and audible stimuli, and our nerves process them. Such processing takes place in dioeerent regions of the brain whose exact ....
....to know, how often a speci c commercial is run in a certain time period on all channels. Provided that all our commercials contain the identifying melody, we simply record all commercials from all channels (commercial recognition and segmentation is easily performed on the picture track [LS96]) digitize them and perform the fuf recognition on the audio tracks. Then, we compare the resulting fuf sequences with the fuf sequences stored in the database. One title would have a signi cantly higher correlation to the queried piece such that we could automatically decide on the corresponding ....
R. Lienhart and F. Stuber. Automatic text recognition in digital videos. In Image and Video Processing IV, Proc. SPIE 2666-20, 1996.
....appears to move as a result of camera motion. INTRODUCTION Text which appears in digital videos can provide important supplemental information for indexing and retrieval. Although extensive work has been done on the extraction of text from images ( 1, 2, 3, 4, 7, 8] and isolated video frames ([5, 6]) very little work has considered the temporal aspects of video. In our previous work, we provided a methodology which identifies both scene and graphic text from low resolution video key frames [5] We have found it necessary, however, to consider temporal changes to detect new information, to ....
....false detections while The support of this research by the Department of Defense under contract MDA 90496C 1250 is gratefully acknowledged. Monitoring Process Tracking Process Figure 1: The scheme for text tracking in digital videos. having multiple instances to provide enhancement for OCR. In [6] Lienhart and Stuber do make use of temporal information to enhance the extraction result, but they do not consider the problem of tracking moving text. This paper is organized as follows: In Section 2 we introduce our tracking scheme. Text element generation is presented in Section 3 and details ....
R. Lienhart and F. Stuber. Automatic text recognition in digital videos. In University of Mannheim, Department of Computer Science, Technical Report TR-95-036, 1995.
....difficulty in determining important parts of the audio and image track. Research to this aim is abundant. In our MoCA (Movie Content Analysis) research project, we have presented algorithms to automatically retrieve content information from the video and audio tracks of digitized films [3] 4] 7] 8][9][13] We have also presented a video abstracting system called VAbstract [14] In this paper, we restrict video abstracting to trailer production and expand the result by other retrieved content information like the title or main actors. The demands on a trailer depend on the video genre involved. ....
....should be contained in the video abstract as well as be available as a 30 t j i x pos y pos s g , t j i g 7 search index over a set of multimedia abstracts. The text segmentation and text recognition algorithms for text appearances in digital videos presented in [7] and [9] are deployed to extract the bitmaps of the different text appearances and to translate their content into ASCII for retrieval purposes. The text segmentation step results in a list of character regions per frame and a list of their motion paths throughout the sequence. These motion paths ....
R. Lienhart and F. Stuber. Automatic Text Recognition in Digital Videos. In Image and Video Processing IV 1996, Proc. SPIE 2666-20, pp. 180-188, Jan. 1996.
....a c k b l a c k feature film 5 3. There exist special editing habits which are frequently used and can be recognized automatically. 4. Often text appears within commercials. The text shows the product or company name and other useful semantic information. It can be identified and evaluated [9][11]. In Section 3 we will show how these features can be computed and how relevant they are for detecting commercials. Legal Regulations In Germany the ratio of commercials to other televised material is regulated by law. The regulations differ for private and public TV stations with the ....
Rainer Lienhart and Frank Stuber. Automatic Text Recognition in Digital Videos. In Image and Video Processing IV 1996, Proc. SPIE 2666-20, pp. 180-188, Jan. 1996.
....burden and perceptual uniformity requirement. The demand for operating on color frames and converting them into a more perceptually uniform color system is one deep insight we gained from our experiments, and constitutes one major difference from the segmentation algorithms described in [6]. The segmentation is performed as follows: In a first preprocessing step the number of different colors used in each video frame is reduced. This transformation does not affect the outline of the characters since characters are assumed to be monochrome and contrasting with their background. ....
....each remaining candidate character region is checked for contrast with its surroundings. If no such contrast, even only a partial one, is found, we conclude that the region cannot belong to a character. Consequently, the region is discarded (Figure 1 (f) In contrast to our previous proposal in [6] we do not apply any width to height ratio constraints to clustered regions. It has turned out that width to height ratio constraints are effective in manual tune ups, but it is impossible to find suitable values coping with all artificial text appearances. The thresholds are either too ....
[Article contains additional citation context not shown here]
Rainer Lienhart and Frank Stuber. Automatic Text Recognition in Digital Videos. In Image and Video Processing IV 1996, Proc. SPIE 2666-20, Jan. 1996.
....Project We are currently investigating new ideas on motion segmentation and a refined second segmentation step. Also, we are developing algorithms that will extract the bitmaps of the text appearances such as the title in feature films for inclusion in our automatic video abstracting system. See [9, 3] and http: www.informatik. uni mannheim.de informatik pi4 projects MoCA Project textSegmentationAndRecognition. html for more details. 5 VisualGREP 5.1 Motivation Any kind of video database retrieval task requires a systematic method to compare and retrieve video sequences. Retrieval based on ....
Rainer Lienhart and Frank Stuber. Automatic text recognition in digital videos. In Image and Video Proc. IV 1996, Proc. SPIE 2666-20, pages 180--188, Jan 1996.
....Figure 1: Abstracting Algorithm 6.2 Status of Implementation VAbstract was implemented in about 1500 lines of ANSI C using the Vista library V2.1.3 [PKL95] PL95] The audio modules are currently still missing. Title extraction must be done by hand, but we are close to an automatic solution [LS96]. Figure 2: User Interface for VAbstract Figure 3: Provider Interface for VAbstract Two example application interfaces were built in about another 2500 lines of Tcl Tk code on top of VAbstract: a user interface assisting a video librarian in selecting a video (see Figure 2) and a provider ....
R. Lienhart and F. Stuber. Automatic text recognition in digital videos. In Image and Video Processing IV, Proc. SPIE 2666-20, 1996.
No context found.
Rainer Lienhart and Frank Stuber, "Automatic text recognition in digital videos", in Image and Video Processing IV 1996, Proc. SPIE 2666-20, 1996.
No context found.
Rainer Lienhart and Frank Stuber, "Automatic text recognition in digital videos", in Image and Video Processing IV 1996.
No context found.
R. Lienhart, F. Stuber (1996) Automatic text recognition in digital videos. In: Proc SPIE 2666:180--188
No context found.
R. Lienhart and F. Stuber. Automatic text recognition in digital videos. In Proceedingsof ACM Multimedia, pages11--20, 1996.
No context found.
R. Lienhart, "Automatic text recognition in digital videos", In Proc. SPIE, Image and Video Processing IV, January 1996.
No context found.
Lienhart R, Stuber F (1996) Automatic text recognition in digital videos. Proceedings of SPIE Image and Video Processing IV 2666: 180-188
No context found.
R. Lienhart and F. Stuber. Automatic text recognition in digital videos. In Proc. of ACM Multimedia 96, pages 11--20, 1996.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC