Results 1 - 10
of
80
Electronic marking and identification techniques to discourage document copying
- IEEE Journal on Selected Areas in Communications
, 1995
"... Abstracf- Modern computer networks make it possible to distribute documents quickly and economically by electronic means rather than by conventional paper means. However, the widespread adoption of electronic distribution of copyrighted material is currently impeded by the ease of unauthorized copyi ..."
Abstract
-
Cited by 136 (11 self)
- Add to MetaCart
(Show Context)
Abstracf- Modern computer networks make it possible to distribute documents quickly and economically by electronic means rather than by conventional paper means. However, the widespread adoption of electronic distribution of copyrighted material is currently impeded by the ease of unauthorized copying and dissemination. In this paper we propose techniques that discourage unauthorized distribution by embedding each doc-ument with a unique codeword. Our encoding techniques are indiscernible by readers, yet enable us to identify the sanctioned recipient of a document by examination of a recovered docu-ment. We propose three coding methods, describe one in detail, and present experimental results showing that our identification techniques are highly reliable, even after documents have been photocopied. I.
Geometric layout analysis techniques for document image understanding: a review
, 1998
"... Document Image Understanding (DIU) is an interesting research area with a large variety of challenging applications. Researchers have worked from decades on this topic, as witnessed by the scientific literature. The main purpose of the present report is to describe the current status of DIU with par ..."
Abstract
-
Cited by 63 (0 self)
- Add to MetaCart
(Show Context)
Document Image Understanding (DIU) is an interesting research area with a large variety of challenging applications. Researchers have worked from decades on this topic, as witnessed by the scientific literature. The main purpose of the present report is to describe the current status of DIU with particular attention to two subprocesses: document skew angle estimation and page decomposition. Several algorithms proposed in the literature are synthetically described. They are included in a novel classification scheme. Some methods proposed for the evaluation of page decomposition algorithms are described. Critical discussions are reported about the current status of the field and about the open problems. Some considerations about the logical layout analysis are also reported.
Document Structure Analysis Algorithms: A Literature Survey
, 2003
"... Document structure analysis can be regarded as a syntactic analysis problem. The order and containment relations among the physical or logical components of a document page can be described by an ordered tree structure and can be modeled by a tree grammar which describes the page at the component le ..."
Abstract
-
Cited by 56 (0 self)
- Add to MetaCart
Document structure analysis can be regarded as a syntactic analysis problem. The order and containment relations among the physical or logical components of a document page can be described by an ordered tree structure and can be modeled by a tree grammar which describes the page at the component level in terms of regions or blocks. This paper provides a detailed survey of past work on document structure analysis algorithms and summarize the limitations of past approaches. In particular, we survey past work on document physical layout representations and algorithms, document logical structure representations and algorithms, and performance evaluation of document structure analysis algorithms. In the last section, we summarize this work and point out its limitations.
Empirical Performance Evaluation Methodology and Its Application to Page Segmentation Algorithms
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2001
"... this paper, we use the following five-step methodology to quantitatively compare the performance of page segmentation algorithms: 1) First, we create mutually exclusive training and test data sets with groundtruth, 2) we then select a meaningful and computable performance metric, 3) an optimizatio ..."
Abstract
-
Cited by 38 (5 self)
- Add to MetaCart
this paper, we use the following five-step methodology to quantitatively compare the performance of page segmentation algorithms: 1) First, we create mutually exclusive training and test data sets with groundtruth, 2) we then select a meaningful and computable performance metric, 3) an optimization procedure is then used to search automatically for the optimal parameter values of the segmentation algorithms on the training data set, 4) the segmentation algorithms are then evaluated on the test data set, and, finally, 5) a statistical and error analysis is performed to give the statistical significance of the experimental results. In particular, instead of the ad hoc and manual approach typically used in the literature for training algorithms, we pose the automatic training of algorithms as an optimization problem and use the Simplex algorithm to search for the optimal parameter value. A paired-model statistical analysis and an error analysis are then conducted to provide confidence intervals for the experimental results of the algorithms. This methodology is applied to the evaluation of five page segmentation algorithms of which, three are representative research algorithms and the other two are well-known commercial products, on 978 images from the University of Washington III data set. It is found that the performance indices (average textline accuracy) of the Voronoi, Docstrum, and Caere segmentation algorithms are not significantly different from each other, but they are significantly better than that of ScanSoft's segmentation algorithm, which, in turn, is significantly better than that of X-Y cut
Recognition of Cursive Roman Handwriting - Past, Present and Future
- In Proc. 7th Int. Conf. on Document Analysis and Recognition
, 2003
"... This paper review the state of the art in o#-line Roman cursive han dw iting recognition. The input provided to an o#-line han iting recognition system is an image of a digit, aw ord, or - more generally - some text, and the system produces, as output, an ASCII transcription of the input. This taski ..."
Abstract
-
Cited by 37 (10 self)
- Add to MetaCart
(Show Context)
This paper review the state of the art in o#-line Roman cursive han dw iting recognition. The input provided to an o#-line han iting recognition system is an image of a digit, aw ord, or - more generally - some text, and the system produces, as output, an ASCII transcription of the input. This taskinvolves a number of processing steps, some of w ich are quite di#cult. Typically, preprocessing, normalization, feature extraction, classification, and postprocessing operations are required. We'll survey the state of the art, analyze recent trends, and try to identify challenges for future research in this field.
Morphological Analysis of Shapes
- CNLS Newsletter
, 1997
"... We first describe a multiresolutional discretization scheme for the boundary of a planar connected shape, by means of the Haar wavelet transform. This scheme yields a hierarchy of polygons of increasing complexity that adaptively captures the shape's boundary at increasing resolutions. Next, we ..."
Abstract
-
Cited by 34 (1 self)
- Add to MetaCart
(Show Context)
We first describe a multiresolutional discretization scheme for the boundary of a planar connected shape, by means of the Haar wavelet transform. This scheme yields a hierarchy of polygons of increasing complexity that adaptively captures the shape's boundary at increasing resolutions. Next, we present a new morphological transform called the Chordal Axis Transform for planar shapes that leads to a constructive definition of the skeleton of a discretized shape. We describe an efficient, adaptive algorithm based on the Constrained Delaunay Triangulation of polygons for skeletonizing the shape. The resulting skeleton is connected and faithfully captures the structure of the shape. Finally, we obtain a simple quantitative descriptor of the morphology of a discretized shape based on its Delaunay triangulation, and introduce the concept of morphological congruence. This leads to a graph theoretic characterization of the nature of structural differences between shapes that are morphologicall...
Document identification for copyright protection using Centroid detection
- IEEE Trans. Commun
, 1998
"... Abstract — A way to discourage illicit reproduction of copy-righted or sensitive documents is to watermark each copy before distribution. A unique mark is embedded in the text whose recipient is registered. The mark can be extracted from a possibly noisy illicit copy, identifying the registered reci ..."
Abstract
-
Cited by 29 (4 self)
- Add to MetaCart
(Show Context)
Abstract — A way to discourage illicit reproduction of copy-righted or sensitive documents is to watermark each copy before distribution. A unique mark is embedded in the text whose recipient is registered. The mark can be extracted from a possibly noisy illicit copy, identifying the registered recipient. Most image marking techniques are vulnerable to binarization attack and, hence, not suitable for text marking. We propose a different approach where a text document is marked by shifting certain text lines slightly up or down or words slightly left or right from their original positions. The shifting pattern constitutes the mark and is different on different copies. In this paper we develop and evaluate a method to detect such minute shifts. We de-scribe a marking and identification prototype that implements the proposed method. We present preliminary experimental results which suggest that centroid detection performs remarkably well on line shifts even in the presence of severe distortions introduced by printing, photocopying, scanning, and facsimile transmission. Index Terms — Centroid detection, centroid noise, document marking, image processing copyright, text watermarking. I.
Performance Comparison of Two Text Marking Methods
- IEEE Journal on Selected Areas in Communications
, 1998
"... A text document typically consists of a collection of regular structures such as words, lines and paragraphs, a slight movement of which seems less perceptible than, say, dithering of the document image. In this paper we exploit this property to watermark formatted text documents by shifting slightl ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
A text document typically consists of a collection of regular structures such as words, lines and paragraphs, a slight movement of which seems less perceptible than, say, dithering of the document image. In this paper we exploit this property to watermark formatted text documents by shifting slightly certain lines and words, in order to discourage illicit distribution. We analyze two methods for reliable document identification in the presence of severe distortions introduced by photocopying, facsimile transmission and other processing. The correlation method uses document profiles directly for detection. To eliminate the effect of certain distortions, the centroid method bases its decision on the distances between the centroids of adjacent profile blocks. We present the maximum likelihood detectors for both methods and evaluate their relative performance. Our analysis indicates that line-shift generally has a smaller error than word-shift detection, and that the correlation detector o...
Computing with Graphs and Graph Rewriting
- FACHGRUPPE INFORMATIK, RWTH
, 1997
"... Graphs are a popular data structure. Programmers are faced with the need to represent, inspect, modify, display and recognize graphs. In this paper we describe a systematic approach to graph modification, using graph rewrite rules. Graph rewrite rules replace one subgraph by another subgraph. In ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
Graphs are a popular data structure. Programmers are faced with the need to represent, inspect, modify, display and recognize graphs. In this paper we describe a systematic approach to graph modification, using graph rewrite rules. Graph rewrite rules replace one subgraph by another subgraph. In other words, a graph rewrite rule specifies how, and under what conditions, to replace one piece of a graph by another piece. This is an intuitive and useful generalization of string rewriting. We illustrate these ideas using a sampling of applications: precedence-network creation, visual language editing, and recognition of mathematics notation. Graph rewriting examples are written in PROGRES, a programming language with both visual and textual elements.
Geometric Structure Analysis of Document Images: A Knowledge-Based Approach // IEEE Transactions on Pattern Analysis and Machine Intelligence. - 2000
- Information Processing & Management
, 1999
"... AbstractÐGeometric structure analysis is a prerequisite to create electronic documents from logical components extracted from document images. This paper presents a knowledge-based method for sophisticated geometric structure analysis of technical journal pages. The proposed knowledge base encodes g ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
(Show Context)
AbstractÐGeometric structure analysis is a prerequisite to create electronic documents from logical components extracted from document images. This paper presents a knowledge-based method for sophisticated geometric structure analysis of technical journal pages. The proposed knowledge base encodes geometric characteristics that are not only common in technical journals but also publication-specific in the form of rules. The method takes the hybrid of top-down and bottom-up techniques and consists of two phases: region segmentation and identification. Generally, the result of the segmentation process does not have a one-to-one matching with composite layout components. Therefore, the proposed method identifies nontext objects, such as images, drawings, and tables, as well as text objects, such as text lines and equations, by splitting or grouping segmented regions into composite layout components. Experimental results with 372 images scanned from the IEEE Transactions on Pattern Analysis and Machine Intelligence show that the proposed method has performed geometric structure analysis successfully on more than 99 percent of the test images, resulting in impressive performance compared with previous works. Index TermsÐDocument image analysis, geometric structure analysis, region segmentation, region identification, knowledge-based approach. æ 1