Results 1 -
5 of
5
A System for Converting PDF Documents into Structured XML format
"... Abstract. We present in this paper a system for converting PDF legacy documents into structured XML format. This conversion system first extracts the different streams contained in PDF files (text, bitmap and vectorial images) and then applies different components in order to express in XML the logi ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
(Show Context)
Abstract. We present in this paper a system for converting PDF legacy documents into structured XML format. This conversion system first extracts the different streams contained in PDF files (text, bitmap and vectorial images) and then applies different components in order to express in XML the logically structured documents. Some of these components are traditional in Document Analysis, other more specific to PDF. We also present a graphical user interface in order to check, correct and validate the analysis of the components. We eventually report on two real user cases where this system was applied on.
Machine Learning for Reading Order Detection in Document Image Understanding
"... Summary. Document image understanding refers to logical and semantic analysis of document images in order to extract information understandable to humans and codify it into machine-readable form. Most of the studies on document image understanding have targeted the specific problem of associating la ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
(Show Context)
Summary. Document image understanding refers to logical and semantic analysis of document images in order to extract information understandable to humans and codify it into machine-readable form. Most of the studies on document image understanding have targeted the specific problem of associating layout components with logical labels, while less attention has been paid to the problem of extracting relationships between logical components, such as cross-references. In this chapter, we investigate the problem of detecting the reading order relationship between components of a logical structure. The domain specific knowledge required for this task is automatically acquired from a set of training examples by applying a machine learning method. The input of the learning method is the description of “chains” of layout components defined by the user. The output is a logical theory which defines two predicates, first to read/1 and succ in reading/2, useful for consistently reconstructing all chains in the training set. Only spatial information on the page layout is exploited for both single and multiple chain reconstruction. The proposed approach has been evaluated on a set of document images processed by the system WISDOM++. 1
Textual Article Clustering in Newspaper Pages
"... In the analysis of a newspaper page an important step is the clustering of various text blocks into logical units, i.e., into articles. We propose three algorithms based on text processing techniques to cluster articles in newspaper pages. Based on the complexity of the three algorithms and experime ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In the analysis of a newspaper page an important step is the clustering of various text blocks into logical units, i.e., into articles. We propose three algorithms based on text processing techniques to cluster articles in newspaper pages. Based on the complexity of the three algorithms and experiment on actual pages from the Italian newspaper L’Adige, we select one of the algorithms as the preferred choice to solve the textual clustering problem. 1
Bidimensional Relations for Reading Order Detection
, 2003
"... We use a propositional language of qualitative rectangle relations to detect the reading order from document images. Document ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We use a propositional language of qualitative rectangle relations to detect the reading order from document images. Document
The Topo-Approach To Spatial Representation And Reasoning
"... Commonsense knowledge about the surrounding physical world and quantitative theories of space, such as metric geometry, can be viewed as two extremes on how human beings relate to space. Qualitative spatial representation and reasoning places itself in between these two approaches. Qualitative spati ..."
Abstract
- Add to MetaCart
(Show Context)
Commonsense knowledge about the surrounding physical world and quantitative theories of space, such as metric geometry, can be viewed as two extremes on how human beings relate to space. Qualitative spatial representation and reasoning places itself in between these two approaches. Qualitative spatial reasoning is a set of high-level theories which abstract from the quantitative details and attempt to mimic the human commonsense knowledge about space as much as possible. Successful approaches to spatial reasoning may impact many application areas of AI, most notably, robotics, computer vision in its broader sense, and natural language processing. In this paper, we briefly overview a modal approach to spatial representation and reasoning, called topoapproach, presented in the PhD thesis "Spatial Reasoning: Theory and Practice," winner of the AI*IA dissertation award for 2003.