20 citations found. Retrieving documents...
D. Rus and K. Summers, "Using white space for automated document structuring," Technical Report TR 94 - 1452, Department of Computer Science, Cornell University, 1994.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Automatic Hypertext Link Typing - Allan (1996)   (17 citations)  (Correct)

....citation to the cited work, are all structural links. We include these with pattern matching links because they are typically recognized by mark up codes that are already embedded in the text. Even when a document is not marked up, structure is usually approximated using pattern analysis. [9] Thistlewaite has shown that when they are applied carefully and efficiently, pattern matching methods can be very flexible and powerful. 16] Manual Links Pattern matching links form a class which is somewhat easy to detect automatically. At the extreme opposite end of the spectrum are manual ....

Daniela Rus and Kristen Summers. Using white space for automated document structuring. In N. Adam, B. Bhargava, and Y. Yesha, editors, Advances in digital libraries. Springer-Verlag Lecture Notes in Computer Science, 1995.


Integrating Geometrical and Linguistic Analysis for E-Mail.. - Chen, Hu, Sproat (1999)   (2 citations)  (Correct)

....[G. Nagy and Stoddard 1985; Wang and Srihari 1989; Mizuno et al. 1991] the approach based on maximal white rectangles [Baird et al. 1990; Baird 1992] and some other methods based on the analysis of background white spaces [Pavlidis 1991; Antonacopoulos and Ritchings 1994; Rahgozar et al. 1994; Rus and Summers 1994]. Each of these techniques relies to a different extent on assumptions about the generic document layout structure, particularly, rectangularity of text blocks and white spacing around each block. Unfortunately such assumptions do not always hold in the case of e mail signature blocks. For ....

....size, location and aspect ratio of the block, indentation attributes of block, etc. to distinguish text blocks from images and graphics, or to assign high level labels to text blocks such as titles, captions, paragraphs, itemized lists, tables, etc. Wang and Srihari 1989; Etemad et al. 1997; Rus and Summers 1994; Jain and Bhattacharjee 1992] The features used in these approaches do not always translate to e mail documents. Furthermore, finer logical labels are not obtained by such analysis. In [Porter and Rainero 1992] more details of logical layout structure are recovered using labels provided in a ....

Rus, D. and Summers, K. 1994. Using white space for automated document structuring. Technical Report TR-94-1452, Cornell University.


Retrieval Of Passages For Information Reduction - DANIELS (1997)   (4 citations)  (Correct)

....and 3. retrieval of document elements. The first line of research into breaking a document into smaller components primarily focuses on imagery analysis at the pixel level to aid in distinguishing document elements. Partitioning a document into smaller structure based elements is reported in [61, 62]. Rus work uses agents to search for layout based abstractions such as tables, figures, and paragraphs, and content based abstractions such as theorems, lemmas, and examples. Once located, smaller objects are further examined to see if they meet the information need. As an example, the ....

Rus, Daniela, and Summers, Kristen. Using White Space for Automated Document Structuring. In Digital Libraries: Current Issues. Lecture Notes in Computer Science 916., N. Adam, B. Bhargava, and Y. Yesha, Eds. Springer-Verlag, 1995, pp. 129--162.


The Roles of Video in the Design, Development, and.. - Rebelsky, Makedon.. (1998)   (Correct)

....document, including author, institution, title, length, topic, and keywords (from a designated set of keywords) Unlike keyword based retrieval, this type of retrieval separates the content of the document from knowledge about the document. One might also use some form of layout based retrieval. [Rus, Allan 1995] [Rus, Summer 1995] Rus, Allan 1998] have suggested that some people retrieve information based on a sense of layout of the document (e.g. the document with a graph on the upper left hand corner on page 3 or 5 ) and that electronic document presentation systems might support such retrieval. At ....

....author, institution, title, length, topic, and keywords (from a designated set of keywords) Unlike keyword based retrieval, this type of retrieval separates the content of the document from knowledge about the document. One might also use some form of layout based retrieval. Rus, Allan 1995] [Rus, Summer 1995] [Rus, Allan 1998] have suggested that some people retrieve information based on a sense of layout of the document (e.g. the document with a graph on the upper left hand corner on page 3 or 5 ) and that electronic document presentation systems might support such retrieval. At present only a few ....

Rus, D., Summer, K.: "Using Whitespace for Automated Document Structuring"; Advances in Digital Libraries; Springer-Verlag (1995).


Document Layout Structure Extraction Using Bounding.. - Liang, Ha, Haralick.. (1996)   (1 citation)  (Correct)

....aligned vertically in columns. Each column has a list of item which are left aligned, right aligned, or centered. A single item can span a number of columns. The tabular structure can be identified by analyzing the projection profile of word bounding boxes. In this, our method is different from [6], which detects peaks from a white space density graph. A table image is shown in Figure 3. We extract the text line and word boxes using the method described in Section 3.3. From Figure 3(a) and 3(b) it is clear that the table columns and column separators can be detected by finding the distinct ....

D. Rus and K. Summers, "Using White Space for Automated Document Structuring," Proceedings of the Workshop on Principles of Document Processing, Seeheim, 1994.


The Roles of Video in Interactive Conference Proceedings.. - REBELSKY, Makedon, al. (1995)   (Correct)

....form of layout based retrieval. It has been suggested that some people retrieve information based on a sense of layout of the document (e.g. the document with a graph on the upper left hand corner on page 3 or 5 ) and that electronic document presentation systems might support such retrieval [RA95,RS95]. It would, of course, be useful for an electronic proceedings to include such a retrieval system (modified, of course, to handle the potentially varying dimensions of the electronic page) However, at present, none support that form of retrieval for general queries, although some (particularly ....

D. Rus and K. Summers, "Using Whitespace for Automated Document Structuring." In Advances in Digital Libraries. N. Adam, B. Bhargava, and Y. Yesha (editors). Springer Verlag, 1995.


Retrieval of Passages for Information Reduction - Daniels (1996)   (4 citations)  (Correct)

....than the first. The first line of research into breaking a document into smaller components primarily focuses on imagery analysis and gets down to the pixel level to aid in distinguishing document subelements. Partitioning a document into smaller structure based elements is reported in [RS95b, RS95c] They use agents to search for layout based abstractions such as tables, figures, and paragraphs, and content based abstractions such as theorems, lemmas, and examples. Once located, smaller objects are further examined to see if they meet the information need. An example of their technique ....

Daniela Rus and Kristen Summers. Using White Space for Automated Document Structuring. In N. Adam, B. Bhargava, and Y. Yesha, editors, Digital Libraries: Current Issues. Lecture Notes in Computer Science 916., pages 129--162. SpringerVerlag, 1995.


Document Layout Structure Extraction Using.. - Liang, Ha.. (1996)   (1 citation)  (Correct)

....aligned vertically in columns. Each column has a list of item which are left aligned, right aligned, or centered. A single item can span a number of columns. The tabular structure can be identified by analyzing the projection profile of word bounding boxes. In this, our method is different from [18], which detects peaks from a white space density graph. A table image is shown in Figure 6. We extract the text line and word boxes using the method described in Section 2.3. From Figure 6(a) and 6(b) it is clear that the table columns and column separators can be detected by finding the distinct ....

D. Rus and K. Summers, Using white space for automated document structuring, in: Proceedings of the Workshop on Principles of Document Processing Seeheim, 1994.


Machine Learning for Information Extraction from Online Documents - Freitag (1996)   (2 citations)  (Correct)

....grammar inference to the analysis of document structure is not novel. Ahonen and Mannila, for example, consider the problem of inducing a regular grammar for document structure from examples, where examples are documents such as dictionaries, encyclopedias, and user manuals [3] Rus and Summers [32] attempt to infer trees that represent document structure based on indentation profiles. In the same work, novel methods for detecting tables and figures from whitespace measurements are described. Hidden Markov Models (HMM) 28] perform a kind of stochastic grammar inference. Model merging ....

Daniela Rus and Kristen Summers. Using white space for automated document structuring. Technical Report TR94-1452, Cornell University, 1994.


Does Navigation Require More than One Compass? - Rus, Allan (1996)   (3 citations)  Self-citation (Rus)   (Correct)

....vector space comparison of document passages to determine topic and subtopic structures of a document, based upon or independent of its layout structure. HP93, SA 94, SS94] In our own work, we have introduced robust algorithms with performance guarantees for segmentation as well as classification. [RS95b] Our vision for information access with structure based information agents has been discussed in [RS95] Structure has been identified as being a source of knowledge in other recent work. FM 93] In that study, they described a system that relies on SGML markedup documents to support structured ....

D. Rus and K. Summers. Using whitespace for automated document structuring. To appear in Advances in digital libraries, N. Adam, B. Bhargava, and Y. Yesha, editors. Springer-Verlag, Lecture Notes in Computer Science, 1995.


Toward a Taxonomy of Logical Document Structures - Summers (1995)   (6 citations)  Self-citation (Summers)   (Correct)

....it is possible to convey logical information through formatting. This information is presented as a grammar, and the document layout is parsed in [3, 9, 11, 12] Other approaches of varying degrees of similarity to parsing and based on varying degrees of knowledge specificity, are presented in [6, 7, 16, 18, 21, 22, 23, 24, 25]. Applications Applications of the solution (which may be represented as a separate hierarchy, with pointers to document locations, or as a marked up version of the document) are discussed in [1, 2, 4, 5, 13, 15, 19, 20, 26] These applications include, but are not limited to, those discussed in ....

.... characteristic shape of a hanging indent, but the shape of a table is recognized by the internal shape of its columnization [14] Since geometry involves the shapes formed by the marks on the paper or screen, its contribution can (inversely) be found by an analysis of the white space in a document [21]. 3.2 Marking Observables Marking observables consist of non linguistic marks on the paper or screen; this includes attributes like font type and weight, as well as non alphanumeric symbols, such as bullet points and rule lines. Bullet points and dashes, for instance, can aid in the ....

[Article contains additional citation context not shown here]

Daniela Rus and Kristen Summers. Using white space for automated document structuring. In Proceedings of the Workshop on Principles of Document Processing, Seeheim, 1994.


Toward a Taxonomy of Logical Document Structures - Summers (1995)   (6 citations)  Self-citation (Summers)   (Correct)

....it is possible to convey logical information through formatting. This information is presented as a grammar, and the document layout is parsed in [3, 9, 11, 12] Other approaches of varying degrees of similarity to parsing and based on varying degrees of knowledge specificity, are presented in [6, 7, 16, 18, 21, 22, 23, 24, 25]. Applications Applications of the solution (which may be represented as a separate hierarchy, with pointers to document locations, or as a marked up version of the document) are discussed in [1, 2, 4, 5, 13, 15, 19, 20, 26] These applications include, but are not limited to, those discussed in ....

.... characteristic shape of a hanging indent, but the shape of a table is recognized by the internal shape of its columnization [14] Since geometry involves the shapes formed by the marks on the paper or screen, its contribution can (inversely) be found by an analysis of the white space in a document [21]. 3.2 Marking Observables Marking observables consist of non linguistic marks on the paper or screen; this includes attributes like font type and weight, as well as non alphanumeric symbols, such as bullet points and rule lines. Bullet points and dashes, for instance, can aid in the ....

[Article contains additional citation context not shown here]

Daniela Rus and Kristen Summers. Using white space for automated document structuring. In Proceedings of the Workshop on Principles of Document Processing, Seeheim, 1994.


Customizing Multimedia Information Access - Daniela Rus (1995)   Self-citation (Rus)   (Correct)

....of a finite library of parametric detectors and segmenters. We have developed modules for automatically discovering layout structures in scanned in images of paper documents. This includes detectors for automatically discovering titles, sections, definitions, figures, figure captions, and tables [RSa, RSc]. We are now developing similar filters for video and audio data. The layout filters for paper documents are geometric in nature and work by identifying patterns in the distribution of white space on a page. For example, the table filter identifies candidate columns by looking for vertical rivers ....

D. Rus and K. Summers, Using whitespace for automated document structuring, in Advances in Digital Libraries, LNCS 916, eds. N. Adam, B. Bhargava, and Y. Yesha, 1995.


The Self-Organizing Desk - Rus, de Santis (1997)   Self-citation (Rus)   (Correct)

.... to project on the desk top rather than extract information [AM 95] The self organizing desk draws from progress made in several areas: self organizing systems [Koh90, CKP93] information retrieval and organization [Sal91, RA95] robotics and vision [MRR96, HKR93] automated document structuring [TA92, RS95b, NSV92], and user interfaces [CAC93] 5 Discussion We have described a system that implements the self organization metaphor to enhance a physical space with electronic information. This system uses a number of key technologies in robotics, computer vision, OCR, information retrieval, filtering, and ....

D. Rus and K. Summers. Using whitespace for automated document structuring. Advances in digital libraries, eds. N. Adam, B. Bhargava, and Y. Yesha, Springer-Verlag, LNCS 916, 1995.


Does Navigation Require More than One Compass? - Rus, Allan (1996)   (3 citations)  Self-citation (Rus)   (Correct)

....vector space comparison of document passages to determine topic and subtopic structures of a document, based upon or independent of its layout structure. HP93, SABS94, SS94] In our own work, we have introduced robust algorithms with performance guarantees for segmentation as well as classification. [RS95b] Our vision for information access with structure based information agents has been discussed in [RS95] Structure has been identified as being a source of knowledge in other recent work. FMSW93] In that study, they described a system that relies on SGML marked up documents to support structured ....

D. Rus and K. Summers. Using whitespace for automated document structuring. Advances in digital libraries, N. Adam, B. Bhargava, and Y. Yesha, editors. Springer-Verlag, Lecture Notes in Computer Science 916, 1995.


Information Retrieval, Information Structure, and Information .. - Rus, Subramanian (1995)   Self-citation (Rus)   (Correct)

....relevant and irrelevant sets. We illustrate the idea of segmenters with a general algorithm that can partition two dimensional documents with arbitrary layout. A special case of the algorithm presented here has been implemented and tested in the context of the Cornell technical report collection [RSb]. The segmenter in [RSb] automatically synthesizes a logical view of a document by analyzing the geometry of the white spaces in the left and right margins. 2.1.1 An Example: Segmenting Documents Given a pixel array of a document, the segmenter s goal is to partition the document into regions ....

....sets. We illustrate the idea of segmenters with a general algorithm that can partition two dimensional documents with arbitrary layout. A special case of the algorithm presented here has been implemented and tested in the context of the Cornell technical report collection [RSb] The segmenter in [RSb] automatically synthesizes a logical view of a document by analyzing the geometry of the white spaces in the left and right margins. 2.1.1 An Example: Segmenting Documents Given a pixel array of a document, the segmenter s goal is to partition the document into regions that capture its layout. ....

D. Rus and K. Summers, Using whitespace for automated document structuring, in eds. eds. N. Adam, B. Bhargava, and Y. Yesha Advances in digital libraries, Springer-Verlag, Lecture Notes in Computer Science, to appear, 1995.


Table Structure Recognition Based On Robust Block Segmentation - Kieninger (1998)   (8 citations)  (Correct)

No context found.

D. Rus and K. Summers, "Using white space for automated document structuring," Technical Report TR 94 - 1452, Department of Computer Science, Cornell University, 1994.


Layout and Language: Integrating Spatial and Linguistic.. - Hurst, Nasukawa (2000)   (3 citations)  (Correct)

No context found.

Daniela Rus and Kristen Summers. 1994. Using white space for automated document structuring. Technical Report TR94-1452, Cornell University, Department of Computer Science, July.


Layout and Language: An Efficient Algorithm for Detecting Text.. - Hurst   (Correct)

No context found.

D. Rus and K. Summers, \Using white space for automated document structuring," Tech. Rep. TR94-1452, Cornell University, Department of Computer Science, July 1994.


A Digital Library Model With Rich Semantic Structure - Soergel (1998)   (Correct)

No context found.

Rus, D. 1995; Summers, K. Using white space for automated document structuring.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC