• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 963
Next 10 →

An Integrated Approach To Document Decomposition And Structural Analysis

by Debashish Niyogi, Sargur N. Srihari , 1996
"... A document image is a visual representation of a paper document, such as a journal article page, a cover page of facsimile transmission, office correspondence, an application form, etc. Document image understanding as a research endeavor consists of developing processes for taking a document through ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
through various representations: from scanned image to semantic representation. This paper describes document decomposition and structural analysis, which constitutes one of the major processes involved in document image understanding. The current state-of-the-art and future directions in the areas

Document decomposition for XML compression: A heuristic approach

by Byron Choi - In DASFAA , 2006
"... Abstract. Sharing of common subtrees has been reported useful not only for XML compression but also for main-memory XML query processing. This method compresses subtrees only when they exhibit identical structure. Even slight irreg-ularities among subtrees dramatically reduce the performance of comp ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
in XML documents. The irregularities are then projected out from the original XML document. We refered this process to as document decomposition. We demonstrated that better compression can be achieved by compressing the decomposed documents sepa-rately. Experimental results demonstrated

Using Linear Algebra for Intelligent Information Retrieval

by Michael W. Berry, Susan T. Dumais - SIAM REVIEW , 1995
"... Currently, most approaches to retrieving textual materials from scientific databases depend on a lexical match between words in users' requests and those in or assigned to documents in a database. Because of the tremendous diversity in the words people use to describe the same document, lexical ..."
Abstract - Cited by 676 (18 self) - Add to MetaCart
, lexical methods are necessarily incomplete and imprecise. Using the singular value decomposition (SVD), one can take advantage of the implicit higher-order structure in the association of terms with documents by determining the SVD of large sparse term by document matrices. Terms and documents represented

Indexing by latent semantic analysis

by Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, Richard Harshman - JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE , 1990
"... A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries. The p ..."
Abstract - Cited by 3779 (35 self) - Add to MetaCart
. The particular technique used is singular-value decomposition, in which a large term by document matrix is decomposed into a set of ca. 100 or-thogonal factors from which the original matrix can be approximated by linear combination. Documents are represented by ca. 100 item vectors of factor weights. Queries

Probabilistic Latent Semantic Indexing

by Thomas Hofmann , 1999
"... Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training corpus of text documents by a generalization of the Expectation Maximization algorithm, the utilized ..."
Abstract - Cited by 1225 (10 self) - Add to MetaCart
Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training corpus of text documents by a generalization of the Expectation Maximization algorithm, the utilized

Unsupervised Learning by Probabilistic Latent Semantic Analysis

by Thomas Hofmann - Machine Learning , 2001
"... Abstract. This paper presents a novel statistical method for factor analysis of binary and count data which is closely related to a technique known as Latent Semantic Analysis. In contrast to the latter method which stems from linear algebra and performs a Singular Value Decomposition of co-occurren ..."
Abstract - Cited by 618 (4 self) - Add to MetaCart
Abstract. This paper presents a novel statistical method for factor analysis of binary and count data which is closely related to a technique known as Latent Semantic Analysis. In contrast to the latter method which stems from linear algebra and performs a Singular Value Decomposition of co

Concept Decompositions for Large Sparse Text Data using Clustering

by Inderjit S. Dhillon, Dharmendra S. Modha - Machine Learning , 2000
"... . Unlabeled document collections are becoming increasingly common and available; mining such data sets represents a major contemporary challenge. Using words as features, text documents are often represented as high-dimensional and sparse vectors--a few thousand dimensions and a sparsity of 95 to 99 ..."
Abstract - Cited by 407 (27 self) - Add to MetaCart
of document vectors; these decompositions are obtained by taking the least-squares approximation onto the linear subspace spanned...

The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity

by David Cohn, Thomas Hofmann , 2001
"... We describe a joint probabilistic model for modeling the contents and inter-connectivity of document collections such as sets of web pages or research paper archives. The model is based on a probabilistic factor decomposition and allows identifying principal topics of the collection as well as autho ..."
Abstract - Cited by 218 (3 self) - Add to MetaCart
We describe a joint probabilistic model for modeling the contents and inter-connectivity of document collections such as sets of web pages or research paper archives. The model is based on a probabilistic factor decomposition and allows identifying principal topics of the collection as well

Integration analysis of product decompositions

by Thomas U. Pimmler - ASME Conference on Design Theory and Methodology. Minneapolis, MN , 1994
"... This paper describes a methodology for the analysis of product design decompositions. The technique is useful for developing an understanding of the "system engineering " needs which arise because of complex interactions between components of a design. This information can be used to defin ..."
Abstract - Cited by 145 (4 self) - Add to MetaCart
to define the product architecture and to organize the development teams. The method involves three steps: 1) decomposition of the system into elements, 2) documentation of the interactions between the elements, and 3) clustering the elements into architectural and team chunks. By using this approach

Document Page Decomposition by the Bounding-Box Projection Technique

by Jaekyu Ha - IEEE Transactions on Systems, Man, and Cybernetics , 1995
"... This paper describes a method for extracting words, textlines and text blocks by analyzing the spatial con-figuration of bounding boxes of connected components on a given document image. The basic idea is that connected components of black pixels can be used as computational units in document image ..."
Abstract - Cited by 23 (1 self) - Add to MetaCart
-block segmentation. In the last step of segmentation, the document decom-position hierarchy is produced from these segmented objects. 1
Next 10 →
Results 1 - 10 of 963
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University