| Berry, M.W., Drmac, Z., Jessup, E.R.: Matrices, vector spaces, and information retrieval. SIAM Review 41 (1999) |
....even O(n) time. We nish the chapter with some numerical experiments that verify the ability of the QLP and some of these other methods to provide good condition number estimates. In Chapter 5 we use the QLP to do latent semantic indexing. Following the exposition of Barry, Drma c, and Jessup in [BDJ99], we rst discuss the idea of representing a database as a matrix and performing queries. We then discuss the idea of reducing the rank of the matrix by using the QR factorization or the SVD and point out that the rank reduced representation of the database reveals its semantic content better than ....
....term. More often, instead of ones, weights would be assigned indicating the importance of the term in that document. Having determined the vector for each document in the database, we can represent the database as a t by d term by document matrix. Much of the discussion of this chapter comes from [BDJ99] and [BB99] Consider the following example from [BDJ99] with the given t = 6 terms and d = 5 books: 69 T1: bak(e,ing) T2: recipes T3: bread T4: cake T5: pastr(y,ies) T6: pie D1: How to Bake Bread Without Recipes D2: The Classic Art of Viennese Pastry D3: Numerical Recipes: The Art of ....
[Article contains additional citation context not shown here]
Michael W. Berry, Zlatko Drmac, and Elizabeth R. Jessup. \Matrices, Vector Spaces, and Information Retrieval." SIAM Review, 41(2):335{ 362, 1999.
....achieves both efficiency and determinism through an elegant combination of index placement and query routing. Given a query, PeerSearch only needs to search a small number of servants to identify matching documents. Leveraging the state of the art IR algorithms such as vector space model (VSM) [1] and latent semantic indexing (LSI) 1] PeerSearch represents documents and queries as vectors and measure the similarity between a query and a document as the cosine of the angle between their vector representations. PeerSearch stores a document index in CAN using its vector representation as ....
....through an elegant combination of index placement and query routing. Given a query, PeerSearch only needs to search a small number of servants to identify matching documents. Leveraging the state of the art IR algorithms such as vector space model (VSM) 1] and latent semantic indexing (LSI) [1], PeerSearch represents documents and queries as vectors and measure the similarity between a query and a document as the cosine of the angle between their vector representations. PeerSearch stores a document index in CAN using its vector representation as the coordinates, resulting in that ....
[Article contains additional citation context not shown here]
M. Berry, Z. Drmac, and E. Jessup. Matrices, vector spaces, and information retrieval. SIAM Review, 41(2):335--362, 1999.
....similar profiles and documents are co located. As a result, they can be matched efficiently and accurately without flooding either of them to every computer in the overlay. The document semantics are derived from IR algorithms such as vector space model (VSM) and latent semantic indexing (LSI) [2]. These algorithms represent the semantics of documents as vectors in a Cartesian space. Vector representation of objects is not specific to text. It is used in virtually all current multimedia retrieval systems [10] To efficiently deliver popular documents to a large number of interested ....
....in Section 3 and 4, respectively. Related work is discussed in Section 5. Section 6 concludes the paper. 2. Background pFilter uses eCAN [15] a hierarchical version of CAN [11] to organize a larger number of nodes into a P2P overlay network, and relies on extensions to VSM and LSI [2] to match documents and profiles. 2.1. Content Addressable Network (CAN) Distributed hash table (DHT) systems (e.g. CAN, Chord, Pastry and Tapestry) build an administration free and fault tolerant application level overlay network to provide a hash table interface that maps keys to values. CAN ....
[Article contains additional citation context not shown here]
M. Berry, Z. Drmac, and E. Jessup. Matrices, vector spaces, and information retrieval. SIAM Review, 41(2):335--362, 1999.
....disseminate information. The advantage of adopting a P2P model is that the capability of the system scales when the user population increases. To support accurate full text searches, pFilter adopts state of theart 1R algorithms such as vector space model (VSM) and latent se mantic indexing (LSI) [2]. These algorithms represent documents and queries as vectors, and measure the similarity between a query and a document as the cosine of the angle between their vector representations. To avoid the flooding of either documents or profiles, profiles in the P2P overlay are organized around their ....
....and document dissemination in Section 3 and Section 4 respectively. A discussion is provided in Section 5. Related work is given in Section 6. Section 7 concludes the paper. 2. BACKGROUND pFilter is built on top of eCAN [15] a hierarchical version of CAN) and uses extensions to VSM and LSI [2]. The Cartesian space abstraction of CAN makes it particularly attractive when used to store vector representations of documents generated by IR algorithms such as VSM and LSI. We describe these basic components below. 2.1 DHT Systems and eCAN Recent DHT systems, represented by CAN, Chord, and ....
[Article contains additional citation context not shown here]
M. Berry, Z. Drmac, and E. Jessup. Matrices, vector spaces, and information retrieval. SIAM Review, 41(2):335 362, 1999.
....is that they usually ignore the advanced ranking algorithms devised by the IR community through decades of re nement and evaluation, and thereby rely on naive keyword based searches. Examples of successful IR algorithms include the vector space model (VSM) and latent semantic indexing (LSI) [2]. These algorithms represent documents and queries as vectors, and measure the similarity between a query and a document as the cosine of the angle between their vector representations. Our goal is to build a scalable P2P IR system, pSearch, that has eciency of DHT systems and accuracy of stateof ....
....background information about DHT and IR. Section 3 describes pVSM and pLSI. We discuss pSearch applications in Section 4, present related work in Section 5, and conclude in Section 6. 2. BACKGROUND pSearch is built on eCAN [12] a hierarchical version of CAN) and uses extensions to VSM and LSI [2]. The Cartesian space abstraction of CAN makes it particularly attractive when used to store vector representations of documents generated by IR algorithms such as VSM and LSI. We describe these basic components below. DHT systems and eCAN. Recent DHT systems, represented by CAN, Chord, and ....
[Article contains additional citation context not shown here]
M. Berry, Z. Drmac, and E. Jessup. Matrices, vector spaces, and information retrieval. SIAM Review, 41(2):335-362, 1999.
....achieves both efficiency and determinism through an elegant combination of index placement and query routing. Given a query, PeerSearch only needs to search a small number of servants to identify matching documents. Leveraging the state of the art IR algorithms such as vector space model (VSM) [1] and latent semantic indexing (LSI) 1] PeerSearch represents documents and queries as vectors and measure the similarity between a query and a document as the cosine of the angle between their vector representations. PeerSearch stores a document index in CAN using its vector representation as ....
....through an elegant combination of index placement and query routing. Given a query, PeerSearch only needs to search a small number of servants to identify matching documents. Leveraging the state of the art IR algorithms such as vector space model (VSM) 1] and latent semantic indexing (LSI) [1], PeerSearch represents documents and queries as vectors and measure the similarity between a query and a document as the cosine of the angle between their vector representations. PeerSearch stores a document index in CAN using its vector representation as the coordinates, resulting in that ....
[Article contains additional citation context not shown here]
M. Berry, Z. Drmac, and E. Jessup. Matrices, vector spaces, and information retrieval. SIAM Review, 41(2):335--362, 1999.
....a standard NFS interface. Catalogue: Catalogue in Sedar contains an index of the files based on their semantic vectors (SV) derived from the contents of the files. A semantic vector is a vector of file type specific features extracted from file contents. For instance, the vector space model [9] extracts the term frequency information from text documents and latent semantic indexing [10] use matrix decomposition and truncation to discover the semantic underlying terms and documents. Welsh et.al [11] derive frequency, amplitude, and tempo features from encoded music data. In the ....
Berry, M.W., Z. Drmac, and E.R. Jessup. Matrices, Vector Spaces, and Information Retrieval. in Society for Industrial and Applied Mathematics Review. 1999. San Diego, CA, USA.
No context found.
M.W. Berry, Z. Drmac, and E. Jessup. Matrices, Vector Spaces, and Information Retrieval. SIAM Review, 41:335-362, 1999.
No context found.
Berry, Michael W. Matrices, Vector Spaces, and Information Retrieval
No context found.
Berry, Michael W. Matrices, Vector Spaces, and Information Retrieval
.... as l i j # log 2 f i j g i # 1 # # # log 2 log 2 n # ## p i j # i j # where f i j is the frequency of the ith term in the jth document, p i j is the probability of the ith term occuring in the jth document, and n is the number of documents in the collection [4]. The weighted frequency for each token is then computed by multiplying its local component by its global component. That is, the term by document matrix is defined as M # # m i j where m i j # l i j g i j The aim of using the log entropy weighting scheme is to downweight high frequency ....
M. Berry, Z. Drmac, and E. Jessup. Matrices, Vector Spaces, and Information Retrieval. SIAM Review 41:335-362, 1999.
No context found.
Berry, M.W., Drmac, Z., Jessup, E.R.: Matrices, vector spaces, and information retrieval. SIAM Review 41 (1999)
No context found.
Michael W. Berry, Zlatko Drmac, and Elizabeth R. Jessup. Matrices, vector spaces, and information retrieval. SIAM Review, 41(2):335--362, 1999.
No context found.
M. Berry, Z. Drmac, and E. Jessup. Matrices, vector spaces, and information retrieval. SIAM Review, 41(2):335-362, 1999.
No context found.
Michael W. Berry, Zlatko Drmac, and Elizabeth R. Jessup. Matrices, vector spaces, and information retrieval. SIAM Review, 41(2):335--362, 1999.
No context found.
M. Berry, Z. Drmac, and E. Jessup. Matrices, Vector Spaces, and Information Retrieval. SIAM Review, 41(2):335--362, 1999.
No context found.
M. Berry, Z. Drmac, and E. Jessup. Matrices, vector spaces, and information retrieval. SIAM Review, 41(2):335--362, 1999.
No context found.
M.W. Berry, Z. Drmac, and E.R. Jessup. Matrices, vector spaces, and information retrieval. SIAM Review, 41(2):335-362, 1999.
No context found.
M. Berry, Z. Drmac, and E. Jessup. Matrices, Vector Spaces, and Information Retrieval. SIAM Review, 41(2):335--362, 1999.
No context found.
Berry, M., Z. Drmac and E.R. Jessup. Matrices, Vector Spaces, and Information Retrieval. SIAM Review 41:2. pp. 335-362. 1999.
No context found.
M. Berry, Z. Drmac, and E. Jessup. Matrices, vector spaces, and information retrieval. SIAM Review, 41(2):335--362, 1999.
No context found.
Berry, M. W., Drmac, Z., Jessup, E.R., 1999, Matrices, Vector Spaces, and Information Retrieval, SIAM Review 41/2:335-362.
No context found.
Michael W. Berry, Zlatko Drmac, and Elizabeth R. Jessup. Matrices, vector spaces, and information retrieval. SIAM Review, 41(2):335--362, 1999.
No context found.
M. Berry, Z. Drmac, and E. Jessup. Matrices, vector spaces, and information retrieval. SIAM Review, 41(2):335--362, 1999.
No context found.
M. Berry, Z. Drmac, and E. Jessup. Matrices, vector spaces, and information retrieval. SIAM Review, 41(2):335--362, 1999.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC