39 citations found. Retrieving documents...
G. Salton. A vector space model for information retrieval. CACM, 18(11):613--620, 1975.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

PlanetP: Using Gossiping to Build Content.. - Cuenca-Acuna.. (2003)   (1 citation)  (Correct)

....(IDF) to balance: a) the fact that terms frequently used in a document are likely important to describe its meaning, and (b) terms that appear in many documents in a collection are not useful for differentiating between these documents. There are several accepted ways of implementing TFxIDF [21]. In our work, we adopt the following system of equations from [26] IDF t = log(1 NC =f t ) wD;t = 1 log(f D;t ) w Q;t = IDF t where NC is the number of documents in the collection, f t is the number of times that term t appears in the collection, and fD;t is the number of times term t ....

G. Salton, A. Wang, and C. Yang. A Vector Space Model for Information Retrieval. In Journal of the American Society for Information Science, volume 18, pages 613--620, 1975.


PlanetP: Using Gossiping to Build Content.. - Cuenca-Acuna.. (2002)   (1 citation)  (Correct)

....(IDF) to balance: a) the fact that terms frequently used in a document are likely important to describe its meaning, and (b) terms that appear in many documents in a collection are not useful for differentiating between these documents. There are several accepted ways of implementing TFxIDF [22]. In our work, we adopt the system of equations from [27] leading to a similarity measure of Sim(Q; D) P t2Q IDF t (1 log(f D;t ) jDj (1) IDF t = log(1 ND =f t ) 2) where Q is the query, D is a document, jDj is the number of terms in D, ND is the number of documents in the ....

G. Salton, A. Wang, and C. Yang. A vector space model for information retrieval. In Journal of the American Society for Information Science, volume 18, pages 613--620, 1975.


Collaborative Management of Global Directories in P2P.. - Peery, Cuenca-Acuna.. (2002)   (Correct)

....the location, rank, and retrieval of published documents whose content matches a particular query. While WayFinder is not critically dependent on PlanetP, it does leverage PlanetP s particular strength: a globally ranked content search capability based on the TFxIDF vectorspace ranking algorithm [23]. This capability not only al goat.txt cat.txt Foo Bar dog.txt Tmp cat.txt 2 cat.txt 1 goat.txt (b) a) Tmp Foo Tmp Bar dog.txt cat.txt Figure 1: a, b) Two local directory structures, and (c) the merged view in the global directory structure. low WayFinder to implement semantic ....

G. Salton, A. Wang, and C. Yang. A vector space model for information retrieval. In Journal of the American Society for Information Science, 1975.


IIT at TREC-10 - Aljlayl, Beitzel, Jensen, al. (2001)   (1 citation)  (Correct)

....use of entity tagging for query terms. In the last section, we present our TREC 10 ad hoc results including some of our results from fusion. 2.1 Query Processing Many different strategies are used to improve the overall effectiveness of an IR system. Several examples are automatic term weighting [1, 2] and relevance feedback [3] Phrases are frequently suggested as a means for improving the precision of an IR system. Prior research with phrases has shown that weighting phrases as importantly as terms can cause query drift [5] and a reduction in precision. To reduce query drift, static weighting ....

Salton G., C. Yang and A. Wong. "A vector-space model for information retrieval", Comm. of the ACM, 18, 1975.


PlanetP: Using Gossiping to Build Content.. - Cuenca-Acuna.. (2002)   (1 citation)  (Correct)

....frequently used in a document are likely important to describe its meaning, and (b) terms that appear in many documents in a collection are not useful for differentiating between these 5 documents for a particular query. Existing literature includes several ways of implementing the TFxIDF rule [28]. In our work, we adopt the following system of equations as suggested by Witten et al. 31] Z t qhb t o y L t qhx t o y wg t q[Z where is the number of documents in the collection, t is the number of times that term appears in the collection, and t is the number of ....

G. Salton, A. Wang, and C. Yang. A vector space model for information retrieval. In Journal of the American Society for Information Science, volume 18, pages 613--620, 1975.


Text-Based Content Search and Retrieval in ad hoc P2P.. - Cuenca-Acuna, Nguyen (2002)   (3 citations)  (Correct)

....approach for P2P information sharing in Section 2. Thus, the problem that we focus on is how to perform textbased content search and retrieval using the index summaries that PlanetP uses. We have adopted a vector space ranking model, using the TFxlDF algorithm suggested by Salton et al. [24], because it is one of the currently most successful text based ranking algorithm [28] is abstractly represented as a vector, where each dimension is associated with a distinct word. The value of each com ponent of the vector represents the importance of that word (typically referred to as the ....

....that we are addressing in this paper is how to search for and retrieve documents relevant to a query posed by some member of a PlanetP community. Given a collection of text documents, the problem of retrieving the subset that is relevant to a particular query has been studied extensively (e.g. [24, 22]) Currently, one of the most successful techniques for addressing this problem is the vector space ranking model [24] Thus, we decided to adapt this technique for use in PlanetP. In this section, we first briefly provide some background on vector space based document ranking, then we present our ....

[Article contains additional citation context not shown here]

G. Salton, A. Wang, and C. Yang. A vector space model for information retrieval. In Journal of the American Society for Information Science, volume 18, pages 613520, 1975.


Text-Based Content Search and Retrieval in ad hoc P2P.. - Cuenca-Acuna, Nguyen (2002)   (3 citations)  (Correct)

....approach for P2P informationsharing in Section 2. Thus, the problem that we focus on is how to perform text based content search and retrieval using the index summaries that PlanetP uses. We have adopted a vector space ranking model, using the TFxIDF algorithm suggested by Salton et al. [22], because it is one of the currently most successful text based ranking algorithm [26] Under this model, a query is comprised of a set of terms. For each document in the collection, TFxIDF uses the frequency of each query term in that document and the frequency of We say currently because we ....

....that we are addressing in this paper is how to search for and retrieve documents relevant to a query posed by some member of a PlanetP community. Given a collection of text documents, the problem of retrieving the subset that is relevant to a particular query has been studied extensively (e.g. [22, 20]) Currently, one of the most successful techniques for addressing this problem is the vector space ranking model [22] Thus, we decided to adapt this technique for use in PlanetP. In this section, we first briefly provide some background on vector space based document ranking, then we present our ....

[Article contains additional citation context not shown here]

G. Salton, A. Wang, and C. Yang. A vector space model for information retrieval. In Journal of the American Society for Information Science, volume 18, pages 613--620, 1975.


PlanetP: Using Gossiping to Build Content.. - Cuenca-Acuna.. (2002)   (1 citation)  (Correct)

....frequently used in a document are likely important to describe its meaning, and (b) terms that appear in many documents in a collection are not useful for differentiating between these documents for a particular query. Existing literature includes several ways of implementing the TFxlDF rule [28]. In our work, we adopt the following system of equations as suggested by Wit ten et al. 31 ] tt) t = it) WD,t = 1 log(fD,t) wQ,t = IDFt where N is the number of documents in the collection, ft is the number of times that term t appears in the col lection, and fD,t is the number of ....

G. Salton, A. Wang, and C. Yang. A vector space model for information retrieval. In Journal of the American Society for Information Science, volume 18, pages 613-620, 1975.


Subspace Representations of Unstructured Text - Holt   (Correct)

....and abbreviation expansion, word stemming, spelling normalization, or from a list of synonyms of words found in the document. While terms typically consist of letters, a term may also contain numbers and symbols such as hyphens or slashes, depending on the application. A vector space model [17, 15] of a text data collection begins conceptually with a sparse non negative raw count matrix D, in which each of the rows corresponds to a term, each of the columns corresponds to a document in the collection, and the ij entry of D is the raw count of the occurrences of the i term in the j ....

G. Salton, A. Wong, and C.S. Yang. A vector space model for Information Retrieval. Journal for the American Society for Information Retrieval, 18(11):613--620, 1975. 12


Text-Based Content Search and Retrieval in ad hoc P2P.. - Cuenca-Acuna, Nguyen (2002)   (3 citations)  (Correct)

....approach for P2P information sharing in Section 2. Thus, the problem that we focus on is how to perform textbased content search and retrieval using the index summaries that PlanetP uses. We have adopted a vector space ranking model, using the TFxIDF algorithm suggested by Salton et al. [24], because it is one of the currently most successful text based ranking algorithm [28] is associated with a distinct word. The value of each component of the vector represents the importance of that word (typically referred to as the weight of the word) to that document or query. Given a query, ....

....that we are addressing in this paper is how to search for and retrieve documents relevant to a query posed by some member of a PlanetP community. Given a collection of text documents, the problem of retrieving the subset that is relevant to a particular query has been studied extensively (e.g. [24, 22]) Currently, one of the most successful techniques for addressing this problem is the vector space ranking model [24] Thus, we decided to adapt this technique for use in PlanetP. In this section, we first briefly provide some background on vector space based document ranking, then we present our ....

[Article contains additional citation context not shown here]

G. Salton, A. Wang, and C. Yang. A vector space model for information retrieval. In Journal of the American Society for Information Science, volume 18,


Latent Semantic Kernels - Cristianini (2001)   (10 citations)  (Correct)

....rest of the paper we will use P to refer to the matrix defining the VSM. We will describe a number of different models in each case showing how an appropriate choice of P realises it as VSM. 6 Basic Vector Space Model The Basic Vector Space Model (BVSM) was introduced in 1975 by Salton et al. [15] (and used as a kernel by Joachims [10] and uses the vector representation with no further mapping. In other words the VSM matrix P = I in this case. The performance of retrieval systems based on such a simple representation is surprisingly good. Since the representation of each document as a ....

G. Salton, A. Wang, and C.S. Yang. A vector space model for information retrieval. Journal of the American Society for Information Science, 18:613--620, 1975.


Expressing User Profiles for Data Recharging - Cherniack, Franklin, Zdonik (2001)   (15 citations)  (Correct)

....Boolean operators (e.g. And, Or, Not) and an exact match semantics is used a document either satisfies the predicate or not. Similarity based models use a fuzzy match semantics in which the profiles and documents are assigned a similarity value (typically based on a vector space model [12]) A document whose similarity to a profile is above a certain threshold is said to match that profile. The Stanford Information Filtering Tool (SIFT) 13] is a well known content based text filtering system for Internet news articles. With the advent of XML, filtering of Web documents based on ....

G. Salton, C. S. Yang, and A. Wong, "A Vector Space Model for Information Retrieval," Commun. ACM, vol. 18, 1975.


On The Design Of Reliable Efficient Information Systems - Chowdhury   (Correct)

....to count overlap is eliminated, reducing the overall runtime. The authors, however, noted that DSC SS does not work well for short documents so no runtime results are reported [57] Approaches that compute document to document similarity measures [63] are similar to document clustering work [64] in that they use similarity computations to group potentially duplicate documents. All pairs of documents are compared. A document to document similarity comparison approach is thus computationally prohibitive given the O(d 2 ) runtime, where d is the number of documents. In reality, these ....

Salton G., Yang C.S., Wong A.. "A Vector-Space Model for Information Retrieval", Comm. of the ACM, 18, 1975.


An Evaluation of Linguistically-motivated Indexing.. - Arampatzis, van der.. (2000)   (4 citations)  (Correct)

....recall precision and average precision on a dataset from the Reuters 21578 text categorization test collection. Next we discuss in more detail the methods used, the dataset and pre processing applied to it, and evaluation measures. 4. 1 The Vector Space Model In the Vector Space Model [15] documents are represented as vectors of weights: D i = #d i1 , d i2 , d ik , d i N # (1) where d ik is the weight of the kth indexing term in the i th document, and N is a the number of indexing terms being used. Indexing terms are assumed to be stochastically independent. An indexing ....

Gerard Salton. A Vector Space Model for Information Retrieval. Communications of the ACM, 18(11):613--620, November 1975.


Study on New Term Weighting Method and New Vector Space.. - Takao, Ogata, Ariki (2000)   (Correct)

....errors Table 5: Dictation result( Corr Acc 19980820 12:00NHK 77.83 75.57 19980820 23:00NHK 77.42 75.43 19980824 12:00NHK 76.46 73.74 19980825 12:00NHK 77.81 73.94 19980826 12:00NHK 78.91 76.24 Total 78.16 75.45 4. VECTOR SPACE MODEL We describe a method based on vector space model [13] [15] in this section, proposing a new term weighting method. 4.1. Procedure Based on Vector Space Model The procedure based on vector space model in spoken document retrieval by document is described as follows as shown in Fig.1. The documents are Japanese broadcast news. 1. Speech ....

....in both document t k and t l is reserved. If cos nearly equals 1, the similarity between any two spoken documents is regarded as high. 5. LATENT SEMANTIC INDEXING (WORD BY DOCUMENT) Latent Semantic Indexing proposed in the technical paper [17] is used as wide as vector space model proposed in [13] [15] This method is described as follows. Let X denote any rectangular matrix, with w Theta d dimension, whose row and column correspond to words and documents respectively. The element (i,j) of this matrix X presents how many word w i appears in the document d j . This matrix X is called ....

G.Salton, A.Wong and C.S.Yang, "A Vector Space Model for Information Retrieval", Journal of the ASIS, pp.613-620, November 1975.


On the Design and Evaluation of a.. - McCabe, Lee.. (2000)   (Correct)

....using a subset of 1,827 TREC documents. Future work includes scaling to larger collections. 2 Prior Work OLAP and MDB effectively analyze large collections of structured data [1] Likewise, Information Retrieval (IR) succeeds in searching unstructured text and returning ranked lists of documents [2, 3, 4]. Finally, work exists integrating searches of structured and unstructured data [5, 6, 7] However those efforts do not take advantage of the hierarchical nature of structured data nor of hierarchies in the text. Multidimensional IR makes use of such hierarchies and allows the user a new kind of ....

G. Salton, C.S. Yang and A. Wong. A vector-space model for information retrieval, Comm. of the ACM, 18, 1975.


The Role of Knowledge in Conceptual Retrieval: A Study in.. - Lin, Demner-Fushman (2006)   (Correct)

No context found.

G. Salton. A vector space model for information retrieval. CACM, 18(11):613--620, 1975.


An Agent for Web Information Dissemination - Based On Genetic (2003)   (Correct)

No context found.

Salton, G; Wong, A.; Yang, C. S. A vector space model for information retrieval. Communications of the ACM, 18(11):613-620, November 1975.


Improving Rocchio with weakly supervised clustering - Vinot, Yvon   (Correct)

No context found.

Gerard Salton, A. Wong, and C.S. Yang. A vector space model for information retrieval. Communications of the ACM, 18(11):613-620, November 1975.


On Scaling Latent Semantic Indexing for Large Peer-To-Peer.. - Tang, Dwarkadas, Xu (2004)   (2 citations)  (Correct)

No context found.

G. Salton, A. Wong, and C. Yang. A vector space model for information retrieval. Journal for the American Society for Information Retrieval, 18(11):613--620, 1975.


Hybrid Global-Local Indexing for Efficient Peer-To-Peer.. - Tang, Dwarkadas (2004)   (6 citations)  (Correct)

No context found.

G. Salton, A. Wong, and C. Yang. A Vector Space Model for Information Retrieval. Journal for the American Society for Information Retrieval, 18(11):613--620, 1975.


A Dempster-Shafer Model for Document Retrieval using Noun - Phr As Es   (Correct)

No context found.

G. Salton, A. Wong, and C. S. Yang. A vector space model for information retrieval. Communications of the ACM, 18(11):613--620, Nov. 1975.


PlanetP: Using Gossiping to Build Content.. - Cuenca-Acuna.. (2003)   (1 citation)  (Correct)

No context found.

G. Salton, A. Wang, and C. Yang. A Vector Space Model for Information Retrieval. In Journal of the American Society for Information Science, volume 18, pages 613--620, 1975.


Identifying Topics for Web Documents Through Fuzzy.. - Haruechaiyasak, Shyu.. (2002)   (Correct)

No context found.

G. Salton, A. Wong and C. S. Yang, A Vector-Space Model for Information Retrieval, Comm. of the ACM ###### (1975) 613-620.


Dynamic Term Selection in Learning a Query from Examples - Emilia Stoica David (2000)   (Correct)

No context found.

Salton, G., Wong, A. & Yang, C. (1975). A vector space model for information retrieval.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC