81 citations found. Retrieving documents...
G. Salton, "Developments in Automatic Text Retrieval", Science, Vol. 253, pages 974979, 1991.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

New Techniques In Intelligent Information Filtering - Macskassy (2003)   (Correct)

....value for an example is the set of tokens that are present in that field for this example. The information retrieval community has similarly spent many years developing robust retrieval methods applicable to many retrieval tasks concerning text containing documents, with vector space methods [Sal91] being the best known examples of techniques in this area. Although for many years the focus has been primarily on retrieval tasks, here, too, the last decade has seen a significant increase in interest in the use of such methods for text classification tasks. The most common techniques use the ....

Gerard Salton. Developments in automatic text retrieval. Science, 253:974--979, 1991.


A New Family of Online Algorithms for Category Ranking - Crammer, Singer (2002)   (7 citations)  (Correct)

....topics ranked according to their relevance. However, as discussed above, the feedback for each document is the set y and thus the topics for each document in the training corpus are not ranked but rather marked as relevant or non relevant. Each document is represented using the vector space model [9] as vectors in R . We denote a document by its vector representation x 2 R . All the topic ranking algorithms we discuss in this paper use the same mechanism: each algorithm maintains a set of k prototypes, w1 ; w2 ; wk . Analogous to the representation of documents, each ....

G. Salton. Developments in automatic text retrieval. Science, 253:974-980, 1991.


Improving Text Classification by Shrinkage in a.. - McCallum, Rosenfeld, .. (1998)   (55 citations)  (Correct)

....categorizing this text becomes more important. A variety of recent work has demonstrated the success of statistical approaches for learning to classify text documents [ Joachims, 1997; Koller and Sahami, 1997; Yang and Pederson, 1997; Nigam et al. 1998 ] These approaches, such as TFIDF [ Salton, 1991 ] and naive Bayes [ Lewis and Ringuette, 1994 ] typically represent documents as vectors of words, and learn by gathering statistics from the observed frequencies of these words within documents belonging to the different classes. Because they rely on these learned word statistics, these ....

G. Salton. Developments in automatic text retrieval. Science, 253:974--979, 1991.


NuggetMine: Intelligent Groupware for Opportunistically.. - Goecks, Cosley (2002)   (2 citations)  (Correct)

....to a server. The server records the submitter and time of submission. It automatically assigns the nugget a headline; the nugget headline is the page title for URLs or the first few words of the nugget for text data. It also selects relevant keywords for the nugget using Salton s TF IDF algorithm [22] and revisits each URL every few weeks to recompute the keywords. Finally, if the user specified a category or a relationship with another nugget, the server records this information as well. We considered allowing users to manually specify the keywords and headline. We also debated about whether ....

Salton, G. (1991). Developments in Automatic Text Retrieval. Science, Vol 253, pp. 974-97.


A Comparative Study of Topic Identification on Newspaper.. - Brigitte Bigi Armelle   (Correct)

.... K V (4) B XW Y4 F [ZC ] This distance is normalized by its value on an empty cache: G G (5) The decision for assigning one or two labels is made by taking the one or two lowest distances (eq. 5) 3. 3 The TFIDF Classifier The TFIDF classifier [7] represents topics as vectors. Each one is characterized by a set of distinct words Pb Pc is the number of words of the topic and the weight of word . is defined as d 2 2 , where 2 is the term frequency, i.e. the number of times the ....

G. Salton. Developments in automatic text retrieval. Science, 253:974--980, 1991.


Text Categorization with Support Vector Machines: Learning with.. - Joachims (1997)   (357 citations)  (Correct)

....in the respective category. For Pr(wil ) and Pr(wil ) the so called Laplace estimator is used [Joachims, 1997] 4. 2 Rocchio Algorithm This type of classifier is based on the relevance feedback algorithm originally proposed by Rocchio [Rocchio, 1971] for the vector space retrieval model [Salton, 1991]. It has been extensively used for text classification. First, both the normalized document vectors of the positive examples as well as those of the negative examples are summed up. The linear component of the decision rule is then computed as i e( fi[ l[ 4 (19) Rocchio requires that negative ....

Salton, G. (1991). Developments in automatic text retrieval. Science, 253:974-979.


Unsupervised Document Classification Using Sequential.. - Slonim, Friedman, Tishby (2002)   (12 citations)  (Correct)

....Nonetheless, it is interesting to ask what are the performance using di erent representations. Although this is not the focus of this work we performed additional experiments regarding this issue. A well known approach in the context of text classi cation is the tf idf representation [12]. Speci cally, each word count is multiplied by the inverse document frequency of the word, de ned by idf(w) log( D D(w) where D(w) is the number of documents in which w occurred. Rich empirical evidence support the bene ts of using this representation for text processing applications. ....

G. Salton. Developments in Automatic Text Retrieval. Science, Vol. 253, pages 974-980, 1990.


Classifying Financial News with Neural Networks - Calvo (2001)   (1 citation)  (Correct)

....application, the data set, and the performance measures. In section 4 we describe the results, and in section 5, the conclusions and comments on future work. 2. Vector models and Classification. Vector models have been successful in information retrieval (IR) and text classification (TC) Salton [7], Salton [8] Vector models are based on the basic assumption that a document can be represented as a vector, dismissing the order of words and other grammatical issues, and that this representation is a ble to retain enough useful information. In order to reduce the number of distinct terms in ....

Salton, G. (1991). Developments in automatic text retrieval. Science , 253, 974--979.


Intelligent Document Classification - Calvo, Ceccatto (2000)   (Correct)

....predefined categories to text documents. Most techniques used to tackle this problem are based on the assumption that a document can be represented as a vector, dismissing the order of words and other grammatical issues, and that this representation is able to retain enough useful information[8, 9]. Thus, document classification can be thought of as a problem of mapping the vector space corresponding to the input documents to the space of output classes, which allows the use of standard statistical classification methods[1, 2] and machine learning techniques to solve it. The dimension of ....

....and present the results obtained. Finally, in the last section (Section 5) we draw some conclusions. 2 Vector Models, Weighting and Feature Selection As stated in the introduction, most statistical techniques used in TC are based on the assumption that a document can be represented as a vector[8, 9]. Figure 2 shows how a vector model can be constructed in a simple toy example. Instituto 1 0 0 Fisica 1 1 1 Rosario 1 0 0 Laboratorio 0 1 1 queda 0 0 1 piso 0 0 1 D1 D2 D3 Doc 2. Laboratorio de Fisica (classes C1, C3) Doc 1. Instituto de Fisica Rosario (classes C1,C2) Doc 3 El ....

G. Salton. Developments in automatic text retrieval. Science, 253:974--979, 1991.


Visualization Of Knowledge Structures - Chen (2002)   (Correct)

....section, we include a number of useful techniques for detecting and extracting salient elements from unstructured text. Interrelationships between these salient elements will form the basis for the visualization of knowledge structures. 2.1. Information Retrieval Models The vector space model [27] originally developed for information retrieval, is a widely used framework for indexing documents based on term frequencies. In this model, each document d is represented by a vector V of terms t s. Terms are weighted to indicate how important they are in representing the document. The distance ....

....their relative prevalence in the collection. Thematic peaks and valleys in ThemeView produce a simplified representation of the complex content of a document corpus. Figure 7. Valleys and peaks in ThemeView. Pacific Northwest National Laboratory) The SPIRE used the classic vector space model [27]. The greatest advantage of the ThemeView approach over the use of traditional high dimensional vector spaces is that the user is able to establish connections easily between the construction and the final visualization. In particular, their procedure usually results in 300 500 nouns to be ....

Salton, G. Developments in automatic text retrieval. Science, 253. 974-980.


A Probabilistic Analysis of the Rocchio Algorithm with .. - Thorsten Joachims..   (Correct)

No context found.

G. Salton, "Developments in Automatic Text Retrieval", Science, Vol. 253, pages 974979, 1991.


The Maximum-Margin Approach to Learning Text Classifiers -.. - Joachims (2000)   (17 citations)  (Correct)

No context found.

Salton, G. (1991). Developments in automatic text retrieval. Science, 253:974-979.


Usable Adaptive Hypermedia Systems - Tsandilas And Schraefel (2004)   (Correct)

No context found.

G. Salton, "Developments in automatic text retrieval", Science, 253, pp. 974#/979, 1991.


SEKT: Semantically Enabled Knowledge Technologies - Data-Driven Change Discovery   (Correct)

No context found.

G. Salton. Developments in automatic text retrieval. 253:974--979, 1991.


Learning Interestingness Measures in Terminology.. - Roche, Azé.. (2004)   (Correct)

No context found.

G. Salton, `Developments in automatic text retrieval', Science, 253, 974--979, (1991).


Automatic Classification of News Articles in Spanish - Beresi, Adeva, Calvo, Ceccatto (2004)   (Correct)

No context found.

Gerald Salton. Developments in automatic text retrieval. Science, Number 253, pages 974-- 979, 1991.


Automatic Categorization of Questions for a Mathematics.. - Williams, Calvo, Bell (2003)   (Correct)

No context found.

Gerald Salton. Developments in automatic text retrieval. Science, (253):974--979, 1991.


Artificial Intelligence 143 (2003) 51--77 - Www Elsevier Com (2001)   (1 citation)  (Correct)

No context found.

G. Salton, Developments in automatic text retrieval, Science 253 (1991) 974--979.


Who Can Claim Complete Abstinence from Peeking at Print Jobs? - Grasso, Meunier (2002)   (Correct)

No context found.

Salton G., Development in Automatic Text Retrieval. Science, 253, 974-979.


Multi-label Semantic Scene Classification - Matthew Boutell Xipeng (2003)   (2 citations)  (Correct)

No context found.

Gerard Salton. Developments in automatic text retrieval. Science, 253:974--980, 1991.


Multilingual Semantic Elaboration in the DOSE platform - Dario Bonino Dip (2004)   (Correct)

No context found.

G. Salton. Developments in automatic text retrieval. Science, 253:974--980, 1991.


DOSE: a Distributed Open Semantic Elaboration Platform - Dario Bonino Fulvio (2004)   (2 citations)  (Correct)

No context found.

G. Salton. Developments in automatic text retrieval. Science, 253:974--980, 1991.


Multi-label Semantic Scene Classification - Matthew Boutell Xipeng (2003)   (2 citations)  (Correct)

No context found.

Gerard Salton. Developments in automatic text retrieval. Science, 253:974--980, 1991.


Modelling Syntactic Context in Automatic Term Extraction - Basili, Pazienza, Zanzotto (2001)   (Correct)

No context found.

G. Salton. Development in automatic text retrieval. Science, 253:974-980, 1991.


Document Clustering using Word Clusters - Via The Information   (Correct)

No context found.

G. Salton. Developments in Automatic Text Retrieval. Science, Vol. 253, pages 974--980, 1990.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC