A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for .. - Joachims (1996)(Correct)(123 citations)
used for text categorization. Both are based on the bag-of-words representation developed in information
up the other group of algorithms. Based on the same bag-of-words representation, these algorithms use
classifier and offers an explanation for the TFIDF word weighting heuristic. The Rocchio classifier, its www.cs.jhu.edu/~sheppard/cs.605.754/papers/paper3a.ps.gz
One or more of the query terms is very common - only partial results have been returned. Try Google (CiteSeer).
A Comparison of Two Learning Algorithms for Text Categorization - Lewis, Ringuette (1994)(Correct)(94 citations)
Many text categorization systems treat a text as a "bag of words"ignoring the original ordering of the
categories to future documents based on, say, the words they contain. Such an approach can save
categorization systems treat a text as a "bag of words"ignoring the original ordering of the text and www.cs.cmu.edu/afs/cs.cmu.edu/user/mnr/www/papers/categ.ps
Information Extraction from HTML: Application of a General.. - Freitag (1998)(Correct)(65 citations)
to treat each fragment as a mini-document and use a bag-of-words technique, as in document classification.
each fragment as a mini-document and use a bag-of-words technique, as in document classification. There
themselves, such as length (e.g.single character word)word punctuationp sentence punctuation p www.cs.cmu.edu/~dayne/ps/webie.ps.Z
Efficient Filtering of XML Documents for Selective.. - Altinel, Franklin (2000)(Correct)(47 citations)
typically rely on simple keyword matching or "bag of words" information retrieval techniques. The
systems typically use simple keyword matching or "bag of words" Information Retrieval (IR) techniques to
rely on simple keyword matching or "bag of words" information retrieval techniques. The advent of www.cs.umd.edu/users/altinel/./pub/vldb00.pdf
Bag of Words and Word Psitions - Cohen (1995)(Correct)(39 citations)
instead a documents is treated as an unordered "bag of words"However, many modern information
De Raedt, editor, Advances in ILP. IOS Press, 1995. Bag of Words and Word Psitions William W. Cohe
ILP-95, Leuven properties as the order of words in a document. ILP methods offer a potential for portal.research.bell-labs.com/orgs/ssr/people/wcohen/postscript/ilp.ps
Data mining for hypertext: A tutorial survey - Chakrabarti (2000)(Correct)(24 citations)
terms. Therefore they are collectively called bag-of-words models. 2.2 Models for hypertext
documents. 3.1.1 Naive Bayes classification The bag-of-words models readily enable naive Bayes
of text search, such as synonymy, polysemy (a word with more than one meaning)and context www.cs.berkeley.edu/~soumen/doc/kddexp2000/survey.ps
Estimating the Generalization Performance of an SVM Efficiently - Joachims (2000)(Correct)(24 citations)
and the classification task. Here the so called "bag-of-words" representations following the setup of
the classification task. Here the so called "bag-of-words" representations following the setup of Joachims
the setup of Joachims (1998) is used. Each distinct word corresponds to a feature with its TFIDF score www-ai.cs.uni-dortmund.de/DOKUMENTE/joachims_00a.ps.gz
The Role of Unlabeled Data in Supervised Learning - Mitchell (1999)(Correct)(24 citations)
of times the word occurs in the web page. This "bag of words" representation obviously ignores
a web page can be achieved by considering just the words on the web page. Alternatively, the page can be
the page can be classified using only the words on hyperlinks that point to the web page (e.g. www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/mitchell-pubs/iccs99.ps
Relational Learning with Statistical Predicate Invention.. - Craven, Slattery (2001)(Correct)(21 citations)
we describe the commonly used set-of-words and bag-of-words representations for learning text classi
for learning text classi ers. We describe the use of bag-of-words with the Naive Bayes algorithm, which is
allows it to characterize text in terms of word frequencies, whereas its relational component is www.cs.cmu.edu/~sean/papers/mlj.ps
Content-Based Book Recommending Using Learning for Text.. - Mooney, Roy (2000)(Correct)(21 citations)
in each slot is then processed into an unordered bag of words (tokens) and the examples represented as
and the examples represented as a vector of bags of words (one bag for each slot)A book's title
ratings) compared to the thousands of distinct words present in even short descriptive texts (e.g. ftp.cs.utexas.edu/pub/mooney/papers/libra-dl-00.ps.gz
A Machine Learning Approach to Building.. - McCallum, Nigam.. (1999)(Correct)(20 citations)
Thus our Q-function becomes a mapping from a \bag-of-words" to a scalar. We represent the mapping
the relevant distinctions between actions as the words found in the neighborhood of the corresponding
our Q-function becomes a mapping from a \bag-of-words" to a scalar. We represent the mapping using a www.cs.cmu.edu/People/mccallum/papers/cora-ijcai99.ps.gz
Text Segmentation Using Exponential Models - Beeferman, Berger, Lafferty (1997)(Correct)(19 citations)
is more subtle than can be recognized by only a #bag of words" tokenizer and morphological #lter. Word
disposal is a collection of online data #38 million words of Wall Street Journal archives and another
Research Laboratories. 150 million words from selected news broadcasts# annotated with the www.informedia.cs.cmu.edu/html/../documents/BEEFERMAN-emnlp.pdf
A Non-Invasive Learning Approach to Building Web User Profiles - Chan (1999)(Correct)(18 citations)
how phrases can be identified to enrich the common bag-of-words representation for documents (Section
document is usually not retained and hence the name "bag-of-words.However, much research focuses on
can be identified to enrich the common bag-of-words representation for documents (Section 2.2)We cs.fit.edu/~pkc/papers/webkdd99.ps
Text Classification Using WordNet Hypernyms - Scott, Matwin (1998)(Correct)(17 citations)
Rules are also produced with the commonly used bag-of-words representation, incorporating no
text. This text representation, referred to as the bagof -words, is used in most typical approaches to
Text Classification Using WordNet Hypernyms Sam Scott Computer Science Dept. www.ai.sri.com/~harabagi/coling-acl98/acl_work/sscot.ps.gz