Results 1 - 10
of
1,183
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
, 1997
"... The Rocchio relevance feedback algorithm is one of the most popular and widely applied learning methods from information retrieval. Here, a probabilistic analysis of this algorithm is presented in a text categorization framework. The analysis gives theoretical insight into the heuristics used in the ..."
Abstract
-
Cited by 456 (1 self)
- Add to MetaCart
. The results show that the probabilistic algorithms are preferable to the heuristic Rocchio classifier not only because they are more well-founded, but also because they achieve better performance.
Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews
, 2003
"... The web contains a wealth of product reviews, but sifting through them is a daunting task. Ideally, an opinion mining tool would process a set of search results for a given item, generating a list of product attributes (quality, features, etc.) and aggregating opinions about each of them (poor, mixe ..."
Abstract
-
Cited by 453 (0 self)
- Add to MetaCart
, mixed, good). We begin by identifying the unique properties of this problem and develop a method for automatically distinguishing between positive and negative reviews. Our classifier draws on information retrieval techniques for feature extraction and scoring, and the results for various metrics
A classification of schema-based matching approaches
- JOURNAL ON DATA SEMANTICS
, 2005
"... Schema/ontology matching is a critical problem in many application domains, such as, semantic web, schema/ontology integration, data warehouses, e-commerce, catalog matching, etc. Many diverse solutions to the matching problem have been proposed so far. In this paper we present a taxonomy of schema- ..."
Abstract
-
Cited by 386 (21 self)
- Add to MetaCart
-based matching techniques that builds on the previous work on classifying schema matching approaches. Some innovations are in introducing new criteria which distinguish between matching techniques relying on diverse semantic clues. In particular, we distinguish between heuristic and formal techniques
Training Algorithms for Linear Text Classifiers
, 1996
"... Systems for text retrieval, routing, categorization and other IR tasks rely heavily on linear classifiers. We propose that two machine learning algorithms, the Widrow-Hoff and EG algorithms, be used in training linear text classifiers. In contrast to most IR methods, theoretical analysis provides pe ..."
Abstract
-
Cited by 276 (12 self)
- Add to MetaCart
Systems for text retrieval, routing, categorization and other IR tasks rely heavily on linear classifiers. We propose that two machine learning algorithms, the Widrow-Hoff and EG algorithms, be used in training linear text classifiers. In contrast to most IR methods, theoretical analysis provides
Boosting and Rocchio Applied to Text Filtering
- In Proceedings of ACM SIGIR
, 1998
"... We discuss two learning algorithms for text filtering: modified Rocchio and a boosting algorithm called AdaBoost. We show how both algorithms can be adapted to maximize any general utility matrix that associates cost (or gain) for each pair of machine prediction and correct label. We first show that ..."
Abstract
-
Cited by 113 (2 self)
- Add to MetaCart
that AdaBoost significantly outperforms another highly effective text filtering algorithm. We then compare AdaBoost and Rocchio over three large text filtering tasks. Overall both algorithms are comparable and are quite effective. AdaBoost produces better classifiers than Rocchio when the training
Support vector machines for spam categorization
- IEEE TRANSACTIONS ON NEURAL NETWORKS
, 1999
"... We study the use of support vector machines (SVM’s) in classifying e-mail as spam or nonspam by comparing it to three other classification algorithms: Ripper, Rocchio, and boosting decision trees. These four algorithms were tested on two different data sets: one data set where the number of features ..."
Abstract
-
Cited by 342 (2 self)
- Add to MetaCart
We study the use of support vector machines (SVM’s) in classifying e-mail as spam or nonspam by comparing it to three other classification algorithms: Ripper, Rocchio, and boosting decision trees. These four algorithms were tested on two different data sets: one data set where the number
Methods and Metrics for Cold-Start Recommendations
- PROCEEDINGS OF THE 25TH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL
, 2002
"... We have developed a method for recommending items that combines content and collaborative data under a single probabilistic framework. We benchmark our algorithm against a nave Bayes classifier on the cold-start problem, where we wish to recommend items that no one in the community has yet rated. We ..."
Abstract
-
Cited by 330 (7 self)
- Add to MetaCart
We have developed a method for recommending items that combines content and collaborative data under a single probabilistic framework. We benchmark our algorithm against a nave Bayes classifier on the cold-start problem, where we wish to recommend items that no one in the community has yet rated
Feature Subset Selection Using A Genetic Algorithm
, 1997
"... : Practical pattern classification and knowledge discovery problems require selection of a subset of attributes or features (from a much larger set) to represent the patterns to be classified. This is due to the fact that the performance of the classifier (usually induced by some learning algorithm) ..."
Abstract
-
Cited by 279 (7 self)
- Add to MetaCart
: Practical pattern classification and knowledge discovery problems require selection of a subset of attributes or features (from a much larger set) to represent the patterns to be classified. This is due to the fact that the performance of the classifier (usually induced by some learning algorithm
A comparison of classifiers and document representations for the routing problem
- ANNUAL ACM CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL - ACM SIGIR
, 1995
"... In this paper, we compare learning techniques based on statistical classification to traditional methods of relevance feedback for the document routing problem. We consider three classification techniques which have decision rules that are derived via explicit error minimization: linear discriminant ..."
Abstract
-
Cited by 196 (2 self)
- Add to MetaCart
discriminant analysis, logistic regression, and neural networks. We demonstrate that the classifiers perform 1015 % better than relevance feedback via Rocchio expansion for the TREC-2 and TREC-3 routing tasks.
Error minimization is difficult in high-dimensional feature spaces because the convergence process
Results 1 - 10
of
1,183