• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 767
Next 10 →

NewsWeeder: Learning to Filter Netnews

by Ken Lang - in Proceedings of the 12th International Machine Learning Conference (ML95 , 1995
"... A significant problem in many information filtering systems is the dependence on the user for the creation and maintenance of a user profile, which describes the user's interests. NewsWeeder is a netnews-filtering system that addresses this problem by letting the user rate his or her interest l ..."
Abstract - Cited by 561 (0 self) - Add to MetaCart
) principle was able to raise the percentage of interesting articles to be shown to users from 14% to 52% on average. Further, this performance significantly outperformed (by 21%) one of the most successful techniques in Information Retrieval (IR), termfrequency /inverse-document-frequency (tf-idf) weighting

Syskill & Webert: Identifying interesting web sites

by Michael Pazzani, Jack Muramatsu, Daniel Billsus - In Proc. 13th Natl. Conf. on Artificial Intelligence , 1998
"... We describe Syskill & Webert, a software agent that learns to rate pages on the World Wide Web (WWW), deciding what pages might interest a user. The user rates explored pages on a three point scale, and Syskill & Webert learns a user profile by analyzing the information on a page. The user p ..."
Abstract - Cited by 353 (5 self) - Add to MetaCart
profile can be used in two ways. First, it can be used to suggest which links a user would be interested in exploring. Second, it can be used to construct a LYCOS query to find pages that would interest a user. We compare four different learning algorithms and TF-IDF, a standard information retrieval

Understanding inverse document frequency: On theoretical arguments for IDF

by Stephen Robertson - Journal of Documentation , 2004
"... The term weighting function known as IDF was proposed in 1972, and has since been extremely widely used, usually as part of a TF*IDF function. It is often described as a heuristic, and many papers have been written (some based on Shannon’s Information Theory) seeking to establish some theoretical ba ..."
Abstract - Cited by 168 (2 self) - Add to MetaCart
The term weighting function known as IDF was proposed in 1972, and has since been extremely widely used, usually as part of a TF*IDF function. It is often described as a heuristic, and many papers have been written (some based on Shannon’s Information Theory) seeking to establish some theoretical

Near Duplicate Image Detection: min-Hash and tf-idf Weighting

by James Philbin, Andrew Zisserman
"... This paper proposes two novel image similarity measures for fast indexing via locality sensitive hashing. The similarity measures are applied and evaluated in the context of near duplicate image detection. The proposed method uses a visual vocabulary of vector quantized local feature descriptors (SI ..."
Abstract - Cited by 130 (1 self) - Add to MetaCart
This paper proposes two novel image similarity measures for fast indexing via locality sensitive hashing. The similarity measures are applied and evaluated in the context of near duplicate image detection. The proposed method uses a visual vocabulary of vector quantized local feature descriptors (SIFT) and for retrieval exploits enhanced min-Hash techniques. Standard min-Hash uses an approximate set intersection between document descriptors was used as a similarity measure. We propose an efficient way of exploiting more sophisticated similarity measures that have proven to be essential in image / particular object retrieval. The proposed similarity measures do not require extra computational effort compared to the original measure. We focus primarily on scalability to very large image and video databases, where fast query processing is necessary. The method requires only a small amount of data need be stored for each image. We demonstrate our method on the TrecVid 2006 data set which contains approximately 146K key frames, and also on challenging the University of Kentucky image retrieval database. 1

Lost in quantization: Improving particular object retrieval in large scale image databases

by James Philbin, Michael Isard, Josef Sivic, Andrew Zisserman - In CVPR , 2008
"... The state of the art in visual object retrieval from large databases is achieved by systems that are inspired by text retrieval. A key component of these approaches is that local regions of images are characterized using high-dimensional descriptors which are then mapped to “visual words ” selected ..."
Abstract - Cited by 253 (8 self) - Add to MetaCart
space. We describe how this representation may be incorporated into a standard tf-idf architecture, and how spatial verification is modified in the case of this soft-assignment. We evaluate our method on the standard Oxford Buildings dataset, and introduce a new dataset for evaluation. Our results

Probabilistic Models for Information Retrieval based on Divergence from Randomness

by Gianni Amati, Cornelis Joost Van Rijsbergen - ACM TRANSACTIONS ON INFORMATION SYSTEMS , 2002
"... We introduce and create a framework for deriving probabilistic models of Information Retrieval. The models are nonparametric models of IR obtained in the language model approach. We derive term-weighting models by measuring the divergence of the actual term distribution from that obtained under a ra ..."
Abstract - Cited by 243 (5 self) - Add to MetaCart
. Results show that our framework produces different nonparametric models forming baseline alternatives to the standard tf-idf model.

Structure and Content Scoring for XML

by Sihem Amer-yahia, Divesh Srivastava, Nick Koudas, David Toman, Amélie Marian - in Proc. of the International Conference on Very Large Databases (VLDB , 2005
"... XML repositories are usually queried both on structure and content. Due to structural heterogeneity of XML, queries are often interpreted approximately and their answers are returned ranked by scores. Computing answer scores in XML is an active area of research that oscillates between pure content s ..."
Abstract - Cited by 52 (9 self) - Add to MetaCart
scoring such as the wellknown tf*idf and taking structure into account. However, none of the existing proposals fully accounts for structure and combines it with content to score query answers. We propose novel XML scoring methods that are inspired by tf*idf and that account for both structure and content

WebMate: A Personal Agent for Browsing and Searching

by Liren Chen, Katia Sycara - In Proceedings of the Second International Conference on Autonomous Agents , 1998
"... The World-Wide Web is developing very fast. Currently, finding useful information on the Web is a time consuming process. In this paper, we present WebMate, an agent that helps users to effectively browse and search the Web. WebMate extends the state of the art in Web-based information retrieval in ..."
Abstract - Cited by 239 (10 self) - Add to MetaCart
in many ways. First, it uses multiple TF-IDF vectors to keep track of user interests in different domains. These domains are automatically learned by WebMate. Second, WebMate uses the Trigger Pair Model to automatically extract keywords for refining document search. Third, during search, the user can

On a robust document classification approach using TF-IDF scheme with learned, context-sensitive semantics.

by Sushain Pandit
"... Document classification is a well-known task in information retrieval domain and relies upon various indexing schemes to map documents into a form that can be consumed by a classification system. Term Frequency-Inverse Document Frequency (TF-IDF) is one such class of term-weighing functions used ext ..."
Abstract - Add to MetaCart
Document classification is a well-known task in information retrieval domain and relies upon various indexing schemes to map documents into a form that can be consumed by a classification system. Term Frequency-Inverse Document Frequency (TF-IDF) is one such class of term-weighing functions used

Using TF-IDF to Determine Word Relevance in Document Queries

by Juan Ramos
"... In this paper, we examine the results of applying Term Frequency Inverse Document Frequency (TF-IDF) to determine what words in a corpus of documents might be more favorable to use in a query. As the term implies, TF-IDF calculates values for each word in a document through an inverse proportion of ..."
Abstract - Cited by 49 (0 self) - Add to MetaCart
In this paper, we examine the results of applying Term Frequency Inverse Document Frequency (TF-IDF) to determine what words in a corpus of documents might be more favorable to use in a query. As the term implies, TF-IDF calculates values for each word in a document through an inverse proportion
Next 10 →
Results 1 - 10 of 767
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University