19 citations found. Retrieving documents...
ROCCHIO, J.J., Document Retrieval Systems: Optimization and Evaluation, 1966, National Science Foundation, Harvard Computation Laboratory.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Dynamic Term Selection in Learning a Query from Examples - Emilia Stoica David (2000)   (Correct)

....weights. Figures 1 3 show the variation of the average precision with the number of terms in the profile and the curve of term weights for three profiles of the Reuters 21578 collection: oat, dfl, and housing. The profiles were constructed from training examples using Rocchio for tem weighting (Rocchio, 1966). After that, they were run as queries on Reuters collection. Profile oat reaches its maximum average oat 0 1 2 3 4 5 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 Term Rank Weight housing 0 0.5 1 1.5 2 2.5 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 Term Rank Weight oat 0 0.2 0.4 0.6 0.8 ....

....method behaves well in 18 out of 36 cases, while the Percentage based method behaves well in 23 out of 36 cases. This is another reason to consider the second method more robust than the first one. 6 Related Work The basic term selection methods proposed in literature assume a vector space (Rocchio, 1966; Ide, 1971; Salton, 1971) or a probabilistic model (Robertson Sparck Jones, 1976; Croft Harper, 1979; Rijsbergen, 1979; Harper, 1980) Regardless of the method, the number of terms in a query is generally fixed; this is either because no benefit has been found in varying the number of terms ....

Rocchio, J. (1966). Document Retrieval Systems--Optimization and Evaluation. Ph.D. thesis, Harvard Computational Laboratory, Cambridge.


Towards Adaptive Web Sites: Conceptual Framework and Case Study - Perkowitz, Etzioni (2000)   (44 citations)  (Correct)

....clustering component which is specialized to cluster mining, but it is possible to adapt traditional clustering algorithms to this problem. In [15] and follow up work, we compared PageGather s clustering component (both PGCLIQUE and PGCC variants) to two standard algorithms: K means clustering [18] and hierarchical agglomerative clustering (HAC) 22] There are literally hundreds of clustering algorithms and variations thereof. We chose K means as it is particularly fast, and HAC as it is widely used. We found that PageGather s clustering component was faster than both HAC and K means and ....

J. Rocchio. Document Retrieval Systems --- Optimization and Evaluation. PhD thesis, Harvard University, 1966.


Boosting for Document Routing - Iyer, Lewis, Schapire, Singer.. (2000)   (4 citations)  (Correct)

....judgments are only a coarse reflection. A training algorithm observes these coarse judgments on many training documents and attempts to produce a model estimating the underlying degree of relevance, which is then used to rank documents. Early vector space retrieval relevance feedback algorithms [12, 11] embody this view, emphasizing the construction of a prototypical relevant vector to which similarity can be measured. In recent years, IR researchers working in both frameworks have used increased computing power to search for models that optimize ranking effectiveness on training data [1, 3, ....

....method A wide range of highly tuned learning algorithms for producing ranking functions have been explored in the TREC routing evaluations. Some of the consistently most successful approaches [2, 21] are multipass optimization algorithms initialized using Rocchio s relevance feedback formula [12, 11]. We therefore compared the effectiveness of RankBoost with the latest in this series of algorithms. This version, which we will call Rocchio QZ DFO here, incorporates dynamic feedback optimization [3] and query zoning [18] The algorithm has 5 parameterized phases and is described in detail ....

J.J. Rocchio. Document Retrieval Systems--Optimization and Evaluation. PhD thesis, Harvard Computational Laboratory, 1966.


Towards adaptive Web sites: Conceptual framework and case study - Perkowitz, Etzioni (2000)   (44 citations)  (Correct)

....from other clusters. Standard clustering algorithms partition the objects into a set of non overlapping clusters. Clustering algorithms are numerous, but two common techniques are Hierarchical Agglomerative Clustering (HAC) 39] which is popular for its high quality output, and K means clustering [33], which is particularly fast. A well known application of clustering algorithms to information browsing is found in Scatter Gather [8] The authors present a system which applies hierarchical clustering to the results of database queries; users can interactively browse M. Perkowitz, O. Etzioni ....

J. Rocchio, Document retrieval systems---Optimization and evaluation, Ph.D. Thesis, Harvard University, Cambridge, MA, 1966.


Adaptive Web Sites: Concept and Case Study - Perkowitz, Etzioni (2001)   (1 citation)  (Correct)

....we applied COBWEB [3] the leading conceptual clustering algorithm, to the task of index page synthesis. Like IndexFinder, COBWEB produces only clusters that are pure and complete with respect to some concept. Note that, unlike IndexFinder and COBWEB, statistical clustering techniques such as [7, 8] and standard data mining algorithms such as [1] do not yield complete and pure index pages of the sort we want to provide to the Webmaster, so we do not include them in this comparison. In separate experiments, we have shown that PageGather (the statistical cluster mining component of ....

J. Rocchio. Document Retrieval Systems --- Optimization and Evaluation. PhD thesis, Harvard University, 1966.


Towards Adaptive Web Sites: Conceptual Framework and Case Study - Perkowitz, Etzioni (2001)   (44 citations)  (Correct)

....from other clusters. Standard clustering algorithms partition the objects into a set of non overlapping clusters. Clustering algorithms are numerous, but two common techniques are Hierarchical Agglomerative Clustering [39] which is popular for its high quality output, and K Means clustering [33], which is particularly fast. A well known application of clustering algorithms to information browsing is found in Scatter Gather [8] The authors present a system which applies hierarchical clustering to the results of database queries; users can interactively browse the cluster hierarchy and ....

J. Rocchio. Document Retrieval Systems --- Optimization and Evaluation. PhD thesis, Harvard University, 1966.


Learning Routing Queries in a Query Zone - Singhal (1997)   (21 citations)  (Correct)

....in predicting relevance of an article. 6, 5, 21] To learn the features and their weights, most routing algorithms usually use the probability of occurrence (or some variation of it) of a feature in the articles marked relevant by a user and the non relevant articles in the training corpus. [22, 14] The central idea of this scheme is that if a feature occurs with a high probability in the relevant articles but with a low probability in the non relevant articles, then it is a good indicator of relevance and should be assigned a high weight in the profile. On the other hand, if a feature ....

....indicate the moving away of the query from the non relevant documents and towards the relevant articles. 2 Rocchio s Algorithm A feedback query creation algorithm developed by Joe Rocchio in the mid 1960 s has, over the years, proven to be one of the most successful profile learning algorithms. [22, 23] Rocchio s algorithm was developed in the framework of the vector space model. 27] The algorithm is based upon the fact that if the relevance for a query is known, an optimal 1 query vector will maximize the average query document similarity for the relevant articles, and will simultaneously ....

J. Rocchio. Document Retrieval Systems--Optimization and Evaluation. PhD thesis, Harvard Computational Laboratory, Cambridge, MA, 1966.


The Application of Classical Information Retrieval Techniques to.. - James (1995)   (24 citations)  (Correct)

....of the relevant and irrelevant documents in the collection as a whole, the system should now be more able to rank relevant documents highly on its next pass. This process is known as relevance feedback and a number of different strategies for its implementation have been tested. Rocchio [38] proposed a method for relevance feedback in which mean vectors were obtained for each of the sets of retrieved relevant and retrieved non relevant documents, and a new query obtained by taking a weighted sum of these vectors with the initial query vector. Formally, with initial query Q 0 , and ....

J. J. Rocchio Jnr. Document retrieval systems --- optimization and evaluation. PhD thesis, Harvard University, 1966.


Adaptive Web Sites: Automatically Synthesizing Web Pages - Perkowitz, Etzioni (1998)   (56 citations)  (Correct)

....a subsequent ten day period. There are literally hundreds of clustering algorithms and variations thereof. To compare PageGather with traditional methods, we picked two widely used document clustering algorithms: hierarchical agglomerative clustering (HAC) Voorhees 1986) and K Means clustering (Rocchio 1966). HAC is probably the most popular document clustering algorithm, but it proved to be quite slow. Subsequently, we chose K Means because it is a linear time algorithm known for its speed. Of course, additional experiments are required to compare PageGather with other clustering algorithms before ....

Rocchio, J. 1966. Document Retrieval Systems --- Optimization and Evaluation. Ph.D. Dissertation, Harvard University.


TextVis: An Integrated Visual Environment for Text Mining - Landau, Feldman.. (1998)   (2 citations)  (Correct)

....document source. It can therefore be subject to further analysis. 3.2. Clustering Any set of documents (either an original set, or the result of any of the other tools) can be clustered by the system. The user may choose from one of four classical clustering algorithms: Hierarchical, k means [12], Buckshot, and Fractionation [3] Following the clustering, the user can select any number of documents, either individual documents or whole clusters, and export them for further analysis. 3.3. Frequent Set Based Tools Frequent Sets [1] are sets of terms that co occur frequently in the ....

Rocchio, J. J.: Document retrieval systems -- optimization and evaluation. Ph.D. Thesis, Harvard University, (1966).


Grouper: A Dynamic Clustering Interface to Web Search Results - Zamir, Etzioni (1999)   (55 citations)  (Correct)

....search engines. In earlier work we introduced Suffix Tree Clustering (STC) a fast, incremental, linear time clustering algorithm that produces coherent clusters [38] Using traditional IR metrics (e.g. average precision) we compared STC to other fast clustering algorithms (including k means [25], Buckshot and Fractionation [5] on the task of clustering results returned by Web search engines, and showed it to be both faster and more precise than previous algorithms. We used the results to argue that post retrieval clustering of Web search engine results a la STC is feasible. In this ....

....are probably the most commonly used. These algorithms are quadratic in the number of documents and are therefore too slow for our online requirements [34] Linear time clustering algorithms are the best candidates to comply with the speed requirement of on line clustering. These include K Means [25], Single Pass [14] Buckshot and Fractionation [5] In earlier work [38] we have introduced another linear time algorithm STC which will be briefly described in the next section. In contrast to STC, most clustering algorithms treat a document as a set of words and not as an ordered sequence ....

J. J. Rocchio, Document retrieval systems -- optimization and evaluation, Ph.D. Thesis, Harvard University, 1966.


Boosting and Rocchio Applied to Text Filtering - Schapire, Singer, Singhal (1998)   (51 citations)  Self-citation (Rocchio)   (Correct)

....(possibly retrieved by the initial user query) as relevant and some as non relevant, has been one of the most successful methods in IR. A feedback query creation algorithm developed by Rocchio in the mid 1960 s has, over the years, proven to be one of the best relevance feedback algorithms [20, 21]. Rocchio s algorithm was developed in the framework of the vector space model [25] Rocchio defines an optimal 3 query as the query that maximizes the average query document similarity for the relevant articles, and simultaneously minimizes the average query document similarity for the ....

J.J. Rocchio. Document Retrieval Systems--Optimization and Evaluation. PhD thesis, Harvard Computational Laboratory, Cambridge, MA, 1966.


Boosting and Rocchio Applied to Text Filtering - Schapire, Singer, Singhal (1998)   (51 citations)  Self-citation (Rocchio)   (Correct)

.... (possibly retrieved by the initial user query) as relevant and some as nonrelevant, hasbeen one of the most successfulmethods in IR [30] A feedback query creation algorithm developed by Rocchio in the mid 1960 s has, over the years, proven to beone of the best relevance feedback algorithms [27, 28]. Rocchio s algorithm was developed in the framework of the vector space model [32] When documents are to be ranked for a query, an ideal query should rank all the relevant documents above all non relevant documents. However, such a query might just not exist, or even if it does exists for the ....

J.J. Rocchio. Document Retrieval Systems--Optimization and Evaluation. PhD thesis, Harvard Computational Laboratory, Cambridge, MA, 1966.


E-Business Knowledge Based Information Retrieval - Scarinci, Wives, Loh.. (2002)   (Correct)

No context found.

ROCCHIO, J.J., Document Retrieval Systems: Optimization and Evaluation, 1966, National Science Foundation, Harvard Computation Laboratory.


Using Structured P2P Overlay Networks to Build Content.. - Stacey, Berry, Coyle   (Correct)

No context found.

J. J. Rocchio, Document retrieval systems - optimization and evaluation. Ph.D. Thesis, Harvard University, 1966.


The Role of Semantic Locality in Hierarchical Distributed.. - Bouskila (1999)   (5 citations)  (Correct)

No context found.

J. J. Rocchio, "Document retrieval systems -- optimization and evaluation," PhD thesis, Harvard University, 1966. Report ISR-10 to National Science Foundation, Harvard Computation Laboratory.


Improving Document Transformation Techniques with Collaborative.. - Klink (2004)   (Correct)

No context found.

Joseph J. Rocchio. Document Retrieval Systems - Optimization and Evaluation. Ph.D. Thesis, Harvard Computational Laboratory, Cambridge, MA, March 1966.


A Parallel Relational Database Management System.. - Lundquist.. (1999)   (5 citations)  (Correct)

No context found.

Rocchio, Jr., J. J., "Document Retrieval Systems - Optimization and Evaluation," Ph.D. Thesis, Harvard University, March 1966.


Web Document Clustering: A Feasibility Demonstration - Zamir, Etzioni (1998)   (77 citations)  (Correct)

No context found.

J. J. Rocchio, Document retrieval systems - optimization and evaluation. Ph.D. Thesis, Harvard University, 1966.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC