| Brian Amento, Loren G. Terveen and William C. Hill. Does "authority" mean quality? Predicting expert quality ratings of Web documents. In Proceedings of SIGIR 2000. |
....link is highly recommended and therefore an authority. The authoritativeness of a document is even higher if the documents that link to it are some sort of authorities themselves. HITS and PageRank (or variants of those) are reported to be successful in numerous studies of retrieval quality [18, 3, 1, 25, 5], however when measuring relevance only, without taking quality or authoritativeness into account, HITS and PageRank based methods have not been able to improve retrieval effectiveness for Ad Hoc search tasks [15, 13, 20, 31, 8] Davison [11] shows that the topic locality assumption holds: when ....
B. Amento, L. Terveen, and W. Hill. Does "authority" mean quality? predicting expert quality ratings of web documents. In Croft et al. [9], pages 296--303.
....containing what the user wants that is named in the query. Topic distillation is new, and is concerned with locating the most useful pages (out of many) that best and comprehensively describe a user s topic, either by content or via links. Previous investigations on topic distillation such as [2,3,4,5] mostly tie the process to quality identification. They employed Kleinberg s HITS [6] algorithm as the primary method, and added content weighting as secondary improvement. Authority and hub pages found were identified with topic distillation answers. In this experiment, we employ page content ....
Amento, B, Terveen, L & Hill, W (2000). Does "authority" mena quality? Predicting expert quality ratings of web documents. Proc. ACM SIGIR 2000, pp.296-303.
....of the need may be available. Similarity between the short or long description and each crawled page may be used to judge the page s relevance [15, 21] 3. Similarity to seed pages: The pages corresponding to the seed URLs, are used to measure the relevance of each page that is crawled [2]. The seed pages are combined together into a single document and the cosine similarity of this document and a crawled page is used as the page s relevance score. 4. Classifier score: A classifier may be trained to identify the pages that are relevant to the information need or task. The training ....
....score [21[ 18 Gautam Pant et al. 6. Link based popularity: One may use algorithms, such as PageRank [5] or HITS [16] that provide popularity estimates of each of the crawled pages. A simpler method would be to use just the number of in links to the crawled page to derive similar information [10, 2]. Many variations of link based methods using topical weights are choices for measuring topical popularity of pages [4, 7] 4.2 Summary Analysis Given a particular measure of page importance we can summarize the performance of the crawler with metrics that are analogous to the information ....
t3. Amento, L. Terveen, and W. Hill. Does "authority" mean quality? Predicting expert quality ratings of web documents. In Proc. 23th Annual Intl. A CM SICIR Conf. on Research and Development in Information Retrieval, 2000.
....we leave this to future work. In other future work, we would like to test the techniques presented here on many more questions, and further refine the classification described in [4] by training on larger data sets. Further, we would like to investigate whether page ranking algorithms as in [2] are appropriate for task questions. 5. ACKNOWLEDGMENTS We would like to acknowledge the assistance of Morris Tang. This work was supported in part by the Center for Intelligent Information Retrieval and in part by NSF grant #IIS 9907018 and the Advanced Research and Development Activity grant ....
Amento, B., Terveen, L, and Hill, W. Does "authority" mean quality? Predicting expert quality rating of web documents. In Proceedings of SIGIR '00, 296-303.
....entropy [22] is applied on the term document matrix which is generated from the term extraction module to calculate the entropy. By definition, the entropy E can be expressed as =1Plogp , where Pi is the probability of eventi and n is number of events. By normalizing the weight of a term to be [0, 1 ], the entropy of term T is: E( log2 (1) in which w e is the value of normalized term frequency. w e is an entity in the term document matrix to represent the weight of a term in a page, i.e. j , where is the term frequency of term in pagej. To normalize the entropy value of a term to the ....
....entropy of term T is: E( log2 (1) in which w e is the value of normalized term frequency. w e is an entity in the term document matrix to represent the weight of a term in a page, i.e. j , where is the term frequency of term in pagej. To normalize the entropy value of a term to the range [0, 1], the base of the logarithm is chosen to be the number of pages. Equation (1) thus becomes: E(T) welog, we, wherenqDi, Disthesetof pages. 2) We then define the entropy of anchor AN as the average entropy of all terms inAN below: E(ANi) lE(5) whereT1,T2 . T,aretermsinanchor ANi (3) k ....
[Article contains additional citation context not shown here]
B. Amento, L. Terveen, and W. Hill. Does "Authority" Mean Quality? Predicting Expert Quality Ratings of Web Documents. Proc. of 23th ACM SIGIR Conf. on Research and Development in Information Retrieval, 2000.
....often the impact or the signi cance of a paper is measured by the simple counts of citation. Second, just as the number of citations of a paper does not include all factors in judging the importance of a paper, the in degree of a webpage does not necessarily truly re ect the quality of a webpage[2]. Indeed, ranking of webpages and of scienti c papers can not be decided entirely objectively. One could easily list many other factors. Here we proceed by brie y discussing two exceptions to the generic rule of in degree ranking in relation to HITS: a) Highly HITS ranked authority webpages, but ....
B. Amento, L. Terveen, and W. Hill. Does authority mean quality? Predicting expert quality ratings of web documents. Proc. ACM Conf. on Research and Develop. in Info. Retrieval (SIGIR'00), pages 296-303, 2000.
....link is highly recommended and therefore an authority. The authoritativeness of a document is even higher if the documents that link to it are some sort of authorities themselves. HITS and PageRank (or variants of those) are reported to be successful in numerous studies of retrieval quality [18, 3, 1, 25, 5], however when measuring relevance only, without taking quality or authoritativeness into account, HITS and PageRank based methods have not been able to improve retrieval effectiveness for Ad Hoc search tasks [15, 13, 20, 31, 8] Davison [11] shows that the topic locality assumption holds: when ....
B. Amento, L. Terveen, and W. Hill. Does "authority" mean quality? Predicting expert quality ratings of web documents. In Croft et al. [9], pages 296--303.
....of humans and other Web agents than humans themselves. Thus it is quite reasonable to explore crawlers in a context where the parameters of crawl time and crawl distance may be beyond the limits of human acceptance imposed by user based experimentation. Our analysis of the crawler literature [1, 2, 4, 6, 8, 11, 10, 17, 18, 29, 37] and our own experience [21, 24, 25, 26, 22, 31, 32, 27] indicate that in general, when embarking upon an experiment comparing crawling algorithms, several critical decisions are made. These impact not only the immediate outcome and value of the study but also the ability to make comparisons with ....
....that use link based criteria. Lexical measures show a range of sophistication. Cho et al. 12] explore a rather simple measure: the presence of a single word such as computer in the title or above a frequency threshold in the body of the page is enough to indicate a relevant page. Amento et al. [2] compute similarity between a page s vector and the centroid of the seed documents as one of their measures of page quality. Chakrabarti et al. apply classifiers built using positive and negative example pages to determine page importance [11] Aggarwal et al. adopt a more generic framework that ....
[Article contains additional citation context not shown here]
B Amento, L Terveen, and W Hill. Does "authority" mean quality? Predicting expert quality ratings of Web documents. In Proc. 23rd ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 296--303, 2000.
....ranking scheme that can be used to rank search results. The HITS algorithm identifies, for a given search query, a set of authority pages and a set of hub pages. A complete performance evaluation of these link based techniques is beyond the scope of this paper. The interested reader is referred to [3] for a comparison of the performance of di#erent link based ranking techniques. There are many interesting research directions to be explored. One promising direction is to study how other sources of information, like query logs, and click streams can be used for improving Web searching. Another ....
B. Amento, L. Terveen, and W. Hill. Does authority mean quality? Predicting expert quality ratings of web documents. In Proceedings of the Twenty-Third Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2000.
....the effectiveness of a link method and a non link method, followed by discussion and conclusions (Sections 5 7) 2. THE SITE FINDING PROBLEM It is difficult to precisely and uncontroversially define the term Web site . Rather than introduce a new definition, we adopt one by Amento et al. [1]: A site (multimedia document) is an organized collection of pages on a specific topic maintained by a single person or group . A site is not the same thing as a domain: for example, thousands of sites are hosted on www. geocities. corn. We note that the topic of a site might be quite ....
....is highly recommended, and should be ranked more highly. This can be based on a simple link count [4] or on a calculation of page weights by iterative propagation [3, 8] A recent study found that link recom mendation methods do a good job of picking high quality items, as judged by human experts [1]. Judgments of qual ity may be necessary to properly evaluate recommendation methods, such as those used by Bharat and Henzinger [2] However, reliably defining and measuring different aspects of quality, and assessing their importance to end users, is a new endeavor [13] The topic locality ....
Brian Amento, Loren G. Terveen, and William C. Hill. Does "authority" mean quality? Predicting expert quality ratings of Web documents. In Proceedings of $IGIR 2000.
....gold standard that represents a relevant page. For instance in [6] the authors explore a rather simple version of this measure: the presence of a single word such as computer in the title or above a frequency threshold in the body of the page is enough to indicate a relevant page. Amento et al. [2] compute similarity between a page s vector and the centroid of the seed documents as one of their measures of page quality. Chakrabarti et al. apply classifiers built using positive and negative example pages to determine page importance [5] In our own research we have used as the gold standard ....
B. Amento, L. Terveen, and W. Hill. Does "authority" mean quality? Predicting expert quality ratings of Web documents. In Proceedings of the 23rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 296--303, 2000.
....less obvious but larger long term gains. Their solution is to build classifiers that can assign pages to di#erent classes based on the expected link distance between the current page and relevant documents. The area of crawler quality evaluation has also received much attention in recent research [2, 9, 8, 19, 4]. For instance, many alternatives for assessing page importance have been explored, showing a range of sophistication. Cho et al. 9] use the simple presence of a word such as computer to indicate relevance. Amento et al. 2] compute the similarity between a page and the centroid of the seeds. ....
....has also received much attention in recent research [2, 9, 8, 19, 4] For instance, many alternatives for assessing page importance have been explored, showing a range of sophistication. Cho et al. 9] use the simple presence of a word such as computer to indicate relevance. Amento et al. [2] compute the similarity between a page and the centroid of the seeds. In fact contentbased similarity assessments form the basis of relevance decisions in several examples of research [8, 19] Others exploit link information to estimate page relevance with methods based on in degree, out degree, ....
[Article contains additional citation context not shown here]
B. Amento, L. Terveen, and W. Hill. Does "authority" mean quality? Predicting expert quality ratings of web documents. In Proc. 23rd ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 296--303, 2000.
....is crucial. In previous work [17] we proposed an evaluation framework and tested a number of methods to evaluate the performance of such crawlers in a fair way and under limited memory resources. There are two limitations in our previous research. First, as in the case of other studies (see e.g. [5, 1]) some of our metrics of page quality, i.e. relevance) relied on the similarity between the crawled pages and the seed pages. This in a sense simplified the crawl problem because a crawler would start from the most relevant pages. Thus our first goal here is to assess crawlers within a more ....
....Henzinger [3] experiment with 28 topics but do not di#erentiate between these. However, they do present results for the full topic set and for two special subsets: rare and popular topics as determined by the retrieval set size from AltaVista (analogous to our lexical generality) Amento et al. [1] experiment on a set of five topics that are somewhat homogeneous in that they are all representative of popular entertainment topics. Menczer and Belew [16] test two crawlers (InfoSpiders and BestFirst) on topics from a limited Encyclopaedia Britannica (EB) corpus and analyze the dependence of ....
B. Amento, L. Terveen, and W. Hill. Does "authority" mean quality? Predicting expert quality ratings of web documents. In Proc. 23rd ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 296--303, 2000.
....less obvious but larger long term gains. Their solution is to build classifiers that can assign pages to di#erent classes based on the expected link distance between the current page and relevant documents. The area of crawler quality evaluation has also received much attention in recent research [2, 9, 8, 24, 4]. For instance, many alternatives for assessing page importance have been explored, showing a range of sophistication. Cho et al. 9] use the simple presence of a word such as computer to indicate relevance. Amento et al. 2] compute the similarity between a page and the centroid of the seeds. ....
....has also received much attention in recent research [2, 9, 8, 24, 4] For instance, many alternatives for assessing page importance have been explored, showing a range of sophistication. Cho et al. 9] use the simple presence of a word such as computer to indicate relevance. Amento et al. [2] compute the similarity between a page and the centroid of the seeds. In fact content based similarity assessments form the basis of relevance decisions in several examples of research [8, 24] Others exploit link information to estimate page relevance with methods based on in degree, out degree, ....
[Article contains additional citation context not shown here]
B Amento, L Terveen, and W Hill. Does "authority" mean quality? predicting expert quality ratings of web documents. In Proceedings of the 23rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 296--303, 2000.
....similarity. Similarity based measures show a range of sophistication. Cho et al. 7] explore a rather simple similarity measure: the presence of a single word such as computer in the title or above a frequency threshold in the body of the page is enough to indicate a relevant page. Amento et al. [1] compute similarity between a page s vector and the centroid of the seed documents as one of their measures of page quality. Similarity as measured using page text [3] or the words surrounding a link [6] may also be used to augment what are primarily link based measures. Chakrabarti et al. take a ....
....built using positive and negative example pages to determine page importance [6] We also explore classifiers for page evaluation, however there are di#erences as described below. In degree, out degree, PageRank, hubs and authorities are the more commonly used link based page importance measures [1, 2, 3, 6, 7]. For example, Cho et al. consider pages with PageRank above a specified threshold as being relevant to the query [7] Kleinberg s recursive notion of hubs and authorities [13] has been extended by several others. For example, edge weights are considered important [6] and so are edges that connect ....
[Article contains additional citation context not shown here]
B. Amento, L. Terveen, and W. Hill. Does "authority" mean quality? predicting expert quality ratings of web documents. In Proceedings of the 23rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 296--303, 2000.
No context found.
Brian Amento, Loren G. Terveen, and Willuam C. Hill. Does "authority" mean quality? predicting expert quality ratings of web documents. In Research and Development in Information Retrieval, pages 296--303, 2000.
No context found.
Amento, B., Terveen, L., and Hill, W. Does "Authority" Mean Quality? Predicting Expert Quality Ratings of Web Documents, in Proceedings of SIGIR'2000 (Athens Greece, July 2000), ACM Press.
....model. The algorithm assumes no knowledge of the model, and is well defined regardless of the model s accuracy. 1. Introduction Kleinberg s seminal paper [20] on hubs and authorities introduced a natural paradigm for classifying and ranking web pages, setting off an avalanche of subsequent work [7, 8, 10, 15, 22, 9, 3, 12, 19, 2, 5, 27, 1, 13]. Kleinberg s ideas were implemented in HITS as part of the CLEVER project [7, 10] Around the same time, Brin and Page [6, 25, 18] developed a highly successful search engine, Google [17] which orders search results according to Microsoft Research, One Microsoft Way, Redmond, WA 98052. optas ....
Brian Amento, Loren G. Terveen, and Willuam C. Hill. Does "authority" mean quality? predicting expert quality ratings of web documents. In Research and Development in Information Retrieval, pages 296--303, 2000.
No context found.
Brian Amento, Loren G. Terveen and William C. Hill. Does "authority" mean quality? Predicting expert quality ratings of Web documents. In Proceedings of SIGIR 2000.
No context found.
B. Amento, L. Terveen, and W. Hill. Does authority mean quality? Predicting expert quality ratings of web documents. In Proceedings of the Twenty-Third Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2000.
No context found.
B. Amento, L. Terveen, and W. Hill. Does authority mean quality? Predicting expert quality ratings of web documents. In Proceedings of the Twenty-Third Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2000.
No context found.
Brian Amento, Loren Terveen, and Will Hill. Does "authority" mean quality? predicting expert quality ratings of web documents. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 296--303, Athens Greece, jul 2000.
No context found.
B. Amento, L. G. Terveen, and W. C. Hill. Does "authority" mean quality? Predicting expert quality ratings of web documents. In Proceedings of SIGIR'00, pages 296--303, 2000.
No context found.
Brian Amento, Loren G. Terveen and William C. Hill. Does "authority" mean quality? Predicting expert quality ratings of Web documents. In Proceedings of SIGIR 2000.
No context found.
B. Amento, L. Terveen, and W. Hill. Does "authority" mean quality? predicting expert quality ratings of web documents. In Proceedings of the 23rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 296-303, 2000.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC