| Harman, D. (1992). Ranking Algorithms.in Information Retrieval Data Structures & Algorithms. W. B. Frakes and R. |
....will also match the content being shared against the concepts (ontological classes) in the community s ontology. Each ontological class is characterized by a set of terms (keywords and phrases) and the shared information is matched against each concept using the vector cosine ranking algorithm [Harman 1992]. The system then suggests to the sharer a set of concepts to which the information could be assigned. The user is then able to accept the system recommendation or to modify it by suggesting alternative or additional concepts to which the document should be assigned. 2.2 Ontological ....
D. Harman. Ranking Algorithms, in W. Frakes & R. Baeza-Yates. Information Retrieval, Prentice-Hall, USA, 1992.
....one in terms of its structural position. The final weight of a term is composed by multiplying the two estimates. The first component evaluates the weight as if the term was from an unstructured document. It has been shown that the most important measure is the Inverse Document Frequency (IDF) [20, 9]. We have adopted the IDF definition in [5] which for a given term i IDF i = log2 N (3) Where N is the number of XML documents in the collection and n i is the number of occurrences of the term i in the collection. Based on the above measure, the weight of the term i in the XML document ....
D. K. Harman. Ranking Algorithms. In Information Retrieval: Data Structures and Algorithms, W. B Frakes and R. Baeza-Yates (Eds) Prentice-Hall, Englewood Cli#s, N.J. pp. 363-392 (1992)
....and fast linguistic analysis tool is represented in [8] SMES) SMES is a linguistic tool for the German language and consists of lexical, morphological and syntactical analysis. It can extract linguistic annotated word lists and also linguistic relations between words and word phrases. Ranking. [6] gives an overview on ranking algorithms. It describes several ranking aspects in the IR research area including a guide to selecting ranking techniques. A survey on general combining ranking algorithms gives [2] A ranking approach for structural data using the probabilistic model is XPRES [9] ....
D. Harman. Ranking Algorithms. In W. B. Frakes and R. Baeza-Yates, editors, Information Retrieval --- Data Structures & Algorithmns, pages 363--392. Prentice Hall PTR, New Jersey, USA, 1992.
....and returned to the user. In our experiments term weights were calcu lated using the IDF rule wa,t = fa,t log(N ft) where fa,t is the number of appearances of term t in document d, N is the number of documents in the collection, and ft is the number of docu ments that contain term t. Harman [3] gives a summary of ranking techniques and a discussion of the cosine measure and IDF weighting rule. To support the cosine measure in an inverted file text database system, each inverted file entry contains a sequence of (d, fa, pairs for some term t. The value ft, the number of documents ....
....set of approximately L candidate answers, but in a different permutation, so when the top r documents are extracted from this set and returned, different retrieval effectiveness can be expected, particularly if r L. A similar strategy to that of continue is de scribed by Harman and Candela [3, 4], although their motivation is somewhat different they de sire a small number of accumulators to reduce the sorting time required for a total ranking of the collection, whereas we assume that only a small fraction of the collection is to be presented. In this latter case, to find the top r ....
D. Harman. Ranking algorithms. In W.B. Frakes and R. Baeza-Yates, editors, Information Retrieval: Data Structures and Algorithms, chapter 14, pages 363 392. Prentice-HMl, 1992.
....w t =log(N f t ) where N is the number of documents stored the inverse document frequency (IDF) rule. With these assignments, the cosine similarity measure can be calculated as C q,d = log f t (1) There are many alternative mechanisms for assigning term and term document weights [7], but most of them can be calculated using the same framework. Throughout our experiments we assumed the formulation given in equation (1) A simple way to compute the cosine measure is as follows. First, for each document in the database an accumulator is created in which the term dependent ....
D.K. Harman. Ranking algorithms. In W.B. Frakes and R. Baeza-Yates, editors, Information Retrieval: Data Structures and Algorithms, chapter 14, pages 363--392. Prentice-Hall, 1992.
....of documents in the collection that contain t;takeinto account the number of times f d,t that term t appears in document d; and use the length W d (according to some metric) of document d for normalisation. In our experiments we have used the cosine measure with logarithmic in document frequency [8], which is one of the most e#ective similarity measures. In this method the similarity C(q,d)ofqueryq and document d in a collection of N documents is given by C(q,d) # t#q#d (w q,t w d,t ) t#q q,t # d,t # where w d,t =log(f d,t 1) and w q,t =log(f q,t 1) log(N f t 1) In ....
D.K. Harman. Ranking algorithms. In Frakes and BaezaYates
....these words are more discriminating than common words; that is, the presence of a rare word in both document and query is assumed to be a good indicator of relevance. The cosine measure is just one method that can be used to perform ranking, and there are many others see, for example, Harman [17] or Salton [29] for descriptions of alternatives. The cosine measure suits our purposes because, if anything, it is one of the more demanding similarity measures, in that the similarity value assigned to each document depends not just upon that document, but also upon all of the other documents in ....
....document, but also upon all of the other documents in the collection. 2. 4 Ranked query evaluation The usual method for determining which of the documents in a collection have a high cosine measure with respect to a query is to compute cosine from the inverted file structure and document lengths [3, 17, 18, 24]. In this method, an accumulator variable A d is created for each document d containing any of the words in the query, in which the result of the expression t w q,t w d,t is accrued as inverted lists are processed. A simple form of this query evaluation algorithm is shown in Figure 2. Note ....
[Article contains additional citation context not shown here]
D.K. Harman. Ranking algorithms. In Frakes and Baeza-Yates [13], chapter 14, pages 363--392.
....model and the properties of structured documents, as illustrated by XML documents, are given. This is to introduce the basic notions for the rest of the paper. 2. 1 Retrieval Function It is generally accepted in the information retrieval community that Boolean retrieval is inadequate (see [4, 15]) and yields the worst retrieval quality in comparison with other retrieval models. The ranking approach to retrieval has turned out to be more appropriate for end users. Many models for the ranking technique have been developed over the last 40 years since the work of Luhn [18] in 1957. Based on ....
D. Harman. Ranking algorithms. In Frakes and Baeza-Yates
....with, IRF 1 was more than sufficiently fast. Only this approach was therefore made completely operational and used for the evaluation. 4.6 Retrieval The retrieval of documents is done by the IRF: VSRetrieval module. This module implements a retrieval strategy based on the vector space model [54, 26] with term weighting. The index is stored in three tables of a relational database, as described above. 6 Available from http: www.hughes.com.au . mSQL can be used freely for academic purposes. 43 Figure 4.9: IRF 1 Web Interface The first retrieval systems were based on Boolean logic (they ....
....functions are the inverse document frequency (IDF) the signal noise ratio, and the term discrimination value. 7 Of these I will only describe the IDF measure in detail, because it is used in IRF 1; it has proved to yield very good results in numerous experiments, many of which are referenced in [26]. The inverse document frequency measure was originally devised by Sparck Jones in [59] It is based on the assumption that the importance of a term t is proportional to its frequency in each document d (without stopwords) that is freq td , the within document frequency, and inversely ....
[Article contains additional citation context not shown here]
Harman, Donna. 1992. "Ranking algorithms." In Information Retrieval, edited by W. B. Frakes, and R. Baeza-Yates. Englewood Cliffs, NJ: Prentice Hall: 363--392.
....connected by boolean operators such as AND, OR, or NOT. Long boolean queries, especially when adjacency operators are allowed, are difficult to formulate. Even a computer literate might have trouble for instance to express the query: human factors and or system performance in medical database [Harman, 1992] as a boolean expression. Moreover, users might have to reformulate their query several times (changing some ORs into ANDs, or vice versa) if too many or not enough candidates are returned the first time. For user friendliness, free style queries are clearly preferable queries. Section 6.1.2 ....
Harman, D. (1992). Ranking algorithms. In Frakes, W. B. and Baeza-Yates, R., editors, Information Retrieval, Data Structure and Algorithms, pages 241--263. Prentice Hall.
....Various search engines such as AltaVista, InfoSeek, Yahoo, etc. have been developed and used nowadays. Normally, a query submitted to the Internet search engine is composed of terms or keywords. The relevance of a document to a given query is calculated according to some formulae such as TFxIDF [4]. In comparison to the queries used in traditional information retrieval systems, the queries posed to the search engines on the Internet consist of only about 2 query terms on average [5] The Web is believed to contain several hundred million pages, but the coverage of a single search engine is ....
Harman, D. Ranking algorithms. In Information Retrieval: Data Structures and Algorithms (Englewood Cliffs, NJ, 1992), Prentice-Hall, pp. 293--362.
....requires users to write complex logical expressions for query representation. It then presents the search output in a disordered manner. Although there are some search engines that arrange the search output using a vector space model or a Term Frequency Inverted Document Frequency (TF IDF) model(Harman, 1992), there are obvious limitations in retrieval effectiveness because of the difficulty in searching documents by using only query words and their statistical characteristics. On the other hand, a great deal of work has been carried out on solving these problems using natural language processing ....
Harman, D. (1992). Ranking algorithms. In Information Retrieval, chapter 14. Prentice Hall.
....this approach is that the more frequent a term 1 Most of this research was done while the author was a Research Fellow at the IBM Haifa Research Laboratory. is in a collection, the less discriminating it is. The most classical embodiment of this approach is the family of tf Theta idf scores [Harman, 1992, Salton and McGill, 1983] where tf stands for the term frequency of a term in a document, and idf for the inverse document frequency . The possibility of allowing the user to assign weights to search terms exists already in some IR engines (although apparently not in any of the popular Web ....
Harman, D. (1992). Ranking algorithms. In Frakes, W. B. and Baeza-Yates, R., editors, Information Retrieval, Data Structure and Algorithms, pages 241--263. Prentice Hall.
....subsequences are not necessarily distinct. Retrieval method Instead of the Boolean retrieval operations used in the earlier studies, augmented tf idf ranking method available in the SMART Information Retrieval System Version 11.0 was selected for use in the retrieval tests. According to Harman [10], ranking retrieval methods have two important features that recommended them over Boolean approaches. First adjacency operations or field restrictions, necessary in Boolean systems, are not necessary in ranking systems; and, stoplists are not required, nor recommended, for ranking systems. The ....
Harman, Donna. Ranking algorithms. In Information retrieval: Data structures & algorithms, ed. William B. Frakes and Ricardo Baeza-Yates, 363-392. Englewood Cliffs: Prentice Hall, 1992.
....their relative importance by considering their distribution in the full document collection. The intuition behind this approach is that the more frequent a term is in a collection, the less discriminating it is. The most classical embodiment of this approach is the family of tf Theta idf scores [Har92, SM83] where tf stands for the term frequency of a term in a document, and idf for the inverse document frequency . The possibility of allowing the user to assign weights to search terms exists already in some IR engines (although apparently not in any of the popular Web search services) but ....
D. Harman. Ranking algorithms. In W. B. Frakes and R. Baeza-Yates, editors, Information Retrieval, Data Structure and Algorithms, pages 241--263. Prentice Hall, 1992.
....5.2 Model evaluation We compare our model model with the vector space model [5] We used two variants of the vector space model. The first variant uses the following weighting function 2 : w i (p j ) log 2 FREQ i (p j 1) log 2 TOTFREQ i Delta IDF(N; p j ) It was reported in [1] that such a formula can be safely used for retrieval. This variant is labelled VSM . The second variant of the vector space model uses the formula (1) with the IDF variant (2) We label this variant as Baseline . This model can be considered as a special case of the proposed model where only ....
D. K. Harman. Ranking algorithms. In W. B. Frakes and R. Baeza-Yates, editors, Information Retrieval: Data Structures and Algorithms, chapter 14, pages 363--392. Prentice-Hall, New Jersey, NJ, 1992. IRSG98 11 A Dempster-Shafer Model for Document Retrieval using Noun Phrases
....Instead, it requires users to write complex logical expressions for queries and presents the search output in a disordered manner. There are some search engines that arrange the search output using the vector space model or the Term Frequency Inverted Document Frequency (TF IDF) model[2]. However, there are obvious limitations in retrieval effectiveness because it is difficult to search documents using only query words and their statistical characteristics. On the other hand, much work has been done towards solving these problems using natural language processing methods[3] ....
Donna Harman : "Ranking Algorithms" in Information Retrieval, Chapter 14. Prentice Hall, 1992.
....retrieval model and properties of structured documents, illustrated by XML documents, are given. This is to introduce the basic notions for the rest of the paper. 2. 1 Retrieval Function It is generally accepted in the information retrieval community that Boolean retrieval is insufficient (see [4, 15]) and yields the worst retrieval quality in comparison with other retrieval models. The ranking approach to retrieval has turned out to be more appropriate for end users. Many models for the ranking technique have been developed over the last 40 years since the work of Luhn [18] in 1957. Based on ....
Donna Harman. Ranking algorithms. In Frakes and Baeza-Yates [10], chapter 14, pages 363--392.
....list of search results. The order of the documents is based on a measure of similarity between the document and the query; that similarity measure is used as an approximation of the relevance of the document to the query 10 1. 0 Organization of Search Results (van Rijsbergen 1979; Salton 1989; Harman 1992). Yet, an ordered list does not give the user information about the similarities or differences in the content of the documents. For example, the user would not be able to determine that 30 different preventive measures were discussed in the retrieved documents, or that 10 documents discussed the ....
Harman D (1992). Ranking Algorithms. Information Retrieval Data Structures & Algorithms. R. B.-Y. William B. Frakes, Prentice Hall.
....and easily. Other Approaches Automatic approaches to organizing search results include relevance ranking and clustering. These techniques typically represent each document as a vector of all words that appear in the document. Relevance ranking systems create an ordered list of search results (Harman 1992). The order of the documents is based on a measure of how likely it is that the document is relevant to the query. Even if the documents are ranked by relevance criteria, an ordered list does not give the user much information on the similarities or differences in the contents of the documents. ....
Harman, D. 1992. Ranking Algorithms. Information Retrieval Data Structures & Algorithms. R. B.-Y. William B. Frakes, Prentice Hall.
....query term) The latter two are very simple methods to give a rough estimate of the behaviour of the characteristics of a term. Term frequency Including information about how often a term occurs in a document term frequency information has often been shown to increase retrieval performance (Harman, 1992) . For this experiment we used the formula tf d (t) ln(occs t ) ln(n unique ) where occs t is the number of occurrences of term t in document d and occs unique is the number of unique term occurrences in d. Theme Previous work by e.g. Hearst and Plaunt, 1993) and (Paradis and Berrut, 1996) ....
Harman, D. (1992). Ranking algorithms. In: Information retrieval : data structures & algorithms . (W. B. Frakes and R. Baeza-Yates, ed.). Ch. 14. pp 363 - 392.
....is not considered. It is again surprising to observed that the idf weighting scheme produces the same level of effectiveness than tf Gamma idf . This is in contrast to what generally happens in textual IR, where term within document frequency is important information for the weighting scheme [11]. Moreover, figure 5 confirms that the use of stemming is detrimental to the effectiveness of an IR system in SQR, as already observed previously in figure 3. Other experiments involving the use of different versions of the tf weighting scheme (the tf 10 , for example) and of different sizes of ....
D. Harman. Ranking algorithms. In W.B. Frakes and R. Baeza-Yates, editors, Information Retrieval: data structures and algorithms, chapter 14. Prentice Hall, Englewood Cliffs, New Jersey, USA, 1992. 25
No context found.
Harman, D. (1992). Ranking Algorithms.in Information Retrieval Data Structures & Algorithms. W. B. Frakes and R.
No context found.
Harman, D., Ranking Algorithms, in Frakes, W. & Baeza-Yates, R. Information Retrieval, Prentice-Hall, New Jersey, USA (1992)
No context found.
D. K. Harman. Ranking algorithms. In W. B. Frakes and R. Baeza-Yates, editors, Information Retrieval: Data Structures and Algorithms, chapter 14, pages 363--392. Prentice-Hall, New Jersey, NJ, 1992.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC