| Luhn, H. P. (1957), `A statistical approach to mechanized encoding and searching of literary information ', IBM Journal of Research and Development, 4(4), 600-605. |
....General Terms Experimentation Keywords Information Retrieval, Formal Models, Machine Learning 1. INTRODUCTION The goal of information retrieval (IR) is to determine which documents are relevant to a user s information need. In early IR work, this determination was based on heuristic judgments [17] (e.g. that documents containing the user s query terms are likely to be relevant) followed by heuristic tweaking of parameters (e.g. term weights) to make the system work. Subsequently, attempts were made to avoid, Permission to make digital or hard copies of all or part of this work for ....
H. P. Luhn. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Developement, 1(4):309--317, 1957.
....or misspelled. Two different automatic text processing approaches can be used to aid the requirements engineer in the situation described above: the statistical approach and the linguistics approach. In this paper we focus on the statistical approach, which originates from the work by H. P. Luhn [7]. There are several reasons that we choose to explore this approach. Firstly, the statistical approach has been thoroughly tried and examined and has been found fairly successful for automatic text analysis [17] Secondly, the linguistic approach is still regarded expensive to implement and no ....
Luhn, H. P., "A Statistical Approach to Mechanized Encoding and Searching of Literary Information", IBM Journal of Research and Development, 1(4), pp. 309-317, 1957.
....inadequate (see [4, 15] and yields the worst retrieval quality in comparison with other retrieval models. The ranking approach to retrieval has turned out to be more appropriate for end users. Many models for the ranking technique have been developed over the last 40 years since the work of Luhn [18] in 1957. Based on different query and document representations, it is always the retrieval function which determines the similarity between these representations and constitutes the ranking order. The usual techniques, however, are all based on flat documents. In other words the granularity of ....
H. Luhn. A statistical approach to mechanized encoding and searching of literary information. IBM J. of Research and Development, 1(4):309--317, 1957.
....following words , and redisplays the user s query with a blue AND separator between each query term. 3. 2 Stop Word Removal Although few web search engines perform stop word removal, it is a standard technique used in bibliographic search systems to decrease the index size and increase precision [14, 15]. Transparent Queries feedback for stop word removal consists of the following textual explanation: Words marked with a red line have been removed from the query because they are very common. Additionally, each query term is listed below the textual explanation, and stop words are marked with red ....
Luhn, HP; A Statistical Approach to Mechanized Encoding and Searching of Literary Information. IBM Journal of Research and Develoopment, 1957. 1(4).
....upon. It is one of the #bibliometric laws #White McCain, 1989# concerning regularities in bibliographies, lists of authors, citation lists, etc. For the purpose of #nding relevant, content bearing words ##keywords #, common #highest ranking# and rare #lowest ranking# words should be avoided #Luhn, 1957,1958#. Do wehave a similar situation where the highest ranking genes may not be interesting for cancer classi#cation #Lowest ranking genes are obviously not interesting.# Our ranking system is not really the same as for word usage, since a discrimination or classi#cation ability has already ....
Luhn HP #1957#, #A statistical approach to mechanized encoding and search of literature information", IBM Journal of Research and Development, 1:309-317.
....over time is indeed affected by Passage Feedback with IRIS (2) learning, an improved passage feedback system with usability enhancements may prove to be an effective mechanism for interactive information retrieval. 1 Introduction A perspective on information retrieval (IR) was presented by Luhn (1957), who suggested an automatic text retrieval system based on comparison of word representations between the document and the query. In this perspective, the objective of IR is to find information units from an information collection that best match the query put to it by a user. Consequently, the ....
Luhn, H. P. (1957). A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1, 309-317.
....(see [4, 15] and yields the worst retrieval quality in comparison with other retrieval models. The ranking approach to retrieval has turned out to be more appropriate for end users. Many models for the ranking technique have been developed over the last 40 years since the work of Luhn [18] in 1957. Based on different query and document representations it is always the retrieval function which determines the similarity between them and constitutes the ranking order. The usual techniques, however, are all based on flat documents, ie the granularity of division is always on the term ....
H.P. Luhn. A statistical approach to mechanized encoding and searching of literary information. IBM J. of Research and Development, 1(4):309--317, 1957.
.... such as TiMBL [28] and ILP [29] 5 Unsupervised template learning We should remember the possibility of unsupervised notion of template learning: in a Sheffield PhD thesis Collier [30] developed such a notion, one that can be thought of as yet another application of the old technique of Luhn [31] to locate statistically significant words in a corpus and use those to locate the sentences in which they occur as key sentences. This has been the basis of a range of summarisation algorithms and Collier proposed a form of it as a basis for unsupervised template induction, namely that those ....
H.P. Luhn, A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1:309--317, 1957.
....training sample size and richness. 4. 2 Unsupervised template learning We should remember that there is also a possible unsupervised notion of template learning, developed in a Sheffield PhD thesis by Collier [17] one that can be thought of as yet another application of the old technique of Luhn [42] to locate, in a corpus, statistically significant words and use those to locate the sentences in which they occur as key sentences. This has been the basis of a range of summarisation algorithms and Collier proposed a form of it as a basis for unsupervised template induction, namely that those ....
H. P. Luhn. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1:309-- 317, 1957.
.... cannot be extracted automatically, but have to be provided manually , i.e. from an external source; examples of this are the author of a photograph (in image retrieval) or the nationality of a non native speaker (in speech retrieval) Traditional information retrieval research, from Luhn [16] onwards, has assumed that retrieval should be based on endogenous knowledge only. Today, this assumption is increasingly challenged by the emergence of novel applications such digital libraries and multimedia search engines, and by the increasing convergence of research fields that had ....
Hans P. Luhn. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1:309--317, 1957.
....serve as a conceptual criterion for deciding whether the word should be kept for retrieval purposes. While such an analysis provides a conceptual criterion for identifying index terms, it is very difficult to apply. A variety of practical filters for content bearing terms have been devised. Luhn [13], for example, suggested a very simple filter based solely on frequency of occurrence. But the value of a word as a subject indicator is likely to depend on its pattern of occurrence as well as its frequency. Bookstein and Swanson [7] suggested that non content bearing terms would be Poisson ....
Luhn H.P., A Statistical Approach to the Mechanized Encoding and Searching of Literary Information, IBM Journal of Research and Development 1(4) (1957) 309--17.
....model presented in the next Section allows conceptual modelling of traditional, as well as of advanced IR applications (see [9] in which we will use different representation layers, one for queries and one for documents. 2. 2 Intelligent Information Retrieval It has been a long time since Luhn [17] suggested the use of statistical techniques for the representation of a document s information content. There have been many changes in the field of IR since that time and there have been surprising developments in computer hardware. However some fundamental issues remain unsolved. In particular, ....
H.P. Luhn. A statistical approach to mechanized encoding and searching of library Information. IBM Journal of Research and Development, 1:309:317, 1957.
....methods, and proposes a new method called weighted inverse document frequency (WIDF) 2.1 Term Frequency Term frequency is the simplest measure to weight each term in a text. In this method, each term is assumed to have importance proportional to the number of times it occures in a text [12]. The weight of a term t in a text d is given by W (d; t) TF(d; t) 1) where TF(d; t) is the term frequency of the term t in the text d. Term frequency is known to improve recall in information retrieval, but does not always improve precision. Because frequent terms tend to appear in many ....
H. P. Luhn. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, Vol. 1, No. 4, pp. 307--319, 1957.
....schema is widely used in IR to provide a measure of the discrimination power of a term in a document collection. This weight is based on Luhn s assumption and on the assumption that the discriminating power of a term is inversely proportional to the number of documents in which that term occurs [8]. In particular, the inverse document frequency reflects the intuition that the larger the number of documents that are indexed by the same term, the less important the term becomes as a descriptor of any of them. We can now combine the above two weighting schemas with the retrieval formulas ....
H.P. Luhn. A statistical approach to mechanized encoding and searching of library Information. IBM Journal of Research and Development, 1:309:317, 1957.
....requests advice, Letizia suggests which hyperlinks to follow based on its database of keywords. This strategy could be adapted to assist a prefetching algorithm in deciding which hyperlinks to follow. It has been recognized that many of the most common words in English are worthless in index terms [21]. A list of words used as a filter during indexing because they make poor index terms is called a stoplist. When building a database of keywords, a stoplist should be used to filter out such words. 3.1.2 Prefetching for Wireless Information Access Kaashoek et al. 22] describe dynamic ....
H. Luhn, "A statistical approach to mechanized encoding and searching of literary information," IBM Journal of Research and Development, vol. 1, no. 4, 1957.
....similarities [20] except that we use a new similarity measure that more accurately characterizes copy overlap, while traditional IR systems look for semantic similarity. Several schemes have been proposed to enhance IR schemes, such as use of signature files [8] lexical analysis [1] stoplists [13, 9], stemming algorithms [12, 15] thesaurus [21] and ranking algorithms [19] Since our approach is based on IR, such schemes are orthogonal to our model, and one or more of these schemes could be used to enhance our document comparison mechanism. Our scheme is based on words, which are easier to ....
H.P. Luhn. A statistical approach to mechanizedencoding and searching of literary information. IBM Journal of Research and Development, 1(4), 1957.
....to the field of domainindependent summarization. Word frequency based systems The idea that there exists a correlation between, on one hand, the frequency of words and their distribution, and, on the other hand, the significance in texts of the sentences that contain them goes back as far as Luhn [ 1957, 1958 ] In his experiments, Luhn observed that this correlation follows a Bell curve whose minima correspond to words that occur very seldom and very often and whose maximum corresponds to words that occur relatively frequently. The validity of using word frequency as an indicator of ....
H.P. Luhn. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1(4):309--317, October 1957.
No context found.
Luhn, H. P. (1957), `A statistical approach to mechanized encoding and searching of literary information ', IBM Journal of Research and Development, 4(4), 600-605.
No context found.
H.P. Luhn. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1957.
No context found.
H.P. Luhn. A statistical approach to mechanized encoding and searching of literacy information. IBM Journal of Research and Development, 1957.
No context found.
H. P. Luhn. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1(4):390, 1957.
No context found.
H. P. Luhn. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1(4):390, 1957.
No context found.
) H.P. Luhn, "A Statistical Approach to the Mechanized Encoding and Searching of Literary Information", IBMJournaP of ReseaP ch aP- Development,Vol. 1, No. 4, pp.30RpTPf4 1957.
No context found.
Hans Peter Luhn. 1957. "A Statistical Approach to Mechanized Encoding and Searching of Literary Information." IBM Journal of Research and Development 1 (4) 309-317.
No context found.
Luhn, H. (1957). A statistical approach to the mechanized encoding and searching of literary information. IBM Journal of Research and Development 1(4), 309--317.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC