25 citations found. Retrieving documents...
Luhn, H. P. (1957), `A statistical approach to mechanized encoding and searching of literary information ', IBM Journal of Research and Development, 4(4), 600-605.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Empirical Development of an Exponential Probabilistic Model.. - Teevan, Karger (2003)   (1 citation)  (Correct)

....General Terms Experimentation Keywords Information Retrieval, Formal Models, Machine Learning 1. INTRODUCTION The goal of information retrieval (IR) is to determine which documents are relevant to a user s information need. In early IR work, this determination was based on heuristic judgments [17] (e.g. that documents containing the user s query terms are likely to be relevant) followed by heuristic tweaking of parameters (e.g. term weights) to make the system work. Subsequently, attempts were made to avoid, Permission to make digital or hard copies of all or part of this work for ....

H. P. Luhn. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Developement, 1(4):309--317, 1957.


Evaluating Automated Support for Requirements.. - Dag, Regnell.. (2001)   (2 citations)  (Correct)

....or misspelled. Two different automatic text processing approaches can be used to aid the requirements engineer in the situation described above: the statistical approach and the linguistics approach. In this paper we focus on the statistical approach, which originates from the work by H. P. Luhn [7]. There are several reasons that we choose to explore this approach. Firstly, the statistical approach has been thoroughly tried and examined and has been found fairly successful for automatic text analysis [17] Secondly, the linguistic approach is still regarded expensive to implement and no ....

Luhn, H. P., "A Statistical Approach to Mechanized Encoding and Searching of Literary Information", IBM Journal of Research and Development, 1(4), pp. 309-317, 1957.


Searching and Browsing Collections of Structural Information - Wolff, Flörke, Cremers (2000)   (11 citations)  (Correct)

....inadequate (see [4, 15] and yields the worst retrieval quality in comparison with other retrieval models. The ranking approach to retrieval has turned out to be more appropriate for end users. Many models for the ranking technique have been developed over the last 40 years since the work of Luhn [18] in 1957. Based on different query and document representations, it is always the retrieval function which determines the similarity between these representations and constitutes the ranking order. The usual techniques, however, are all based on flat documents. In other words the granularity of ....

H. Luhn. A statistical approach to mechanized encoding and searching of literary information. IBM J. of Research and Development, 1(4):309--317, 1957.


Transparent Queries: Investigating Users' Mental Models of.. - Muramatsu, Pratt (2001)   (3 citations)  (Correct)

....following words , and redisplays the user s query with a blue AND separator between each query term. 3. 2 Stop Word Removal Although few web search engines perform stop word removal, it is a standard technique used in bibliographic search systems to decrease the index size and increase precision [14, 15]. Transparent Queries feedback for stop word removal consists of the following textual explanation: Words marked with a red line have been removed from the query because they are very common. Additionally, each query term is listed below the textual explanation, and stop words are marked with red ....

Luhn, HP; A Statistical Approach to Mechanized Encoding and Searching of Literary Information. IBM Journal of Research and Develoopment, 1957. 1(4).


Zipf's Law in Importance of Genes for Cancer Classification.. - Li, Yang (2002)   (Correct)

....upon. It is one of the #bibliometric laws #White McCain, 1989# concerning regularities in bibliographies, lists of authors, citation lists, etc. For the purpose of #nding relevant, content bearing words ##keywords #, common #highest ranking# and rare #lowest ranking# words should be avoided #Luhn, 1957,1958#. Do wehave a similar situation where the highest ranking genes may not be interesting for cancer classi#cation #Lowest ranking genes are obviously not interesting.# Our ranking system is not really the same as for word usage, since a discrimination or classi#cation ability has already ....

Luhn HP #1957#, #A statistical approach to mechanized encoding and search of literature information", IBM Journal of Research and Development, 1:309-317.


Passage Feedback with IRIS - Yang, Maglaughlin, Newby   (Correct)

....over time is indeed affected by Passage Feedback with IRIS (2) learning, an improved passage feedback system with usability enhancements may prove to be an effective mechanism for interactive information retrieval. 1 Introduction A perspective on information retrieval (IR) was presented by Luhn (1957), who suggested an automatic text retrieval system based on comparison of word representations between the document and the query. In this perspective, the objective of IR is to find information units from an information collection that best match the query put to it by a user. Consequently, the ....

Luhn, H. P. (1957). A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1, 309-317.


XPRES: a Ranking Approach to Retrieval on Structured Documents - Wolff, Flörke, Cremers (1999)   (2 citations)  (Correct)

....(see [4, 15] and yields the worst retrieval quality in comparison with other retrieval models. The ranking approach to retrieval has turned out to be more appropriate for end users. Many models for the ranking technique have been developed over the last 40 years since the work of Luhn [18] in 1957. Based on different query and document representations it is always the retrieval function which determines the similarity between them and constitutes the ranking order. The usual techniques, however, are all based on flat documents, ie the granularity of division is always on the term ....

H.P. Luhn. A statistical approach to mechanized encoding and searching of literary information. IBM J. of Research and Development, 1(4):309--317, 1957.


IR and AI: traditions of representation and anti-representation in .. - Wilks (2000)   (1 citation)  (Correct)

.... such as TiMBL [28] and ILP [29] 5 Unsupervised template learning We should remember the possibility of unsupervised notion of template learning: in a Sheffield PhD thesis Collier [30] developed such a notion, one that can be thought of as yet another application of the old technique of Luhn [31] to locate statistically significant words in a corpus and use those to locate the sentences in which they occur as key sentences. This has been the basis of a range of summarisation algorithms and Collier proposed a form of it as a basis for unsupervised template induction, namely that those ....

H.P. Luhn, A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1:309--317, 1957.


Can we make Information Extraction more adaptive? - Wilks, Catizone (1999)   (2 citations)  (Correct)

....training sample size and richness. 4. 2 Unsupervised template learning We should remember that there is also a possible unsupervised notion of template learning, developed in a Sheffield PhD thesis by Collier [17] one that can be thought of as yet another application of the old technique of Luhn [42] to locate, in a corpus, statistically significant words and use those to locate the sentences in which they occur as key sentences. This has been the basis of a range of summarisation algorithms and Collier proposed a form of it as a basis for unsupervised template induction, namely that those ....

H. P. Luhn. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1:309-- 317, 1957.


Towards a Logical Reconstruction of Information Retrieval Theory - Sebastiani (1999)   (2 citations)  (Correct)

.... cannot be extracted automatically, but have to be provided manually , i.e. from an external source; examples of this are the author of a photograph (in image retrieval) or the nationality of a non native speaker (in speech retrieval) Traditional information retrieval research, from Luhn [16] onwards, has assumed that retrieval should be based on endogenous knowledge only. Today, this assumption is increasingly challenged by the emergence of novel applications such digital libraries and multimedia search engines, and by the increasing convergence of research fields that had ....

Hans P. Luhn. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1:309--317, 1957.


Clumping Properties of Content-Bearing Words - Bookstein, Klein, Raita (1998)   (6 citations)  (Correct)

....serve as a conceptual criterion for deciding whether the word should be kept for retrieval purposes. While such an analysis provides a conceptual criterion for identifying index terms, it is very difficult to apply. A variety of practical filters for content bearing terms have been devised. Luhn [13], for example, suggested a very simple filter based solely on frequency of occurrence. But the value of a word as a subject indicator is likely to depend on its pattern of occurrence as well as its frequency. Bookstein and Swanson [7] suggested that non content bearing terms would be Poisson ....

Luhn H.P., A Statistical Approach to the Mechanized Encoding and Searching of Literary Information, IBM Journal of Research and Development 1(4) (1957) 309--17.


Modelling Adaptive Information Retrieval - Crestani, van Rijsbergen (1993)   (Correct)

....model presented in the next Section allows conceptual modelling of traditional, as well as of advanced IR applications (see [9] in which we will use different representation layers, one for queries and one for documents. 2. 2 Intelligent Information Retrieval It has been a long time since Luhn [17] suggested the use of statistical techniques for the representation of a document s information content. There have been many changes in the field of IR since that time and there have been surprising developments in computer hardware. However some fundamental issues remain unsolved. In particular, ....

H.P. Luhn. A statistical approach to mechanized encoding and searching of library Information. IBM Journal of Research and Development, 1:309:317, 1957.


Text Categorization Based on Weighted Inverse Document Frequency - Tokunaga, Iwayama (1994)   (5 citations)  (Correct)

....methods, and proposes a new method called weighted inverse document frequency (WIDF) 2.1 Term Frequency Term frequency is the simplest measure to weight each term in a text. In this method, each term is assumed to have importance proportional to the number of times it occures in a text [12]. The weight of a term t in a text d is given by W (d; t) TF(d; t) 1) where TF(d; t) is the term frequency of the term t in the text d. Term frequency is known to improve recall in information retrieval, but does not always improve precision. Because frequent terms tend to appear in many ....

H. P. Luhn. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, Vol. 1, No. 4, pp. 307--319, 1957.


Retrieval of Spoken Documents: First Experiences - Crestani, Sanderson (1997)   (Correct)

....schema is widely used in IR to provide a measure of the discrimination power of a term in a document collection. This weight is based on Luhn s assumption and on the assumption that the discriminating power of a term is inversely proportional to the number of documents in which that term occurs [8]. In particular, the inverse document frequency reflects the intuition that the larger the number of documents that are indexed by the same term, the less important the term becomes as a descriptor of any of them. We can now combine the above two weighting schemas with the retrieval formulas ....

H.P. Luhn. A statistical approach to mechanized encoding and searching of library Information. IBM Journal of Research and Development, 1:309:317, 1957.


Efficient Information Access for Wireless Computers - Wachsberg (1996)   (2 citations)  (Correct)

....requests advice, Letizia suggests which hyperlinks to follow based on its database of keywords. This strategy could be adapted to assist a prefetching algorithm in deciding which hyperlinks to follow. It has been recognized that many of the most common words in English are worthless in index terms [21]. A list of words used as a filter during indexing because they make poor index terms is called a stoplist. When building a database of keywords, a stoplist should be used to filter out such words. 3.1.2 Prefetching for Wireless Information Access Kaashoek et al. 22] describe dynamic ....

H. Luhn, "A statistical approach to mechanized encoding and searching of literary information," IBM Journal of Research and Development, vol. 1, no. 4, 1957.


SCAM: A Copy Detection Mechanism for Digital Documents - Shivakumar, Garcia-Molina (1995)   (15 citations)  (Correct)

....similarities [20] except that we use a new similarity measure that more accurately characterizes copy overlap, while traditional IR systems look for semantic similarity. Several schemes have been proposed to enhance IR schemes, such as use of signature files [8] lexical analysis [1] stoplists [13, 9], stemming algorithms [12, 15] thesaurus [21] and ranking algorithms [19] Since our approach is based on IR, such schemes are orthogonal to our model, and one or more of these schemes could be used to enhance our document comparison mechanism. Our scheme is based on words, which are easier to ....

H.P. Luhn. A statistical approach to mechanizedencoding and searching of literary information. IBM Journal of Research and Development, 1(4), 1957.


The Rhetorical Parsing, Summarization, and Generation of Natural.. - Marcu (1997)   (Correct)

....to the field of domainindependent summarization. Word frequency based systems The idea that there exists a correlation between, on one hand, the frequency of words and their distribution, and, on the other hand, the significance in texts of the sentences that contain them goes back as far as Luhn [ 1957, 1958 ] In his experiments, Luhn observed that this correlation follows a Bell curve whose minima correspond to words that occur very seldom and very often and whose maximum corresponds to words that occur relatively frequently. The validity of using word frequency as an indicator of ....

H.P. Luhn. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1(4):309--317, October 1957.


Shallow NLP techniques for Internet Search - Penev, Wong (2006)   (Correct)

No context found.

Luhn, H. P. (1957), `A statistical approach to mechanized encoding and searching of literary information ', IBM Journal of Research and Development, 4(4), 600-605.


INVISTOR - A Distributed MultiMedia Indexing System - Westmacott   (Correct)

No context found.

H.P. Luhn. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1957.


Augmented Trading: From news articles to stock price.. - van Bunningen (2004)   (Correct)

No context found.

H.P. Luhn. A statistical approach to mechanized encoding and searching of literacy information. IBM Journal of Research and Development, 1957.


Keyword Extraction From A Single Document Using Word.. - Matsuo, Ishizuka (2004)   (Correct)

No context found.

H. P. Luhn. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1(4):390, 1957.


Keyword Extraction from a Single Document using Word.. - Matsuo, Ishizuka (2003)   (Correct)

No context found.

H. P. Luhn. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1(4):390, 1957.


PAI: Automatic Indexing for Extracting Asserted Keywords.. - Matsumura, Ohsawa, al. (2002)   (Correct)

No context found.

) H.P. Luhn, "A Statistical Approach to the Mechanized Encoding and Searching of Literary Information", IBMJournaP of ReseaP ch aP- Development,Vol. 1, No. 4, pp.30RpTPf4 1957.


Unlocking Topicality in Text - Foreground and Background.. - Karlgren   (Correct)

No context found.

Hans Peter Luhn. 1957. "A Statistical Approach to Mechanized Encoding and Searching of Literary Information." IBM Journal of Research and Development 1 (4) 309-317.


An Exploratory Analysis of Phrases in Text Retrieval - Pickens, Croft (2000)   (2 citations)  (Correct)

No context found.

Luhn, H. (1957). A statistical approach to the mechanized encoding and searching of literary information. IBM Journal of Research and Development 1(4), 309--317.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC