37 citations found. Retrieving documents...
N. Fuhr, "Models for Retrieval with Probabilistic Indexing", Information Processing and Management, 25(1), pages 55-72, 1989.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Naive (Bayes) at Forty: The Independence Assumption in Information .. - Lewis (1998)   (12 citations)  (Correct)

....have been developed that more or less gracefully integrate term frequency and document length information into the BIM itself. The widely used probabilistic indexing approach assumes there is an ideal binary indexing of the document, for which the observed index term occurrences provide evidence [7, 13]. Retrieval or classification is based on computing (or approximating) the expected value of the posterior log odds. The expectation is taken with respect to the probabilities of various ideal indexings. While this is a plausible approach, in practice the probabilities of the ideal indexings are ....

Norbert Fuhr. Models for retrieval with probabilistic indexing. Information Processing and Management, 25(1):55-72, 1989.


Document-Query Duality Meets Maximum Likelihood: The Answer is.. - Bodoff (2001)   (Correct)

....data to estimate the probability that a binary document or query term is represented correctly, and if that seemed improbable, then that bit would be flipped from zero to one or vice versa. This is the corollary to VSM document and query modifications in response to relevance feedback data. Fuhr [7] points exactly in this direction, by explicitly modelling P(x dm ) the probability that representation x is correct for document dm , separately from the probability of relevance P(R x, f k ) That model contributed to the development of our approach. We went one step further: instead of ....

Fuhr, N., Models for Retrieval with Probabilistic Indexing. Information Processing and Management, 1989. 25(1): p. 55-72.


EmailValet: Where do you want to read your Email? - Macskassy, Dayanik, Hirsh (2000)   (Correct)

....its document vector has the highest cosine. The Probabilistic TFIDF classifier (Joachims, 1997) is a probabilistic version of the TFIDF classifier, based on estimation of the probability of a category C given document d, P r(C d) using the retrieval with probabilistic indexing method proposed in (Fuhr, 1989). To classify a new document d, P r(C j d) is estimated for each class, C j , as described in more detail by Joachims (Joachims, 1997) d is assigned to the class whose probability is the highest. The Maximum Entropy(ME) classifier for text classification estimates the conditional ....

Fuhr, N. (1989). Models for retrieval with probabilistic indexing. Information Processing and Management, 25(1), 55--72.


Using Machine Learning To Improve Information Access - Sahami (1999)   (15 citations)  (Correct)

....notion, Cooper and Maron [33] proposed the idea of probabilistic indexing of documents in which index terms are given probabilistic weights based on the relevance of these index terms to queries likely to be given to the retrieval system. Much more recent work in probabilistic indexing by Fuhr [56, 57] also treats the retrieval problem as one of making probabilistic inferences about the relevance of documents to a query, and examines this problem from the different viewpoints of the query and the document. The work along these lines closest in spirit to our own is that of van Rijsbergen [166] ....

....notions of document overlap, based on equivalence classes of words (e.g. synonyms) phrases, or, in general, any function on groups of words in the corpus. In this way, our score can capture the CHAPTER 6. A NEW MODEL FOR DOCUMENT CLUSTERING 94 full generality of probabilistic indexing [56] techniques used in other tasks (such as document retrieval) This extension can be performed by computing the expected document overlap as a sum over multiple multinomial distributions (one for each set of mutually exclusive functional events) For example, say we wished to consider both single ....

Fuhr, N. Models for retrieval with probabilistic indexing. Information Processing and Management 25, 1 (1989), 55--72.


Patent Retrieval System Using Document Filtering Techniques - Naomi Inoue Kazunori (2000)   (Correct)

....Calculation of Similaritybetween Two Documents A document filtering system based on probabilistic models calculates a posterior probability P (cjd) the probability that a user s profile d is classified into a cluster c. Many methods of calculating posterior probability have been proposed[2] 3][4][5] Our patent retrieval system adopts Iwayama s formulation because it has the following advantages over other calculation methods. 1) it considers within document term frequencies. 2) it considers term weighting for incoming documents. 3) it is less affected by having an insufficient ....

N.Fuhr: "Models for Retrieval with Probabilistic Indexing", Information Processing and Retrieval, 25(1), pp.55-72, 1989.


Naive (Bayes) at Forty: The Independence Assumption in Information .. - Lewis (1998)   (12 citations)  (Correct)

....have been developed that more or less gracefully integrate term frequency and document length information into the BIM itself. The widely used probabilistic indexing approach assumes there is an ideal binary indexing of the document, for which the observed index term occurrences provide evidence [7, 13]. Retrieval or classification is based on computing (or approximating) the expected value of the posterior log odds. The expectation is taken with respect to the probabilities of various ideal indexings. While this is a plausible approach, in practice the probabilities of the ideal indexings are ....

Norbert Fuhr. Models for retrieval with probabilistic indexing. Information Processing and Management, 25(1):55--72, 1989.


Training Context-Insensitive versus.. - Bachrach.. (1998)   (Correct)

....for data on the Web. 1.1 Relation to Other work Our PRT algorithm is similar to the algorithm used by Maron [28] The main di erence is that Maron used a small number of features, manually selected, while we use the full document vocabulary. Other variants of this method were used in [12, 21, 24, 33]. The main SE algorithm we examine was recently introduced by Cohen and Singer [6] In addition, we introduce a novel context sensitive variant of the algorithm and a feature reduction mechanism. The datasets we used represent generic text classi cation problems. Web pages ltering is relatively ....

.... of these distributions, the PRT algorithm is an optimal classi er [9] Although this independence assumption is obviously violated in natural language text (see discussions in Cooper [8] and Lewis [23] variants of this algorithm have been applied successfully in variety of IR tasks (see e.g. [28, 12, 21, 24, 33]) In addition to the basic algorithm (pure PRT) we considered its following variations. Sequential application. In the classical PRT hypothesis testing procedure [35, 9] one speci es a significance level parameter which prescribes two thresholds U( log 1 and L( log 1 ....

N. Fuhr. Models for retrieval with probabilistic indexing. Information Processing & Management, 25(1):55-72, 1989.


Text Passage Classification Using Supervised Learning - Bi, Murtagh, McClean, Anderson (1999)   (1 citation)  (Correct)

....classification. That is because we somewhat directly adapt the Nave Bayes Classifier described in [17] into the document classification in this work with little variation on text representation. The evaluation of performance of the variations of this learning algorithm has been described in [4, 13, 17]. In this context, however, it would be apparent that the identifying passage can fail or be misled when the incorrect text category knowledge is given, and this method is suspect or powerless if the knowledge is incorrect or unavailable. Therefore, identifying a passage is based on such a simple ....

N. Fuhr. Models for Retrieval with Probabilistic Indexing. Information Processing and Management. 25(1), p55-72, 1989.


EmailValet: Learning User Preferences for Wireless Email - Macskassy, Dayanik, Hirsh (1999)   (1 citation)  (Correct)

....its document vector has the highest cosine. The Probabilistic TFIDF classifier [Joachims, 1997] is a probabilistic version of the TFIDF classifier, based on estimation of the probability of a category C given document d, P r(C d) using the retrieval with probabilistic indexing method proposed in [Fuhr, 1989] . To classify a new document d, P r(C j d) is estimated for each class, C j , as described in more detail by Joachims [1997] d is assigned to the class whose probability is the highest. Ripper [Cohen, 1995; 1996] is a learning method that forms sets of simple rules for data described by ....

N. Fuhr. Models for retrieval with probabilistic indexing. Information Processing and Management, 25(1):55--72, 1989.


ACIRD: Intelligent Internet Documents Organization and.. - Lin, Chen, Ho, Huang (2002)   (Correct)

....training data instead of non structured textual data. This motivated many approaches to document classification use corpus to characterize documents and develops new algorithms to learn classification knowledge. These algorithms include Bayesian independence classifier [21] knearest neighbor [22, 32], rule based induction algorithm [10] and mixed approached such as INQUERY [33] Those systems concentrate on the document categorization and the learning algorithm, but they omit the diversity of the semantics of terms (or features) in the document. In machine learning, the feature is usually an ....

N. Fuhr, "Models for Retrieval with Probabilistic Indexing", Information Processing and Management, Vol. 25, No. 1, 1989, pp. 55-72.


EmailValet: Learning Email Preferences for Wireless Platforms - Macskassy, Hirsh, Dayanik (1999)   (Correct)

....with similar content have similar vectors. The Probabilistic TFIDF classifier [7] is a probabilistic version of the TFIDF classifier, based on estimation of the probability of a category C j given document d, P r(C j d) using the retrieval with probabilistic indexing method proposed in [6]. Ripper [4, 5] is a learning method that forms sets of simple rules for data described by sets of attribute value pairs. Each rule tests a conjunction of conditions on attribute values. Rules are returned as an ordered list, and the first successful rule provides the prediction for the class ....

N. Fuhr. Models for retrieval with probabilistic indexing. Information Processing and Management, 25(1):55--72, 1989.


Text Categorization Based on Weighted Inverse Document Frequency - Tokunaga, Iwayama (1994)   (5 citations)  (Correct)

....between C i and d given t, that is P (C i jt; d) P (C i jt) we obtain Eq. 9) P (C i jd) X t P (C i jt)P (tjd) 9) Using Bayes rule, we finally obtain Eq. 10) P (C i jd) P (C i ) X t P (tjC i )P (tjd) P (t) 10) This formulation is different from the one proposed in [3, 15]. The details of this formulation is discussed elsewhere [16] Here P (tjC i ) is the probability that a randomly selected term in the category C i is the term t. P (tjd) is the probability that a randomly selected term in the text t is the term t. P (t) and P (C i ) are the prior probabilities of ....

N. Fuhr. Models for retrieval with probabilistic indexing. Information Processing & Management, Vol. 25, No. 1, pp. 55--72, 1989.


A probabilistic model for text categorization: Based on a.. - Iwayama, Tokunaga (1994)   (2 citations)  (Correct)

....consistently and efficiently to large numbers of daily incoming documents. The purpose of this paper is to propose a new probabilistic model for automatic text categorization. While many text categorization models have been proposed so far, in this paper, we concentrate on the probabilistic models [12, 8, 6, 9, 3, 17, 18] because these models have solid formal grounding in probability theory. Section 2 quickly reviews the probabilistic models and lists their individual problems. In section 3, we propose a new probabilistic model based on a Single random Variable with Multiple Values (SVMV) Our model is very ....

....i has, the more probably it will be categorized into category c. This is called the Probabilistic Ranking Principle (PRP) 11] Several strategies can be used to assign categories to a document based on PRP [9] There are several ways to calculate P (cjd) Three representatives are [12] 8] and [6]. 2.1 Probabilistic Relevance Weighting (PRW) Robertson and Sparck Jones [12] make use of the well known logistic (or log odds) transformation of the probability P (cjd) g(cjd) log P (cjd) P (cjd) 2) where c means not c , that is a document is not categorized into c. Since this is a ....

[Article contains additional citation context not shown here]

N. Fuhr. Models for retrieval with probabilistic indexing. Information Processing & Retrieval, Vol. 25, No. 1, pp. 55--72, 1989.


Hierarchical Bayesian Clustering for Automatic Text.. - Iwayama, Tokunaga (1995)   (5 citations)  (Correct)

.... is a general form of the well known Maximum Likelihood estimation, and we call the algorithm Hierarchical Bayesian Clustering (HBC) Probabilistic models are becoming popular in the field of text retrieval categorization owing to their solid formal grounding in probability theory [ Croft, 1981, Fuhr, 1989, Kwok, 1990, Lewis, 1992 ] They retrieve those texts that have larger posterior probabilities of being relevant to a request. When these models are extended to cluster based text retrieval categorization, however, the algorithm used for text clustering has still been a non probabilistic one [ ....

N. Fuhr. Models for retrieval with probabilistic indexing. Information Processing & Retrieval, 25(1):55--72, 1989.


A First Approach to Speech Retrieval - Glavitsch   (Correct)

....optimizes both the cost and the effectiveness of the retrieval system. The importance of the probability ranking principle comes from the fact that it can be proven mathematically. Two well known probabilistic retrieval methods are the Binary Independence Retrieval (BIR) model [RSJ76] Fuh89] and the Binary Independence Indexing (BII) model [FB91] The BIR model assigns probabilistic weights to query features whereas the BII model assigns probabilistic weights to document features. The probabilistic parameters are computed by means of a test collection. Both the BII and the BIR ....

N. Fuhr. Models for Retrieval with Probabilistic Indexing. Information Processing & Management, 25(1):55--72, 1989.


Learning to Extract Symbolic Knowledge from the World.. - Craven, Freitag.. (1998)   (111 citations)  (Correct)

....of the classifier by examining the structure of the page s URL. Below we describe these two steps in turn. 4.1 Using Word Vectors to Classify Web Pages Word vector based methods represent a document as a vector, with one entry for each word in the vocabulary. The Probabilistic Indexing approach [4] used in this paper classifies a new document d by summing, over all words in d , the probability that the word is representative of both the document and the class. More precisely, it assigns the class C to document d according to the following rule: C = argmax 8C Pr(Cjd ....

N. Fuhr. Models for retrieval with probabilistic indexing. Information Processing and Management, 25:55--72, 1989.


Natural Language Processing for Information Retrieval - Lewis, Jones (1996)   (41 citations)  (Correct)

....most NLP tasks. TR, even more than DR, is tolerant with respect to errors in document representations. In addition, ambiguities in NLP system output (for instance, alternative decompositions of a sentence into phrases) can be assigned probabilities of correctness in a probabilistic indexing method [11]. On the other hand, NLP applied to documents must cope with vast amounts of variable quality text from broad domains. User requests present smaller amounts of text, but even more variability in form and content. Each of the three main aspects of our strategy forming text descriptions, providing ....

Fuhr, N. Models for retrieval with probabilistic indexing. Inf. Process. Manage., 25, 1 (1989), 55--72.


A Probabilistic Learning Approach for Document Indexing - Fuhr, Buckley (1991)   (23 citations)  Self-citation (Fuhr)   (Correct)

....be assigned those terms that are used by queries to which the document is relevant. With this model, the notion of weighted indexing (instead of binary indexing) that is the weighting of the index terms w.r.t. the document, was given a theoretical justification in terms of probabilities. In [13], this approach is generalized to all models of probabilistic indexing by introducing the concept of correctness as the event to which the probabilities relate. The Maron and Kuhns model assumes that the probabilistic indexing weights for a document can be estimated on the basis of relevance ....

....range: in the case of search term weighting from relevance feedback, the relevance information collected for one query is worthless for any other query. In the same way, the probabilistic indexing approach restricts the use of relevance data to a single document. The Darmstadt Indexing Approach [13] [3] overcomes these deficiencies by introducing the concept of relevance descriptions: a relevance description is an abstraction from specific queries, documents and terms. Like in pattern recognition methods, a relevance description contains values of features of the objects under consideration ....

[Article contains additional citation context not shown here]

N. Fuhr. Models for retrieval with probabilistic indexing. Information Processing and Management, 25(1):55--72, 1989.


Models in Information Retrieval - Fuhr   Self-citation (Fuhr)   (Correct)

....In the following, we will assume that P (I d m ) is the same for all documents; so we only have to estimate the parameters P (I t i ,d m ) A direct estimation of these parameters would su#er from the same problems as described before. Instead, we apply the so called description oriented approach [5]. Here the basic idea is the abstraction from specific terms and documents. Instead, we regard feature vectors x(t i ,d m ) of term document pairs, and we estimate probabilities P (I x(t i ,d m ) referring to these vectors. The di#erences between the two strategies are illustrated in figure 9. A ....

N. Fuhr. Models for retrieval with probabilistic indexing. Information Processing and Management, 25(1):55--72, 1989.


Probabilistic Models in Information Retrieval - Fuhr (1992)   (39 citations)  Self-citation (Fuhr)   (Correct)

....methods for coping with imprecision in databases [IEEE 89, Motro 90] As new databases for technical, scientific and office applications are set up, this issue becomes of increasing importance. A first probabilistic model that can handle both vague queries and imprecise data has been presented in [Fuhr 90]. Furthermore, the integration of text and fact retrieval will be a major issue (see e.g. Rabitti Savino 90] Finally, it should be mentioned that the models discussed here do scarcely take into account the special requirements of interactive retrieval. Even the feedback methods are more or ....

Fuhr, N. (1989a). Models for Retrieval with Probabilistic Indexing. Information Processing and Management 25(1), pages 55--72.


Optimizing Document Indexing and Search Term Weighting Based.. - Fuhr, Buckley (1993)   (3 citations)  Self-citation (Fuhr)   (Correct)

No context found.

Fuhr, N. (1989). Models for Retrieval with Probabilistic Indexing. Information Processing and Management 25(1), pages 55--72.


Probabilistic Information Retrieval as Combination of.. - Fuhr, Pfeifer (1994)   (9 citations)  Self-citation (Fuhr)   (Correct)

....learning strategy. 3 A new probabilistic model for the Darmstadt Indexing Approach The Darmstadt Indexing Approach (DIA) is a dictionary based approach for automatic indexing from document titles and abstracts, with index terms (called descriptors here) from a prescribed indexing vocabulary ([8] [11] This means that a descriptor may be assigned to a document even when it does not occur in the document text. For the task of mapping text content onto the set of descriptors, the approach needs an indexing dictionary containing term descriptor rules for as many terms (i.e. words or ....

N. Fuhr. Models for retrieval with probabilistic indexing. Information Processing and Management, 25, 1 (1989), 55--72.


A Probabilistic Analysis of the Rocchio Algorithm with .. - Thorsten Joachims..   (Correct)

No context found.

N. Fuhr, "Models for Retrieval with Probabilistic Indexing", Information Processing and Management, 25(1), pages 55-72, 1989.


Where Should The Person Stop And The Information Search Interface.. - Bates (1990)   (26 citations)  (Correct)

No context found.

Fuhr, Norbert, "Models for Retrieval with Probabilistic Indexing," Information Processing & Management, 25, 1, 1989, pp. 55-72. 22


Automatic Indexing Based on Bayesian Inference Networks - Tzeras, Hartmann (1993)   (22 citations)  (Correct)

No context found.

Fuhr, N. (1989a). Models for Retrieval with probabilistic Indexing. Information Processing and Management 25(1), pages 55-72.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC