| Croft, W. B. (1981). Document representation in probabilistic models of information retrieval. Journal of American Society for Information Science, pages 451--457. |
.... indexing approaches [19] A different model for using probabilistic indexing weights in retrieval is described in [26] as the 2 Poisson Independence model, but also had little success (mainly because of parameter estimation problems) In contrast to these results, the approaches developed in [6] [7] 35] show improvements over binary indexing; however, these models lack an explicit notion of an event to which the probabilistic weights relate. In this paper, we present a radically different approach to probabilistic indexing. We introduce the concept of relevance description as an ....
....indexing approach which is feasible for real applications. The major concepts of our approach are the following: ffl Definition of a probabilistic indexing model in terms of the BII model. In contrast to nonprobabilistic indexing models (like e.g. 27] or earlier probabilistic models [6], the indexing weights of the BII model have a clear notion as probabilities in a well defined event space. 19 ffl Abstraction from specific term document pairs by definition of relevance descriptions. Unlike many other probabilistic IR models, the probabilistic parameters do not relate to a ....
W.B. Croft. Document representation in probabilistic models of information retrieval. Journal of the American Society for Information Science, 32:451--457, 1981.
....get the same RSVs could be exploited. In the following, we will describe the procedure for estimating the ith addend in equation (2.2) exemplarily for the BIR as well as the retrieval with probabilistic indexing (RPI) model. Other models as for example extensions of the BIR model presented in [Cro81] or [RW94] could have also been applied instead. 2.1.1 Using the BIR model In order to be able to estimate the ith addend in (2.2) with the help of the binary independence retrieval (BIR) model 11 [RSJ76] Sch97a] retrievable items D i Gamma1 2 D i , i.e. documents as well as ....
W. B. Croft. Document representation in probabilistic models of information retrieval. Journal of the American Society for Information Science, 32(6), 1981.
....consistently and efficiently to large numbers of daily incoming documents. The purpose of this paper is to propose a new probabilistic model for automatic text categorization. While many text categorization models have been proposed so far, in this paper, we concentrate on the probabilistic models [12, 8, 6, 9, 3, 17, 18] because these models have solid formal grounding in probability theory. Section 2 quickly reviews the probabilistic models and lists their individual problems. In section 3, we propose a new probabilistic model based on a Single random Variable with Multiple Values (SVMV) Our model is very ....
W. B. Croft. Document representation in probabilistic models of information retrieval. Journal of the American Society for Information Science, Vol. 32, No. 6, pp. 451--457, 1981.
....document, however, does not necessarily mean that the term is indicative of the contents of the document. Rather than make such extreme judgments, we would prefer to use a finer granularity when expressing the degree to which a term should be assigned to a document. This was accomplished by Croft [14, 15] who expressed this degree as the probability of a term being assigned to a document, P(x i = 1 j d) such that documents should now be ranked by the expected value of Equation 1, or g(x) X i P(x i = 1 j d) C log N Gamma n i n i P(x i = 1 j d) is then estimated using the ....
W. B. Croft. Document representation in probabilistic models of information retrieval. J. Amer. Soc. Inf. Sci., 32(6):451--457, Nov. 1981.
.... maximization is a general form of the well known Maximum Likelihood estimation, and we call the algorithm Hierarchical Bayesian Clustering (HBC) Probabilistic models are becoming popular in the field of text retrieval categorization owing to their solid formal grounding in probability theory [ Croft, 1981, Fuhr, 1989, Kwok, 1990, Lewis, 1992 ] They retrieve those texts that have larger posterior probabilities of being relevant to a request. When these models are extended to cluster based text retrieval categorization, however, the algorithm used for text clustering has still been a non ....
W. B. Croft. Document representation in probabilistic models of information retrieval. Journal of the American Society for Information Science, 32(6):451--457, 1981.
....estimate the addend. In other words, we are allowed to use individual indexing vocabularies for different subcollections. In the following, we will present the estimation procedure exemplarily for the BIR model. More sophisticated models as for example extensions of the BIR approach described in [Cro81] or [RW94] respectively, or the retrieval with probabilistic indexing (RPI) model [Fuh89] could have been applied instead (and become even indispensable when dealing with documents containing media data as images, videos etc. In analogy to the non distributed BIR model, we assume the dependency ....
W. B. Croft. Document representation in probabilistic models of information retrieval. Journal of the American Society for Information Science, 32(6), 1981.
....called Hierarchical Bayesian Clustering (HBC) and use the algorithm to construct a set of clusters for cluster based search. The searching platform we focus on is the probabilistic model of text categorization that searches the most likely clusters to which an unseen document is classified [ Croft, 1981, Fuhr, 1989, Iwayama and Tokunaga, 1994, Kwok, 1990, Lewis, 1992 ] Since HBC constructs the most likely set of clusters that contains the the given training documents, HBC gives exactly the same criterion both in constructing and in searching clusters. For this reason, our framework is expected ....
W. B. Croft. Document representation in probabilistic models of information retrieval. Journal of the American Society for Information Science, 32(6):451--457, 1981.
....document, however, does not necessarily mean that the term is indicative of the contents of the document. Rather than make such extreme judgments, we would prefer to use a finer granularity when expressing the degree to which a term should be assigned to a document. This was accomplished by Croft [19, 20] who expressed this degree as the probability of a term being assigned to a document, P(x i = 1 d) such that documents should now be ranked by the expected value of Equation 4.1, or g(x) i P(x i = 1 d) C log N n i n i P(x i = 1 d) is then estimated using the ....
Croft, W. B. Document representation in probabilistic models of information retrieval. J. Amer. Soc. Inf. Sci., 32(6):451--457, Nov. 1981.
....the probability of relevance of a document using Bayesian classification theory. The representations of documents and information needs (requests) that are used for this model are simply sets of unweighted words or index terms. This basic approach can be extended to incorporate weighted terms [CROF81] or requests structured using Boolean operators [CROF86a] Statistical indexing and retrieval techniques are efficient and are more effective in terms of finding relevant documents than searches based on Boolean queries and exact matching [SALT83b] The major disadvantage of these techniques is ....
....of its estimation of the potential of the documents remaining to be checked. 4 The Experiment Given a REST representation of a request, it is relatively straightforward to generate information for a statistical retrieval strategy. The strategy developed from the probabilistic model by Croft [CROF81, CROF86a] can make use of information about the relative importance of terms and about dependencies between terms. This information is derived from the relative importance of slots in case frames and the term groupings that represent concepts. In the absence of information about the importance of terms in ....
Croft, W. B. "Document Representation in Probabilistic Models of Information Retrieval ". Journal of the American Society of Information Science, 32: 451-457; 1981.
No context found.
Croft, W. B. (1981). Document representation in probabilistic models of information retrieval. Journal of American Society for Information Science, pages 451--457.
No context found.
Croft W.B., "Document Representation in Probabilistic Models of Information Retrieval", J. Amer. Soc. Inf. Sci. 32(6):451-457, Nov, 1981.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC