@MISC{_ameasure, author = {}, title = {A Measure Theoretic Approach to Information Retrieval}, year = {} }

Share

OpenURL

Abstract

The vector space model of information retrieval is one of the classical and widely applied retrieval models. Paradoxically, it has been characterised by a discrepancy between its formal framework and implementable form. The underlying concepts of the vector space model are mathematical terms: linear space, vector, and inner product. However, in the vector space model, the mathematical meaning of these concepts is not preserved. They are used as mere computational constructs or metaphors. Thus, the vector space model actually does not follow logically from the mathematical concepts on which it has been claimed to rest. This problem has been recognised for more than two decades, but no proper solution has emerged so far. The present paper proposes just such a solution to this very problem. Firstly, the concept of retrieval is defined based on measure theory. Then, retrieval is particularised using fuzzy set theory. As a result, the retrieval function is conceived as the cardinality of the intersection of two fuzzy sets. This view makes it possible to build a connection to linear spaces. Thus, the classical and the generalised vector space models as well as the latent semantic indexing model gain a correct formal background with which they are consistent. At the same time it becomes clear that the inner product is not a necessary ingredient of the vector space model. Moreover, this view makes it possible to consistently formulate new retrieval methods: in linear space with general basis, entropy-based, and probability-based. Experimental results using standard test collections are also reported.