| H Sch utze and C Silverstein. Projections for efficient document clustering. In Proceedings of the 20th International ACM SIGIR Conference, 1997. |
.... the recent decades, eg [20, 8, 30, 17] and much is known about the importance of feature reduction in general, eg [14] and, in particular, clustering [28] However, little has been done so far to facilitate feature reduction for document clustering of query results, with the notable exception of [22]. In contrast to the latter paper, which uses a tf idf weighting scheme, we suggest ranking the importance of each such candidate keyword j with a weight w j = h j d j h j log( H h j ) 1) where H is the total number of hit documents, h j is the number of documents in H containing word ....
H Sch utze and C Silverstein. Projections for efficient document clustering. In Proceedings of the 20th International ACM SIGIR Conference, 1997.
....Analysis ffl Principal component (i.e. eigenvector) analysis in a high dimensional discrete space. capture the largest variation of words and documents without sacrificing much information. estimate document similarity in a lower dimensional continuous space. ref) Joliffe (1986) Sch utze and Silverstein (1997). ffl Controversial technique in IR: e.g. lack of rigorous statistical foundation. ref) Hinton et al. 1997) ffl However, for word co occurrence modelling purpose, an exact probability estimate may not be not needed if the captured variance is sufficiently large for discrimination. 8th ELSNET ....
Sch utze, H. and C. Silverstein (1997, July). Projections for efficient document clustering. In Proceedings of SIGIR'97, Philadelphia, pp. 74--81.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC