| T. G. Kolda. Limited-Memory Matrix Methods with Applications. PhD thesis, The Applied Mathematics Program, University of Maryland, College Park, Mayland, 1997. |
....weights on the edges, we can capture the degree of this association. One possibility is to have edge weights equal term frequencies, i.e. the number of times a word occurs in a document. In fact, most of the term weighting formulae used in information retrieval may be used as edge weights, see [31, 32, 26] for more details. One popular term weighting scheme is to have the edge weight E ij associated with the edge fw i ; d j g be E ij = t ij log jDj jD i j 4 where t ij is the number of times word w i occurs in document d j , jDj = n is the total number of documents and jD i j is the ....
T. G. Kolda. Limited-Memory Matrix Methods with Applications. PhD thesis, The Applied Mathematics Program, University of Maryland, College Park, Mayland, 1997.
....the overall importance of a word in the entire set of documents. The objective of such weighting schemes is to enhance discrimination between various document vectors for better retrieval e ectiveness [7] There are many schemes for selecting the term, global, and normalization components, see [6] for various possibilities. One popular scheme is the tfn scheme known as normalized term frequencyinverse document frequency. This scheme uses t ji = f ji , g j = log(d=d j ) and s i = P w j=1 (t ji g j ) 2 1=2 , where d j is the number of documents that contain word j. Note that this ....
T. G. Kolda. Limited-Memory Matrix Methods with Applications. PhD thesis, The Applied Mathematics Program, University of Maryland, College Park, Mayland, 1997.
....to normalize the columns in the final matrix. If we do not, short documents may not be recognized as relevant. To normalize, s i can be set to ( w P j=1 (t ji g j ) 2 ) Gamma 1 2 . Note that this particular normalization causes all the document vectors to lie on the unit sphere. Kolda [Kol97] presents 5, 5, and 2 schemes, respectively, for the term, global, and normalization components. Finally, a given query can be formulated as a query vector q, and the documents that contain the query words may be returned by the vector X T q, where X = x 1 ; x 2 ; x d ] Note that the ....
....entries may also be scaled by the term, global and normalization components. A weighting scheme can, therefore, be specified by a six letter combination that indicates term, global and normalization components for the document matrix and the term, global and normalization components for the query [Kol97], for example, lfn.txx or lgn.lgx. The SMART system developed at Cornell uses the above vector space model for query retrieval. ....
T. G. Kolda. Limited-Memory Matrix Methods with Applications. PhD thesis, The Applied Mathematics Program, University of Maryland, College Park, Mayland, 1997.
....overall importance of a word in the entire set of documents. The objective of such weighting schemes is to enhance discrimination between various document vectors for better retrieval e ectiveness [SB88] There are many schemes for selecting the term, global, and normalization components, see [Kol97] for various possibilities. In this paper we use the popular tfn scheme known as normalized term frequencyinverse document frequency. This scheme uses t ji = f ji , g j = log(d=d j ) and s i = P w j=1 (t ji g j ) 2 1=2 . Note that this normalization implies that kx i k = 1, i.e. each ....
T. G. Kolda. Limited-Memory Matrix Methods with Applications. PhD thesis, The Applied Mathematics Program, University of Maryland, College Park, Mayland, 1997.
....weights on the edges, we can capture the degree of this association. One possibility is to have edge weights equal term frequencies, i.e. the number of times a word occurs in a document. In fact, most of the term weighting formulae used in information retrieval may be used as edge weights, see [31, 32, 26] for more details. One popular term weighting scheme is to have the edge weight E ij associated with the edge fw i ; d j g be E ij = t ij log jDj jD i j ; 4 where t ij is the number of times word w i occurs in document d j , jDj = n is the total number of documents and jD i j is the ....
T. G. Kolda. Limited-Memory Matrix Methods with Applications. PhD thesis, The Applied Mathematics Program, University of Maryland, College Park, Mayland, 1997.
....(SVD) or principal component analysis to discover latent relationships between correlated words and documents. Truncated SVD is a popular and well studied matrix approximation scheme [Golub and Van Loan, 1996] Based on the earlier work of [O Leary and Peleg, 1983] for image compression, [Kolda, 1997] has developed a memory efficient matrix approximation scheme known as semi discrete decomposition. Gallant, 1994, Caid and Oing, 1997] have used an implicit matrix approximation scheme based on their context vectors. Papadimitriou et al. 1998] have proposed computationally efficient matrix ....
....entire 4 set of documents. The objective of such weighting schemes is to enhance discrimination between various document vectors and to enhance retrieval effectiveness [Salton and Buckley, 1988] There are many schemes for selecting the term, global, and normalization components, for example, [Kolda, 1997] presents 5, 5, and 2 schemes, respectively, for the term, global, and normalization components a total of 5 Theta 5 Theta 2 = 50 choices. From this extensive set, we will use two popular schemes denoted as txn and tfn, and known, respectively, as normalized term frequency and normalized term ....
Kolda, T. G. (1997). Limited-Memory Matrix Methods with Applications. PhD thesis, The Applied Mathematics Program, University of Maryland, College Park, Mayland.
....(SVD) or principal component analysis to discover latent relationships between correlated words and documents. Truncated SVD is a popular and well studied matrix approximation scheme (Golub and Van Loan, 1996) Based on the earlier work of (O Leary and Peleg, 1983) for image compression, (Kolda, 1997) has developed a memory efficient matrix approximation scheme known as semidiscrete decomposition. Gallant, 1994; Caid and Oing, 1997) have used an implicit matrix approximation scheme based on their context vectors. Papadimitriou et al. 1998) have proposed computationally efficient matrix ....
....the entire set of documents. The objective of such weighting schemes is to enhance discrimination between various document vectors and to enhance retrieval effectiveness (Salton and Buckley, 1988) There are many schemes for selecting the term, global, and normalization components, for example, (Kolda, 1997) presents 5, 5, and 2 schemes, respectively, for the term, global, and normalization components a total of 5 5 2 = 50 choices. From this extensive set, we will use two popular schemes denoted as txn and tfn, and known, respectively, as normalized term frequency and normalized term ....
Kolda, T. G.: 1997, `Limited-Memory Matrix Methods with Applications'. Ph.D. thesis, The Applied Mathematics Program, University of Maryland, College Park, Mayland.
....(LSI) that uses truncated singular value decomposition (SVD) or principal component analysis to discover latent relationships between correlated words and documents. One may interpret LSI as a matrix approximation scheme. Based on earlier work of [O Leary and Peleg, 1983] for image compression, [Kolda, 1997] has developed a memory efficient matrix approximation scheme known as semi discrete decomposition. Gallant, 1994, Caid and Oing, 1997] have used an implicit matrix approximation scheme based on their context vectors. Papadimitriou et al. 1998] have proposed computationally efficient matrix ....
....in the entire set of documents. The objective of such weighting schemes is to enhance discrimination between various document vectors and to enhance retrieval effectiveness [Salton and Buckley, 1988] There are many schemes for selecting the term, global, and normalization components, for example, [Kolda, 1997] presents 5, 5, and 2 schemes, respectively, for the term, global, and normalization components a total of 5 Theta 5 Theta 2 = 50 choices. From this extensive set, we will use two popular schemes denoted as txn and tfn, and known, respectively, as normalized term frequency and normalized term ....
Kolda, T. G. (1997). Limited-Memory Matrix Methods with Applications. PhD thesis, The Applied Mathematics Program, University of Maryland, College Park, Mayland. 31
....a maximum allowable number of iterations can be specified. This method was derived from the semidiscrete decomposition that was introduced by O Leary and Peleg [37] for image compression and that was also used for latent semantic indexing in information retrieval by Kolda and O Leary [32, 35, 34]. 5.2. Kernighan Lin Fiduccia Mattheyses. The Kernighan Lin [31] algorithm is a widely used method for improving a graph partition. As with alternating partitioning, the initial partition can be random, or it can be the output of another algorithm. A reformulation by Fiduccia and Mattheyses ....
T. G. Kolda, Limited-Memory Matrix Methods with Applications, Ph.D. thesis, Applied Mathematics Program, University of Maryland, College Park, MD, 1997. Also available as Department of Computer Science Tech. Rep. CS-TR-3806.
....from (4) As with the SDD algorithm, several di erent strategies can be used to initialize y in Step (2a) in the weighted SDD algorithm (Figure 1) Here, we only present the these schemes brie y. The same convergence results hold, and the proofs are similar to those given for the SDD algorithm in [Kolda 1997], as listed below. 1) MAX: Choose e j such that j is the index of the column containing the largest magnitude entry in R k R k W . 2) CYC: Choose e i where i = k mod n) 1. 3) THR: Accept a given unit vector only if it satis es kR k e j k 2 W kR k k 2 W =n. Theorem 3. Kolda 1997] ....
....in [Kolda 1997] as listed below. 1) MAX: Choose e j such that j is the index of the column containing the largest magnitude entry in R k R k W . 2) CYC: Choose e i where i = k mod n) 1. 3) THR: Accept a given unit vector only if it satis es kR k e j k 2 W kR k k 2 W =n. Theorem 3. [Kolda 1997] The sequence fA k g generated by the SDD algorithm with MAX initialization converges to A in the Frobenius norm. Furthermore, the rate of convergence is at least linear. Theorem 4. Kolda 1997] The sequence fA k g generated by the SDD algorithm with CYC initialization converges to A in the ....
[Article contains additional citation context not shown here]
Kolda, T. G. 1997. Limited-Memory Matrix Methods with Applications. Ph. D. thesis, University of Maryland, College Park, MD.
.... there is too much information to be indexed manually. Most people have used some type of information retrieval system in the form of Internet search engines. Search engines are based on information retrieval models such as the Boolean system, the probabilistic model, or the vector space model [7]. We focus on the vector space model, described in Sect. 2, which models documents and queries as vectors and computes similarity scores using an inner product. The performance of the vector space model depends on the term weighting scheme, that is, the functions that determine the components of ....
....documents and queries are represented as vectors in term space. The term list for a given document collection is compiled as follows. Words appearing in only one document are removed. Numbers, punctuation, and stop words are also removed. The remaining words form our set of terms. See Kolda [7] for further details on preprocessing. To compare a document and query, we find their similarity score by computing their dot product. For example, Table 1 shows two partial documents from MEDLINE, and Table 2 shows a query from the MEDLINE test collection. 1 (See the Appendix A for the ....
[Article contains additional citation context not shown here]
T. G. Kolda. Limited-Memory Matrix Methods with Applications. PhD thesis, Applied Mathematics Program, University of Maryland, College Park, Maryland, 1997. Also available as Department of Computer Science Technical Report CS-TR-3806.
....initialization scheme leads to a linearly convergent algorithm (Theorem 3) but is computationally expensive if R k is stored implicitly as A Gamma A k . 2. Cycling (CYC) initialization sets y k = e i where i = k mod n) 1. Unfortunately, the rate of convergence can be as slow as n step linear [Kolda 1997]. 3. Threshold (THR) initialization also cycles through the unit vectors, but it does not accept a given vector unless it satisfies kR k e j k 2 2 kR k k 2 F =n. We are guaranteed that at least one unit vector will satisfy this inequality by definition of the F norm. Even though R k is stored ....
....to it by finding y 2 S n that is a discrete approximation to v; that is, find a y that solves min ky Gamma vk 2 s.t. y 2 S n ; y j y=kyk 2 : 7) This also yields a linearly convergent algorithm (Theorem 6) We conclude this section with the proof of these convergence results. Theorem 3. [Kolda 1997] The sequence fA k g generated by the SDD algorithm with MAX initialization converges to A in the Frobenius norm. Furthermore, the rate of convergence is at least linear. Proof. Without loss of generality, assume that R k 6= 0 for all k; otherwise, the algorithm terminates at the exact solution. ....
[Article contains additional citation context not shown here]
Kolda, T. G. 1997. Limited-memory matrix methods with applications. Technical Report CS-TR-3806, Computer Science Department, University of Maryland, College Park, MD.
....initialization scheme leads to a linearly convergent algorithm (Theorem 3) but is computationally expensive if R k is stored implicitly as A Gamma A k . 2) Cycling (CYC) initialization sets y k = e i where i = k mod n) 1. Unfortunately, the rate of convergence can be as slow as n step linear [Kolda 1997]. 3) Threshold (THR) initialization also cycles through the unit vectors, but it does not accept a given vector unless it satisfies kR k e j k 2 2 kR k k 2 F =n. We are guaranteed that at least one unit vector will satisfy this inequality by definition of the F norm. Even though R k is stored ....
....to it by finding y 2 S n that is a discrete approximation to v; that is, find a y that solves min ky Gamma vk 2 s.t. y 2 S n ; y j y=kyk 2 : 7) This also yields a linearly convergent algorithm (Theorem 6) We conclude this section with the proof of these convergence results. Theorem 3. [Kolda 1997] The sequence fA k g generated by the SDD algorithm with MAX initialization converges to A in the Frobenius norm. Furthermore, the rate of convergence is at least linear. Proof. Without loss of generality, assume that R k 6= 0 for all k; otherwise, the algorithm terminates at the exact solution. ....
[Article contains additional citation context not shown here]
Kolda, T. G. 1997. Limited-Memory Matrix Methods with Applications. Ph. D. thesis, University of Maryland Applied Mathematics Program. Also available as Department of Computer Science Technical Report CS-TR-3806.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC