### Citations

3632 | Authoritative sources in a hyperlinked environment
- Kleinberg
- 1998
(Show Context)
Citation Context ...symmetric: Whenever is an eigenvector for some ( ) w v eigenvalue λ, then is an eigenvector for −λ. −w Hence, we have exactly the same setting as in the established Hubs and Authorities (HITS) model (=-=Kleinberg, 1999-=-). The first part of any eigenvector is always an eigenvector of the hub matrix G T G, and the second part is an eigenvector of the authority matrix GG T . Z T 2 Z3A Simple Algorithm for Nuclear Norm... |

593 | Matrix factorization techniques for recommender systems - Koren, Bell, et al. |

555 | A singular value thresholding algorithm for matrix completion
- Cai, Candes, et al.
- 2008
(Show Context)
Citation Context ...≤ t } . Recently (Toh & Yun, 2009; Liu et al., 2009) and (Ji & Ye, 2009) independently proposed algorithms that obtain an ɛ-accurate solution to (1) in O(1/ √ ɛ) steps, by improving the algorithm of (=-=Cai et al., 2008-=-). More recently also (Mazumder et al., 2009) and (Ma et al., 2009) proposed algorithms in this line of so called singular value thresholding methods, but cannot guarantee a convergence speed. Each st... |

359 | The power of convex relaxation: Nearoptimal matrix completion
- Candes, Tao
(Show Context)
Citation Context ...ormulations occur in many machine learning and compressed sensing applications such as dimensionality reduction, matrix classification, multi-task learning and matrix completion (Srebro et al., 2004; =-=Candes & Tao, 2009-=-). Matrix completion by using matrix factorizations of either low rank or low norm has gained a lot of attention in the area of recommender systems (Koren et al., 2009) with the recently ended Netflix... |

280 | Projected gradient methods for nonnegative matrix factorization
- Lin
(Show Context)
Citation Context ...sually becomes a non-convex problem (consider for example U, V ∈ R 1×1 together with the identity function f(x) = x). Therefore many of the popular methods such as for example (Rennie & Srebro, 2005; =-=Lin, 2007-=-) can get stuck in local minima and so are neither theoretically nor practically well justified, see also (DeCoste, 2006). These shortcomings can be overcome as follows: One can equivalently transform... |

274 | A rank minimization heuristic with application to minimum order system approximation.
- Fazel, Hindi, et al.
- 2001
(Show Context)
Citation Context ...ny non-zero matrix X ∈ R n×m and t ∈ R: ||X||∗ ≤ t 2 iff ∃ symmetric matrices A ∈ Sn×n , B ∈ Sm×m ( A X s.t. XT ) ≽ 0 and T r(A) + T r(B) = t . B Proof. This is a slight variation of the argument of (=-=Fazel et al., 2001-=-; Srebro et al., 2004). ⇒ From the characterization ||X||∗ = 1 minUV T =X 2 (||U||2 F ro + ||V ||2F ro ) we get that ∃ U, V , UV T = X s.t. ||U|| 2 F ro + ||V ||2F ro = T r(UU T ) + T r(V V T ( ) ≤ t,... |

246 | Fast maximum margin matrix factorization for collaborative prediction
- Rennie, Srebro
- 2005
(Show Context)
Citation Context ...tor variables U and V usually becomes a non-convex problem (consider for example U, V ∈ R 1×1 together with the identity function f(x) = x). Therefore many of the popular methods such as for example (=-=Rennie & Srebro, 2005-=-; Lin, 2007) can get stuck in local minima and so are neither theoretically nor practically well justified, see also (DeCoste, 2006). These shortcomings can be overcome as follows: One can equivalentl... |

196 | Weighted low-rank approximations
- Srebro, Jaakkola
- 2003
(Show Context)
Citation Context ...& Srebro, 2005) report an optimization time of about 5 hours on the 1M dataset, but use the different smoothed hinge loss function so that the results cannot be directly compared. (Ma et al., 2009), (=-=Srebro & Jaakkola, 2003-=-) and (Ji & Ye, 2009) only obtained results on much smaller datasets. 0.94 0.863 0.785 RMSE 0.708 MovieLens 10M rb 1/k, test best on line segm., test gradient interp., test 1/k, train best on line seg... |

183 | An accelerated proximal gradient algorithm for nuclear norm regular- ized least squares problems
- Toh, Yun
- 2009
(Show Context)
Citation Context ...is pursuit de-noising problem in compressed sensing literature. The analogue vector variant of (2) is the Lasso problem (Tibshirani, 1996) which is minx∈Rn { 2 ||Ax − b|| ∣ 2 ||x||1 ≤ t } . Recently (=-=Toh & Yun, 2009-=-; Liu et al., 2009) and (Ji & Ye, 2009) independently proposed algorithms that obtain an ɛ-accurate solution to (1) in O(1/ √ ɛ) steps, by improving the algorithm of (Cai et al., 2008). More recently ... |

114 |
Improving regularized singular value decomposition for collaborative filtering.
- Paterek
- 2007
(Show Context)
Citation Context ...pproximate SVD methods, which were used as a building block by most of the teams participating in the Netflix competition (including the winner team). Those methods have been further investigated by (=-=Paterek, 2007-=-; Takács et al., 2009) and also (Kurucz et al., 2007), which already proposed a heuristic using the HITS formulation. These approaches are algorithmically extremely similar to our method, although the... |

111 | An accelerated gradient method for trace norm minimization.
- Ji, Ye
- 2009
(Show Context)
Citation Context ...sed sensing literature. The analogue vector variant of (2) is the Lasso problem (Tibshirani, 1996) which is minx∈Rn { 2 ||Ax − b|| ∣ 2 ||x||1 ≤ t } . Recently (Toh & Yun, 2009; Liu et al., 2009) and (=-=Ji & Ye, 2009-=-) independently proposed algorithms that obtain an ɛ-accurate solution to (1) in O(1/ √ ɛ) steps, by improving the algorithm of (Cai et al., 2008). More recently also (Mazumder et al., 2009) and (Ma e... |

106 | A survey on PageRank computing.
- Berkhin
- 2005
(Show Context)
Citation Context ...f the Eigenvalue Problem For the actual computation of ( the approximate largest eigenvector in ApproxEV −∇ ˆ f(Z (k)), Cfˆ k2 ) , either Lanczos method or the power method (as in PageRank, see e.g. (=-=Berkhin, 2005-=-)) can be used. Both methods are known to scale well to very large problems and can be parallelized easily, as each iteration consists of just one matrix-vector multiplication. However, we have to be ... |

103 | Spectral regularization algorithms for learning large incomplete matrices
- Mazumder, Hastie, et al.
(Show Context)
Citation Context ...et al., 2009) and (Ji & Ye, 2009) independently proposed algorithms that obtain an ɛ-accurate solution to (1) in O(1/ √ ɛ) steps, by improving the algorithm of (Cai et al., 2008). More recently also (=-=Mazumder et al., 2009-=-) and (Ma et al., 2009) proposed algorithms in this line of so called singular value thresholding methods, but cannot guarantee a convergence speed. Each step of all those algorithms requires the comp... |

84 | Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm.
- Clarkson
- 2008
(Show Context)
Citation Context .... α 2 (f(Z ′ ) − f(Z) + 〈Z ′ − Z, ∇f(Z)〉) , which turns out to be small for many applications 1 . The algorithm can be seen as a matrix generalization of the sparse greedy approximation algorithm of (=-=Clarkson, 2008-=-) for vectors in the unit simplex, called the coreset method, which has seen many successful applications in a variety of areas ranging from clustering to support vector machine training, smallest enc... |

74 | Non-linear matrix factorization with gaussian processes.
- Lawrence, Urtasun
- 2009
(Show Context)
Citation Context ... 48333. We obtained a RMSEtest of 0.8617 after k = 400 steps, in a total running time of 52 minutes (16291 matrix multiplications). Our best RMSEtest value was 0.8573, compared to 0.8543 obtained by (=-=Lawrence & Urtasun, 2009-=-) using their non-linear improvement of MMMF. Algorithm Variants. Comparing the proposed algorithm variants from Section 4.3, Figure 2 demonstrates moderate improvements compared to our original Algor... |

61 | Scalable collaborative filtering approaches for large recommender systems. - Takacs, Pilaszy, et al. - 2009 |

41 | Fast algorithms for approximate semidefinite programming using themultiplicative weights update method
- Arora, Hazan, et al.
- 2005
(Show Context)
Citation Context ...Hazan, 2008) made use of the fact that Lanczos method, which is theoretically better understood, provably obtains the required approximation quality in a bounded number of steps if the matrix is PSD (=-=Arora et al., 2005-=-). For arbitrary loss function f, the gradient −∇ ˆ f(Z), which is the matrix whose largest eigenvector we have to compute in the algorithm, is always a symmetric matrix of the block form ∇ ˆ ( 0 G f(... |

41 | Collaborative filtering in a non-uniform world: Learning with the weighted trace norm
- Srebro, Salakhutdinov
- 2010
(Show Context)
Citation Context ...proaches are algorithmically extremely similar to our method, although they are aimed at a slightly different optimization problem, and do not directly guarantee bounded nuclear norm. Very recently, (=-=Salakhutdinov & Srebro, 2010-=-) observed that Funk’s algorithm can be seen as stochastic gradient descent to optimize (1) when the regularization term is replaced by a weighted variant of the nuclear norm. Simon Funk’s method cons... |

40 | An implementable proximal point algorithmic framework for nuclear norm minimization,” - Liu, Sun, et al. - 2012 |

39 | Sparse approximate solutions to semidefinite programs.
- Hazan
- 2008
(Show Context)
Citation Context ... regularization, such as e.g. low norm matrix factorizations, have seen many applications recently. We propose a new approximation algorithm building upon the recent sparse approximate SDP solver of (=-=Hazan, 2008-=-). The experimental efficiency of our method is demonstrated on large matrix completion problems such as the Netflix dataset. The algorithm comes with strong convergence guarantees, and can be interpr... |

36 |
Methods for large scale svd with missing values,
- Kurucz, Benczur, et al.
- 2007
(Show Context)
Citation Context ...uilding block by most of the teams participating in the Netflix competition (including the winner team). Those methods have been further investigated by (Paterek, 2007; Takács et al., 2009) and also (=-=Kurucz et al., 2007-=-), which already proposed a heuristic using the HITS formulation. These approaches are algorithmically extremely similar to our method, although they are aimed at a slightly different optimization pro... |

35 | Collaborative prediction using ensembles of maximum margin matrix factorizations.
- DeCoste
- 2006
(Show Context)
Citation Context .... Therefore many of the popular methods such as for example (Rennie & Srebro, 2005; Lin, 2007) can get stuck in local minima and so are neither theoretically nor practically well justified, see also (=-=DeCoste, 2006-=-). These shortcomings can be overcome as follows: One can equivalently transform any low-norm matrix factorization problem (which is usually not convex in its two factor variables) into an optimizatio... |

30 | Psvm: Parallelizing support vector machines on distributed computers
- Chang, Zhu, et al.
- 2007
(Show Context)
Citation Context ...the coreset setting. Also, it remains to investigate if our algorithm can be applied to other matrix factorization problems such as (potentially only partially observed) kernel matrices as e.g. PSVM (=-=Chang et al., 2007-=-), PCA or [p]LSA, because our method could exploit the even simpler form of ∇f for symmetric matrices.A Simple Algorithm for Nuclear Norm Regularized Problems 7. Acknowledgements We would like to tha... |

1 | Netflix update: Try this at home. Simon Funk’s personal blog - Webb - 2006 |