#### DMCA

## Bounds for linear multi-task learning (2006)

### Cached

### Download Links

Venue: | Journal of Machine Learning Research |

Citations: | 39 - 9 self |

### Citations

443 | A framework for learning predictive structures from multiple tasks and unlabeled data.
- Ando, Zhang
- 2005
(Show Context)
Citation Context ...ett (1999), Bartlett and Mendelson (2002), Bartlett et al (2005) and Ando and Zhang (2005), and we make no claim to originality for any of it. A preliminary result is 135sMAURER Theorem 16 Let F be a =-=[0,1]-=- m -valued function class on a space X , and X = � X l � (m,n) i a vector (l,i)=(1,1) of X -valued independent random variables where for fixed l and varying i all the X l i are identically distribute... |

416 |
Neural Network Learning - Theoretical Foundations
- Anthony, Bartlett
- 1999
(Show Context)
Citation Context ...r and eˆr(f)the empirical error on a training sample S of size n (drawn iid from the underlying task distribution) respectively. Combining Hoe¤ding’s inequality with a union bound one shows (see e.g. =-=[1]-=-), that with probability greater than 1 we have for every f2F the error bound er (f) eˆr (f) + 1 p p lnjFj + ln (1= ): (1) 2n Suppose now that there are a setY, a rather large setG of preprocessors g ... |

395 | Rademacher and Gaussian complexities: risk bounds and structural results
- Bartlett, Mendelson
(Show Context)
Citation Context ...inequality and a union bound imply that with probability greater 1 we have for all (h1; :::; hm)2H m and every g2G 1 m mX er l (hl g) l=1 1 m mX l=1 eˆr l (hl g) + 1 r p 2n lnjHj + lnjGj + ln (1= ) : =-=(2)-=- m Here er l (f) and eˆr l (f) denote the expected error in task l and the empirical error on training sample Sl respectively. The left hand side above is an average of the expected errors, so that th... |

277 | Regularized multi–task learning - Evgeniou, Pontil - 2004 |

193 | A model of inductive bias learning.
- Baxter
- 2000
(Show Context)
Citation Context ...g, has been tested in practice with good results under a variety of di¤erent circumstances (see [5], [8], [17], [18]). The technique has been analyzed theoretically and in some generality (see Baxter =-=[6]-=- and Zhang[18]). The purpose of this paper is to improve some of these theoretical results in a special case of practical importance, when input data is represented in a linear, potentially in…nite di... |

158 | Empirical margin distributions and bounding the generalization error of combined classifiers.
- Koltchinskii, Panchenko
- 2002
(Show Context)
Citation Context ...tems and a general PAC bound in terms of Rademacher complexities. For the readers bene…t a proof of this bound is given in an appendix, where we follow the path prepared by Kolchinskii and Panchenko (=-=[11]-=-) and Bartlett and Mendelsson ([2]). In section 5 we study the Rademacher complexities of linear multi-task systems. In section 6 we give bounds for non-interacting systems, which are essentially equi... |

114 | Exploiting task relatedness for multiple task learning. In:
- Ben-David, Schuller
- 2003
(Show Context)
Citation Context ...g sample Sl respectively. The left hand side above is an average of the expected errors, so that the guarantee implied by the bound is a little weaker than the usual PAC guarantees (but see Ben-David =-=[7]-=- for bounds on the individual errors). The …rst term on the right is the average empirical error, which a multi-task learning algorithm seeks to minimize. We can take it as an operational de…nition of... |

106 | Support Vector Machines, - Cristianini, Shawe-Taylor - 2000 |

22 |
Gaussian measures on function spaces
- Baxendale
- 1976
(Show Context)
Citation Context ...v for both y and z in (ii) shows that the corresponding eigenvalue must be nonnegative, so E [QX]2HS + . Property (ii) above is sometimes taken as the de…ning property of the covariance operator (see =-=[4]-=-). If X is distributed uniformly on M\S1, h where M is a k-dimensional subspace and S1 the unit sphere in H, then E hX; yi 2i =hE [QX] y; yi is zero if and onlysif y2M ? , so the range of E [QX] is M,... |

17 |
Lifelong learning algorithms. Learning to learn,
- Thrun
- 1997
(Show Context)
Citation Context ...neous learning of di¤erent tasks under some common constraint, often called multi-task learning, has been tested in practice with good results under a variety of di¤erent circumstances (see [5], [8], =-=[17]-=-, [18]). The technique has been analyzed theoretically and in some generality (see Baxter [6] and Zhang[18]). The purpose of this paper is to improve some of these theoretical results in a special cas... |

8 |
Functional Analysis, part I
- Reed, Simon
- 1980
(Show Context)
Citation Context ... hSei; T eii de…nes an inner product on HS, making it into 2 1 A 1=2 : 5s6 a Hilbert space. We denote the corresponding norm withk:k HS in contrast to the usual operator normk:k 1 (see Reed and Simon =-=[16]-=- for background on functional analysis). We use HS to denote the set of symmetric Hilbert-Schmidt operators. For every member of HS there is a complete orthonormal basis of eigenvectors, and for T 2 H... |

8 |
Regularized multi–task learning
- Evegniou, Pontil
- 2004
(Show Context)
Citation Context ...=4 : (8) mX # 2 sup hT wl; vli nm l=1 20 6 4@ X 11=2 jhwl; wrijA 3 7 5 l;r l;r 2B nm kAk 0 @ HS X 1 E [jhwl; wrij] A nX nX E i=1 j=1 l i l j x l i; x l j = nX i=1 depending on the 1=2 x l i : (9) 2 : =-=(10)-=- Also, for l 6= r, we get, using Jensen’s inequality and independence of the Rademacher variables, (E [jhwl; wrij]) 2 h E hwl; wri 2i (11) = = nX nX nX nX E i=1 j=1 i0 =1 j0 =1 nX i:j=1 x l i; x r 2 j... |

8 |
M.Scholz and G.Rätsch. Kernel PCA and De-noising
- Mika, Smola
- 1998
(Show Context)
Citation Context ...n p 0 @2 m mX l=1 which together with x l i 1 gives (6). ! 2 2 + nX i=1 x l i h hwl; wri 2i1 1=4 A : (13) 2 + 2 X l i; X l j nX i;j=1 ! 2 2 + X l i; X l j mX 2 2 nX l;r=1 i;j=1 x l i; x r j 2 1 A 1=4 =-=(14)-=- ;s16 Taking the expectation of (12), using Jensen’s inequality, X l 1 and independence of Xl and Xr for l6= r, and Jensen’s inequality again, we get h i E ^R m n (FB A) (X) 2BkAk HS nm = 2BkAk HS nm ... |

7 |
Theoretical Models of Learning to Learn, in Learning to Learn, S.Thrun, L.Pratt Eds
- Baxter
- 1998
(Show Context)
Citation Context ...on Simultaneous learning of di¤erent tasks under some common constraint, often called multi-task learning, has been tested in practice with good results under a variety of di¤erent circumstances (see =-=[5]-=-, [8], [17], [18]). The technique has been analyzed theoretically and in some generality (see Baxter [6] and Zhang[18]). The purpose of this paper is to improve some of these theoretical results in a ... |

5 |
Estimating the moments of a random vector
- Shawe-Taylor, Christianini
- 2003
(Show Context)
Citation Context ...able with values in HS and E [kTk HS ] <1. Also any result valid in H has a corresponding analogue valid in HS. We quote a corresponding operator-version of a Theorem of Christianini and Shawe-Taylor =-=[15]-=- on the concentration of independent vector-valued random variables. Theorem 3. Suppose that T1; :::; Tm are independent random variables in H withkTik 1. Then for all > 0 with probability greater tha... |

5 |
O.Bousquet and S.Mendelson. Local Rademacher complexities. Available online: http://www.stat.berkeley.edu/~bartlett/papers/bbm-lrc-02b.pdf
- Bartlett
(Show Context)
Citation Context ... scaling with the dimension k and behave poorly on on this example. Appendix In this section we give a proof of Theorem 4 for the readers convenience. Most of this material is combined from [1], [2], =-=[3]-=- and [18], and we make no claim to originality for any of it. A preliminary result iss20 Theorem 8. LetF be a [0; 1] m -valued function class on a spaceX , and X = Xl i (m;n) a vector ofX -valued inde... |

4 |
Learning, in Learning to Learn, S.Thrun, L.Pratt Eds
- Caruana
- 1998
(Show Context)
Citation Context ...multaneous learning of di¤erent tasks under some common constraint, often called multi-task learning, has been tested in practice with good results under a variety of di¤erent circumstances (see [5], =-=[8]-=-, [17], [18]). The technique has been analyzed theoretically and in some generality (see Baxter [6] and Zhang[18]). The purpose of this paper is to improve some of these theoretical results in a speci... |

3 | M.Scholz, G.Ratsch: Kernel pca and de-noising in feature spaces - Mika, Scholkopf, et al. |

1 | Local Rademacher complexities. Available online: http://www.stat.berkeley.edu/˜bartlett/papers/bbm-lrc-02b.pdf - Bartlett, Bousquet, et al. |

1 |
Kernels for multi-task learning. Available online
- Miccheli, Pontil
- 2005
(Show Context)
Citation Context ...w we have E ^R (FB A) (x) E l i l j l i 0 2B n p m h hwl; wli 2i n X A2 1=2 HS 0 @ X E l;r l j0 ij i0j 0 + ii0 jj0 + ij0 ji0 so i;j=1 2 nX i=1 X l i X l i 2 X l j Inserting this together with (11) in =-=(13)-=- gives ^R m n (FB A) (x) 2 1=2 2B A HS n p 0 @2 m mX l=1 which together with x l i 1 gives (6). ! 2 2 + nX i=1 x l i h hwl; wri 2i1 1=4 A : (13) 2 + 2 X l i; X l j nX i;j=1 ! 2 2 + X l i; X l j mX 2 2... |