Results 1 
7 of
7
Learning from multiple sources
 In Advances in Neural Information Processing Systems 19
, 2007
"... We consider the problem of learning accurate models from multiple sources of “nearby ” data. Given distinct samples from multiple data sources and estimates of the dissimilarities between these sources, we provide a general theory of which samples should be used to learn models for each source. This ..."
Abstract

Cited by 53 (4 self)
 Add to MetaCart
(Show Context)
We consider the problem of learning accurate models from multiple sources of “nearby ” data. Given distinct samples from multiple data sources and estimates of the dissimilarities between these sources, we provide a general theory of which samples should be used to learn models for each source. This theory is applicable in a broad decisiontheoretic learning framework, and yields general results for classification and regression. A key component of our approach is the development of approximate triangle inequalities for expected loss, which may be of independent interest. We discuss the related problem of learning parameters of a distribution from multiple data sources. Finally, we illustrate our theory through a series of synthetic simulations.
Transfer bounds for linear feature learning
 Machine Learning
"... Abstract. If regression tasks are sampled from a distribution, then the expected error for a future task can be estimated by the average empirical errors on the data of a
nite sample of tasks, uniformly over a class of regularizing or preprocessing transformations. The bound is dimension free, jus ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
Abstract. If regression tasks are sampled from a distribution, then the expected error for a future task can be estimated by the average empirical errors on the data of a
nite sample of tasks, uniformly over a class of regularizing or preprocessing transformations. The bound is dimension free, justi
es optimization of the preprocessing featuremap and explains the circumstances under which learningtolearn is preferable to single task learning.1 1
Stability of MultiTask Kernel Regression Algorithms
, 2013
"... Abstract. We study the stability properties of nonlinear multitask regression in reproducing Hilbert spaces with operatorvalued kernels. Such kernels, a.k.a. multitask kernels, are appropriate for learning problems with nonscalar outputs like multitask learning and structured output prediction. ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Abstract. We study the stability properties of nonlinear multitask regression in reproducing Hilbert spaces with operatorvalued kernels. Such kernels, a.k.a. multitask kernels, are appropriate for learning problems with nonscalar outputs like multitask learning and structured output prediction. We show that multitask kernel regression algorithms are uniformly stable in the general case of infinitedimensional output spaces. We then derive under mild assumption on the kernel generalization bounds of such algorithms, and we show their consistency even with non HilbertSchmidt operatorvalued kernels 1. We demonstrate how to apply the results to various multitask kernel regression methods such as vectorvalued SVR and functional ridge regression. 1
Efficient Representations for Lifelong Learning and Autoencoding
 JMLR: WORKSHOP AND CONFERENCE PROCEEDINGS VOL 40:1–20, 2015
, 2015
"... It has been a longstanding goal in machine learning, as well as in AI more generally, to develop lifelong learning systems that learn many different tasks over time, and reuse insights from tasks learned, “learning to learn ” as they do so. In this work we pose and provide efficient algorithms for ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
It has been a longstanding goal in machine learning, as well as in AI more generally, to develop lifelong learning systems that learn many different tasks over time, and reuse insights from tasks learned, “learning to learn ” as they do so. In this work we pose and provide efficient algorithms for several natural theoretical formulations of this goal. Specifically, we consider the problem of learning many different target functions over time, that share certain commonalities that are initially unknown to the learning algorithm. Our aim is to learn new internal representations as the algorithm learns new target functions, that capture this commonality and allow subsequent learning tasks to be solved more efficiently and from less data. We develop efficient algorithms for two very different kinds of commonalities that target functions might share: one based on learning common lowdimensional and unions of lowdimensional subspaces and one based on learning nonlinear Boolean combinations of features. Our algorithms for learning Boolean feature combinations additionally have a dual interpretation, and can be viewed as giving an efficient procedure for constructing nearoptimal sparse Boolean autoencoders under a natural “anchorset ” assumption.
MultiObjective MultiTask Learning
, 2007
"... This dissertation presents multiobjective multitask learning, a new learning framework. Given a fixed sequence of tasks, the learned hypothesis space must minimize multiple objectives. Since these objectives are often in conflict, we cannot find a single best solution, so we analyze a set of solut ..."
Abstract
 Add to MetaCart
(Show Context)
This dissertation presents multiobjective multitask learning, a new learning framework. Given a fixed sequence of tasks, the learned hypothesis space must minimize multiple objectives. Since these objectives are often in conflict, we cannot find a single best solution, so we analyze a set of solutions. We first propose and analyze a new learning principle, empirically efficient learning. From a sample complexity perspective, following this principle is not much worse than the singleobjective multitask learning case. In the context of empirically efficient learning, algorithms for the new learning frameworks are proposed and evaluated. First, we pose regularization as a multiobjective problem, in which training error must balance the complexity of the hypothesis space. Second, we consider multiple datadependent loss functions, in which the error rate in one class must balance the error rate in the other class. Finally, we assume that tasks share a clustering structure in which the average loss in one cluster must balance the loss in another cluster. The algorithms are evaluated on synthetic and real datasets. The results motivate the application of multiobjective optimization, indicating that the objectives are in conflict. By controlling the relative performance of the algorithms to generate a tradeoff surface, we can effectively explore the multiobjective nature of the learning
Presented to the Faculties of the University of Pennsylvania
, 2009
"... First and foremost, to my advisor, Michael Kearns. At the risk of sounding cliche, Michael has been a near ideal advisor to me. He helped me cultivate a taste in research problems, taught me how to recognize and design good models and algorithms, and was a constant source of invaluable advice on nav ..."
Abstract
 Add to MetaCart
(Show Context)
First and foremost, to my advisor, Michael Kearns. At the risk of sounding cliche, Michael has been a near ideal advisor to me. He helped me cultivate a taste in research problems, taught me how to recognize and design good models and algorithms, and was a constant source of invaluable advice on navigating the academic world. Perhaps most importantly, he helped me to develop confidence in myself as a researcher — no easy feat, I’m sure. I feel truly lucky to have had the chance to learn so much from him at the start of my career, and I sincerely hope that we will remain both collaborators and friends for many years to come. To my thesis committee, Sanjeev Khanna, Yishay Mansour, Fernando Pereira, and Ben Taskar, for their feedback and advice, and for being so supportive of me and of this work. To Kevin LeytonBrown, Eugene Nudelman, Yoav Shoham, and the rest of the Multiagent Group at Stanford circa 2003, for giving me my first opportunity to get involved in research. It was because of Kevin’s encouragement and contagious enthusiasm for research that I decided to go on for the Ph.D., so the credit (or blame) for me being here in the first place should go to him. To Eyal EvenDar and (again) Yishay Mansour, for acting as informal mentors and always giving me valuable career advice, whether I wanted to hear it or not. To my other collaborators, colleagues, and teachers, from whom I have learned so much. I am
Lifelong Learning with Noni.i.d. Tasks
"... Abstract In this work we aim at extending the theoretical foundations of lifelong learning. Previous work analyzing this scenario is based on the assumption that learning tasks are sampled i.i.d. from a task environment or limited to strongly constrained data distributions. Instead, we study two sc ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract In this work we aim at extending the theoretical foundations of lifelong learning. Previous work analyzing this scenario is based on the assumption that learning tasks are sampled i.i.d. from a task environment or limited to strongly constrained data distributions. Instead, we study two scenarios when lifelong learning is possible, even though the observed tasks do not form an i.i.d. sample: first, when they are sampled from the same environment, but possibly with dependencies, and second, when the task environment is allowed to change over time in a consistent way. In the first case we prove a PACBayesian theorem that can be seen as a direct generalization of the analogous previous result for the i.i.d. case. For the second scenario we propose to learn an inductive bias in form of a transfer procedure. We present a generalization bound and show on a toy example how it can be used to identify a beneficial transfer algorithm.