| Caruana, R., Pratt, L., and Thrun, S. (1997). Multitask learning. Machine Learning, 28:41. |
....these observations to machine learning what kind of advantage is there in setting a learner to work on several tasks sequentially or simultaneously Intuitively, there should certainly be some advantage, especially if the tasks are closely related in some way. And, indeed, much experimental work [Bax95, IE96, Thr96, Hes98, Car97] has validated this intuition. However, thus far, there has been relatively little progress on theoretical justi cation for these results. Relatedness of tasks is key to the multitask learning (MTL) approach. Obviously, one cannot expect that information gathered through the learning of a set of ....
Rich Caruana. Multitask Learning. Machine Learning, 28(1):41-75, 1997.
....field centers x i were evenly distributed with a spacing of 3 cm, and we set # = 1.8 cm. These parameters were determined empirically. With these, M = 193 receptive fields served to cover the training region. The output units had linear activation functions , while to accelermultitask training [6], we found no decrease in localization error. Typically an accurate estimate of the location is much more important than of the moment direction or strength. We also found that networks trained without the moment were more robust. This can easily occur in practice, for instance when the head ....
Richard Caruana. Multitask learning. Machine Learning, 28(1): 41--75, July 1997.
.... Similarity has been studied extensively in areas such as pattern recognition and adaptive nearest neighbors [10, 6, 1] character recognition, text based information retrieval [4, 7] and case based reasoning [5] Closely related to models of similarity is the problem of multitask learning [3], where we attempt to use information learned from one task to improve performance on a related task. In this talk, I explore the relationship between Bayes optimal classification, similarity measures, and multitask learning. The key for making the connection is the conditional distribution P ....
Rich Caruana. Multitask learning. Machine Learning, 28(1):41--75, 1997.
....is typically smaller, and dependencies between di erent classes w.r.t. membership can be taken into account and may even be explicitated. Advantages of learning a single model for multiple related prediction tasks have been reported several times in the literature (see e.g. 5] for decision trees, [7, 1] for neural networks, 24] for text classi cation) While our setting is one of assigning a set of classes to an instance, we look in more detail at the case where the classes are ordered in a hierarchy. This hierarchy concisely conveys relevant information about the similarity and di erences ....
R. Caruana. Multitask learning. Machine Learning, 28:41-75, 1997.
.... we feel that our work sheds some light on what it means for tasks to be related, a concept whose formal definition has been su#ciently elusive to all but halt progress on the theoretical side of multitask learning (in spite of the plethora of promising experimental work in this area, e.g. [3, 13]) Perhaps our concrete concept of related and successful analysis of multitask learning within this framework will inspire further formal development in this area. 6. CONCLUSIONS In this work we have attempted to provide a formalism to support the application of the mathematical machinery of ....
R. Caruana. Multitask learning. Machine Learning, 28(1):41--75, 1997.
....use of information from past tasks to manipulate I independently for each state action pair. Task transfer is the process whereby one task, known as the source task, is used to bias the learning of another task, known as the target task. Task transfer is important in the life long learning [7][3] approach to AI. In the life long learning approach one agent encounters a variety of tasks over the lifetime of the learning agent. The agent applies information from each task to speed the learning of each subsequent task. This approach can potentially reduce the training time and adaptability ....
Rich Caruana. Multitask learning. Machine Learning, 28:41--75, 1997.
....of a feedforward neural network with part of the weights shared and others speci c to each task. Training the network on all tasks, the risk of over tting the shared part is reduced and a common set of features can be obtained. This idea has been studied and tested on practical problems in e.g. [13] and [14] 12 0 5 10 15 20 25 30 1850 1900 1950 2000 2050 2100 2150 #inputs removed test error 0 5 10 15 20 25 30 1838 1840 1842 1844 1846 1848 1850 1852 #inputs removed test error Fig. 5. The test error (minus the log likelihood of the data in the test set under the ....
R. Caruana, Multitask learning, Machine Learning 28 (1997) 41-75.
....must be transferred across domains or tasks. The process is known as inductive transfer (Pratt and Thrun, 1997) An interesting study in inductive transfer falls in the realm of neural networks. A review of how neural networks can learn from related tasks is provided by Pratt and Jennings (1998) Caruana (1997) shows why multitask learning works well in the context of neural networks using backpropagation. The claim is that training with many domains in parallel on a single neural network induces information that accumulates in the training signals; a new domain can then bene t from such past ....
Caruana Rich (1997). Multitask Learning. Second Special Issue on Inductive Transfer. Machine Learning, 28, 41-75.
....detection domain does not possess well de ned tasks that can be learned separately and data re representation through a neural network transform is inappropriate because of the non metric nature of the feature space. A related problem was examined by Caruna in his multitask learning work [71, 72, 73]. In this case, the learning system is presented with a goal task on which actual performance is to be measured and a number of known related tasks for which training data is available. In this case, however, the tasks are not presented sequentially but simultaneously. Caruna employs a multiple ....
R. Caruana. Multitask learning. Machine Learning, 28(1):41-75, Jul 1997. -
....nature. Now that time invariant learning tasks are becoming well understood, there is growing research interest directed to dynamic and adaptable systems. Studies have examined issues such as concept drift [55] learning bias from multiple tasks [7] continual learning [53, 57] multitask learning [9], knowledge transfer between tasks [46, 44] and lifelong learning [69] While progress has been made, this branch of study still contains many unresolved issues. Methods are needed to reliably detect when the underlying concept being modeled has changed since training began and to adapt the ....
....within the learning algorithm; techniques that automatically discard training instances by age, for example, integrate the two phases with an assumption of drift. In general, the process of adapting current models to new information has been studied under the rubrics of multitask learning [9], knowledge transfer between tasks [46, 44] and lifelong learning [69] Chapter 2 Issues and Related Work In this chapter we will discuss the goals of the anomaly detection domain, related background work, and the issues raised by the proposed research. Although we make an effort to divide the ....
[Article contains additional citation context not shown here]
R. Caruana. Multitask learning. Machine Learning, 28(1):41--75, Jul 1997.
....Of course, if we strive to augment statistical learning with other methods, we have to be aware that the information for the decision of the neural systems, e.g. for a control task, has to be supplied from additional sources. Examples are information from related tasks in multitask learning [2] or from information available in a di#erent representation as in fuzzy systems [3] A second approach aims at an appropriate predisposition of the NN to achieve learning of various di#erent albeit related tasks. In this way, we can achieve sequential rapid learning of tasks on an appropriate ....
R. Caruana. Multitask learning. Machine Learning, 28:41--75, 1997.
....the use of a neural architecture with part of the weights shared and others specific to each task. Training the network on all tasks, the risk of overfitting the shared part is reduced and a common set of features can be obtained. This idea has been studied and tested on practical problems in e.g. Caruana (1997) and Pratt and Jennings (1996) Whether it works depends on whether the tasks are indeed sufficiently similar, which is often hard to tell in advance. Baxter (1997) proposed hierarchical Bayesian inference as a model for studying multitask learning. Parameters that are shared between tasks are ....
....only for the most probable solutions Amp , but also for the maximum likelihood solutions A ml . The results for these maximum likelihood solutions indicate the performance that can be obtained in a frequentist multitask learning approach without taking into account prior information following e.g. Caruana (1997) and Pratt and Jennings (1996) On purpose we considered a data set with a relatively small number of inputs (n inp = 9) This makes it feasible to go all the way up to n hid = n inp , which is equivalent to the case of no bottleneck. Loosely speaking, the maximum likelihood test error for n hid = ....
[Article contains additional citation context not shown here]
Caruana, R. (1997). Multitask learning. Machine Learning, 28, 41--75.
....predicate invention may be viewed as a form of 152 A MODEL OF INDUCTIVE BIAS LEARNING inductive bias learning. Preliminary results with this approach on a chess domain are reported in Khan, Muggleton, and Parson (1998) Improving performance on a fixed reference task. Multi task learning (Caruana, 1997) trains extra neural network outputs to match related tasks in order to improve generalization performance on a fixed reference task. Although this approach does not explicitly identify the extra bias generated by the related tasks in a way that can be used to learn novel tasks, it is an example ....
....the task relatedness is not in question, but in other cases such as medical problems it is not so clear. Grouping too large a subset of tasks together as related tasks could clearly have a detrimental impact on bias learning or multi task learning, and there is emprical evidence to support this (Caruana, 1997). Thus, algorithms for automatically determining task relatedness are a potentially useful avenue for further research. In this context, see Silver and Mercer (1996) Thrun and O Sullivan (1996) Note that the question of task relatedness is clearly only meaningful relative to a particular ....
Caruana, R. (1997). Multitask learning. Machine Learning, 28, 41--70.
....predicate invention may be viewed as a form of 152 A MODEL OF INDUCTIVE BIAS LEARNING inductive bias learning. Preliminary results with this approach on a chess domain are reported in Khan, Muggleton, and Parson (1998) E Improving performance on a fixed reference task. Multi task learning (Caruana, 1997) trains extra neural network outputs to match related tasks in order to improve generalization performance on a fixed reference task. Although this approach does not explicitly identify the extra bias generated by the related tasks in a way that can be used to learn novel tasks, it is an example ....
....the task relatedness is not in question, but in other cases such as medical problems it is not so clear. Grouping too large a subset of tasks together as related tasks could clearly have a detrimental impact on bias learning or multi task learning, and there is emprical evidence to support this (Caruana, 1997). Thus, algorithms for automatically determining task relatedness are a potentially useful avenue for further research. In this context, see Silver and Mercer (1996) Thrun and O Sullivan (1996) Note that the question of task relatedness is clearly only meaningful relative to a particular ....
Caruana, R. (1997). Multitask learning. Machine Learning, 28, 41--70.
....human steering is remarkably robust to the loss of these features. Human drivers can fall back on a number of alternate features as different subsets of road features come in and out of view. Backprop nets can learn to steer better if they learn to recognize other road features such as centerlines (Caruana, 1997). How can we force backprop nets to learn to use a variety of road features when learning to steer A related problem arises in health care (Cooper et al. 1997) Basic inputs such as age, gender, and blood pressure are available for most patients before they enter the hospital. Other ....
....code, a peaks code, a binary code, a unary code, and the value divided by 100. Gray code is a similar to binary code except that only one feature changes at a time when counting from 0 to 100. Peaks code consists of 100 features with values calculated by a Gaussian centered on the example value (Caruana, 1997). Unary code ( thermometer code ) outputs as many 1 s as the example value, then fills in 0 s to reach 100 features. 100 examples are used for training and 300 for testing. Our experiments investigate how robust each learning algorithm is to random uniform feature corruption of the test data. For ....
Caruana, R. (1997). Multitask learning. Doctoral dissertation, School of Computer Science, Carnegie Mellon University.
....and analysing each observed training example of the target function in terms of the domain theory acquired in previous learning tasks. The transfered information is concerned with the invariant knowledge about robots and their environments, not the inherent common information about tasks. Caruane [1] studied a multitask learning (MTL) an inductive transfer mechanism that improves generalization by using domain information contained in the training signals of related tasks. The tasks are learned in parallel by sharing common representation (a shared hidden layer) in a backpropagation ANN. ....
....feedforward neural network with one hidden layer. Ten hidden neurons are used as the conductors connecting one visual input and one motion output. An input value is the combination of all the points in the current visual field of the creature, which is normalized to a real number in the interval [0,1]. The output of the connectionist network consists of a single component that controls the movement of the virtual creature. At present, there are eleven possible actions: move one step forward, move one step to the right, move one step to the left, remain stationary, or turn to the other seven ....
[Article contains additional citation context not shown here]
R. Caruana, Multitask Learning, Machine Learning, 1997, 28:41-75
....generalization performance. Certain methods of functional transfer have also been found to reduce training time (measured in number of training iterations) Chief among these methods is the parallel MTL paradigm explored recently by Caruana and Baxter [Baxt95, Caru95] A recent paper by Caruana [Caru97] expresses plans for research into the use of MTL networks for sequential learning. We encourage these efforts as this is a large and exciting area of scientific discovery. 2.3 MTL Network Learning Kehoe points out in [Keho88] that psychological studies of human and animal learning suggest that ....
....system. ffl Wealth of virtual examples. The source of inductive bias under TRM is the virtual examples chosen for each of the domain knowledge tasks. There is great potential benefit in being able to generate virtual examples beyond those paired with the real training examples for a new task [Caru97]. Virtual examples can be selected by way of random sampling or by an ordering over the input attribute space. One might propose that the generation of virtual examples vary dynamically as some function during the learning process (this could be seen as an extension of Mostafa s idea of adaptive ....
Richard A. Caruana, "Multitask learning", Machine Learning, Vol. 28, pp. 41--75, 1997.
....such as a neural network, is applied to fit data, there is a danger of overfitting that is, of memorizing the training data and failing to generalize well to new data points. Results of several studies, including our own work (Dietterich, Hild, Bakiri, 1995) and the very thorough work of Caruana (1997) have shown that the best predictive accuracy for this form of neural network is obtained by using a large number of hidden units combined with the technique known as early stopping (Lang, Hinton, Waibel, 1990) Early stopping works as follows. Suppose we have a total of N data points available ....
Caruana, R. (1997). Multitask Learning. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA.
....improves performance (see [10] for collections of papers on multi task learning) In [1] the advantage of combining several tasks is investigated theoretically, under the assumption that a feature matrix B common to all tasks indeed exists. In most approaches to multi task learning (see e.g. [2] and references therein) all tasks receive the same input information, i.e. all inputs are nonspecific. As in our case, the different tasks are forced to share the same hidden unit representation. Often, but not always, this leads to a better generalization performance [2] The problems ....
....learning (see e.g. 2] and references therein) all tasks receive the same input information, i.e. all inputs are nonspecific. As in our case, the different tasks are forced to share the same hidden unit representation. Often, but not always, this leads to a better generalization performance [2]. The problems considered in the literature are mostly artificial and combine on the order of 10 or less tasks. An exception is [7] where different tasks concerning stock selection and portfolio management are combined in various ways. This experimental study is probably closest in spirit to our ....
R. Caruana. Multitask learning. Machine Learning, 28:41--75, 1997.
No context found.
Rich Caruana. Multitask learning. Machine Learning, 28(1):41--75, 1997.
....Yet human steering is remarkably robust to the loss of these features because human drivers can fall back on a number of alternate features as different subsets of road features come in and out of view. Backprop nets do learn to steer better if they are forced to learn to recognize centerlines[4]. Why do ALVINN nets not automatically learn to use a variety of road features (such as centerlines) when learning to steer A related problem arises in pneumonia risk prediction[8] Here there are a number of basic inputs available for patients before they enter the hospital (e.g. age, gender, ....
....code, a peaks code, a binary code, a unary code, and the value divided by 100. Grey code is a similar to binary code except that only one feature changes at a time when counting from 0 to 100. Peaks code consists of 100 features with values calculated by a Gaussian centered on the example value[4]. Unary code ( thermometer code ) outputs as many 1 s as the example value, then fills in 0 s to reach 100 features. Coding the inputs these different ways yields data that satisfies the assumption that there are multiple disjoint subsets of the features which can predict the target. Each trial ....
Caruana, R. (1997) Multitask Learning. Ph.D. Thesis, School of Computer Science, CMU.
....with the main task while using a shared representation. Because the extra tasks share a hidden layer with the main task, internal representations learned for the extra tasks can be used by the main task outputs, often improving performance on the main task. MTL in backprop nets is well documented[13, 1, 2, 4, 8, 9, 7]. Most applications of MTL are to problems where some features available for the training set will not be available for future test cases[5] We recently demonstrated that there are problems where some features that could be used as inputs would be more useful if used as extra outputs ....
Caruana, R., "Multitask Learning," Ph.D. thesis, Carnegie Mellon University, CMU-CS-97-203, 1997.
....tasks. Each, however, exploits a different relationship between tasks. We have discovered additional mechanisms (some of which are special cases of the ones presented here) and have run tests on carefully contrived problems to verify that each mechanism actually works. More detail can be found in [Caruana 1994, 1997]. 3.2.1. Statistical Data Amplification Data amplification is an effective increase in sample size due to extra information in the training signals of related tasks. Amplification occurs when there is noise in the training signals. Consider two tasks, T and T 0 , with independent noise added to ....
....performance on F1(A,B) Using the extra feature as an extra output is better than using it as an extra input. F1(A,B) and F2(A,B) were carefully contrived. We have devised less contrived functions that demonstrate similar effects, and have seen evidence of this behavior in real world problems [Caruana de Sa 1997]. One particularly interesting class of problems where 62 some features are more useful as outputs than as inputs is when there is noise present in the features; noise in extra outputs is often less harmful than noise in extra inputs. 5. Is MTL Just for Backprop Nets In MTL with backprop nets, ....
[Article contains additional citation context not shown here]
Caruana, R., "Multitask Learning," Ph.D. Thesis, School of Computer Science, Carnegie Mellon University, 1997.
No context found.
Caruana, R., Pratt, L., and Thrun, S. (1997). Multitask learning. Machine Learning, 28:41.
No context found.
Caruana, Rich. 1997. Multitask learning. Machine Learning, 28(1):41--75.
No context found.
R. Caruana. Multitask Learning, in Learning to Learn, S.Thrun, L.Pratt Eds. Springer 1998.
No context found.
R. Caruana. Multitask learning. Machine Learning, 28(1):41--75, 1997.
No context found.
R. Caruana. Multitask learning. Machine Learning, 28(1):41--75, 1997.
No context found.
R. Caruana. Multitask learning. Machine Learning, 28(1):41--75, 1997.
No context found.
R. Caruana. Multitask learning. Machine Learning, 28:41--75, 1997.
No context found.
Caruana, R., Pratt, L., & Thrun, S. #1997#. Multitask learning. Machine Learning, 28, 41#76.
No context found.
Caruana, R. (1997). Multitask learning. Machine Learning, 28, 41-75.
No context found.
Caruana, R. (1997). Multitask learning. Machine Learning, 28, 41-75.
No context found.
R. Caruana, "Multitask learning," Machine Learning, vol. 28, no. 1, pp. 41--75, 1997.
No context found.
Rich Caruana. Multitask learning. Machine Learning, 28:41--75, 1997.
No context found.
Rich Caruana. Multitask learning. Machine Learning, 28(1):41--75, 1997.
No context found.
Rich Caruana. Multitask learning. Machine Learning, 28(1):41--75, 1997.
No context found.
Rich Caruana, "Multitask learning," in Learning to Learn, Lorien Pratt Sebastian Thrun, Ed., chapter 5, pp. 95--1133. Kluwer Academic Publsishers, Norwell Massachusetts, 1998.
No context found.
Rich Caruana. Multitask learning. Machine Learning, 28:41--75, 1997.
No context found.
Rich Caruana. Multitask learning. Machine Learning, 28(1):41--75, 1997.
No context found.
Caruana, R. (1997). Multitask Learning. Machine Learning 28(1), 41-75.
No context found.
R. Caruana, L. Pratt, and S. Thrun. Multitask learning. Machine Learning, 28:41, 1997.
No context found.
R. Caruana. Multitask learning. Machine Learning, 28:41--75, 1997.
No context found.
Caruana Rich (1998). Multitask Learning.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC