| Schaal,S. & Atkeson,C.G., "From isolation to cooperation : An alternative view of a system of experts", In: Touretzky,D.S.,Mozer,M.C.& Hasselmo, M.E.(Eds.) Advances in Neural Information Processing Systems 8, Cambridge,MA:MIT Press (1996) |
....used by Moody and Darken [20] uses self organization [23] to find the expert positions, following the input distribution of the samples. This reduces the chance of having too little samples available for an expert, but does not place more experts at difficult parts of v(x) Schaal and Atkeson [24] place experts in an incremental way, inserting an expert at places where there are samples but no experts nearby. The exact value of nearby has to be set beforehand. Other ways of positioning are placing one expert at each sample (usually resulting in too many experts) placing experts ....
....result in a bad approximation (high bias) using too many experts can cause overfitting. Section 4.4.1 discusses the results of experiments where the number of experts is varied. 4. 3 The approximated function For the experiments we used a 2 dimensional function to be approximated, taken from [24] (figure 4.2) e OXl 7) x) max e 5Ox] 4.9) 1.25 e 5(Xl X) 1.2i o.6 0.5 0.5 o o o. 0.5 1 1 Figure 4.2: Function Destl. For the experiments, learning samples and test samples are generated from this function, in the form (x, x) the input vectors x are taken from a uniform random ....
[Article contains additional citation context not shown here]
Stefan Schaal and Christopher C. Atkeson. From isolation to cooperation: an alternative view of a system of experts. In D.S. Touretzky, M.C. Mozer, and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems 8. MIT Press, 1996. In press.
....the set of spline coverings. The splines are limited in extent by a Gaussian windowing function in a similar fashion to the gating component of a mixtures of experts model. An incremental learning algorithm known as receptive field locally weighted regression was presented by Schaal and Atkeson [210]. This algorithm consists of linear regression experts and a set of Gaussian receptive field units which gate the expert outputs. Each of the experts are trained independently in this algorithm. Pawelzik et al. 171] consider the problem of segmentation of a time series by an ensemble of radial ....
Schaal, S. and Atkeson, C. G. [1996], From isolation to cooperation: an alternative view of a system of experts, in Touretzky et al. [226], pp. 486--492.
.... model, numerous non linear approximation methods share this same decomposition strategy: Normalized RBF [6] 12] 13] Specht s General Regression Neural network (GRNN) 17] B splines network [4] Kernel Regression Estimator [21] Schaal and Atkeson s Receptive Field Weighted Regression (RFWR) [16], Jordan and Jacob s Mixture of Experts Network [8] When adopting linear function for the local models, the approach can be perceived as a softening version of piecewise linear regression. The learning of these multimodelling approaches must be performed at two levels:1) How to decompose the ....
Schaal, S. and C.G. Atkeson - 1996. From Isolation to Cooperation: An Alternative View of a System of Experts - In Proceedings of NIPS 8.
....[22] or multiple attribute decision making [89] but ultimately all of these classifiers are intended for a common goal. Similarly in the mixture of experts framework, the multiple experts involved are again tackling the same task, though they may specialize in different regions of the input space [66, 43, 17, 73]. The supra classifier is a generalization of combining where the support classifiers could be designed for different tasks, and are immutable, having been trained previously. Support classifiers for ensembles combiners try to solve the 21 same classification task (though they may differentiated ....
S. Schaal and C. G. Atkeson. From isolation to cooperation: An alternative view of a system of experts. In D. Touretzky, M. Mozer, and M. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 605--611. The MIT Press, 1996.
....quickly, a more rapidly adaptable scheme is required. An alternative way to grow a mixture of experts based on the novelty of the training patterns is described here. This approach of growing a neural network is not new. A similar approach based on input novelty has been used before in [5] 24] [25]. The present approach differs from those cited by incorporating output novelty within the framework of novelty detection. Let the network at any instant of time have m experts. A new expert m 1 is added to the network if for the current training pattern x t , min[ x t Gamma m j ) T ....
....model environments that are non stationary. A localized model was employed for the gating network. We note that the computational efficiency often two orders of magnitude for the same performance level of localized approaches as compared to the MLP, have been well documented (see for example [25], 10] These advantages carry over to our methods and for this reason we have not further emphasized this point by providing timing comparisons for most of the experiments. Our approach can be compared to other localized connectionist architectures such as the radial basis function (RBF) ....
[Article contains additional citation context not shown here]
S. Schaal and C. G. Atkeson. From isolation to cooperation: An alternative view of a system of experts. In D. Touretzky, M. Mozer, and M. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 605--611. The MIT Press, 1996.
....non linear function approximation. Global function approximation using sigmoid basis functions provides a good nature of generalization and works well for off line learning. However, sigmoid basis function approximation has the problem of catastrophic interference when used for on line learning [7]. When applied to temporal difference (TD) learning, the parameters of sigmoid networks can, in some cases, diverge [2] Local function approximation methods like radial basis function (RBF) network are suitable for on line learning. However, RBF has the drawback of limited generalization and ....
S. Schaal and C. G. Atkeson. From isolation to cooperation: An alternative view of a system of experts. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 605--611. MIT Press, Cambridge, MA, USA, 1996.
.... Jacobs 1997, Tresp and Taniguchi 1995) An added advantage of partitioning is that we can use simpler agents: local agents usually turn out to be a lot simpler than a monolithic, global agent; e.g. simpler functional forms can be used when we approximate polynomial functions using multiple agents (Schaal and Atkeson 1996); or much fewer numbers of hidden units are needed in backpropagation networks when multiple such networks are used. Consequently, partitioning can potentially improve learning to a significant degree. These methods have not been applied extensively to tasks other than simple regression prediction ....
....does not divide up the input state space into regions for different agents to learn (and thus makes learning tasks easier overall) although they do divide up input state space through using decision trees. Our approach differs from radialbasis functions (such as in Blanzieri and Katenkamp 1996, Schaal and Atkeson 1996, Peterson and Sun 1998, and van der Smagt and Greon 1995) in that (1) we use hypercubes or other region forms different from the spherical form used by RBF and (2) more importantly, instead of a Gaussian function as in the RBF approach, we use a more powerful approximator in each region, which ....
S. Schaal and C. Atkeson, (1996). From isolation to cooperation: an alternative view of a system of experts. Advances in Neural Information Processing Systems 8. pp.605-611. MIT Press.
....[ Via theoretical analysis and experiments using normalized gaussian gating networks (but with fixed centers and widths) they demonstrate the regularlizing effect of local learning, and its relative robustness against illconditioning. Also noteworthy is the work of Schaal and Atkeson [8] who introduce the system of experts architecture wherein local experts are allocated depending on the spatial distribution of input samples. In this architecture each expert has a fixed center point associated with it in the input space and the expert outputs are weighted based on the distance ....
....input samples. In this architecture each expert has a fixed center point associated with it in the input space and the expert outputs are weighted based on the distance of an input sample from the different center points. This architecture is related to our localized mixture of experts network. In [8], the center point for each expert s operation is fixed, and the weighting (gating) factors are a function of only the distance from the input sample to the center points associated with the different experts. Since the gating network outputs in the localized mixture of experts network depend also ....
[Article contains additional citation context not shown here]
S. Schaal and C. G. Atkeson. From isolation to cooperation: An alternative view of a system of experts. In D. Touretzky, M. Mozer, and M. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 605--611. The MIT Press, 1996.
....the set of spline coverings. The splines are limited in extent by a Gaussian windowing function in a similar fashion to the gating component of a mixtures of experts model. An incremental learning algorithm known as receptive field locally weighted regression was presented by Schaal and Atkeson [210]. This algorithm consists of linear regression experts and a set of Gaussian receptive field units which gate the expert outputs. Each of the experts are trained independently in this algorithm. Pawelzik et al. 171] consider the problem of segmentation of a time series by an ensemble of radial ....
Schaal, S. and Atkeson, C. G. [1996], From isolation to cooperation: an alternative view of a system of experts, in Touretzky et al. [226], pp. 486--492.
....neurofuzzy models. In the connectionist community, closely related approximators are, when the local models reduce to constants: RBF [5] 10] GRNN [14] CMAC, B splines network and, when extending the local models to linear functions: Mixture of Experts [7] and Receptive Field Weighted Regression [12]. In this paper, we will concentrate on multi experts networks in which each local expert is indeed a linear function and each zone of influence of the expert is defined by a normalized Gaussian. In the next section this type of multiexpert network will be precisely defined. In the statistics and ....
.... fuzzy model [15] numerous nonlinear approximation methods share this same decomposition strategy: Normalized RBF [5] 10] Specht s General Regression Neural network (GRNN) 14] B splines network, Kernel Regression Estimator [17] Schaal and Atkeson s Receptive Field Weighted Regression (RFWR) [12], Jordan and Jacob s Mixture of Experts Network [7] When adopting linear functions for the local models, this approach can be perceived as a softened version of piecewise linear regression. The expression of the G k (x) is as follows: G k (x) e Gamma(x Gammac k ) M k Mk (x Gammac k ) ....
[Article contains additional citation context not shown here]
Schaal, S. & Atkeson, C.G. (1996) From Isolation to Cooperation: An Alternative View of a System of Experts. Advances in Neural Information Processing Systems 8
....used by Moody and Darken [20] uses self organization [23] to find the expert positions, following the input distribution of the samples. This reduces the chance of having too little samples available for an expert, but does not place more experts at difficult parts of D(x) Schaal and Atkeson [24] place experts in an incremental way, inserting an expert at places where there are samples but no experts nearby. The exact value of nearby has to be set beforehand. Other ways of positioning are placing one expert at each sample (usually resulting in too many experts) placing experts ....
....will result in a bad approximation (high bias) using too many experts can cause overfitting. Section 4.4.1 discusses the results of experiments where the number of experts is varied. 4. 3 The approximated function For the experiments we used a 2 dimensional function to be approximated, taken from [24] (figure 4.2) D(x) max 8 : e Gamma10x 2 1 e Gamma50x 2 2 1:25 e Gamma5(x 2 1 x 2 2 ) 4.9) 24 CHAPTER 4. EXPERIMENTS: LOCALLY WEIGHTED POLYNOMIALS 1 0.5 0 0.5 1 x 1 0.5 0 0.5 1 y 0 0.2 0.4 0.6 0.8 1 1.2 Figure 4.2: Function Dest1. For the experiments, learning samples ....
[Article contains additional citation context not shown here]
Stefan Schaal and Christopher C. Atkeson. From isolation to cooperation: an alternative view of a system of experts. In D.S. Touretzky, M.C. Mozer, and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems 8. MIT Press, 1996. In press.
....is required. An alternative way to grow a mixture of experts based on the novelty of the training patterns is described here. This approach of growing a neural network is not new. A similar approach based on input novelty has been used before in [Platt (1991) Roberts and Tarassenko (1994) [Schaal and Atkeson (1996)] The present approach differs from those cited by incorporating output novelty within 2 0 2 2 0 2 0.5 0 0.5 1 x y (a) 2 0 2 2 0 2 0.5 0 0.5 1 x y (b) 2 0 2 5 0 5 0.5 0 0.5 1 x y (c) 2 0 2 2 0 2 0.5 0 0.5 1 x y (d) Figure 1. a) Global data set. b) c) d) overlapping subsets of the ....
Schaal, S., Atkeson, C. G., 1996, "From isolation to cooperation: An alternative view of a system of experts," In Touretzky, D., Mozer, M., Hasselmo, M., editors, Advances in Neural Information Processing Systems 8. The MIT Press.
....in real time implementation. The grid based Gaussian soft max basis function network was successfully used in a 4D state space. However, a more flexible algorithm that allocates basis functions only in the relevant parts of the state space may be necessary for dealing with higher dimension systems (Schaal and Atkeson, 1996). In the above simulations, we assumed that the local linear model of the system dynamics f(x; u) u was available. In preliminary experiments, it was verified that the critic, the system model, and the actor can be trained simultaneously. The actor tutor architecture resembles feedback error ....
Schaal, S. and Atkeson, C. C. (1996). From isolation to cooperation: An alternative view of a system of experts. In Touretzky, D. S., Mozer, M. C., and Hasselmo, M. E., editors, Advances in Neural Information Processing Systems 8, pages 605--611. MIT Press, Cambridge, MA, USA.
....some heuristics to improve efficiency. Their most efficient version only adds instances to memory if the classification is incorrect, so the algorithm only learns when it makes a mistake. Schaal and Atkeson extend this concept to continuous domains with Receptive Field Weighted Regression (RFWR) [55]. They collect data in receptive fields, and each receptive field contains a linear model of its data. RFWR only modifies a receptive field if the field s model is inconsistent with a data point. This algorithm is covered in more detail in Chapter 6. Schaal and Atkeson use yet another ....
....like many parametric functions, are good and efficient at fitting data that matches their assumptions, but can fail dramatically if their assumptions are violated. 6.1. 3 Receptive Field Weighted Regression Receptive Field Weighted Regression is a new local function approximation technique [55]. It derives from research on non parametric modeling methods. The basic data structure of RFWR is the receptive field. A model consists of a set of receptive fields, each of which has an extent and a linear partial model. RFWR makes predictions with the following algorithm: 1. Compute the ....
[Article contains additional citation context not shown here]
Stefan Schaal and Christopher G. Atkeson. From isolation to cooperation: An alternative view of a system of experts. In D. S. Touretzky, M. C. Moser, and M. E. Hanselmo, editors, Advanced in Neural Information Processing Systems 8, pages 605--611, Cambridge, MA, 1996. MIT Press. ftp://ftp.cc.gatech.edu/pub/people/sschaal/ schaal-NIPS95.html.
No context found.
Schaal,S. & Atkeson,C.G., "From isolation to cooperation : An alternative view of a system of experts", In: Touretzky,D.S.,Mozer,M.C.& Hasselmo, M.E.(Eds.) Advances in Neural Information Processing Systems 8, Cambridge,MA:MIT Press (1996)
....be addressed. In previous work, we developed a learning method that can automatically determine the size and shape of the neighborhood for a nonparametric learning method, Receptive Field Weighted Regression (RFWR) that uses locally weighted linear regression to interpolate the neighboring data [15, 16]. RFWR removed the need to store all the training data in memory by just retaining a sufficient number of locally linear models. It could be shown that this learning system has favorable properties for incremental learning in the spirit of the issues mentioned at the beginning of this section, and ....
Schaal,S. & Atkeson,C.G., "From isolation to co- operation : An alternative view of a system of experts", In: Touretzky, D.S.,Mozer,M.C.& Hasselmo, M.E.(Eds.) Advances in Neural Information Processing Systems 8, Cambridge,MA:MIT Press (1996.
.... was introduced into the domain of machine learning and robot learning by Atkeson (Atkeson and Reinkensmeyer, 1988, 1989; Atkeson, 1990, 1992) who also explored techniques for detecting irrelevant features, and Zografski (Zografski, 1989, 1991, 1992; Zografski and Durrani, 1995) Atkeson and Schaal (1995) explore locally weighted learning from the point of view of neural networks. Dietterich et al. 1994) report on a recent workshop on memory based learning, including locally weighted learning. 4 Distance Functions Locally weighted learning is critically dependent on the distance function. There ....
....can be achieved by building a set of local models at fixed locations, using the techniques described in this paper. In addition to computational speedup in the presence of large datasets there may be statistical advantages to compressing data instead of merely storing it all (Fritzke, 1995; Schaal and Atkeson, 1995). 17 Summary This paper has surveyed locally weighted learning. Local weighting, whether by weighting the data or the error criterion, can turn global function approximation into powerful alternative approaches. By means of local weighting, unnecessary bias of global function fitting is reduced, ....
Schaal, S. and Atkeson, C. G. (1995). From isolation to cooperation: An alternative view of a system of experts. NIPS95 proceedings, in press.
....be addressed. In previous work, we developed a learning method that can automatically determine the size and shape of the neighborhood for a nonparametric learning method, Receptive Field Weighted Regression (RFWR) that uses locally weighted linear regression to interpolate the neighboring data [15, 16]. RFWR removed the need to store all the training data in memory by just retaining a sufficient number of locally linear models. It could be shown that this learning system has favorable properties for incremental learning in the spirit of the issues mentioned at the beginning of this section, and ....
Schaal,S. & Atkeson,C.G., "From isolation to cooperation : An alternative view of a system of experts", In: Touretzky,D.S.,Mozer,M.C.& Hasselmo, M.E.(Eds.) Advances in Neural Information Processing Systems 8, Cambridge,MA:MIT Press (1996)
No context found.
Stefan Schaal and Christopher C. Atkeson. From Isolation to Cooperation: An Alternative View of a System of Experts. submitted to Neural Information Processing Systems 1995, 1995.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC