| Pogox T, Girosi F, A theory of networks for approximation andlearning A.I. Memo1140, Artificial Intellig nce Lab, MIT, 1989. |
.... good continuation [7] constant curvature [22] or local measures of confidence [26,31, 25, 12,9] The third methodology also results in such local operations but is primarily motivated by mechanical or physical metaphors like in the dynamic particles [29] They often use a variational formulation [18, 23] and explicit or implicit geometric models [30, 33] However, there is also a more recent interest for methods meant to work in arbitrary dimension. For instance, tensor voting [21] has been introduced as a unified formalism for addressing the issues of grouping noisy sets of points into more ....
Tomaso Poggio and Federico Girosi. A theory of networks for approximation and learning. Technical Report 1140, AIM, 1989.
....tractability, radial basis function networks have recently also attracted consid erable attention especially in applications dealing with prediction and classification [32] 34] The importance of radial basis function networks has also greatly benefitted from the work of Poggio et.al. [35] [37] where the relationship between regularization theory and radial basis function networks is explored. The approximation capabilities of static sigmoidal type networks and of radial basis func tion networks has been studied by several research groups (see for example, 38] 40] In Section ....
....class of functions that can be approximated. b) Radial basis function neural networks: Radial Basis Function (RBF) networks were introduced to the neural network literature by Broomhead et.al. 43] and have since gain significance in the field due to several application and theoretical results [32, 33, 35]. Recently, RBF networks have also been considered in adaptive control of nonlinear dynamical systems [28] The input output response (x y) of a RBF neural network (shown in Figure 3) with m inputs, n outputs and n hidden, or kernel units, is characterized by = i= 1, 2, y = At where x E Td ....
T. Poggio and F. Girosi "A theory of networks for approximation and learning", Tech. tep. Artif. Itel. Lab, Memo No. 1140, M.I.T., 1989.
....is cast in a higher dimensional space. It is interesting that this approach is a generalization of the roc.tex; 10 04 2001; 17:01; p.13 14 Rodriguez et.al. linear combination of views method described earlier [19] and is also equivalent to standard regularization [21, 24] and generalized splines [5, 20]. This method yields a detector function of the form: F (x) N X j=1 w j G(x c j ) 1) Here G is an appropriate basis function on (IR 2 ) n , such as the gaussian function; recall n is the number of points in our object. The centers c j , j = 1; n, are chosen randomly as points in ....
Poggio, T. and F. Girosi: 1989, `A Theory of Networks for Approximation and Learning'. Technical Report A. I. Memo 1140, Massachusetts Institute of Technology, Articial Intelligence Laboratory and Center for Biological Information Processing, Whitaker College, Massachusetts.
....merely within the according subspace of Q under advantageous utilisation of a the networks grid based architecture. The Neural Network Architecture The multi dimensional nonlinear function ACGF( q) can be mapped suitably by the sensory motor transformation of a RBF like neural architecture [6]. Because the neural network is utilised within a control loop, a sufficiently exact representation of the whole input space has to be guaranteed. Therefore the neurons are attached to regular positions in a right angular grid. The architecture proposed in this work takes important advantages from ....
Poggio, T. and Girosi, F. A theory of networks for approximation and learning. AI Memo 1140. MIT, 1989.
....a large variety of results about the approximation by neural networks (c.f. 3, 7, 13, 21, 26, 29] and the references quoted there) the stability aspect has been largely neglected so far, only few authors give a rigourous treatment of regularization methods for the approximation problem (cf. [14, 15, 16, 33, 40]) In Section 2 we will show that network approximation in Sobolev spaces is equivalent to least squares collocation for a corresponding integral equation of the first kind. Based on the well known results about least squares collocation we will derive results about the convergence in the case of ....
....step works better. The limiting case of k(x; y) L s L s OE(x; y) ffi(x Gamma y) 2.15) yields a network whose basis function is the Green s function of the differential operator L s L s . These so called regularization networks have been introduced by Girosi and Poggio (cf. [14, 15, 16]) and analyzed with respect to their approximation properties. The class of functions generated by a regularization network is dense in H s( Omega Gamma8 A disadvantage of these networks is that the mollifier (the Dirac delta) is not a continuous operator on L 2( Omega Gamma8 and hence ....
F.Girosi, T.Poggio, A theory of networks for approximation and learning, AI Memo 1140 (AI Laboratory, MIT, Cambridge, Massachusetts, 1989).
....in the network is supposed to lead to a global error reduction. Certain types of neural networks, like multilayer perceptrons or radial basis function networks, are universal approximators, i.e. they can approximate any continuous function on a compact domain to any given degree of accuracy [15, 40]. However, the desired solution cannot be found analytically. It has to be obtained by iteratively processing the training data with the learning algorithm. Success is not guaranteed: the algorithm can get stuck in local minima, for example. Even if the learning problem is solved, it is not ....
Poggio T, Girosi F (1989) A theory of networks for approximation and learning, A.I. Memo 1140 Cambridge, MA: MIT
....0.5 quantile were identically transformed if formed by less than 50 elements. The two figures were empirically estimated with the bootstrap procedure described below. 1.3. 3 RBFLinear Networks Generalized RBF networks were then considered to realize a combination of non parametric regressors [36, 18, 4]. The model integrates the superposition of a family of local basis functions (or kernels) with the contribution of a linear term. Whenever an offset b is also considered, the resulting regression equation takes the following form: f(x) HG(x) Ax b (1.3) where H is a coefficient matrix and ....
T. Poggio and F. Girosi. A theory of networks for approximation and learning. A.I. Memo No. 1140, MIT, 1989.
....which is able to separate object views from image patterns that are not instances of the object. It is well known that the general problem of learning from examples, and in particular classification, can be interpreted as the problem of approximating a multivariate function from sparse data [3], where the data are in the form of (input, output) pairs, obtained by random sampling the unknown function in the presence of noise. This problem is clearly ill posed, since it has an infinite number of solutions and, in order to choose one particular solution, we need to have some a priori ....
....unknown function in the presence of noise. This problem is clearly ill posed, since it has an infinite number of solutions and, in order to choose one particular solution, we need to have some a priori knowledge of the function that has to be reconstructed (see [4] and the references therein) In [3], 4] the authors approach the problem of multivariate function approximation by using regularization theory and the a priori knowledge of the function takes the form of a smoothness functional. More recently, Vapnik [5] has introduced a new learning scheme, well founded in the framework of the ....
T. Poggio and F. Girosi, "A theory of networks for approximation and learning," Tech. Rep. A.I. Memo No. 1140, Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 1989.
....can then be made to assign input vector to the class verifying (23) Note that we also have , so that the consideration of or leads to the same decision. B. Connectionist Implementation The classification method just introduced has some similarity with radial basis function (RBF) networks [18]. A RBF network is a neural network composed of an input layer, a hidden layer and an output layer (Fig. 1) The response of hidden unit to an input vector is defined as a decreasing function of the distance between and a weight vector . The output signal from the th output unit with weight vector ....
T. Poggio and F. Girosi, "A theory of networks for approximation and learning," M.I.T, Tech. Rep. A.I. Memo no. 1140, 1988.
....data) its contribution to the over all prediction error is often small compared to the contribution of the variance. Di#erent regularization methods exist for finding an optimal tradeo# by means of adding a penalty, usually in the form of some measure of smoothness to the predictor (Wahba, 1990; Poggio and Girosi, 1994). While smoothness can be considered a universal goal for a predictor, there are other related goals which are based on a measure of the quality of the hidden units representation. Some of these constraints are related to the information content of the hidden unit representation. For example, ....
Poggio, T. and Girosi, F. (1994). A theory of networks for approximation and learning. A.I. Memo No.1140,C.B.I.P. Paper No.31, Massachusetts Institute of technology.
....modifier a priori pour pouvoir en construire la solution. C est cet a priori qui nous int eresse, ainsi nous pouvons reformuler la question initiale de la mani ere suivante : quel a priori fait on sur la nature de la solution lorsque l on utilise des perceptrons multicouches Poggio et Girosi [58] sont les premiers a avoir fait le lien entre les r eseaux de neurones artificiels et la r egularisation. D une mani ere plus g en erale, c est cet article qui a propos e d utiliser la th eorie de la r egularisation pour r esoudre des probl emes d apprentissage. Mais la m ethode de ....
T. Poggio and F. Girosi, A theory of networks for approximation and learning, Tech. Rep. 1140, M.I.T. AI Laboratory, Cambridge, MA., 1989.
....f) on I = 0; 1] so that each continuous function f on I d can be written in the form : f(x 1 ; x 2 ; x d ) 2d 1 X q=1 g q ( d X p=1 h pq (x p ) where g q are properly chosen continuous functions of one variable. h pq (x) don t depend on f and g q totally characterizes f . ffl [Poggio and Girosi, 1989] : this is hopeless ffl Kurkova : a neural network implementation ffl the curse of dimensionality (Friedman) the complexity of f(x) has not been reduced. The number of variable is not a satisfactory characteristic of badness (Lorentz 86) ANNIE Tutorial Theories and heuristics for MLP St ....
Poggio, T. and Girosi, F. (1989). A theory of networks for approximation and learning. Technical Report 1140, M.I.T. AI Laboratory, Cambridge, MA.
....an estimate for J T is computed, implementation of the Lyapunov control algorithm is straightforward. Jordan Rumelhart 92] use a standard sigmoidal network with back propagation based training. Lee uses a gaussian radial basis function network in [Lee Kil 89] similar to that proposed by [Poggio Girosi 89] and a sine cosine network in [Lee Kil 90] Lee Bekey 91] Let f( Delta) be a neural network approximating the forward kinematics function f( Delta) Let the current configuration, the input vector, be , and the current end effector location be approximated by x = f( and the desired ....
Tomaso Poggio & Federico Girosi, "A Theory of Networks for Approximation and Learning", MIT AI Memo No. 1140, July 1989.
....instead of modeled in software by discrete approximations. These facts suggest examining neural models for control which are amenable to a direct, physical implementation, for example, as the output of the continuous, analog electronic networks suggested by researchers in the field of early vision [12, 28]. Specifically, the goal of the research described in this paper is to develop adaptive architectures capable of exploiting these analog network designs for the control of continuous time, nonlinear dynamic systems. Our approach is to treat the entire problem in the context of adaptive systems ....
....established between the elements of these representations and the components of a class of three layered neural networks. If a function has a Fourier transform with compact support, then it is infinitely differentiable [5] Poggio s research into neural approximation using regularization theory [28] has demonstrated that, under this latter assumption, a linear superposition of gaussian radial basis functions results in an optimal mean square approximation to an unknown function whose values are specified at a finite set of points in R n . Further, Girosi and Poggio [10] have proven that ....
[Article contains additional citation context not shown here]
Poggio, T., and Girosi, F., "A theory of networks for approximation and learning", Artificial Intelligence Lab. Memo, No. 1140, MIT, Cambridge, MA, July 1989. 30
.... circuits of depth 2 can approximate (in various norms) large function classes (including continuous functions) arbitrarily well (Arai, 1989; Carrol and Dickinson, 1989; Cybenko, 1989; Funahashi, 1989; Gallant and White, 1988; Hornik et al. 1989; Irie,1988; Lapades and Farber, 1987; Nielson, 1989; Poggio and Girosi, 1989; Wei et al. 1991) Various gate functions have been used, among others, the cosine squasher, the standard sigmoid, radial basis functions, generalized radial basis functions, polynomials, trigonometric polynomials and binary thresholds. Still, as we will see, these functions differ greatly in ....
POGGIO, T., and GIROSI, F. (1989), A theory of networks for Approximation and learning, Artificial Intelligence Memorandum, no 1140.
....4) This approach leads naturally to the question of the definition of optimality. Defining an optimal subset of views as the subset that minimizes the nearest neighbor classification error amounts to performing vector quantization (VQ; see appendix B) in the input space (Moody and Darken, 1989; Poggio and Girosi, 1989). By definition, quantizing an input space results in a set of vectors that are the best representation of the entire space. A quantization is said to be optimal if it minimizes an expected distortion. Simple measures of the latter, such as squared Euclidean distance, while widely used in vector ....
Poggio, T. and Girosi, F. (1989). A theory of networks for approximation and learning. A.I. Memo No. 1140, Artificial Intelligence Laboratory, Massachusetts Institute of Technology.
....its contribution to the over all prediction error is often small compared to the contribution of the variance. Different regularization methods exist for finding an optimal tradeoff by means of adding a penalty, usually in the form of some measure of smoothness to the predictor (Wahba, 1990; Poggio and Girosi, 1994). While smoothness can be considered a universal goal for a predictor, there are other related goals which are based on a measure of the quality of the hidden units representation. Some of these constraints are related to the information content of the hidden unit representation. For example, ....
Poggio, T. and Girosi, F. (1994). A theory of networks for approximation and learning. A.I. Memo No.1140,C.B.I.P. Paper No.31, Massachusetts Institute of technology.
.... Darken, 1989) These networks approximate the function via a superposition of bases, in our case the Gaussian receptive fields, and can be derived by assuming that the function approximator trades off the closeness of the fit to the input output data and the smoothness of the resulting function (Poggio and Girosi, 1989). In other words, such a system is intrinsically biased toward learning smooth mappings. Other generalization studies Using a setup in which hand movements produced cursor movements on a monitor, Imamizu et al. 1995) examined pointing under a 758 rotatory perturbation. The results of their ....
Poggio T, Girosi F (1989) A theory of networks for approximation and learning. Artificial Intelligence Lab Memo 1140, MIT.
....practical applications of neural networks biological inspirations may not be so important as inspirations from approximation theory, probability theory, statistics or pattern recognition. This understanding led to neural models based on the radial basis functions, popular in approximation theory [42, 43]. Slowly other types of transfer functions were introduced, but systematic research of this aspect of neural models has been missing. In the next section we have tried to systematize our knowledge of the activation functions. 3ACTIVATION FUNCTIONS Weighted activation, called also the fan in ....
T. Poggio and F. Girosi, "A theory of networks for approximation and learning", Technical Report A. I. Memo 1140, MIT, Massachusetts, July 1989.
....with varying degrees of success ( KGV83] 9 G G G G G Centers Outputs Inputs Figure 2.2: Typical radial basis function network with activation function G 2. 2 Radial Basis Function networks Another type of network that has been studied is the radial basis function (RBF) network ( BL88] PG89] PG90b] This three layer network consists of a layer of input units, a layer of radial basis function units and a layer of output units. Each radial basis function unit has a vector of parameters, t i , called a center. The connection from RBF unit i to output unit j is weighted by the ....
....c ij . Finding the coefficients is a simple linear problem which can be solved by a matrix inversion ( BL88] 10 RBF networks can be generalized by allowing fewer centers than training examples. This scheme was named Generalized Radial Basis Functions or GRBFs for short by Poggio and Girosi ( PG89] The centers can either be fixed or adjustable during learning. If the centers are fixed to some initial values then an overconstrained system of linear equations for the coefficients arises and an exact mapping from input vectors to output vector cannot be found. However, a mapping with the ....
[Article contains additional citation context not shown here]
Tomaso Poggio and Federico Girosi. A theory of networks for approximation and learning. A.I. Memo 1140, MIT, 1989.
....assumption of a good prediction. In our work we used a regression method and principle component analysis (PCA) to reduce the parameters. After this reduction we compared the results of different neural network algorithms to achieve best results. In detail, we tried Radial Basis Functions (RBF) [Pog89], Counterpropagation [HN88] and Feedforward networks like Backpropagation (BackProp) Rum86] and Resilient Propagation (RProp) Bra93] For all networks, we first trained with half of the data samples (2430) and all components (96) and used the rest of the 4860 data samples for the validation ....
T. Poggio and F Girosi. A theory of networks for approximation and learning. A.I. Memo, (1140), 1989.
....of modeled in software by discrete approximations. These facts suggest examining neural models for control which are amenable to a direct, physical implementation, for example, as the output of the continuous, analog electronic networks suggested by researchers in the field of early 1 vision [14, 32]. Our goal is hence to develop adaptive architectures capable of exploiting continuous analog networks for the control of continuous time, nonlinear dynamic systems. Our approach is to treat the entire problem in the context of adaptive systems theory, avoiding iterative training procedures in ....
....(Pz) 2 (x) r z (x) the structure of this space can be conveniently exploited, producing the necessary conditions for z to be a minimizing function: ff PPz e z ( ffi( Gamma x(t) 0 where P is the Hilbert adjoint of the operator P [21, 19] and ffi ( is the Dirac distribution. Poggio [31, 32] has studied representations arising from this equation in detail, and shows that the required solution can be represented in terms of a superposition integral using the kernel or Green s function, K, of the self adjoint operator PP [39, 10, 25] In the current context, this method suggests: ....
[Article contains additional citation context not shown here]
Poggio, T., and Girosi, F., "A theory of networks for approximation and learning", Artificial Intelligence Lab. Memo, No. 1140, MIT, Cambridge, MA, July 1989.
....account for the limited region of validity of the approximation, and the unavoidable approximation errors even within this region. Any additional prior information about f can be used to select the set of basis functions employed. Under the assumption that f is infinitely differentiable, [13] show that, for a class of least squares learning problems, gaussian radial functions are natural basis functions, and map directly onto a class of three layered, feedforward neural networks [3, 5] By assuming further that f can be well approximated by a function with compact spectral support, ....
Poggio, T., and Girosi, F., "A theory of networks for approximation and learning", Artificial Intelligence Lab. Memo, No. 1140, MIT, Cambridge, MA, July 1989.
....between various empirical modeling methods has gradually become available. Articial neural networks have been shown to be universal approximators [20] A formal framework has been developed for modeling by RBFNs by showing their relationship to regularization methods in approximation theory [21]. This framework has been exploited to develop generalized forms of radial basis function networks such as hyper basis functions and regularization networks [22] Among statistical methods, linear methods have been subjected to signicant theoretical analysis, and the connections between linear ....
....between the actual and approximated functions. The denition of the empirical modeling problem given above incorporates approximation of the input and the output space. This denition is different from the conventional denition of approximation problems [41] and denitions given for ANN modeling [21,42], where the entire emphasis is on minimizing the error of approximation of the outputs only. This broader denition allows inclusion of various neural Table 1 Comparison matrix for empirical modeling methods Method Input transformation Basis function Optimization criteria OLS Linear projection ....
[Article contains additional citation context not shown here]
T. Poggio, F. Girosi, A theory of networks for approximation and learning, A.I. Memo 1140, MIT, MA, 1989.
....in the category. A strict classification function can be derived from a soft by selecting a threshold where the soft classification gives strict category membership. A model of categorization in accord with the above view of dimensions is that given by the radial basis function (RBF) networks (Poggio and Girosi, 1989). In a RBF network, a set of Gaussian classifiers is connected to a single node which weigh the contribution of each Gaussian classifier into a linear sum. Given sufficient number of classifiers with appropriate distribution of centers and variances, this sum can approximate an arbitrary function ....
....In a RBF network, a set of Gaussian classifiers is connected to a single node which weigh the contribution of each Gaussian classifier into a linear sum. Given sufficient number of classifiers with appropriate distribution of centers and variances, this sum can approximate an arbitrary function (Poggio and Girosi, 1989, Girosi et al. 1993) S Figure 3. A RBF Network. It was suggested by Poggio and Girosi (1989) that Gaussians with different variances could be mixed in an approximation task, but the theory was not developed further. To model learning in the network a simple delta rule can be used (Widrow and ....
[Article contains additional citation context not shown here]
Poggio, T. and Girosi, F. (1989). A theory of networks for approximation and learning, MIT AI Memo No 1140.
....well as the weights # 1 , # n depend on f . But, unfortunately, the function g depends on the function to be represented. Moreover, the functions h j are non di#erentiable and hence cannot be used by current learning algorithms. For a further discussion we refer the reader to the survey (Poggio and Girosi 1989). However, if we only allow everywhere di#erentiable activation functions (such as the standard sigmoid) then we can only represent everywhere di#erentiable target functions. Thus one has to relax the requirement of exact representation, and demand only that the approximation error (in an ....
T. Poggio and F. Girosi, 1989. A theory of networks for Approximation and learning, Artificial Intelligence Memorandum, no 1140.
.... circuits of depth 2 can approximate (in various norms) large function classes (including continuous functions) arbitrarily well (Arai, 1989; Carrol and Dickinson, 1989; Cybenko, 1989; Funahashi, 1989; Gallant and White, 1988; Hornik et al. 1989; Irie,1988; Lapades and Farber, 1987; Nielson, 1989; Poggio and Girosi, 1989; Wei et al. 1991) Research on e#cient approximations has only started recently (Williamson and Paice, 1991) Various gate functions have been used, among others, the cosine squasher, the standard sigmoid, radial basis functions, generalized radial basis functions, polynomials, trigonometric ....
POGGIO, T., and GIROSI, F. (1989), A theory of networks for Approximation and learning, Artificial Intelligence Memorandum, no 1140.
.... and Lowe, 1988; Moody and Darken, 1989) These models, which approximate the function via a superposition of bases (in our case Gaussians) can be derived by assuming that the function approximator trades off how closely it fits the input output data with how smooth the resulting function is (Poggio and Girosi, 1989). In other words, such a system is intrinsically biased towards learning smooth mappings. Related Generalization Studies Other than Bedford s (1989, 1993) studies mentioned in the introduction, several recent studies have addressed visuomotor generalization. Using a setup in which hand movements ....
Poggio, T. and Girosi, F. (1989). A theory of networks for approximation and learning. AI Lab Memo 1140, MIT.
....f is equal to the number of datapoints. In the case of motion analysis, this number is prohibitively large and for computational efficiency we prefer a low dimensional representation. A similar approach has been used in function approximation and time series prediction (Broomhead and Lowe, 1988; Poggio and Girosi, 1989) when the number of datapoints is large. We have found that as long as one uses the prior over velocity fields, the exact form of the representation used is not crucial very similar results are obtained with different representations (Weiss, 1997) 102 Chapter 3 Smoothness in Layers a ....
Poggio, T. and Girosi, F. (1989). A theory of networks for approximation and learning.
....r g) max q m(f q g) 23) Note that we also have m 0 (f r g) max q m 0 (f q g) so that the consideration of m or m 0 leads to the same decision. 3. 2 Connectionist implementation The classification method just introduced has some similarity with radial basis function (RBF) networks [23]. A RBF network is a neural network composed of an input layer, a hidden layer and an output layer (Figure 1) The response of hidden unit i to an input vector x is defined as a decreasing function of the distance between x and a weight vector p i . The output signal o j from the j th output ....
T. Poggio and F. Girosi. A theory of networks for approximation and learning. Technical Report A.I. Memo No. 1140, M.I.T, 1988.
....0.5 1 1.5 Saturating Linear 3 2 1 0 1 2 3 1.5 1 0.5 0 0.5 1 1.5 Hyperbolic Tangent Sigmoid 3 2 1 0 1 2 3 0 0.2 0.4 0.6 0.8 1 Log Sigmoid Figure 3: Activation function for neural networks 3. 2 Generalized Radial Basis Function Neural Networks Radial basis function neural networks (GRBF NNs) [7] are an extension to radial basis function neural networks (RBF NNs) which were mainly used for strict interpolation tasks. GRBF neural networks are equivalent to generalized splines. To approximate a function given in N points (x; y) x 2 R n ; y 2 R) one can find an approximation function of ....
....a Gaussian) with different centers in the hidden layer. Since for Figure 4: Twodimensional Gaussian activation function for GRBF neural network GRBFs usually a direct learning algorithm is employed, the problem of local minima of backpropagation is avoided and the training itself is very fast [7]. These advantages are paid for with slow recalls when the networks are in use. 3.3 Dimensionally Homogeneous Neural Networks approximation function g 0 B B B B x 1 x 2 . x n Gamma1 1 C C C C A i x n j Classical neural network 1 Gamma1 function F 0 B B B B x ....
T. Poggio and F. Girosi. A Theory of Networks for Approximation and Learning. A.I. Memo 1140. Mass. Inst. of Techn., Cambridge, MA, July 1989.
....sample size, and computational complexity. In this paper, we restrict P to be generated by some smooth function f and some probability measure PX over X, that is, the sample point is of the form (x; f(x) Further justification for the smoothness assumption is given by Poggio and Girosi [33,34]. Definition 6 A function f from X into Y is called a Lipschitz function if and only if for some K 1 we have d Y (f(x) f(x 0 ) KdX (x; x 0 ) 8 3 VORONOI ENCODERS AND QUANTIZATION NUMBERS for all x; x 0 2 X. Let kfkL denote the smallest such K. A class of functions F from X into Y ....
....using more complicated basis functions. However, this usually makes the training problem harder; most work along this line has been mostly experimental in terms of computational complexity. Interested readers are referred to the work of Friedman [11] Moody and Darken [30] and Poggio and Girosi [33,34]. Our memory based learning algorithms mainly take advantage of the skewness of distributions over the input space and assume the smoothness of functions over the input space. However, the degree of smoothness may vary widely from one region to the other (Dean [7] In practice, after the initial ....
T. Poggio and F. Girosi, "A Theory of Networks for Approximation and Learning, " MIT. Artificial Intelligence Laboratory, A. I. Memo No. 1140, Boston, MA, 1989.
....function RBF ( Such functions, depending on the goal and the context, can simply be bumps such as the generic function in figure 9, or more specifically certain Green s functions, for example of the Laplace operator plus a constant. The details of the optimality of such choices are given in [PG1, PG2, MM, MB]. 12 fig 10 There has been justifiably great interest in the representation of functions in this form in the mathematical and neural network communities (see references above) The essential phenomenological rationale for the use of radial basis functions and more generally bumps of the form ....
T. Poggio and F. Girosi, A theory of networks for approximation and learning, A.I. Memo No. 1140, M.I.T. A.I. Lab, 1989.
....function RBF ( Such functions, depending on the goal and the context, can simply be bumps such as the generic function in figure 9, or more specifically certain Green s functions, for example of the Laplace operator plus a constant. The details of the optimality of such choices are given in [PG1, PG2, MM, MB]. 12 fig 10 There has been justifiably great interest in the representation of functions in this form in the mathematical and neural network communities (see references above) The essential phenomenological rationale for the use of radial basis functions and more generally bumps of the form ....
T. Poggio and F. Girosi, A theory of networks for approximation and learning, A.I. Memo No. 1140, M.I.T. A.I. Lab, 1989.
....the total number of parameters associated with a single discriminant function of this form is N 1 . We refer to the classifier with such a discriminator as a modified Gaussian RBF classifier, in recognition of the fact that it constitutes a Gaussian radial basis function (RBF) classifier (e.g. [3, 15, 17, 14]) with diagonal covariance matrices when the discriminator is formed by cascaded layers of these functions. Figures 10 12 are comparative summaries of differential versus probabilistic learning for 650 parameter classifiers generated from the linear, logistic linear, and modified RBF hypothesis ....
T. Poggio and F. Girosi. A Theory of Networks for Approximation and Learning. AI Memo 1140, MIT, 1989.
....to high dimensional supervised learning tasks, and often out perform other models such as neural networks and decision trees. Gaussian processes are closely related to a number of well established techniques, including kriging in geostatistics (Matheron 1963) generalized radial basis functions (Poggio and Girosi 1989), and spline models (Wahba 1990; Wahba, Wang, Gu, Klein, and Klein 1994a) In the context of interpretation of complex models, the attractiveness of Gaussian process models is that it is simple to fit a Gaussian process model that has both additive and general components (the general components ....
Poggio, T. and F. Girosi (1989). A theory of networks for approximation and learning. Technical Report 1140, MIT AI Laboratory.
....usual system of equations becomes overconstrained. Usually, a form of gradient descent is employed to set the parameter values. A cost function is defined in terms of the error in reproducing the data and the parameters are adjusted to reduce this error. Examples of RBF variants can be found in [Poggio and Girosi, 1989; Poggio and Girosi, 1990; Moody and Darken, 1989] In Chapter 4 we employ an FA based on RBFs. We preset the position of the centers in a grid pattern and do not change them during learning. The radial function G is hyperbolic: G( x; x i ) 1 c 1 (k x Gamma x i k =c 2 ) 2 (2.5) We ....
T. Poggio and F. Girosi, "A Theory of Networks for Approximation and Learning," Technical Report 1140, MIT AI Lab, 1989.
....variable and c is the constant parameter, is called a Radial Basis Function (RBF) when it depends only on the radial distance r = k x Gamma ck, where c is its center . The RBF method is one of the possible solutions to the real multivariate interpolation problem, stated as follows [18] 8] [19], 2] 20] 21] Interpolation Problem: Given N different points f x i 2 R d j i = 1; Ng, where d is the number of dimensions, and N real numbers fy i 2 R j i = 1; Ng, find a function F from R d to R satisfying the interpolation conditions: F ( x i ) y i ; i = 1; ....
.... y: 5) From (5) a necessary and sufficient condition for the existence of a unique solution to the interpolation problem is the invertibility of the matrix H . The RBF matrix will be invertible if the column vectors of H form a basis in R N . This condition is satisfied for a number of RBFs [19]. Figure 1 shows a realization of (2) in the form of a network with one layer of hidden units [10] Since each radial hidden unit defines a (d 1) dimensional hypersurface, the RBF network interpolates by reconstructing the data with scaled hypersurfaces. The examples in this report employ a ....
T. Poggio and F. Girosi, "A theory of networks for approximation and learning," A.I. Memo #1140, Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 1989.
....(5) a necessary and sufficient condition for the existence of a unique solution to the interpolation problem is the invertibility of the matrix H. The RBF matrix will be invertible if the column vectors of H form a basis in R N . This condition is satisfied for a number of RBFs, for example [12]: h(r) expf Gamma(r=a) 2 g (Gaussian) 6) h(r) a 2 r 2 ) fi ; fi 1 h(r) r (linear) h(r) r 2 log r (thin plate splines) Figure 1 shows a realization of (2) in the form of a network with one layer of hidden units. Each hidden unit implements the same radial function, but ....
T. Poggio and F. Girosi, "A theory of networks for approximation and learning," A.I. Memo #1140, Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 1989.
.... the stimulus space covering units (SSCUs) come to learn different categorization task(s) Note that this just describes the basic framework one could imagine, for instance, the addition of slow time scale top down feedback to the SSCU layer, analogous to the GRBF networks of Poggio and Girosi [69], that could enhance categorization performance by optimizing the receptive fields of SSCUs. Similarly, the algorithms CP has also been claimed to occur for facial identity [3] but the experimental design appears flawed as stimuli in the middle of the continuum were presented more often than the ....
....MAX operation performs a scanning and selection operation over a range of inputs, which is a central element of many computational algorithms. Neurons performing a weighted sum followed by a Gaussian nonlinearity, the other element of HMAX, can in principle learn any kind of input output mapping [69]. A hierarchical model consisting of MAX and template match units thus seems to be a powerful general framework for computation in the brain. Is HMAX therefore also a good model of processing for other parts of cortex Very recent results by Giese [30] who applied a hierarchical model very similar ....
Poggio, T. and Girosi, F. (1989). A theory of networks for approximation and learning. AI Memo 1140, CBIP paper 31, MIT AI Lab and CBIP, Cambridge, MA.
....output are learned by minimizing the functional H over the training set (Girosi, Jones, and Poggio 1995) 5 2.2 Radial Basis Functions An interesting special case arises for radial K. Radial Basis Function techniques or Radial Basis Function networks (RBFs) Powell 1987; Micchelli 1986; Poggio and Girosi 1989; Girosi, Jones, and Poggio 1995) follow from regularization when K(s, t) is shift invariant and radially symmetric: the best example is a Gaussian K(s, t) G # ( s t 2 ) f(x) l X i=1 c i G # ( x x i 2 ) 3) In the Gaussian case, these RBF networks consist of units each ....
....objects (Brunelli and Poggio 1991) and also with images of faces (Beymer 1993; Romano 1993) showed that a view based scheme of this type can be made to work well. It was not surprising that one of the first questions we asked was whether a similar approach may be used by our brain. As Poggio and Girosi (1989) and Poggio (1990) argued, networks that learn from examples have an obvious appeal from the point of view of neural mechanisms and available neural data. In a certain sense, networks like Gaussian Radial Basis functions are an extension of a very simple device: look up tables. The idea of ....
Poggio, T. and F. Girosi (1989). A theory of networks for approximation and learning. A.I. Memo No. 1140, Artificial Intelligence Laboratory, Massachusetts Institute of Technology.
.... the stimulus space covering units (SSCUs) come to learn different categorization task(s) Note that this just describes the basic framework one could imagine, for instance, the addition of slow time scale top down feedback to the SSCU layer, analogous to the GRBF networks of Poggio and Girosi [10], that could enhance categorization performance by optimizing the receptive fields of SSCUs. Similarly, the algorithms used to learn SSCUs (k means clustering or simple storage of all training examples) and the categorization units (RBF) should just be taken as examples. For instance, a less ....
Poggio, T. and Girosi, F. (1989). A theory of networks for approximation and learning. Technical Report AI Memo 1140, CBIP paper 31, MIT AI Lab and CBIP, Cambridge, MA.
No context found.
Tomaso Poggio and Federico Girosi. A Theory of Networks for Approximation and Learning. A. I. Memo 1140, MIT, 1989.
No context found.
Pogox T, Girosi F, A theory of networks for approximation andlearning A.I. Memo1140, Artificial Intellig nce Lab, MIT, 1989.
No context found.
T. Poggio and F. Girosi, "A Theory of Networks for Approximation and Learning," MIT, Tech. Rep. AI 1140, 1989.
No context found.
T. Poggio and F. Girosi. A theory of networks for approximation and learning. A.I. Memo 1140, MIT, July 1989.
No context found.
T. Poggio and F. Girosi, "A Theory for Networks for Approximation and Learning, " A.I. Memo No. 1140, Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 1989.
No context found.
Poggio, Tomaso and Federico Girosi, A theory of networks for approximation and learning, A.I. Memo No. 1140, M.I.T. A.I. Lab, 1989.
No context found.
T. Poggio and F. Girosi, "A theory of networks for approximation and learning," MIT AI Lab TR-1140, 1989.
No context found.
T. Poggio and F. Girosi,"A Theory of Networks for Approximation and Learning", M.I.T., A.I. Memo No. 1140, July 1989.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC