301 citations found. Retrieving documents...
S. Geman, E. Bienenstock and R. Doursat, "Neural Networks and the Bias/Variance Dilemma", Neural Computation 4, 1-58 (1992)

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

A Learning Algorithm For Neural Network Ensembles - Navone, Verdes, Granitto.. (2000)   (1 citation)  (Correct)

....it to the Ozone and Friedman#1 databases and comparing the results obtained with those of [9] Finally, in Section 5 we draw some conclusions. 2. THE BIAS VARIANCE DILEMMA The theoretical framework for ensemble averaging is based on the bias variance decomposition of the generalization error[10]. Let s consider in the context of regression a set of N noisy data pairs D= t i ,x i ) i=1,N obtained from some distribution P and generated according to t = f(x) e(x) where t is the observed target value, f(x) is the true regression and e(x) is a random noise with zero mean. If we ....

S. Geman, E. Bienenstock and R. Doursat, "Neural Networks and the Bias/Variance Dilemma", Neural Computation 4, 1-58 (1992)


Appearance-Based Recognition Using Perceptual Components - Liu, Wang (2001)   (Correct)

....including texture classification, face recognition, and 3D object recognition. 1 Introduction As the learning algorithms for neural network models have been the focus in the neural network literature, representation has recently been realized as the fundamental challenge for neural modeling [4, 16, 2]. Bishop stated that the choice of pre processing and feature extraction is one of the most significant factors in determining the performance of the final system (p. 295, 2] In this paper, we focus on problems in visual recognition. For visual recognition and classification, the essential ....

S. Geman and E. Bienenstock, "Neural networks and the bias/variance dilemma," Neural Computations, vol. 4, pp. 1-58, 1992.


Learning Algorithms for Radial Basis Function Networks.. - Blanzieri   (Correct)

....or suboptimal approximators. In this framework is common to define different measures for the precision of the approximation. If f has continuous values and calling f the approximation it is possible to define the Mean Square Error (MSE) which is the sum of the variance and squared bias(see [Geman et al. 1992]) MSEf f (x) g = E[ f(x) Gamma f(x) 2 ] V arf f(x)g Bias 2 f f(x)g where Biasf f(x)g = E[ f(x) Gamma f(x) and V arf f(x)g = E[ f(x) Gamma E[ f(x) 2 ] Another very usual form of error measure, is the L 2 norm, which is called Integrated Squared Error (ISE) ISEf ....

Geman, S., Bienenstock, E., and Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4:1--58.


Algorithm Selecting Diverse Members Neural Network.. - Navone, Verdes.. (2000)   (Correct)

....series and comparing the results obtained with those of previous works in the literature. Finally, in Section 5 we collect the main conclusions. 2. THE BIAS VARIANCE DILEMMA The theoretical framework for ensemble averaging is based on the bias variance decomposition of the generalization error[9]. Let s consider a set of N noisy data pairs D= t ,x i ) i=1,N obtained from some distribution P and generated according to t = f(x) e(x) where t is the observed target value, f(x) is the true regression and e(x) is a random noise with zero mean. If we estimate f using an ANN trained on D ....

S. Geman, E. Bienenstock and R. Doursat, "Neural Networks and the Bias/Variance Dilemma", Neural Computation 4, 1-58 (1992)


Selecting Diverse Members Of Neural Network Ensembles - Navone, Verdes, Granitto.. (2000)   (Correct)

....5 we consider the sunspot time series and predict the remaining of the cycle 23 of solar activity. Finally, in Section 6 we draw some conclusions. 2. THE BIAS VARIANCE DILEMMA The theoretical framework for ensemble averaging is based on the bias variance decomposition of the generalization error[7]. Let s consider a set of N noisy data pairs D= t i ,x i ) i=1,N obtained from some distribution P and generated according to t = f(x) e(x) where t is the observed target value, f(x) is the true regression and e(x) is a random noise with zero mean. If we estimate f using an ANN trained on ....

S. Geman, E. Bienenstock and R. Doursat, "Neural Networks and the Bias/Variance Dilemma", Neural Computation 4, 1-58 (1992)


Modeling of Sonic Logs in Oil Wells with Neural Networks.. - Granitto Verdes Navone (2001)   (Correct)

....Figure 1 shows the final available data for one representative well. 3 Neural Network Ensembles Ensemble techniques have been used recently in regression classification tasks with considerable success. They are theoretically motivated on the bias variance decomposition of the generalization error[4]. This procedure is based on the intuitive idea that by combining the outputs of several individual predictors one might improve on the performance of a single generic one[5] However, this idea has been proved to be true only when the combined predictors are simultaneously accurate and diverse ....

S. Geman, E. Bienenstock and R. Doursat, "Neural Networks and the Bias/Variance Dilemma", Neural Computation 4, 1-58 (1992)


Subspace Information Criterion for Model Selection - Sugiyama, Ogawa (2001)   (1 citation)  (Correct)

....Based on the above setting, we shall first give an estimation method of the generalization error of f # . The unbiased learning result f u and the learning operator X u are used for this purpose. The generalization error of f # is decomposed into the bias and variance (see e.g. Takemura, 1991; Geman et al. 1992; Efron Tibshirani, 1993) E # # f # f# 2 = #E # f # f# 2 E # # f # E # f # # 2 . 10) It follows from Eqs. 4) and (3) that Eq. 10) yields E # # f # f# 2 = #X # z f# 2 E # #X # ## 2 = #X # z f# 2 tr (X # QX # # ) 11) where tr ( ....

Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4(1), 1--58.


Subspace Information Criterion for Non-Quadratic.. - Tsuda, Sugiyama, Müller (2000)   (Correct)

....for Non Quadratic Regularizers 6 q q q m q u w v Figure 1: Basic idea for evaluating the generalization error. 3.2 Generalized Subspace Information Criterion In this section, we will derive an unbiased estimator of JG . JG can be decomposed into the bias and variance (see also [27, 28]) as E # # # # # 2 = E # # # m # m # # # 2 = E #w # m # # # 2 = ## m # # # 2 2E ## m # # , w# E #w# 2 = ## m # # # 2 E #w, w#, 14) where w : # # m . The bias term can be expressed by using # # ....

S. Geman, E. Bienenstock, and R. Doursat, "Neural networks and the bias/variance dilemma," Neural Computation, vol. 4, no. 1, pp. 1--58, 1992.


Using Unlabeled Data to Improve Text Classification - Nigam (2001)   (10 citations)  (Correct)

....maximally reduce the error of the classifier trained with that extra example. If one assumes the learner is unbiased then reducing classification error is equivalent to reducing classification variance over the data distribution. This follows from the decomposition of error into bias and variance (Geman et al. 1992). In some cases the expected variance reduction can be estimated empirically, and data can be iteratively selected for labeling by this approach (Cohn et al. 1996) Frequently, calculating this expected variance reduction in closed form is prohibitively complex and impractical at best. In these ....

Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4(1), 1--58.


On the Importance of Change of Representation in Induction - Pérez, Vilalta, Rendell   (Correct)

.... This difficult goal has taken several forms in different research fields, including change of representation, variable bias, adaptable network structure, and non parametric statistics (Schlimmer, 1987; Utgoff, 1986; Rendell, Seshu, Tcheng, 1987; Barron Barron, 1988; Devroye Gyorfi, 1985; Geman, Bienenstock, Doursat, 1992). Induction involves the discovery of regularities in data and knowledge structures. The complexity of the existing regularities determines the complexity of the inductive process. Whenever structure is intricate, regularities are involved, and it is useful to view induction as a multi layered ....

Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4, 1--58.


Data Mining At The Interface Of Computer Science And Statistics - Smyth (2001)   (1 citation)  (Correct)

.... A broader statistical view of neural networks as highly flexible non linear regression models gradually began to emerge, both from within the neural network community (e.g. Bis95] and from members of the statistical community who had taken an interest in this new form of regression model [GBD92, CT94, Rip94]. For example, links to more established statistical ideas such as generalized linear models and projection pursuit regression led to new models and algorithms that are hybrids of statistical and neural network research (e.g. JJ94] Graph based models for efficient representation of ....

Geman, S., Bienenstock, E., and Doursat, R. (1992) Neural networks and the bias/variance dilemma, Neural Computation, 4, 1--58.


Learning Probabilistic Grammars for Language Modeling - Carroll (1995)   (4 citations)  (Correct)

....is true, it helps the learner by reducing the number of choices when guessing on unseen data. To the extent that it is false, the assumption increases the error by obliterating sharp peaks or valleys in the surface. The two kinds of error described above correspond roughly to what Geman, et al. [8] call bias and variance error. They develop a formula for the expected error of a learner which must generalize, and break it into the sum of these two error terms. As the details are lengthy, we present only the intuitions here. Bias error is the expected difference of the expected learner s ....

Stuart Geman, Elie Bienenstock & Rene Doursat, "Neural Networks and the Bias/Variance Dilemma," Neural Computation 4 (1992), 1--58.


On the Use of Advanced Inductive Methods for Knowledge .. - Kandola, Gunn.. (1999)   (Correct)

....decay, allowing greater model flexibility during the early stages of training. However, MacKay (MacKay, 1999) has suggested that this problem can be overcome by using Markov Chain Monte Carlo Paper.tex; 3 11 1999; 11:28; p.7 8 Jaz S. Kandola, Steve R. Gunn, Ian Sinclair, Philippa Reed (MCMC) (Geman et al., 1992)(Neal, 1995) methods, which use correct Bayesian sampling of the hyperparameters and parameters. 3.4. Additive Neurofuzzy Networks Neurofuzzy systems (Jang, Sun and Mizutani, 1997; Brown and Harris, 1994; Bossley, 1997) combine the learning ability of neural networks with a fuzzy representation, ....

S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias/variance dilemma. Neural Computation, vol. 4, no. 1, pp1-58, 1992.


The "test and Select" Approach to Ensemble Combination - Amanda Sharkey Noel (2000)   (7 citations)  (Correct)

.... performs badly in domains with noisy training data) 4 3 A test and select methodology for ensemble creation What makes an ensemble combination work Explanations of the effect of ensemble combining have been couched in terms of the variance and bias components of the ensemble error [26] [27]; and it can be shown that ensembles provide a means of reducing the variance, or dependence on the data of a set of nets. Another way to think about the behaviour of ensembles is in terms of the number of coincident failures made by the component nets. If the component nets in an ensemble made no ....

Geman, S., Bienenstock, E., Doursat, R. (1992) Neural networks and the bias/variance dilemma. Neural Computation,4, pp1-58.


Diversity, Selection, and Ensembles of Artificial Neural Nets - Sharkey, Sharkey   (Correct)

....in the training set. The problem is that, even though ANNs can exhibit apparently impressive levels of correct performance on novel inputs, they are still likely to make errors. In interesting applications, we are unlikely to know what a sufficiently representative sample is. As discussed by Geman, Bienenstock and Doursat (1992), in order to achieve perfect performance for a complex task such as recognizing all nonambiguous handwritten characters, we need training sets of infinite size. Since ANNs are trained on data sets of a finite (often severely limited) size, it is clear that they will continue to make errors. If ....

.... Explanations of the improved performance that can be obtained through the use of ensembles have recently been couched in terms of bias and variance, or the ensemble equivalent (e.g. Krogh and Vedelsby,1995; Parmanto et al., 1996; Sharkey, 1996) In the bias variance decomposition of the error (Geman et al., 1992), the variance can be intuitively characterized as representing the extent to which the output of a net is sensitive to the data on which it was trained, i.e. the extent to which the same set of results would be obtained if a different set of training data was used. The bias on the other hand ....

Geman, S., Bienenstock, E. and Doursat, R. (1992) Neural networks and the bias/variance dilemma. Neural Computation, 4, 1-58.


Stochastic Model Based Image Analysis - Wang, Adali (2000)   (Correct)

....In the context of stochastic image modeling and regularization, the statistical properties of both pixel and context images are important and have to be considered together. Many investigations have been conducted on image statistics of pixel images in MR or other modalities (e.g. references [10, 22, 25, 29]) Reference [20] has conducted intensive research on the objective assessment of imaging statistics of x ray and gamma rays, by considering the e ects of both quantum noise and object variability. A pioneer work is reported in reference [19] on the speckle statistics in medical ultrasound images ....

....of MR images in which six assumptions are made and partially justi ed. Furthermore, the problem for researchers relates to the statistics of context image is another challenging topic. Using a randomization rule and stochastic regularization, many investigations have been conducted (see references [13, 25, 27, 28]) showing important properties. However, they su er from some limitations, such as constraints on context representation, which are generally imposed mathematically without objective justi cation by the context image statistics [13, 29] Since the true context is unobservable in general, the ....

[Article contains additional citation context not shown here]

S. Geman, E. Bienenstock, and R. Doursat, \Neural networks and the bias/variance dilemma," Neural Computation, 4, pp. 1-52, 1992.


Combining Diverse Neural Nets - Sharkey, Sharkey (1997)   (14 citations)  (Correct)

....there are some inputs which fail on all the members of the ensemble. The concept of diversity employed in the definitions of these levels is related to the ideas expressed by Krogh and Vedelsby (1995) in their discussion of ensemble ambiguity (which in turn is related to the concept of variance, Geman et al. 1992), and by Wolpert (1992) when he suggests that what is required is that the nets should be mutually orthogonal. Levels of diversity however differ from these approaches because they take account of the overall accuracy of the ensemble output. That is, rather than simply requiring that the nets ....

Geman, S., Bienenstock, E. and Doursat, R. (1992) Neural networks and the bias/variance dilemma. Neural Computation, 4(1):1-58.


Overfitting in Neural Nets: Backpropagation, Conjugate.. - Caruana, Lawrence, Giles (2000)   (Correct)

....data from y = sin(x=3) Order 20 overfits. Bottom: Small and large MLPs fit to same data. The large MLP does not overfit significantly more than the small MLP. 2 Overfitting Much has been written about overfitting and the bias variance tradeoff in neural nets and other machine learning models [2, 12, 4, 8, 5, 13, 6]. The top of Figure 1 illustrates polynomial overfitting. We created a training dataset by evaluating y = sin(x=3) at 0; 1; 2; 20 where is a uniformly distributed random variable between 0.25 and 0.25. We fit polynomial models with orders 2 20 to the data. Underfitting occurs with ....

S. Geman et al. Neural networks and the bias/variance dilemma. Neural Computation, 4(1):1-- 58, 1992.


Hierarchical Learning with Procedural Abstraction Mechanisms - Rosca (1997)   (21 citations)  (Correct)

....for particular structures has been discussed in the area of non parametric statistical inference. In statistical terms this is the problem of learning with low bias or tabula rasa learning. However, low bias in the choice of models is paid for by a high variance, an example being neural networks [Geman et al. 1992]. Methods for balancing bias and variance include techniques that rely on a complexity penalty function which is added to the error term in order to promote parsimonious solutions. The basic idea is to trade the complexity of the model for its accuracy. This idea resonates with one of the ....

....model without a priori biasing for particular structures has been tackled in the area of non parametric statistical inference. In statistical terms this is the problem of learning with low bias or tabula rasa learning. However, low bias in the choice of models is paid for by a high variance (see [Geman et al. 1992] for an excellent introduction to the bias variance dilemma) Methods for 115 balancing bias and variance include techniques that rely on a complexity penalty function which is added to the error term in order to promote parsimonious solutions. The basic idea is to trade the complexity of the ....

S. Geman, E.Bienenstock, and R.Doursat, "Neural Networks and the Bias/Variance Dilemma," Neural Computation, (4):1--58, 1992.


An Overview of Radial Basis Function Networks - Ghosh, Nag (2000)   (3 citations)  (Correct)

....The key to valid generalization is to select a generalizer whose inductive bias is suited to the data, and then to match the model complexity with the complexity of the underlying map generating the data. The problem of model selection can be understood as a trade o between bias and variance [GBD92]. The generalization error can be decomposed into the sum of bias squared and variance. A typical trade o between these two components of generalization error is a function of the model complexity [GBD92] Too simple a model will have a high bias in the sense that the model, on the average will ....

....The problem of model selection can be understood as a trade o between bias and variance [GBD92] The generalization error can be decomposed into the sum of bias squared and variance. A typical trade o between these two components of generalization error is a function of the model complexity [GBD92]. Too simple a model will have a high bias in the sense that the model, on the average will di er considerably from the desired one, even though speci c instances of the model, obtained by changing the training data, initialization conditions etc, may hardly di er from one another. On the other ....

S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias/variance dilemma. Neural Computation, 4(1):1-58, 1992.


Error-Correcting Classification via ML-Techniques and.. - Utschick, Nossek   (Correct)

....of the classifier is the representation of classes in the decision space. Practitioners often prefer the trivial 1 out of K coding where each output of the classifier corresponds to one of all classes. This method directly corresponds to decision rules based on posterior probabilities of classes [5]. Provided the representation of classes is based on binary reference vectors t 2 f Gamma1; 1g J embedded in a real valued decision space D, the estimation of reference vectors (codewords) for classes [6] is equal to the optimal decomposition of polychotomies into dichotomies [7] e.g. class k ....

S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias/variance dilemma. Neural Computation, 4(1):1--58, 1992.


Boosting with the L_2-Loss: Regression and Classification - Bühlmann, Yu (2001)   (Correct)

No context found.

Geman, S., Bienenstock, E. and Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computations 4, 1-58.


The Learning Tree, a New Concept in Learning - Tomas Landelius Hans   (Correct)

No context found.

S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias/variance dilemma. Neural Computation, 4:1--58, 1992.


TD(λ) Converges with Probability 1 - Dayan, Sejnowski (1994)   (Correct)

No context found.

Geman, S, Bienenstock, E & Doursat, R (1991). Neural networks and the bias/variance dilemma. Neural Computation, 4, pp 1-58.


Combining Classifiers based on kernel density estimators - Acuna, Rojas   (Correct)

No context found.

Geman, S. Bienenstock, E. and Doursat, R. (1992), "Neural networks and the bias/variance dilemma". Neural Computation. 4, 1-58.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC