| J. Rissanen. Stochastic complexity and modeling. Annals of Statistics, 14:1080--1100, 1986. |
....of recursiveness leads to a slightly more computable approach than the more general case. However, while M is enumerable, it is not recursive, and thus practically infeasible. This drawback inspired less general yet practically more feasible principles of minimum description length (MDL) [27, 17] as well as priors derived from time bounded restrictions [15] of Kolmogorov complexity [12, 24, 4] No particular instance of these approaches, however, is universally accepted or has a general convincing motivation that carries beyond rather specialized application scenarios. For instance, ....
J. Rissanen. Stochastic complexity and modeling. The Annals of Statistics, 14(3):1080-1100, 1986.
....Berkeley, CA 94720. L. Li is Supported by the NSF grant DMS 9971698, and B. Yu is supported by the grants NSF DMS 9803063 and ARO DAAG55 98 1 0341. 1 Introduction and background The Minimum Description Length (MDL) principle is introduced by Rissanen as a fundamental principle to model data, see [15, 17] and the reference list in [18] If we encode data from a source by prefix codes, the best code is the one that achieves the minimum description length among all prefix codes if there is such a one. Because of the equivalence between a prefix codelength and the negative logarithm of the ....
....and 0 is an interior point of Theta. Then the codelength of the optimal code is given by Shannon s coding theorem as L 0 = Gamma i=1 log p(X i j 0 ) The codelength corresponding to a given distribution Q(x) is LQ = Gamma i=1 log q(X i ) and the redundancy is RQ = LQ Gamma L 0 . Rissanen [17] shows that for each positive number ffl and for all 0 2 Theta, except in a set whose volume goes to zero as n Gamma 1, E P 0 RQ d Gamma ffl log n: Later Barron and Hengartner ( 2] prove that except for a set of parameter values with a volume zero, lim n 1 EP 0 RQ 2 log n 1 : ....
[Article contains additional citation context not shown here]
J. Rissanen. Stochastic complexity and modeling. Annals of Statistics, 14:1080--1100, 1986.
....Fastest Way of Describing Objects Unfortunately, while M and the more general priors of Section 4 are computable in the limit, they are not recursive, and thus practically infeasible. This drawback inspired less general yet practically more feasible principles of minimum description length (MDL) [68, 41] as well as priors derived from time bounded restrictions [31] of Kolmogorov complexity [28, 59, 8] No particular instance of these approaches, however, is universally accepted or has a general convincing motivation that carries beyond rather specialized application scenarios. For instance, ....
J. Rissanen. Stochastic complexity and modeling. The Annals of Statistics, 14(3):1080-1100, 1986.
....of recursiveness leads to a slightly more computable approach than the more general case. However, while M is enumerable, it is not recursive, and thus practically infeasible. This drawback inspired less general yet practically more feasible prin ciples of minimum description length (MDL) [27, 17] as well as priors derived from time bounded restrictions [15] of Kolmogorov complexity [12, 24, 4] No par ticular instance of these approaches, however, is universally accepted or has a general convincing motivation that carries beyond rather specialized application scenarios. For instance, ....
J. Rissanen. Stochastic complexity and modeling. The Annals of Statistics, 14(3):1080-1100, 1986.
.... Moreover, since SQDF is derived from the maximum likelihood estimation, it is not only appropriate as a classifier, but it also can be used for model complexity identification with information criterion such as Akaike s Information Criterion (AIC) 13] or Minimum Description Length (MDL) [14]. 2.3 Model Identification In SQDF, the only parameter that is not determined is k, that is, the number of reliable eigenvalues. The other parameters are calculated automatically with samples. Of course, k can be chosen arbitrarily or experimentally. In recognition systems which handle large ....
Rissanen, J.: Stochastic Complexity and Modeling. The Annals of Statistics 14 (1986) 1080--1100
....output consisted of 20 joint angles of a human hand linearly encoded by nine values using Principal Component Analysis (PCA) In this experiment, the number of specialized functions was set to 20. This number was found to be optimal in the sense of the Minimum Description Length (MDL) principle [32]; an exhaustive search is impractical, so we find this number via approximate search. Each mapping function was a one hidden layer, feed forward network (multi layer perceptron) with seven hidden neurons. To measure the accuracy of the hand pose reconstruction, we randomly selected approximately ....
J. Rissanen. Stochastic complexity and modeling. Annals of Statistics, 14,1080-1100, 1986.
....this ordering. In [3] an 6 algorithm has been presented for optimizing an ordering for this purpose of removing arcs from a given network structure. 4 A Minimum Description Length Approach Another way to judge the quality of a network structure is by the minimum description length principle [18, 19] which stems from coding theory where the aim is to create a network structure that describes the database as accurately as possible with as few symbols as possible. 4.1 The MDL Measure The MDL principle results in the following measure. Definition 4.1 Let U , B S , D, N , n, r i , N ijk , and ....
J. Rissanen. Stochastic complexity and modeling. Annals of Statistics, 14(3):1080-- 1100, 1986.
....for the number of models and parameters. The log of these prior probabilities adds to the likelihood calculation and essential creates an a priori model cost. What the likelihood approach requires is a principle for choosing these model costs. The minimum description length principle [22,21], or MDL, is one of the few general principles for choosing the model cost. This principle states that the cost of the model is related to the number of parameters or bits it takes to encode the model. In general, simpler explanations are preferred, therefore the likelihood should be reduced by ....
Jorma Rissanen. Stochastic complexity and modeling. The Annals of Statistics, 14(3):1080--1100, 1986.
....the tree. In C4.5 a continuous valued feature may be used again down the tree. To nd multiple cut points, the method which is explained above is applied to each of the subsets found recursively. For stopping Fayyad and Irani use Minimum Description Length Principle [Quinlan and Rivest, 1989, Rissanen, 1986, Rissanen, 1978] A new cut point will be created for a subset if : Gain(S) log 2 N 1 4(S) 2.24) Gain(S) E(S) E p (S) 2.25) 4(S) log 2 3 2 [kE(S) k 1 E(S left ) k 2 E(S right ) 2.26) 29 2.2.7 Softening Thresholds for Continuous Valued Features For continuous feature values ....
Rissanen, J. (1986). Stochastic complexity and modeling. Ann. of Statist., 14(3):1080-1100.
.... Many approaches to the segmentation of SAR images have been reported including those based on statistical based methods [24] fractals [10] and neural networks [15] Information theoretic based approaches include simple thresholding [6, 5] Rissanen s minimum description length (MDL) principle [22], and optimal component selection [20, 4] The performance complexity method of comparing alternate approaches to ATR can accomodate such methods of segmentation. A direct extension of this framework to other complexity measures, such as chip processing rate [19] and to the comparison of other ....
J. Rissanen. Stochastic complexity and modeling. The Annals of Statistics, 14(3):1080--1100, 1986.
....learning algorithms. The comparison shows that CV and MLPs are capable of performing better than many of the learning algorithms which are frequently employed in the fields of machine learning and neural networks. The other learning methods compared against are c4 [4] 12] c4.5 [2] ib1[3][6], mml [4] 12] and cn2 [5] 10] The results for these algorithms are taken from [13] The average generalization accuracy for CV is better than any of the other learning algorithms compared against ( 95 confidence level) c4 c45 ib1 mml cn2 CV(2,20) 84.57 84.68 84.00 85.85 80.74 86.07 Table ....
Rissanen, Jorma (1986), Stochastic Complexity and Modeling. The Annals of Statistics, vol 14, no 3, pp 10801100.
.... (n) k 2 log n O(1) 14) Usually (14) holds uniformly for all sequences x 1 ; x 2 ; we sometimes need to restrict ourselves to a compact subset of k in order to make (14) uniformly true) It is also known that (14) is in some sense (up to O(1) the best regret that can be achieved [22, 23]. Therefore, every code that achieves (14) is usually called a universal code , and its corresponding distribution universal model . Until very recently there were four known ways to construct a universal model for a given class M k : the two part code, the Bayesian mixture code , the ....
J. Rissanen. Stochastic complexity and modeling. The Annals of Statistics, 14:1080-1100, 1986.
....and the output F is generally a non linear activation function g of the weighted sum of the 1 Neural network complexity is an intuitive but still vaguely defined concept. Discussions on using the number of (effective) parameters as a measure of model complexity, for example, can be found in [15, 16] and references therein. 8 inputs minus a given threshold, that is, F = F (b 1 ; b M ) g 0 M X j=1 w j b j Gamma t 1 A ; 18) where w j is the weight of input b j and t the threshold. Thia neuron is illustrated in Figure 5. A commonly used activation function is g = tanh ....
J. Rissanen, Stochastic Complexity and Modeling. The Annals of Stochastics, 1986, 14, No. 3 1080-1100.
....introduced bySchwarz (1978) is the Bayesian information criterion (BIC) This metric is defined as M BIC (G# C) logMML (G# C) 1 2 Dim(G)logN where N is the number of cases in C. Theorem 5 M BIC is scoreequivalent. The proof of Theorem 5 is identical to the proof of Theorem 4. Rissanen (1986) presents two scoring metrics using the principle of minimum description length (MDL) One of these metrics, originally presented in Rissanen (1978) has recently received some attention in the literature. A version of this metric explored by Bouckaert (1993b) is MMDL1 (G# C) log p(G h ) ....
Rissanen, J. (1986). Stochastic complexity and modeling. The Annals of Statistics, 14(3):1080--1100.
....issue is discussed in more detail in Section 2. In our earlier work [15] we demonstrated empirically that marginal likelihood can be in practice a poor model selection criterion for classification domains, and that model selection criteria based on prequential (predictive sequential) approaches [5, 6, 7, 19] or cross validation [23, 9] lead to more accurate predictive models. In this paper we extend and elaborate our previous work in two ways. First, instead of constraining ourselves to simple variants of the Naive Bayes model, here we change the model family to consist of more complex finite mixture ....
....reported in [15] and demonstrate similar behavior with the mixture models as with the Naive Bayes model: supervised model selection criteria clearly outperform the unsupervised marginal likelihood criterion also in this case. The results also suggest that the greedy heuristic suggested in [19, 20] for handling the ordering problem, or the simple variants considered here, do not yield satisfactory results in practice, but more efficient solutions are needed. In this set of experiments, better results were obtained by averaging the prequential criterion over a number of random orderings. The ....
[Article contains additional citation context not shown here]
J. Rissanen. Stochastic complexity and modeling. Annals of Statistics, 14(3):1080--1100, September 1986.
.... are: i) signal detection in the presence of noise, where certain parameters of the desired signal (e.g. amplitude, phase, Doppler shift) are unknown [9] 28] ii) pattern recognition problems like speech recognition [20] and optical character recognition [25] iii) model order selection [1] [21], for instance, estimating the order of a Markov process [16] and (iv) universal decoding in the presence of channel uncertainty [2, Chap. 2, Sect. 5] 5] 11] 14] 30] The latter application, which will receive special attention in this paper, is actually the one that motivated our general ....
J. Rissanen, \Stochastic complexity and modeling," Ann. Statist., vol. 14, no. 3, pp. 1080-1100, 1986.
.... 1) F (n) 9) F (n) E n log Z n . 10) Z n = # W exp( n L n (w) n L n (w # ) #(w)dw. 11) This relation eq. 9) which is proven by the direct calculation (see Appendix) claims that V (w # ) in eq. 8) is equal to the increase of the stochastic complexity F (n) in eq. 10) [19] [32] Here F (n) is often called the free energy, ABIC [2] the logarithm of Evidence [14] or the Bayesian factor [21] which plays an important role in information theory, Bayesian statistics, and neural network learning. 4 2.2 Non identifiable Learning Machines In this paper, we consider ....
J.Rissanen,"Stochastic complexity and modeling," Annals of Statistics, Vol.14, pp.1080-1100, 1986.
.... ) k 2 log n O(1) 14) Usually (14) holds uniformly for all sequences x 1 ; x 2 ; we sometimes need to restrict ourselves to a compact subset of k in order to make (14) uniformly true) It is also known that (14) is in some sense (up to O(1) the best code length that can be achieved [22, 23]. Therefore, every code that achieves (14) is usually called a universal code , and its corresponding distribution universal model . Until very recently there were four known ways to construct a universal model for a given class M k : the two part code, the Bayesian mixture code , the ....
J. Rissanen. Stochastic complexity and modeling. Annals of Statistics, 14:1080-1100, 1986.
....function is known as the penalty function and when summed with the MSE, provides a global minima which is taken to be the correct model order. The variety of these techniques comes from the variety of penalty functions used [47] Two of the more popular are Minimum Description Length (MDL) [61, 63, 62] and Akaike Information Criteria (AIC) 2, 1] Rissanen s MDL technique s popularity 84 stems from its ties to Kolmogorov complexity while Akaike s popularity stem from ties with Information Theory and the Kullback Leibler distance. 8.1.2 Statistical Approaches to Model Order Estimation The ....
Jorma Rissanen. Stochastic complexity and modeling. The Annals of Statistics, 14(3):1080--1100, 1986.
....the optimal code for that source. 3 asymptotic form D# R#=A2 #2 R=k , for large R. This implies that ## R # #= k 2 log n n O#n #1 #: This is the familiar achievable redundancy result of Rissanen and others for universal noiseless coding of finitely parametrized sources [15, 33, 16, 47, 48]. We show the same result for universal quantization. Furthermore, we show that in infinite dimensional cases (e.g. noiseless coding of countably infinite alphabets and variable rate vector quantization) the distortion rate trade off has the asymptotic power law form D# R#=A R #b ....
.... rate of convergence of the nth order OPTA to the OPTA is best left as a separate issue of interest in its own right [40, 41, 36] In the case of noiseless coding, Rissanen and others have provided the optimal rate of convergence of the nth order rate redundancy to zero, when # is a subset of R k [15, 33, 16, 47, 48]. The following theorem is a slight generalization of [47, Theorem 1b] which we shall call Rissanen s achievability theorem. Theorem 2 (Rissanen) Let # be an open subset of R k . Suppose that for each # 2 # the nth order relative entropies D n ##jj ## as functions of # 2 # are twice ....
[Article contains additional citation context not shown here]
J. Rissanen. Stochastic complexity and modeling. Annals of Statistics, 14:1080--1100, September 1986.
....attempts to identify 2 the optimal subclass from which to choose the nal hypothesis. There have been a variety of methods proposed for choosing the optimal subclass, but most techniques fall into one of two basic categories: complexity penalization (e.g. the minimum description length principle [Ris86] and various statistical selection criteria [FG94] and hold out testing (e.g. cross validation and bootstrapping [Efr79] Regularization is similar to model selection except that one does not impose a discrete decomposition on the base hypothesis class. Instead a penalty criterion is imposed ....
J. Rissanen. Stochastic complexity and modeling. Annals of Statistics, 14:1080-1100, 1986.
....suggests that the design should take into account some measure of the simplicity, or parsimony, of the solution, in addition to performance on the training set. In one basic approach, penalty terms are added to the training cost, either to directly favor the formation of a small model [1] [85], or to do so indirectly via regularization smoothness constraints or other costs which measure overspecialization. A second common approach is to build a large model, overspecialized to the training set, and then attempt to undo some of the training by retaining only the vital model structure, ....
J. Rissanen, "Stochastic complexity and modeling," Ann. Statist., vol. 14, pp. 1080--1100, 1986.
....model selector (Shibata, 1981) have the same linear approximation. Some other criteria which provide a di erent asymptotic behavior have been proposed including RIC (Foster and George, 1994) BIC (Schwartz, 1978) as well as criteria derived from the Minimum Description Length (MDL) principle (Rissanen, 1986; Barron, Rissanen and Yu, 1998) During the same years, a general theory of minimizing the empirical risk (for any set of functions, any loss functions, and any number of samples) has been constructed (Vapnik, 1982) In the framework of this theory, the method of Structural Risk Minimization for ....
Rissanen, J. (1986). Stochastic complexity and modeling. Annals of Statistics, 14:1080-1100.
....5, 15 for cases (1) 2) and (2a) respectively and (b) for case (2) and (2a) we chose OE k to be multi layer perceptrons with 16 hidden neurons. Note that several model selection approaches could be used instead to choose the number of parameters of the architecture (e.g. Minimum Description Length [18]) Fig. 2 shows the body pose estimates obtained in several single images coming from two different sequences at specific orientations (due to space limitations case (2) is not included, in this case its performance is comparable with the rest) The agreement between pose estimates and ....
J. Rissanen. Stochastic complexity and modeling. Annals of Statistics, 14,1080-1100, 1986.
....5, 15 for cases (1) 2) and (2a) respectively and (b) for case (2) and (2a) we chose S to be multi layer perceptrons with 16 hidden neurons. Note that several model selection approaches could be used instead to choose the number of parameters of the architecture (e.g. Minimum Description Length [18]) Fig. 2 shows the body pose estimates obtained in several single images coming from two different sequences at specific orientations (due to space limitations case (2) is not included, in this case its performance is comparable with the rest) The agreement between pose estimates and ....
J. Rissanen. Stochastic complexity and modeling. Annals of Statistics, 14,1080-1100, 1986.
.... well as several theorems relating probabilities to complexities see also Chaitin s and G acs independent papers on prefix complexity and m [24, 17] Solomonoff s work on inductive inference helped to inspire less general yet practically more feasible principles of minimum description length [68, 46] as well as time bounded restrictions of Kolmogorov complexity, e.g. 28, 69, 39] as well as the concept of logical depth of x, the runtime of the shortest program of x [6] Equation (14) makes predictions of the entire future, given the past. This seems to be the most general approach. ....
J. Rissanen. Stochastic complexity and modeling. The Annals of Statistics, 14(3):1080-- 1100, 1986.
....then for all positive numbers e and all 0 2 k, except in a set whose volume goes to zero as n , Eo log f(x ; 0, k e logn. q(x n) 2 The mean is taken relative to f(x ; O, 47 We give the original proof, which can be generalized to the proof of a stronger version of the theorem, [23]. There exists an elegant proof of an extension of the theorem in [16] Proof: Consider a partition of the set fk into k dimensional hypercubes of edge length A n c x , where c is a constant. Let the, say ran, centers of these hypercubes form the set f(An) 0 l, 02, and write ....
Rissanen, J. (1986), 'Stochastic Complexity and Modeling', Annals of Statistics, Vol 14, 1080- 1100
....designed with the resulting model. This was done in the special case of classification models by Wallace and Boulton [1] as early as 1968 and for general parametric model classes later by Rissanen [2] While quite crude, such a construct was shown to be asymptotically optimal in a strong sense [3]. We give below better constructs, which have certain optimum properties even non asymptotically. These also appear to provide unsurpassed results in practice. The same principle can also be applied to hypothesis testing, for which it brings the important simplification that composite hypotheses ....
....function f (x n ) as a solution to Shtarkov s minimax problem appears to be too weak to qualify it as a universal model, which is supposed to provide the shortest code length for the data obtainable with the model class. However, there is an extension of Shannon s noiseless coding theorem [3], stating that the mean E # log f (x n ) approaches the entropy of the datagenerating process f (x n ; #) at the fastest possible rate, no matter which process in the class you pick, except for a set of the parameters of measure zero. Moreover, this length is also asymptotically the ....
[Article contains additional citation context not shown here]
Rissanen, J. (1986) Stochastic complexity and modeling. Ann. Statist., 14, 1080--1100.
No context found.
Rissanen, J. (1986), `Stochastic Complexity and Modeling', Annals of Statistics, Vol 14, 1080-1100
No context found.
J. Rissanen. Stochastic complexity and modeling. Annals of Statistics, 14:1080--1100, 1986.
No context found.
Rissanen, J. (1986a). Stochastic complexity and modeling. Ann. Statist., 14, 1080--1100.
No context found.
J. Rissanen. Stochastic complexity and modeling. Annals of Statistics, 14:1080--1100, 1986.
No context found.
Rissanen, J. (1986). Stochastic complexity and modeling. Annals of Statistics, 14(3), 1080 -- 1100.
No context found.
J. Rissanen, "Stochastic complexity and modeling," Ann. Statist., vol. 14, pp. 1080--1100, 1986.
No context found.
Rissanen, J. (1986), "Stochastic complexity and modeling", Annals of Statistics, 14(3), 1080-1100.
No context found.
J. Rissanen, "Stochastic complexity and modeling," Ann. Statist., vol. 14, no. 3, pp. 1080--1100, 1986.
No context found.
J. Rissanen. Stochastic complexity and modeling. Annals of Statistics, 14(3):1080--1100, 1986.
No context found.
J. Rissanen, "Stochastic complexity and modeling," Annals of Statistics, vol. 14, pp. 1080--1100, Sept. 1986.
No context found.
J. Rissanen. Stochastic complexity and modeling. The Annals of Statistics, 14(3):1080-1100, 1986.
No context found.
J. Rissanen, Stochastic complexity and modeling. Annals of Statistics, 14:1080--1100, 1986.
No context found.
J. Rissanen. Stochastic Complexity and Modeling. Annals of Statistics, 14(3):1080-- 1100, 1986.
No context found.
J. Rissanen. Stochastic complexity and modeling. Ann Stat, 14:1080--1100, 1986.
No context found.
J. Rissanen. Stochastic complexity and modeling. Annals of Statistics, 14(3):1080--1100, 1986.
No context found.
Rissanen, J. (1986). Stochastic Complexity and Modeling, The Annals of Statistics 14(3), 1080-1100.
No context found.
Rissanen, J. (1986) Stochastic complexity and modeling. Annals of Statistics, 14, 1080-1100.
No context found.
J. Rissanen. Stochastic complexity and modeling. The Annals of Statistics, 14(3):1080-1100, 1986.
No context found.
J. Rissanen, \Stochastic complexity and modeling," Annals of Statistics, vol. 14, pp. 1080{ 1100, Sept. 1986.
No context found.
J. Rissanen, "Stochastic complexity and modeling," Ann. Statist. vol. 14, pp. 1080#1100, 1986.
No context found.
Rissanen, J. #1986#. Stochastic complexity and modeling. Annals of Statistics, 14, 3, 1080#1100.
No context found.
Rissanen, J. (1986b),`Stochastic complexityand modeling', Annals of Statistics , 14,1080-- 1100.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC