39 citations found. Retrieving documents...
Jaakkola, T. and Haussler, D. (1999). Probabilistic kernel regression models. In Conference on AI and Statistics.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Regularized Least-Squares Classification - Rifkin, Yeo, Poggio   (Correct)

....easily select the value of # which minimizes the leave oneout bound. Unfortunately, computing all the eigenvalues of K is in general much too expensive, so again, this technique is not practical for large problems. Jaakkola and Haussler introduced an interesting class of simple leave one bounds [32] for kernel classifiers. In words, their bound says that if a given training point x can be classified correctly by f S without using the contribution of x to f S , then x cannot be a leave one out error f S i (x i ) 0. Although the Jaakkola and Haussler bound does not apply directly to ....

T. Jaakkola and D. Haussler, Probabilistic kernel regression models, In Advances in Neural Information Processing Systems 11 (1998).


Regularized Least-Squares Classification - Rifkin, Yeo, Poggio   (Correct)

....easily select the value of # which minimizes the leave oneout bound. Unfortunately, computing all the eigenvalues of K is in general much too expensive, so again, this technique is not practical for large problems. Jaakkola and Haussler introduced an interesting class of simple leave one bounds [32] for kernel classifiers. In words, their bound says that if a given training point x can be classified correctly by f S without using the contribution of x to f S ,thenx cannot be a leave one out error f S i (x i ) 0. Although the Jaakkola and Haussler bound does not apply directly to RLSC ....

T. Jaakkola and D. Haussler, Probabilistic kernel regression models, In Advances in Neural Information Processing Systems 11 (1998).


Boosted Dyadic Kernel Discriminants - Moghaddam, Shakhnarovich (2002)   (Correct)

.... framework to consider is the theory of potential functions for pattern classification [1] in which potential fields of the form # i y i K(x, x i ) 1) are thresholded to predict classification labels, y = sign(H(x) In a probabilistic kernel regression framework recently proposed in [5], the coe#cients # that minimize the classification error are obtained by maximizing J(#) i,j # i # j y i y j K(x i , x j ) F (# i ) 2) where the potential function F is concave and continuous (corresponding to positive semi definite kernels) This framework subsumes SVMs, which ....

T. Jaakkola and D. Haussler. Probabilistic kernel regression models. In D. Heckerman and J. Whittaker, editors, Proc. of 7th International Workshop on AI and Statistics. Morgan Kaufman, 1999.


A Leave-one-out Cross Validation Bound for Kernel Methods with.. - Zhang (2001)   (Correct)

....if the training data are independently drawn from a xed underlying distribution, then the expected leave out out error equals the expected test error, which measures the generalization ability of the learning method. Leave one out bounds have received much attention recently. For example, see [3, 5, 6, 10] and references therein. Also in [5, 10] the leave one out analysis has already been employed to study the generalization ability of support vector classi cation. In this paper, we extend their results by deriving a general leave one out bound for a class of convex dual kernel learning machines ....

....from a xed underlying distribution, then the expected leave out out error equals the expected test error, which measures the generalization ability of the learning method. Leave one out bounds have received much attention recently. For example, see [3, 5, 6, 10] and references therein. Also in [5, 10], the leave one out analysis has already been employed to study the generalization ability of support vector classi cation. In this paper, we extend their results by deriving a general leave one out bound for a class of convex dual kernel learning machines and apply it to classi cation and ....

[Article contains additional citation context not shown here]

T. Jaakkola and D. Haussler. Probabilistic kernel regression models. In Proceedings of the 1999 Conference on AI and Statistics, 1999.


Sparsity of Data Representation of Optimal Kernel Machine and.. - Kowalczyk (2001)   (1 citation)  (Correct)

.... the number of support vectors to the generalization error of SVM via a bound on leave one out estimator [9] This result has been originally shown for a special case of classification with hard margin cost function (optimal hyperplane) The papers by Opper and Winther [10] Jaakkola and Haussler [6], and Joachims [7] extend Vapnik s result in the direction of bounds for classification error of SVM s. The first of those papers deals with the hard margin case, while the other two derive tighter bounds on classification error of the soft margin SVMs with ffl insensitive linear cost. In this ....

T. Jaakkola and D. Haussler. Probabilistic kernel regression models. In Proc. Seventh Work. on AI and Stat., San Francisco, 1999. Morgan Kaufman.


Learning to Predict the Leave-one-out Error of Kernel.. - Tsuda, Rätsch, Mika..   (Correct)

....to reliably select a good model as demonstrated in simulations on Support Vector and Linear Programming Machines. Comparisons to existing learning theoretical bounds, e.g. the span bound, are given for various model selection scenarios. I Introduction Numerous methods have been proposed [7, 8, 3, 5, 9] for model selection of kernelbased classifiers such as Support Vector Machines (SVMs) 7] and Linear Programming Machines (LPMs) Ill. They all try to find a reasonably good estimate of the generalization error to select the proper hyperparameters. The data dependent LOO error would in principle ....

....of learning machines, as it is an (almost) unbiased estimator of the true generalization error [6] Its computation is, unfortunately, for most practical cases prohibitively slow. There have been several attempts to approximate the leave one out error in closed form for SVM classifiers [8, 3, 5]. For example, a new type of bound was proposed that relies on the span of the SVs and was empirically found to perform best among the learning theoretical bounds [8] However, such approximations are limited to a special learning machine, i.e. SVM, and it seems difficult to provide a useful ....

[Article contains additional citation context not shown here]

T.S. Jaakkola and D. Haussler. Probabilistic kernel regression models. In Proceedings of the 1999 Conference on AI and Statistics, 1999.


Meta Learning: Learning to Predict the Leave-one-out Error - Tsuda, Rätsch, Mika, Müller   (Correct)

....data with an astonishing degree of accuracy as demonstrated in simulations. Comparisons to existing learning theoretical bounds, as e.g. the span bound, are given for model selection and LOO error prediction scenarios. 1 Introduction Numerous methods have been proposed for model selection [13, 12, 2, 6, 14, 5, 4]. They all try to find a reasonably good estimate of the generalization error to select the proper hyperparameters. The data dependent LOO error would in principle be ideal for selecting hyperparameters of learning machines, as it is an (almost) unbiased estimator of the true generalization error. ....

T.S. Jaakkola and D. Haussler. Probabilistic kernel regression models. In Proceedings of the 1999.


Learning to Predict the Leave-one-out Error of Kernel.. - Tsuda, Rätsch, Mika..   (Correct)

....to reliably select a good model as demonstrated in simulations on Support Vector and Linear Programming Machines. Comparisons to existing learning theoretical bounds, e.g. the span bound, are given for various model selection scenarios. I Introduction Numerous methods have been proposed [7, 8, 3, 5, 9] for model selection of kernelbased classifiers such as Support Vector Machines (SVMs) 7] and Linear Programming Machines (LPMs) Ill. They all try to find a reasonably good estimate of the generalization error to select the proper hyperparameters. The data dependent LOO error would in principle ....

....of learning machines, as it is an (almost) unbiased estimator of the true generalization error [6] Its computation is, unfortunately, for most practical cases prohibitively slow. There have been several attempts to approximate the leave one out error in closed form for SVM classifiers [8, 3, 5]. For example, a new type of bound was proposed that relies on the span of the SVs and was empirically found to perform best among the learning theoretical bounds [8] However, such approximations are limited to a special learning machine, i.e. SVM, and it seems difficult to provide a useful ....

[Article contains additional citation context not shown here]

T.S. Jaakkola and D. Haussler. Probabilistic kernel regression models. In Proceedings of the 1999 Conference on AI and Statistics, 1999.


Probabilistic Discriminative Kernel Classifiers for Multi-class.. - Roth (2001)   (1 citation)  (Correct)

....efficient way by applying approximative conjugate gradient inversion techniques, see cf. 12] p. 83. The availability of efficient approximation techniques from the well studied field of numerical linear algebra constitutes the main advantage over a related approach to kLOGREG, presented in [13]. The latter algorithm computes the optimal coefficients # i by a sequential approach. The problem with this on line algorithm, however, is the following: for each new observation x t , t = 1, 2, it imposes computational costs of the order O(t 2 ) Given a training set of N observations ....

T. Jaakkola and D. Haussler. Probabilistic kernel regression models. In David Heckerman and Joe Whittaker, editors, Procs. 7th International Workshop on AI and Statistics. Morgan Kaufmann, 1999.


Optimal Properties and Adaptive Tuning of Standard and.. - Wahba, Lin, Lee, Zhang (2002)   (2 citations)  (Correct)

....GW to the ## method of Joachims [12] which turned out to be closely related to the GACV. Code for the ## estimate is available in SV M light http: ais.gmd.de thorsten svm light . At about this time there was a lot of activity in the development of tuning methods, and a number of them [26] [11] [22] 12] 2] turned out to be related under various circumstances. We first review optimal classification in the two category classification problem. We describe the standard case, where the training set is representative of the general population, and the cost of misclassification is the same ....

....derivation of the GCV [4] 9] for Gaussian observations and for the GACV for Bernoulli observations [35] In [32] 20] 21] it was seen that a direct (non randomized) version was readily available, easy to compute, and worked well. At about same time, there were several other tuning results [3] [11] [12] 22] 26] which are closely related to each other and to the GACV in one way or another. We will discuss these later. The arguments below follow [32] The goal here is to obtain a proxy for the (unobservable) GCKL(#) of (1.10) Let f [ k] # be the minimizer of the form f = b h with h # ....

[Article contains additional citation context not shown here]

T. Jaakkola and D. Haussler.Probabilistic kernel regression models.In Proceedings of the


A Statistical Learning Model of Text Classification for Support.. - Joachims (2001)   (16 citations)  (Correct)

....the third step connects to large margin separation. 4.1 Step 1: Bounding the Expected Error Based on the Margin The following bound [14, 18] shows that large margin combined with low training error leads to high generalization accuracy. It uses results limiting the number of leave oneout errors [10, 13]. The key quantities are the margin as de ned in Section 2, the maximum Euclidean length R of the document vectors x, and the training loss P i . Theorem 1 (Bound on Expected Error of SVM) The expected error rate E(Err n (hSV M ) of a SVM based on n training examples with 0 jj x i jj ....

T. Jaakkola and D. Haussler. Probabilistic kernel regression models. In Conference on AI and Statistics, 1999.


Choosing Multiple Parameters for Support Vector Machines - Chapelle, Vapnik.. (2001)   (31 citations)  (Correct)

....of errors made by the leave one out procedure [18] T = N SV ; where N SV denotes the number of support vectors. 3.2. 2 Jaakkola Haussler bound For SVMs without threshold, analyzing the optimization performed by the SVM algorithm when computing the leave one out error, Jaakkola and Haussler [9] proved the inequality: y p (f 0 (x p ) f p (x p ) 0 p K(x p ; x p ) U p which leads to the following upper bound: T = 1 X p=1 ( 0 p K(x p ; x p ) 1) Note that Wahba et al. 21] proposed an estimate of the number of errors made by the leave one out procedure, which in ....

T. S. Jaakkola and D. Haussler. Probabilistic kernel regression models. In Proceedings of the 1999 Conference on AI and Statistics, 1999.


Gaussian Processes for Classification: Mean Field Algorithms - Opper, Winther (1999)   (11 citations)  (Correct)

.... (29) for TAP) 7 Support Vector Machine Digression The relation between variational problems in reproducing kernel Hilbert spaces (such as the learning in support vector machines) and Bayesian Gaussian processes is well known and has been pointed out by various authors, see e.g. Wahba, 1990; Jaakkola Haussler, 1999). We give a simple formulation of support vector learning in the realizable case (neglecting the bias) Boser, Guyon Vapnik, 1992; Vapnik, 1995; Scholkopf, Burges Smola, 1998) Based on the decomposition for the activation function h(x) eq. 5) one tries to find weights w ae such that t i ....

T. S. Jaakkola and D. Haussler, Probabilistic Kernel Regression Models, to appear in: Proceedings of the 1999 conference on AI and Statistics.


Frame, Reproducing Kernel, Regularization and Learning - Rakotomamonjy, Canu   (Correct)

....important as the choice of the learning machine. In fact, prior information on a specific problem can be used for choosing an e#cient input representation, or for choosing a good hypothesis space, that allows to enhance performance of the learning machine (Scholkopf, Simard, Smola Vapnik 1998, Jaakkola Haussler 1999, Niyogi, Girosi Poggio 1998) The purpose of this paper is to present a method for constructing a RKHS and its associated kernel by means of the Frame theory (Du#n Schae#er 1952, Daubechies 1992) A frame of a Hilbert Space allows to represent any vector of the space by linear combination of ....

Jaakkola, T. & Haussler, D. (1999). Probabilistic kernel regression models, Proceedings of the 1999 Conference on AI and Statistics.


Estimating the Generalization Performance of an SVM Efficiently - Joachims (2000)   (24 citations)  (Correct)

....on a Unix or MS Dos system. can t do much with the QuickTime stuff they also need the specs in a fromat usable magazines and references to books would Figure 1: Representing text as a feature vector. new estimators that overcome these problems, extending results of [Vapnik, 1998] chapter 10) and [Jaakkola and Haussler, 1999] to general SVMs. The new estimators are both accurate and can be computed eciently. After their theoretical justi cation, the estimators are experimentally tested on three text classi cation tasks in section 6. The experiments show that they very accurately re ect the actual behavior of SVMs on ....

.... for unbiased hyperplanes goes back to Vapnik (cf. Vapnik, 1998] pages 418 421) Unlike the work presented here, Vapnik s result is limited to the case where the training data is separable and it is used to derive bounds on the expected error not estimators. Jaakkola 12 and Haussler [Jaakkola and Haussler, 1999] present a generalized bound for inseparable data that is similar to that of lemma 1. Similarly, an approximation to the leave one out error of SVMs was recently proposed in [Wahba, 1999] Nevertheless, like Vapnik s bound both are restricted to unbiased hyperplanes and do not apply to regular ....

Jaakkola, T. and Haussler, D. (1999). Probabilistic kernel regression models. In Conference on AI and Statistics. 26


Bayesian methods for Support Vector Machines: Evidence and.. - Sollich (2000)   (7 citations)  (Correct)

....a large B. The above link between SVMs and GPs has been pointed out by a number of authors, e.g. Seeger, 2000; Opper and Winther, 2000) It can be understood from the common link to reproducing kernel Hilbert spaces (Wahba, 1998) and can be extended from SVMs to more general kernel methods (Jaakkola and Haussler, 1999). For connections to regularization operators see also (Smola et al. 1998) Before discussing the probabilistic interpretation of the second (loss) term in (1) let us digress briefly to the SVM regression case. There, the training outputs y i are real numbers, and each training example ....

Jaakkola, T. and D. Haussler: 1999, `Probabilistic kernel regression models'. In: D. Heckerman and J. Whittaker (eds.): Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics. San Francisco, CA.


Model Selection for Support Vector Machines - Chapelle, Vapnik (2000)   (25 citations)  (Correct)

....[1] the larger p is, the more important in the decision function the support vector x p is. Thus, it is not surprising that removing a point x p causes a change in the decision function proportional to its Lagrange multiplier p . The same kind of result as Theorem 2 has also been derived in [2], where for SVMs without threshold, the following inequality has been derived : y p (f 0 (x p ) f p (x p ) 0 p K(x p ; x p ) The span S p takes into account the geometry of the support vectors in order to get a precise notion of how important is a given point. The previous theorem ....

T. S. Jaakkola and D. Haussler. Probabilistic kernel regression models. In Proceedings of the 1999 Conference on AI and Statistics, 1999.


Choosing Multiple Parameters for Support Vector Machines - Chapelle, Vapnik.. (2000)   (31 citations)  (Correct)

....of errors made by the leave one out procedure [17] T = N SV ; where N SV denotes the number of support vectors. 3.2. 2 Jaakkola Haussler bound For SVMs without threshold, analyzing the optimization performed by the SVM algorithm when computing the leave one out error, Jaakkola and Haussler [8] proved the inequality: y p (f 0 (x p ) f p (x p ) 0 p K(x p ; x p ) U p which leads to the following upper bound: T = 1 X p=1 ( 0 p K(x p ; x p ) 1) Note that Wahba et al. 20] proposed an estimate of the number of errors made by the leave one out procedure, which in ....

T. S. Jaakkola and D. Haussler. Probabilistic kernel regression models. In Proceedings of the 1999 Conference on AI and Statistics, 1999.


Learning with Kernel Machine Architectures - Evgeniou (2000)   (1 citation)  (Correct)

....that to discriminate faces (or people) from non faces (non people) once the probabilistic model is decided. Very recently a general approach to the problem of constructing features for classification starting from probabilistic models describing the training examples has been suggested [ Jaakkola and Haussler, 1998b ] The choice of the features was made implicitly through the choice of the kernel to be used for a kernel classifier. In [ Jaakkola and Haussler, 1998b ] a probabilistic model for both the classes to be discriminated was assumed, and the results were also used when a model of only one class was ....

....i L(x #)# j L(x #) 90 where # i indicates the derivative with respect to the parameter # i . A natural set of features, # i , is found by taking the gradient of L with respect to the set of parameters, # i (x) I 1 2 #L(x #) ## . 5. 4) These features were theoretical motivated in [ Jaakkola and Haussler, 1998b ] and shown to lead to kernel classifiers which are at least as discriminative as the Bayes classifier based on the given generative model. We have assumed the generative model (5.3) and rewritten it with respect to the average image x 0 and the eigenvalues # n and obtain the set of features ....

T. Jaakkola and D. Haussler. Probabilistic kernel regression models. In Proc. of Neural Information Processing Conference, 1998.


Machine Learning Strategies for Complex Tasks - Campbell, Evgeniou, Heisele.. (2000)   (Correct)

....that the pixel values of the images are used to train a classifier, the question is which pixels (that is, which parts of the images) are more important Finding good methods for feature selection is a difficult problem. A heuristic is suggested in [16] and more formal methods are suggested in [26]. Selecting features can be important when many features are available, like in the case of image, speech, or video processing. In fact many learning methods suffer the curse of dimensionality . It turns out that an important characteristic of SVM and kernel machines developed within the ....

T. Jaakkola and D. Haussler. Probabilistic kernel regression models. In Proc. of Neural Information Processing Conference, 1998.


Statistical Learning Theory: a Primer - Evgeniou, Pontil (2000)   (Correct)

....does not provide a general method for finding good data representations, but suggests representations that lead to simple solutions. Although there is not a general solution to this problem, a number of recent experimental and theoretical works provide insights for specific applications [4] [8], 10] 15] ....

T. Jaakkola and D. Haussler. Probabilistic kernel regression models. In Proc. of Neural Information Processing Conference, 1998.


Image Representations for Object Detection Using.. - Evgeniou, Pontil.. (2000)   (6 citations)  (Correct)

....that to discriminate faces (or people) from non faces (nonpeople) once the probabilistic model is decided. Very recently a general approach to the problem of constructing features for classification starting from probabilistic models describing the training examples has been suggested [7]. The choice of the features was made implicitly through the choice of the kernel to be used for a kernel classifier. In [7] a probabilistic model for both the classes to be discriminated was assumed, and the results were also used when a model of only one class was available which is the case ....

....recently a general approach to the problem of constructing features for classification starting from probabilistic models describing the training examples has been suggested [7] The choice of the features was made implicitly through the choice of the kernel to be used for a kernel classifier. In [7] a probabilistic model for both the classes to be discriminated was assumed, and the results were also used when a model of only one class was available which is the case we have. Let us denote with L(x #) the log of the probability function and define the Fisher information matrix I = Z dxP ....

[Article contains additional citation context not shown here]

T. Jaakkola and D. Haussler. Probabilistic kernel regression models. In Proc. of Neural Information Processing Conference, 1998.


Probabilistic methods for Support Vector Machines - Sollich (2000)   (14 citations)  (Correct)

....0 ) i B 2 = OE(x) DeltaOE(x 0 ) B 2 . The SVM prior is therefore simply a Gaussian process (GP) over the functions , with covariance function K(x; x 0 ) OE(x) Delta OE(x 0 ) B 2 (and zero mean) This correspondence between SVMs and GPs has been noted by a number of authors, e.g. [6, 7, 8, 9, 10]. The second term in (1) becomes a (negative) log likelihood if we define the probability of obtaining output y for a given x (and ) as Q(y= Sigma1jx; C) exp[ GammaCl(y (x) 2) We set (C) 1= 1 exp( Gamma2C) to ensure that the probabilities for y = Sigma1 never add up to a value ....

T S Jaakkola and D Haussler. Probabilistic kernel regression models. In Proceedings of The 7th International Workshop on Artificial Intelligence and Statistics. To appear.


Model Selection for Support Vector Machines - Chapelle, Vapnik (2000)   (25 citations)  (Correct)

....the larger ff p is, the more important in the decision function the support vector x p is. Thus, it is not surprising that removing a point x p causes a change in the decision function proportional to its Lagrange multiplier ff p . The same kind of result as Theorem 2 has also been derived in [2], where for SVMs without threshold, the following inequality has been derived : y p (f 0 (x p ) Gamma f p (x p ) ff 0 p K(x p ; x p ) The span S p takes into account the geometry of the support vectors in order to get a precise notion of how important is a given point. The previous ....

T. S. Jaakkola and D. Haussler. Probabilistic kernel regression models. In Proceedings of the 1999 Conference on AI and Statistics, 1999.


Natural Regularization in SVMs - Oliver, Schölkopf, Smola (2000)   (Correct)

....Limited St. George House, 1 Guildhall Street Cambridge CB2 3NH, UK http: www.research.microsoft.com bsc Department of Engineering Australian National University Canberra 0200 ACT, Australia http: spigot.anu.edu.au smola Abstract Recently the so called Fisher kernel was proposed by [6] to construct discriminative kernel techniques by using generative models. We provide a regularization theoretic analysis of this approach and extend the set of kernels to a class of natural kernels, all based on generative models with density p(xj ) like the original Fisher kernel. This ....

....[5] etc. have become standard tools of applied machine learning technology, leading to record benchmark results in a variety of domains. However, until recently, these two strands have been largely separated. A promising approach to combine the strengths of both worlds was made in the work of [6]. The main idea is to design kernels inspired by generative models. In particular, they propose to use a so called Fisher kernel to give a natural similarity measure taking into account an underlying probability distribution. Since defining a kernel function automatically implies assumptions ....

[Article contains additional citation context not shown here]

T. S. Jaakkola and D. Haussler. Probabilistic kernel regression models. In Proceedings of the 1999 Conference on AI and Statistics, 1999.


A unified framework for Regularization Networks and.. - Evgeniou, Pontil, Poggio (1999)   (13 citations)  (Correct)

....the problem we solve in practice, as we described in sections 4 and 5. Since the value # n # (l) is not known in practice, we can only implement the extended SRM approximately by minimizing (67) with various values of # and then picking the best # using techniques such as cross validation [1, 98, 99, 47], Generalized Cross Validation, Finite Prediction Error and the MDL criteria (see [94] for a review and comparison) Summarizing, both the RN and the SVMR methods discussed in sections 4 and 5 can be seen as approximations of the extended SRM method using the V # dimension, with nested hypothesis ....

....second term of the minimized functional enforces the smoothness of f . The first term in equation (94) is the empirical error, while the second term is usually called the smoothness functional since it enforces some sort of smoothness. Various methods for choosing # are proposed in the literature [1, 98, 99, 47, 94]. Under some conditions on the regularization parameter #, it can be shown [90] that as the number of training examples increases the minimizer of equation (94) converges to the exact solution f in the space D(##1 To summarize, to solve the ill posed problem of learning from examples using ....

T. Jaakkola and D. Haussler. Probabilistic kernel regression models. In Proc. of Neural Information Processing Conference, 1998.


Regularization Networks and Support Vector Machines - Evgeniou, Pontil, Poggio (2000)   (46 citations)  (Correct)

....holds: Theorem 6.4. Vapnik, 1998) The expected misclassification risk of the SVM trained on m data points sampled from X Y according to a probability distri 21 All these bounds are not tight enough in practice. 22 Further distribution dependent results have been derived recently see [47,16,34]. T. Evgeniou et al. Regularization Networks and Support Vector Machines 35 bution P (x, y) is bounded by: E 8 : min SV l 1 , R 2 l 1 #(l 1) l 1 9 = where the expectation E is taken over P (x, y) This theorem can also be used to justify the current ....

....problem specific invariances are known to hold a priori. Niyogy et al. 62] showed how several invariances can be embedded in the stabilizer or, equivalently, in virtual examples (see for a related work on tangent distance [89] and [84] Generative probabilistic models. Jaakkola and Haussler [47] consider the case in which prior information is available in terms of a parametric probabilistic model P (x, y) of the process generating the data. They argue that good features for classification are the derivatives of log P with respect to the natural parameters of the distributions at the data ....

T. Jaakkola and D. Haussler. Probabilistic kernel regression models. In Proc. of Neural Information Processing Conference, 1998.


Leave-One-Out Support Vector Machines - Weston (1999)   (2 citations)  (Correct)

....Support Vector Machines Jason Weston Department of Computer Science Royal Holloway, University of London, Egham Hill, Egham, Surrey, TW20 OEX, UK. Abstract We present a new learning algorithm for pattern recognition inspired by a recent upper bound on leave one out error [ Jaakkola and Haussler, 1999 ] proved for Support Vector Machines (SVMs) Vapnik, 1995; 1998 ] The new approach directly minimizes the expression given by the bound in an attempt to minimize leave one out error. This gives a convex optimization problem which constructs a sparse linear classifier in feature space using ....

....1997; Vapnik, 1998 ] In this algorithm it turned out to be favourable to formulate the decision functions in terms of a symmetric, positive definite, and square integrable function k( Delta; Delta) referred to as a kernel. The class of decision functions also known as kernel classifiers [ Jaakkola and Haussler, 1999 ] is then given by f(x) sign X i=1 ff i y i k(x i ; x) ff 0; 1) where training data x i 2 R N and labels y i 2 f Sigma1g: For simplicity we ignore classifiers which use an extra threshold term. Recently, utilizing this particular type of decision rule (that each training ....

[Article contains additional citation context not shown here]

Tommi S. Jaakkola and David Haussler. Probabilistic kernel regression models. In Proceedings of the 1999 Conference on AI and Statistics. Morgan Kaufmann, 1999.


Maximum Entropy Discrimination - Jaakkola, Meila, Jebara (1999)   (33 citations)  Self-citation (Jaakkola)   (Correct)

....error # of the MRE classi er satis es e # # E# fraction of non zero Lagrange multipliers # (13) where the expectation is over the choice of the training set. Practical leave one out cross validation estimates of the generalization error can be derived on the basis of this result (cf. [21,12]) Wemay also make use of generalization error results derived for convex combination of classi ers [20] to obtain more informative generalization bounds for MRE classi ers. The details are left for another paper. 3 Practical realization of the MRE solution Wenow turn to the question of ....

Jaakkola T. and Haussler D. (1998). Probabilistic kernel regression models. In Proceedings of The Seventh International Workshop on Arti cial Intelligence and Statistics.


Maximum Entropy Discrimination - Jaakkola, Meila, Jebara (1999)   (33 citations)  Self-citation (Jaakkola)   (Correct)

....error g of the MRE classi er satis es e g Ef fraction of non zero Lagrange multipliers g (13) where the expectation is over the choice of the training set. Practical leave one out cross validation estimates of the generalization error can be derived on the basis of this result (cf. [21, 12]) We may also make use of generalization error results derived for convex combination of classi ers [20] to obtain more informative generalization bounds for MRE classi ers. The details are left for another paper. 3 Practical realization of the MRE solution We now turn to the question of ....

Jaakkola T. and Haussler D. (1998). Probabilistic kernel regression models. In Proceedings of The Seventh International Workshop on Arti cial Intelligence and Statistics.


Maximum Entropy Discrimination - Jaakkola, Meila, Jebara (1999)   (33 citations)  Self-citation (Jaakkola)   (Correct)

....error g of the MRE classi er satis es e g Ef fraction of non zero Lagrange multipliers g (13) where the expectation is over the choice of the training set. Practical leave one out cross validation estimates of the generalization error can be derived on the basis of this result (cf. [21, 12]) We may also make use of generalization error results derived for convex combination of classi ers [20] to obtain more informative generalization bounds for MRE classi ers. The details are left for another paper. 3 Practical realization of the MRE solution We now turn to the question of ....

Jaakkola T. and Haussler D. (1998). Probabilistic kernel regression models. In Proceedings of The Seventh International Workshop on Articial Intelligence and Statistics.


Convolution Kernels on Discrete Structures - Haussler (1999)   (48 citations)  Self-citation (Haussler)   (Correct)

.... showing that most standard classification, clustering and regression methods can be kernelized , that is, they can be accomplished without ever explicitly representing the feature vector fOE n (x)g n1 , relying instead only on indirect computations of the kernel K(x; y) or the distance d(x; y) [28, 31, 2, 13, 23, 30, 17] (see also the bibliography at http: svm.first.gmd.de. The kernels and corresponding distance functions we construct are suitable for all such methods. In particular, there is a 1 1 correspondence between kernels and Gaussian processes defined on the set X [3, 32, 21] We do not pursue this ....

T. Jaakkola and D. Haussler. Probabilistic kernel regression models. In Proc. of the Seventh Int. Workshop on AI and Statistics, 1998. To appear.


The Maximum-Margin Approach to Learning Text Classifiers -.. - Joachims (2000)   (17 citations)  (Correct)

No context found.

Jaakkola, T. and Haussler, D. (1999). Probabilistic kernel regression models. In Conference on AI and Statistics.


A Simple Method For Estimating Conditional Probabilities For.. - Stefan Uping Cs   (Correct)

No context found.

T. S. Jaakkola and D. Haussler. Probabilistic kernel regression models. In Proceedings of the 1999.


A Simple Method for Estimating Conditional Probabilities for SVMs - Rüping (2004)   (Correct)

No context found.

T. S. Jaakkola and D. Haussler. Probabilistic kernel regression models. In Proceedings of the 1999.


Gaussian Processes for Machine Learning - Seeger (2004)   (Correct)

No context found.

Tommi Jaakkola and David Haussler. Probabilistic kernel regression models. In D. Heckerman and J. Whittaker, editors, Workshop on Arti cial Intelligence and Statistics 7. Morgan Kaufmann, 1999.


Model Selection for Support Vector Machine Classification - Gold, Sollich (2002)   (Correct)

No context found.

T Jaakkola and D Haussler. Probabilistic kernel regression models. In David Heckerman and Joe Whittaker, editors, Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics, San Francisco, CA, 1999. Morgan Kaufmann.


Bayesian Gaussian Process Models: PAC-Bayesian Generalisation.. - Seeger (2003)   (3 citations)  (Correct)

No context found.

Tommi Jaakkola and David Haussler. Probabilistic kernel regression models. In D. Heckerman and J. Whittaker, editors, Workshop on Artificial Intelligence and Statistics 7. Morgan Kaufmann, 1999.


Learning with Kernel Machine Architectures - Evgeniou (2000)   (1 citation)  (Correct)

No context found.

T. Jaakkola and D. Haussler. Probabilistic kernel regression models. In Proc. of Neural Information Processing Conference, 1998.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC