| B. Scholkopf, J. Platt, J. Shawe-Taylor, A. Smola, and R. Williamson. Estimating the support of a high-dimensional distribution. In Technical Report 99-87, Microsoft Research, 1999. |
....Mesh Surface Coordinate Figure 2: The spin image for point P is constructed by accumulating in a 2 D histogram the coordinates l and of a set of contributing points (such as Q) on the mesh representing the object. pair of classes [18] Finally, we use the single class SVM described in [17] to implement the novelty detectors. This SVM estimates the support of a distribution from a set of positive examples only. In this version, the parameter an upper bound on the fraction of the outliers of the distribution. 4. Our Approach The main contribution of our method is the novel way ....
....(novelty detector) to learn its component, using the signatures for in the training set. Let be one of these critical points. The performance of the component detector for point can be quantified by calculating a bound on the expected probability of error on the target set as follows [17]. # X P (4) is the number of support vectors in the component detector for . Using the classifier for point , perform an iterative component growing operation to expand the component about . Initially, the component consists only of point . An iteration of the procedure ....
B. Scholkopf, J Platt, J. Shawe-Taylor, A. J. Smola and R. C. Williamson, "Estimating the support of a high-dimensional distribution, " Technical Report 99-87, Microsoft Research 1999.
....of properties of the SVM; most importantly # provides an upper bound on the fraction of false negatives (outliers) In Section 5.1 we rely on a standard two class SVM classifiers to build an approximation for P. A more direct approach, however, is to make use of the so called one class SVM [8]. Since the characteristic functions of the performance relations being considered are deterministic, the support regions show sharp boundaries. The SVM approach also holds open the promise of obtaining representations of the performance space at varying levels of abstraction, providing needed ....
.... negative class data provide intuitive, e#ective general parameters for controlling performance tradeo#s, and are subject to possibly large, ill defined biases. 5. 2 One class SVM One class SVMs were originally introduced as a means of estimating quantiles of a probability density function [8]. One class SVMs require samples from only one class of data (positive class) assuming negative class elsewhere) and compute the optimal hyperplane maximizing the separation of data from the origin. As a consequence, even if minimum support estimation is attempted through the training process, ....
B. Scholkopf, J. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson, "Estimating the support of a highdimensional distribution," Tech. Rep. MSR-TR-99-87, Microsoft Research, 1999.
....SVM extensions For the sake of simplicity, we have presented here only the most basic version of SVM. Many other version exist that build upon the basics, some meant to deal with the case of non separable data (C SVM [3] SVM [11] others adapted for classification from positive examples only [9], and others meant to deal with more than two classes [10] All SVM versions cited here have in common the use of the Gram matrix as the only vehicle by which the data enter the SVM training. Therefore, the methods for data translation we present here should apply to them as well. 3 A simple ....
B. Scholkopf, J. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the support of a highdimensional distribution. Technical Report 99-87, Microsoft Research, 1999.
....of noise. In addition to their accuracy, a key characteristic of SVMs is their mathematical tractability and geometric interpretation. While SVMs have been widely adopted as supervised learning methods with labeled data, they have also been used for the exploration of unlabeled data (cf. [1, 8, 9]) Novelty detection and cluster analysis using SVMs are examples for learning unlabeled data. For many real world problems, the task is not to classify but to detect novel or Current address: School of Computing and Information Technology, Griffith University, Brisbane, QLD 4111, Australia. ....
....proximity graphs the expected time complexity is sub quadratic for data of all dimensions. Using SVMs, another boundary based clustering method was proposed in [1, 2] we call it SVC) This approach employs support vectors to construct cluster boundaries. Its principle is novelty detection [8] which is sometimes also called data domain description [9] Domain description produces a description of a given set of objects. This description should cover the class of given objects, and ideally reject other possible objects in the object space. Generally, novelty detection can characterize ....
B. Scholkopf, J. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the support of a high-dimensional distribution. Neural Computation, 13(7):1443--1472, 2001.
....system. In practice, however, this ideal relationship is only approximately true due to transistor mismatch and offsets. Fine tuning of the chip s input parameters is done automatically by the robotic system, using an unsupervised Support Vector (SV) learning algorithm introduced recently [7]. The learning requires only that the description of the desired output is given. The machine learns from (unlabeled) examples how to set the parameters to the chip in order to obtain a desired motor behavior. 1 Introduction Modern robots still lag far behind animals in their capability for ....
....sequence and explores parameter combinations in the input parameter space of its GC chip, keeping only those leading to a movement that is correct within some tolerance. It then locates the region in input parameter space that contains most of these parameter combinations using an algorithm [7] that extends SV learning to unlabelled data. 2 The robotic system The robotic system consists of (i) a body with one degree of freedom per leg and a potentiometer attached to each motor that serves as a sensor providing information about the angular displacement of the leg, ii) the ....
[Article contains additional citation context not shown here]
B. Scholkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola and R. C. Williamson, Estimating the Support of a High-Dimensional Distribution. Technical Report, Microsoft Research, 1999, MSRTR -99-87, Redmond, WA, To appear in Neural Computation.
....) 10) where q = 1 t 2 P np k(z n ; z p ) and q j = 1 t P n k(x j ; zn ) subject to the constraints 0 ff i 1= X i ff i = 1: 11) This convex quadratic program can be solved with standard quadratic programming tools. Alternatively, one can employ the SMO algorithm described in [3], which was found to approximately scale quadratically with the training set size. To illustrate the idea presented in this section, figure 1 shows a 2D example of separating the data from the mean of another data set in feature space. Figure 1: Separating one class of data from the mean of a ....
....discrete components. Suppose, moreover, that the kernel is analytic and non constant. With probability 1, asymptotically, equals both the fraction of SVs and the fraction of outliers. The proof can be found in [4] We next state another desirable theoretical result: Proposition 2 (Resistance [3]) Local movements of outliers parallel to w do not change the hyperplane. Essentially, this result is due to the fact that the errors i enter in the objective function only linearly. To determine the hyperplane, we need to find the (constrained) extremum of the objective function, and in finding ....
B. Scholkopf, J. Platt, J. Shawe-Taylor, A.J. Smola, and R.C. Williamson. Estimating the support of a high-dimensional distribution. TR MSR 99 - 87, Microsoft Research, Redmond, WA, 1999.
....z is declared novel if: K(z, z) 2 m X i=1 # i K(z, x i ) m X i,j=1 # i # j K(x i , x j ) R 2 # 0 (22) where R 2 is first computed by finding an example which is non bound and setting this inequality to an equality. An alternative approach has been developed by Scholkopf et al. [53]. Suppose we restrict our attention to RBF kernels: in this case the data lie in a region on the surface of a hypersphere in feature space since #(x) #(x) K(x, x) 1. The objective is therefore to separate off this region from the surface region containing no data. This is achieved by ....
....selects two Lagrange multipliers to optimism at every step and separate heuristics are used to find the two members of the pair. 9 The SMO algorithm has been refined to improve speed [30] and generalised to cover the above three tasks of classification [45] regression [55] and novelty detection [53]. 3 Applications of Support Vector Machines to machine vision. 3.1 Learning to Recognize 3 D Objects We show the potential of SVMs in robotics addressing the recognition of 3 D objects from video images. We describe an aspect based recognition approach using SVMs. Aspect based recognition ....
B. Scholkopf, J.C. Platt, J. Shawe-Taylor, A.J. Smola, R.C. Williamson. Estimating the support of a high-dimensional distribution. Microsoft Research Corporation Technical Report MSR-TR-99-87, 1999.
....novel if: K(z; z) Gamma 2 m X i=1 ff i K(z; x i ) m X i;j=1 ff i ff j K(x i ; x j ) Gamma R 2 0 (32) where R 2 is first computed by finding an example which is non bound and setting this inequality to an equality. An alternative approach has been developed by Scholkopf et al. [31]. Suppose we restricted our attention to RBF kernels: in this case the data lie in a region on the surface of a hypersphere in feature space since OE(x) Delta OE(x) K(x;x) 1 from (8) The objective is therefore to separate off this region from the surface region containing no data. This is ....
.... distribution is then modelled by the decision function: f(z) sign 0 m X j=1 ff j K(x j ; z) b 1 A (36) In the above models the parameter has a neat interpretation as an upper bound on the fraction of outliers and a lower bound of the fraction of patterns which are support vectors [31]. Scholkopf et al. 31] provide good experimental 10 0.3 0.2 0.1 0 0.1 0.2 0.3 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0.05 0.1 0.15 0.2 Fig. 4. The solution in input space for the hyperplane minimising W (ff; b) in equation (37) A hard margin was used with RBF kernels trained using a oe = ....
[Article contains additional citation context not shown here]
B. Scholkopf, J.C. Platt, J. Shawe-Taylor, A.J. Smola, R.C. Williamson, Estimating the support of a high-dimensional distribution. Microsoft Research Corporation Technical Report MSR-TR-99-87, 1999.
....point z is novel if: K(z; z) Gamma 2 m X i=1 ff i K(z; x i ) m X i;j=1 ff i ff j K(x i ; x j ) R 2 (9) R 2 is found by using an equality in (9) for a training example for which ff i is not at a bound i.e. 0 ff i 1=m. This approach has also been developed by Scholkopf et al. [20] who give a different QP formulation for estimating the support and provide good experimental evidence in favour of this approach by highlighting abnormal digits in the USPS handwritten character dataset. Regression. Several approaches to regression [6, 26] are possible but, as for ....
....j) For the L 1 soft margin care must be taken to avoid violation of the constraints (7) leading to bounds on these corrections. The SMO algorithm has been refined to improve speed [13] and generalised to cover the above three tasks of classification [18] regression [22] and estimating densities [20]. Due to its decomposition of the learning task and speed it is probably the method of choice for training SVMs. Model Selection. Apart from the choice of kernel the other indeterminate is the choice of the kernel parameter (e.g. oe in (3) The kernel parameter can be found using ....
B. Scholkopf, J.C. Platt, J. Shawe-Taylor, A.J. Smola, R.C. Williamson, Estimating the support of a high-dimensional distribution. Microsoft Research Corporation Technical Report MSR-TR-99-87, 1999.
No context found.
B. Schlkopf, J. C. Platt, J. Shawe-Taylor, A. Smola, and R. C. Williamson, "Estimating the support of a high-dimensional distribution, " Neural Comput., vol. 13, no. 7, pp. 1443--1471, 2001.
.... an additional regularization term f ] This sum is known as the regularized risk R reg [f ] Remp [f ] f ] c(x i ; y i ; f(x i ) f ] for 0: 2) Common loss functions are the soft margin loss function [1] or the logistic loss for classification and novelty detection [14], the quadratic loss, absolute loss, Huber s robust loss [9] or the insensitive loss [16] for regression. We discuss these in Section 3. In some cases the loss function depends on an additional parameter such as the width of the margin or the size of the insensitive zone. One may make ....
....(14) Finally, if we choose the hinge loss, c(x; y; g(x) max(0; yg(x) 1 ) i ; 0; b) otherwise. 15) Setting = 0 recovers the kernel perceptron algorithm. For nonzero we obtain the kernel perceptron with regularization. Novelty Detection The results for novelty detection [14] are similar in spirit. The setting is most useful here particularly where the estimator acts as a warning device (e.g. network intrusion detection) and we would like to specify an upper limit on the frequency of alerts f(x) The relevant loss function is c(x; y; f(x) max(0; f(x) ....
[Article contains additional citation context not shown here]
B. Scholkopf, J. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the support of a high-dimensional distribution. Neural Computation, 13(7), 2001.
....can we leave out the non SV examples (i.e. the x i with ff i = 0) from the current chunk, but also some of the SVs, especially those that hit the upper boundary (i.e. ff i = C) In fact, one can use chunks which do not even contain all SVs, and maximize over the corresponding sub problems. SMO [15, 25, 20] explores an extreme case, where the sub problems are chosen so small that one can solve them analytically. Several public domain SV packages and optimizers are listed on the web page http: www.kernel machines.org. For more details on the optimization problem, see [19] On the theoretical side, ....
B. Scholkopf, J. Platt, J. Shawe-Taylor, A.J. Smola, and R.C. Williamson. Estimating the support of a high-dimensional distribution. TR MSR 99 - 87, Microsoft Research, Redmond, WA, 1999.
....distribution is not even well defined, e.g. if there are singular components. Part of the motivation for the present work was the paper [1] It turns out that there is a considerable amount of prior work in the statistical literature; for a discussion, cf. the full version of the present paper [3]. 2 ALGORITHMS We first introduce terminology and notation conventions. We consider training data x 1 ; x 2 X; where 2 N is the number of observations, and X is some set. For simplicity, we think of it as a compact subset of R N . Let Phi be a feature map X F , i.e. a map into a ....
....to 0 ff i 1 ; X i ff i = 1: 6) This problem can be solved with standard QP routines. It does, however, possess features that sets it apart from generic QPs, most notably the simplicity of the constraints. This can be exploited by applying a variant of SMO developed for this purpose [3]. The offset ae can be recovered by exploiting that for any ff i which is not at the upper or lower bound, the corresponding pattern x i satisfies ae = w Delta Phi(x i ) P j ff j k(x j ; x i ) Note that if approaches 0, the upper boundaries on the Lagrange multipliers tend to infinity, ....
[Article contains additional citation context not shown here]
B. Scholkopf, J. Platt, J. Shawe-Taylor, A.J. Smola, and R.C. Williamson. Estimating the support of a high-dimensional distribution. TR MSR 99 - 87, Microsoft Research, Redmond, WA, 1999.
....over 1; in compact notation: i; j 2 [ similarly, n; p 2 [t] Bold face greek letters denote dimensional vectors whose components are labelled using normal face typeset. 2 Algorithms In analogy to an algorithm recently proposed for the estimation of a distribution s support (Scholkopf et al. 1999), we will try to construct a nonlinear decision function on X by mapping the data into some feature space and then seeking to separate X from the centroid of Z with a large margin hyperplane 2 committing few training errors. Projections on the normal vector of the hyperplane then characterize the ....
....is precisely met; therefore the support vectors with ff i 0 will often form but a small subset of X. However, the solution depends on all zn , hence it will not necessarily be particularly sparse. If this is a concern, then postprocessing can be applied to increase sparsity, along the lines of Scholkopf et al. 1999. Substituting (15) 17) into L (14) we can eliminate the primal variables to get the dual problem. A short calculation shows that it consists of minimizing 4 the quadratic form W (ff) 1 2 X ij ff i ff j ( x i Delta x j ) q Gamma q j Gamma q i ) 1 2 X ij ff i ff j (k(x i ....
[Article contains additional citation context not shown here]
Scholkopf, B., Platt, J., Shawe-Taylor, J., Smola, A., & Williamson, R. (1999). Estimating the support of a high-dimensional distributionTR MSR 99 - 87). Microsoft Research, Redmond, WA. Submitted to Neural Computation.
No context found.
B. Scholkopf, J. Platt, J. Shawe-Taylor, A. Smola, and R. Williamson. Estimating the support of a high-dimensional distribution. In Technical Report 99-87, Microsoft Research, 1999.
No context found.
B. Schlkopf, J. Platt, J. Shawe-Taylor, A. Smola, and R. Williamson, "Estimating the Support of a High-dimensional Distribution," Microsoft Research, Redmond, WA, USA MSR-TR-9987, 1999.
No context found.
Scholkopf, B., Platt, J., Shawe-Taylor, J., Smola, A. J., & Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural Computation, 13, 1443--1472.
No context found.
B. Scholkopf, "Estimating the support of highdimensional distribution," Tech. Rep., Microsoft Reseach, 1999.
No context found.
B. Scholkopf, J.C. Platt, J. Shawe-Taylor, A.J. Smola, and R.C. Williamson, "Estimating the support of a highdimensional distribution," Neural Computation, vol. 13, no. 7, pp. 1443--1471, 2001.
No context found.
B. Schlkopf, J.C. Platt, A.J. Smola, and R.C. Williamson. Estimating the support of a highdimensional distribution. Neural Computation, 13:1443--1471, 2001.
No context found.
B. Scholkopf, J.C. Platt, J. Shawe-Taylor, A.J. Smola, and R.C. Williamson, "Estimating the support of a highdimensional distribution," Neural Computation, vol. 13, no. 7, pp. 1443--1471, 2001.
No context found.
B. Schlkopf, J.C. Platt, A.J. Smola, and R.C. Williamson. Estimating the support of a highdimensional distribution. Neural Computation, 13:1443--1471, 2001.
No context found.
B. Scholkopf, J. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson, "Estimating the support of a high-dimensional distribution.," in Neural Computation, pp. 1443--1471, 2001.
No context found.
B Scholkopf, J. Platt, J. Shawe-Taylor, Smola A., and R. Williamson. Estimating the support of a high-dimensional distribution. Neural Computation, 13(7), 1999.
No context found.
B. Scholkopf, J.C. Platt, J. Shawe-Taylor, A.J. Smola, and R.C. Williamson, "Estimating the support of a highdimensional distribution," Neural Computation, vol. 13, no. 7, pp. 1443--1471, 2001.
No context found.
B. Scholkopf, J. Platt, J. Shawe-Taylor, A. Smola, and R. Williamson. Estimating the support of a highdimensional distribution. Neural Computation, 13(7):1443-- 1472, 2001.
No context found.
B. Scholkopf, J. Platt, J. Shawe-Taylor, A. Smola, and R. Williamson. Estimating the support of a highdimensional distribution. Neural Computation, 13(7):1443-- 1472, 2001.
No context found.
.#Scholkopf, S. Platt, J. Shawe, J. Smola, A. & Williamson, R. (1999). Estimating the support of a high-dimensional distribution. Technical Report MSR-TR-99-87, Microsoft Research.
No context found.
B. Scholkopf, J. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the support of a high-dimensional distribution. Neural Computation, 13(7):1443--1472, 2001.
No context found.
B. Scholkopf, J. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the support of a high-dimensional distribution. Technical Report 99-87, Microsoft Research, 1999.
No context found.
B. Scholkopf, J. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson, "Estimating the support of a high-dimensional distribution," Neural Computation 13, pp. 1443--1471, 2001.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC