#### DMCA

## A survey of kernel and spectral methods for clustering,” (2008)

### Cached

### Download Links

Venue: | Pattern Recognit., |

Citations: | 88 - 5 self |

### Citations

13203 | The Nature of Statistical Learning Theory
- Vapnik
- 1995
(Show Context)
Citation Context ...j)) · (Φ(xi) − Φ(xj)) = Φ(xi) · Φ(xi) + Φ(xj) · Φ(xj) − 2Φ(xi) · Φ(xj) = K(xi,xi) + K(xj,xj) − 2K(xi,xj) (26) in which the computation of distances of vectors in feature space is just a function of the input vectors. In fact, every algorithm in which input vectors appear only in dot products with other input vectors can be kernelized [71]. In order to simplify the notation we introduce the so called Gram matrix K where each element kij is the scalar product Φ(xi) · Φ(xi). Thus, Eq. 26 can be rewritten as: ‖Φ(xi) − Φ(xj)‖ 2 = kii + kjj − 2kij . (27) Examples of Mercer kernels are the following [78]: • linear: K(l)(xi,xj) = xi · xj (28) • polynomial of degree p: K(p)(xi,xj) = (1 + xi · xj) p p ∈ N (29) • Gaussian: K(g)(xi,xj) = exp ( − ‖xi − xj‖ 2 2σ2 ) σ ∈ R (30) It is important to stress that the use of the linear kernel in Eq. 26 simply leads to the computation of the Euclidean norm in the input space. Indeed: ‖xi − xj‖ 2 =xi · xi + xj · xj − 2xi · xj = K(l)(xi,xi) + K (l)(xj,xj) − 2K (l)(xi,xj) = ‖Φ(xi) − Φ(xj)‖ 2 , (31) shows that choosing the kernel K(l) implies Φ = I (where I is the identity function). Following this consideration we can think that kernels can offer a more general... |

6600 |
Neural Networks for Pattern Recognition
- Bishop
- 1995
(Show Context)
Citation Context ...ormed by the following steps: (1) choose the number k of clusters; (2) initialize the codebook V with vectors randomly picked from X; (3) compute the Voronoi set πi associated to the codevector vi; (4) move each codevector to the mean of its Voronoi set using Eq. 4; (5) return to step 3 if any codevector has changed otherwise return the codebook. At the end of the algorithm a codebook is found and a Voronoi tessellation of the input space is provided. It is guaranteed that after each iteration the quantization error does not increase. Batch K-Means can be viewed as an Expectation-Maximization [9] algorithm, ensuring the convergence after a finite number of steps. This approach presents many disadvantages [25]. Local minima of E(X) make the method dependent on initialization, and the average is sensitive to outliers. Moreover, the number of clusters to find must be provided, and this can be done only using some a priori information or additional validity criterion. Finally, K-Means can deal only with clusters with spherically symmetrical point distribution, since Euclidean distances of patterns from centroids are computed leading to a spherical invariance. Different distances lead to d... |

4839 |
Pattern classification and scene analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...lustering are pattern representation and the similarity measure. Each pattern is usually represented by a set of features of the system under study. It is very important to notice that a good choice of representation of patterns can lead to improvements in clustering performance. Whether it is possible to choose an appropriate set of features depends on the system under study. Once a representation is fixed it is possible to choose an appropriate similarity measure among patterns. The most popular dissimilarity measure for metric representations is the distance, for instance the Euclidean one [25]. Clustering techniques can be roughly divided into two categories: • hierarchical ; • partitioning. Hierarchical clustering techniques [39,74,83] are able to find structures which can be further divided in substructures and so on recursively. The result is a hierarchical structure of groups known as dendrogram. Partitioning clustering methods try to obtain a single partition of data without any other sub-partition like hierarchical algorithms do and are often based on the optimization of an appropriate objective function. The result is the creation of separations hypersurfaces among clusters.... |

4701 |
The Self-Organizing Map
- Kohonen
- 1990
(Show Context)
Citation Context ...step the algorithm takes into account the whole data set to update the codevectors. When the cardinality n of the data set X is very high (e.g., several hundreds of thousands) the batch procedure is computationally expensive. For this reason an on-line update has been introduced leading to the on-line K-Means algorithm [51,54]. At each step, this method simply randomly picks an input pattern and updates its nearest codevector, ensuring that the scheduling of the updating coefficient is adequate to allow convergence and consistency. 6 2.2 Self Organizing Maps - SOM A Self Organizing Map (SOM ) [43] also known as Self Organizing Feature Map (SOFM ) represents data by means of codevectors organized on a grid with fixed topology. Codevectors move to adapt to the input distribution, but adaptation is propagated along the grid also to neighboring codevectors, according to a given propagation or neighborhood function. This effectively constrains the evolution of codevectors. Grid topologies may differ, but in this paper we consider a two-dimensional, square-mesh topology [44,45]. The distance on the grid is used to determine how strongly a codevector is adapted when the unit aij is the winner... |

3782 | Normalized cuts and image segmentation
- Shi, Malik
(Show Context)
Citation Context ... most of these algorithms (e.g., [12,18,50]) have been applied to standard benchmarks such as Ionosphere [73], Breast Cancer [85] and Iris [28] 1 . Kernel Fuzzy c-Means proposed in [14,93,94] has been applied in image segmentation problems while in [32] it has been applied in handwritten digits recognition. There are applications of kernel clustering methods in face recognition using kernel SOM [76], in speech recognition [69] and in prediction of crop yield from climate and plantation data [3]. Spectral methods have been applied in clustering of artificial data [60,63], in image segmentation [56,72,75], in bioinformatics [19], and in co-clustering problems of words and documents [22] and genes and conditions [42]. A semi-supervised spectral approach to bioinformatics and handwritten character recognition have been proposed in [48]. The protein sequence clustering problem has been faced using spectral techniques in [61] and kernel methods in [84]. In the next section we briefly introduce the concepts of linear partitioning methods by recalling some basic crisp and fuzzy algorithms. Then the paper is organized as follows: section 3 shows the kernelized version of the algorithms presented in s... |

3692 | Support vector networks
- Cortes, Vapnik
- 1995
(Show Context)
Citation Context ...be updated at each step of the algorithm or can be fixed for all iterations. The former approach can lead to instabilities since the derivation of the equations has been obtained considering ηi fixed. In the latter case a good estimation of ηi can be done only starting from an approximate solution. For this reason often the Possibilistic c-Means is run as a refining step of a Fuzzy c-Means. 3 Kernel Clustering Methods In machine learning, the use of the kernel functions [57] has been introduced by Aizerman et al. [1] in 1964. In 1995 Cortes and Vapnik introduced Support Vector Machines (SVMs) [17] which perform better than other classification algorithms in several problems. The success of SVM has brought to extend the use of kernels to other learning algorithms (e.g., Kernel PCA [70]). The choice of the kernel is crucial to incorporate a priori knowledge on the application, for which it is possible to design ad hoc kernels. 3.1 Mercer kernels We recall the definition of Mercer kernels [2,68], considering, for the sake of simplicity, vectors in Rd instead of Cd. Definition 3.1 Let X = {x1, . . . ,xn} be a nonempty set where xi ∈ R d. A function K : X×X → R is called a positive definite... |

3388 | A tutorial on support vector machines for pattern recognition,” Data Mining and Knowledge Discovery
- Burges
- 1998
(Show Context)
Citation Context ... rule for the centroids becomes an updating rule for the coefficients of such combination. 3.5 One Class SVM This approach provides a support vector description in feature space [36,37,77]. The idea is to use kernels to project data into a feature space and then to find the sphere enclosing almost all data, namely not including outliers. Formally a radius R and the center v of the smallest enclosing sphere in feature space are defined. The constraint is thus: ‖Φ(xj) − v‖ 2 ≤ R2 + ξj ∀j , (47) where the non negative slack variables ξj have been added. The Lagrangian for this problem is defined [11]: L = R2 − ∑ j (R2 + ξj − ‖Φ(xj) − v‖ 2)βj − ∑ j ξjµj + C ∑ j ξj (48) 17 where βj ≥ 0 and µj ≥ 0 are Lagrange multipliers, C is a constant and C ∑ j ξj is a penalty term. Computing the partial derivative of L with respect to R, v and ξj and setting them to zero leads to the following equations: ∑ j βj = 1, v = ∑ j βjΦ(xj), βj = C − µj. (49) The Karush-Kuhn-Tucker (KKT) complementary conditions [11] result in: ξjµj = 0, (R 2 + ξj − ‖Φ(xj) − v‖ 2)βj = 0. (50) Following simple considerations regarding all these conditions it is possible to see that: • when ξj > 0, the image of xj lies outside the... |

3049 | Some methods for classification and analysis of multivariate observations
- MACQUEEN
- 1967
(Show Context)
Citation Context ...ince Euclidean distances of patterns from centroids are computed leading to a spherical invariance. Different distances lead to different invariance properties as in the case of Mahalanobis distance which produces invariance on ellipsoids [25]. The term batch means that at each step the algorithm takes into account the whole data set to update the codevectors. When the cardinality n of the data set X is very high (e.g., several hundreds of thousands) the batch procedure is computationally expensive. For this reason an on-line update has been introduced leading to the on-line K-Means algorithm [51,54]. At each step, this method simply randomly picks an input pattern and updates its nearest codevector, ensuring that the scheduling of the updating coefficient is adequate to allow convergence and consistency. 6 2.2 Self Organizing Maps - SOM A Self Organizing Map (SOM ) [43] also known as Self Organizing Feature Map (SOFM ) represents data by means of codevectors organized on a grid with fixed topology. Codevectors move to adapt to the input distribution, but adaptation is propagated along the grid also to neighboring codevectors, according to a given propagation or neighborhood function. Thi... |

2963 |
Robust Statistics
- Huber
- 1981
(Show Context)
Citation Context ...respectively: vi = ∑n h=1 (uih) m xh ∑n h=1 (uih) m , (21) vi = ∑n h=1 uihxh ∑n h=1 uih . (22) The parameter ηi regulates the trade-off between the two terms in Eq. 17 and Eq. 18 and it is related to the width of the clusters. The authors suggest to estimate ηi for PCM-I using this formula: ηi = γ ∑n h=1 (uih) m ‖xh − vi‖ 2 ∑n h=1 (uih) m (23) which is a weighted mean of the intracluster distance of the i-th cluster and the constant γ is typically set at one. The parameter ηi can be estimated with scale estimation techniques as developed in the robust clustering literature 11 for M-estimators [35,59]. The value of ηi can be updated at each step of the algorithm or can be fixed for all iterations. The former approach can lead to instabilities since the derivation of the equations has been obtained considering ηi fixed. In the latter case a good estimation of ηi can be done only starting from an approximate solution. For this reason often the Possibilistic c-Means is run as a refining step of a Fuzzy c-Means. 3 Kernel Clustering Methods In machine learning, the use of the kernel functions [57] has been introduced by Aizerman et al. [1] in 1964. In 1995 Cortes and Vapnik introduced Support V... |

2413 | Nonlinear dimensionality reduction by locally linear embedding
- Roweis, Saul
- 2000
(Show Context)
Citation Context ...ther similar patterns [10,16,72]. Spectral approach to clustering has a strong connection with Laplacian Eigenmaps [5]. The dimensionality reduction problem aims to find a proper low dimensional representation of a data set in a high dimensional space. In [5], each node in the graph, which represents a pattern, is connected just with nodes corresponding to neighboring patterns and the spectral decomposition of the Laplacian of the obtained graph permits to find a low dimensional representation of X. The authors point out the close connection with spectral clustering and Local Linear Embedding [67] providing theoretical and experimental validations. 4.1 Shi and Malik algorithm The algorithm proposed by Shi and Malik. [72] applies the concepts of spectral clustering to image segmentation problems. In this framework each node is a pixel and the definition of adjacency between them is suitable for image segmentation purposes. In particular, if xi is the position of the i-th pixel and fi a feature vector which takes into account several of its attributes (e.g., intensity, color and texture information), they define the adjacency as: aij = exp ( − ‖fi − fj‖ 2 2σ21 ) · exp ( −‖xi−xj... |

2129 |
M.: Vector quantization and signal compression
- GERSHO, GRAY
- 1991
(Show Context)
Citation Context ...n example of Voronoi tessellation where each black point is a codevector. 2.1 Batch K-Means A simple algorithm able to construct a Voronoi tessellation of the input space was proposed in 1957 by Lloyd [52] and it is known as batch K-Means. Starting from the finite data set X this algorithm moves iteratively the k codevectors to the arithmetic mean of their Voronoi sets {πi}i=1,...,k. Theoretically speaking, a necessary condition for a codebook V to minimize the Empirical Quantization Error : E(X) = 1 2n k ∑ i=1 ∑ x∈πi ‖x − vi‖ 2 (3) 5 is that each codevector vi fulfills the centroid condition [29]. In the case of a finite data set X and with Euclidean distance, the centroid condition reduces to: vi = 1 |πi| ∑ x∈πi x . (4) Batch K-Means is formed by the following steps: (1) choose the number k of clusters; (2) initialize the codebook V with vectors randomly picked from X; (3) compute the Voronoi set πi associated to the codevector vi; (4) move each codevector to the mean of its Voronoi set using Eq. 4; (5) return to step 3 if any codevector has changed otherwise return the codebook. At the end of the algorithm a codebook is found and a Voronoi tessellation of the input space is provided... |

2066 |
Pattern Recognition with Fuzzy Objective Function Algorithms
- Bezdek
- 1981
(Show Context)
Citation Context ...ve function. The result is the creation of separations hypersurfaces among clusters. For instance we can consider two nonlinear clusters as in figure 1. Standard partitioning methods (e.g., K-Means, Fuzzy c-Means, SOM and Neural Gas) using two centroids are not able to separate in the desired way the two rings. The use of many centroids could solve this problem providing a complex description of a simple data set. For this reason several modifications and new approaches have been introduced to cope with this problem. Among the large amount of modifications we can mention the Fuzzy c-Varieties [8], but the main drawback is that some a priori information on the shape of clusters must be included. Recently, some clustering methods that produce nonlinear separating hypersurfaces among clusters have been proposed. These algorithms can be divided in two big families: kernel and spectral clustering methods. Regarding kernel clustering methods, several clustering methods have been 2 −6 −4 −2 0 2 4 6 − 6 − 4 − 2 0 2 4 6 x1 x2 Figure 1. A data set composed of two rings of points. modified incorporating kernels (e.g., K-Means, Fuzzy c-Means, SOM and Neural Gas). The use of kernels allows to map ... |

1938 | Data clustering: A review
- Jain, Murty, et al.
- 1999
(Show Context)
Citation Context ...lustering methods are presented as extensions of kernel K-means clustering algorithm. Key words: partitional clustering, Mercer kernels, kernel clustering, kernel fuzzy clustering, spectral clustering Email addresses: filippone@disi.unige.it (Maurizio Filippone), francesco.camastra@uniparthenope.it (Francesco Camastra), masulli@disi.unige.it (Francesco Masulli), rovetta@disi.unige.it (Stefano Rovetta). Preprint submitted to Elsevier Science April 30, 2007 1 Introduction Unsupervised data analysis using clustering algorithms provides a useful tool to explore data structures. Clustering methods [39,87] have been addressed in many contexts and disciplines such as data mining, document retrieval, image segmentation and pattern classification. The aim of clustering methods is to group patterns on the basis of a similarity (or dissimilarity) criteria where groups (or clusters) are set of similar patterns. Crucial aspects in clustering are pattern representation and the similarity measure. Each pattern is usually represented by a set of features of the system under study. It is very important to notice that a good choice of representation of patterns can lead to improvements in clustering perfor... |

1703 | On spectral clustering: Analysis and an algorithm
- Ng, Jordan, et al.
- 2002
(Show Context)
Citation Context ...6]. Regarding the applications, most of these algorithms (e.g., [12,18,50]) have been applied to standard benchmarks such as Ionosphere [73], Breast Cancer [85] and Iris [28] 1 . Kernel Fuzzy c-Means proposed in [14,93,94] has been applied in image segmentation problems while in [32] it has been applied in handwritten digits recognition. There are applications of kernel clustering methods in face recognition using kernel SOM [76], in speech recognition [69] and in prediction of crop yield from climate and plantation data [3]. Spectral methods have been applied in clustering of artificial data [60,63], in image segmentation [56,72,75], in bioinformatics [19], and in co-clustering problems of words and documents [22] and genes and conditions [42]. A semi-supervised spectral approach to bioinformatics and handwritten character recognition have been proposed in [48]. The protein sequence clustering problem has been faced using spectral techniques in [61] and kernel methods in [84]. In the next section we briefly introduce the concepts of linear partitioning methods by recalling some basic crisp and fuzzy algorithms. Then the paper is organized as follows: section 3 shows the kernelized versio... |

1652 |
An algorithm for vector quantization design
- Linde, Buzo, et al.
- 1980
(Show Context)
Citation Context ... xi ∈ R d. The codebook (or set of centroids) V is defined as the set V = {v1, . . . ,vc}, typically with c ≪ n. Each element vi ∈ R d is called codevector (or centroid or prototype) 2 . The Voronoi region Ri of the codevector vi is the set of vectors in R d for which 1 These data sets can be found at: ftp://ftp.ics.uci.edu/pub/machine-learningdatabases/ 2 Among many terms to denote such objects, we will use codevectors as in vector quantization theory. 4 vi is the nearest vector: Ri = { z ∈ Rd ∣ ∣ ∣ ∣ i = arg min j ‖z − vj‖ 2 } . (1) It is possible to prove that each Voronoi region is convex [51] and the boundaries of the regions are linear segments. The definition of the Voronoi set πi of the codevector vi is straightforward. It is the subset of X for which the codevector vi is the nearest vector: πi = { x ∈ X ∣ ∣ ∣ ∣ i = arg min j ‖x − vj‖ 2 } , (2) that is, the set of vectors belonging to Ri. A partition on R d induced by all Voronoi regions is called Voronoi tessellation or Dirichlet tessellation. Figure 2. An example of Voronoi tessellation where each black point is a codevector. 2.1 Batch K-Means A simple algorithm able to construct a Voronoi tessellation of the input space was ... |

1569 | Nonlinear component analysis as a kernel eigenvalue problem
- Schölkopf, Smola, et al.
- 1998
(Show Context)
Citation Context ...ηi fixed. In the latter case a good estimation of ηi can be done only starting from an approximate solution. For this reason often the Possibilistic c-Means is run as a refining step of a Fuzzy c-Means. 3 Kernel Clustering Methods In machine learning, the use of the kernel functions [57] has been introduced by Aizerman et al. [1] in 1964. In 1995 Cortes and Vapnik introduced Support Vector Machines (SVMs) [17] which perform better than other classification algorithms in several problems. The success of SVM has brought to extend the use of kernels to other learning algorithms (e.g., Kernel PCA [70]). The choice of the kernel is crucial to incorporate a priori knowledge on the application, for which it is possible to design ad hoc kernels. 3.1 Mercer kernels We recall the definition of Mercer kernels [2,68], considering, for the sake of simplicity, vectors in Rd instead of Cd. Definition 3.1 Let X = {x1, . . . ,xn} be a nonempty set where xi ∈ R d. A function K : X×X → R is called a positive definite kernel (or Mercer kernel) if and only if K is symmetric (i.e. K(xi,xj) = K(xj,xi)) and the following equation holds: n ∑ i=1 n ∑ j=1 cicjK(xi,xj) ≥ 0 ∀n ≥ 2 , (24) where cr ∈ R ∀r = 1, . . .... |

1568 |
The use of multiple measurements in taxonomic problems
- Fisher
- 1936
(Show Context)
Citation Context ...0,80,81,90]. Recently kernel methods have been applied to Fuzzy c-Varieties also [50] with the aim of finding varieties in feature space and there are some interesting clustering 3 methods using kernels such as [33] and [34]. Since the choice of the kernel and of the similarity measure is crucial in these methods, many techniques have been proposed in order to learn automatically the shape of kernels from data as in [4,18,27,56]. Regarding the applications, most of these algorithms (e.g., [12,18,50]) have been applied to standard benchmarks such as Ionosphere [73], Breast Cancer [85] and Iris [28] 1 . Kernel Fuzzy c-Means proposed in [14,93,94] has been applied in image segmentation problems while in [32] it has been applied in handwritten digits recognition. There are applications of kernel clustering methods in face recognition using kernel SOM [76], in speech recognition [69] and in prediction of crop yield from climate and plantation data [3]. Spectral methods have been applied in clustering of artificial data [60,63], in image segmentation [56,72,75], in bioinformatics [19], and in co-clustering problems of words and documents [22] and genes and conditions [42]. A semi-supervised ... |

1539 | Self-organizing formation of topologically correct feature maps - Kohonen - 1982 |

1529 |
Spectral Graph Theory
- Chung
- 1997
(Show Context)
Citation Context ...epts in spectral graph theory. The basic idea is to construct a weighted graph from the initial data set where each node represents a pattern and each weighted edge simply takes into account the similarity between two patterns. In this framework the clustering problem can be seen as a graph cut problem, which can be tackled by means of the spectral graph theory. The core of this theory is the eigenvalue decomposition of the Laplacian matrix of the weighted graph obtained from data. In fact, there is a close relationship between the second smallest eigenvalue of the Laplacian and the graph cut [16,26]. The aim of this paper is to present a survey of kernel and spectral clustering methods. Moreover, an explicit proof of the fact that these two approaches have the same mathematical foundation is reported. In particular it has been shown by Dhillon et al. that Kernel K-Means and spectral clustering with the ratio association as the objective function are perfectly equivalent [20,21,23]. The core of both approaches lies in their ability to construct an adjacency structure between data avoiding to deal with a prefixed shape of clusters. These approaches have a slight similarity with hierarchica... |

1359 | P.: Least Squares Quantization in PCM
- LLOYD
- 1982
(Show Context)
Citation Context ... regions are linear segments. The definition of the Voronoi set πi of the codevector vi is straightforward. It is the subset of X for which the codevector vi is the nearest vector: πi = { x ∈ X ∣ ∣ ∣ ∣ i = arg min j ‖x − vj‖ 2 } , (2) that is, the set of vectors belonging to Ri. A partition on R d induced by all Voronoi regions is called Voronoi tessellation or Dirichlet tessellation. Figure 2. An example of Voronoi tessellation where each black point is a codevector. 2.1 Batch K-Means A simple algorithm able to construct a Voronoi tessellation of the input space was proposed in 1957 by Lloyd [52] and it is known as batch K-Means. Starting from the finite data set X this algorithm moves iteratively the k codevectors to the arithmetic mean of their Voronoi sets {πi}i=1,...,k. Theoretically speaking, a necessary condition for a codebook V to minimize the Empirical Quantization Error : E(X) = 1 2n k ∑ i=1 ∑ x∈πi ‖x − vi‖ 2 (3) 5 is that each codevector vi fulfills the centroid condition [29]. In the case of a finite data set X and with Euclidean distance, the centroid condition reduces to: vi = 1 |πi| ∑ x∈πi x . (4) Batch K-Means is formed by the following steps: (1) choose the number k o... |

1341 |
An efficient heuristic procedure for partitioning graphs
- Kernighan, Lin
- 1970
(Show Context)
Citation Context ...Sc) = c ∑ i=1 yTi Ayi = tr(Y T AY ) (90) 5.3 A unified view of the two approaches Comparing Eq. 90 and Eq. 85 it is possible to see the perfect equivalence between kernel K-means and the spectral approach to clustering when one wants to maximize the ratio association. To this end, indeed, it is enough to set the weights in the weighted kernel K-means equal to one obtaining the classical kernel K-means. It is possible to obtain more general results when one wants to optimize other objective functions in the spectral approach, such as the ratio cut [13], the normalized cut and the Kernighan-Lin [41] objective. For instance, in the case of the minimization of the normalized cut which is one of the most used objective functions, the functional to minimize is: J(S1, . . . , Sc) = tr(Y T D−1/2AD−1/2Y ) (91) Thus the correspondence with the objective in the kernel K-means imposes to choose Y = D1/2Z, W = D and K = D−1AD−1. It is worth noting that for an arbitrary A it is not guaranteed that D−1AD−1 is definite positive. In this case the kernel K-means will not necessarily converge. To cope with this problem in [20] the authors propose to enforce positive definiteness by means of a diagonal sh... |

1291 |
Theory of reproducing kernels
- Aronszajn
- 1950
(Show Context)
Citation Context ...l Clustering Methods In machine learning, the use of the kernel functions [57] has been introduced by Aizerman et al. [1] in 1964. In 1995 Cortes and Vapnik introduced Support Vector Machines (SVMs) [17] which perform better than other classification algorithms in several problems. The success of SVM has brought to extend the use of kernels to other learning algorithms (e.g., Kernel PCA [70]). The choice of the kernel is crucial to incorporate a priori knowledge on the application, for which it is possible to design ad hoc kernels. 3.1 Mercer kernels We recall the definition of Mercer kernels [2,68], considering, for the sake of simplicity, vectors in Rd instead of Cd. Definition 3.1 Let X = {x1, . . . ,xn} be a nonempty set where xi ∈ R d. A function K : X×X → R is called a positive definite kernel (or Mercer kernel) if and only if K is symmetric (i.e. K(xi,xj) = K(xj,xi)) and the following equation holds: n ∑ i=1 n ∑ j=1 cicjK(xi,xj) ≥ 0 ∀n ≥ 2 , (24) where cr ∈ R ∀r = 1, . . . , n Each Mercer kernel can be expressed as follows: K(xi,xj) = Φ(xi) · Φ(xj) , (25) where Φ : X → F performs a mapping from the input space X to a high dimensional feature space F . One of the most relevant aspe... |

1226 | Laplacian eigenmaps for dimensionality reduction and data representation
- Belkin, Niyogi
- 2003
(Show Context)
Citation Context ...ternative definitions: 23 • Normalized Laplacian LN = D − 1 2 LD− 1 2 • Generalized Laplacian LG = D −1L • Relaxed Laplacian Lρ = L − ρD Each definition is justified by special properties desirable in a given context. The spectral decomposition of the Laplacian matrix can give useful information about the properties of the graph. In particular it can be seen that the second smallest eigenvalue of L is related to the graph cut [26] and the corresponding eigenvector can cluster together similar patterns [10,16,72]. Spectral approach to clustering has a strong connection with Laplacian Eigenmaps [5]. The dimensionality reduction problem aims to find a proper low dimensional representation of a data set in a high dimensional space. In [5], each node in the graph, which represents a pattern, is connected just with nodes corresponding to neighboring patterns and the spectral decomposition of the Laplacian of the obtained graph permits to find a low dimensional representation of X. The authors point out the close connection with spectral clustering and Local Linear Embedding [67] providing theoretical and experimental validations. 4.1 Shi and Malik algorithm The algorithm proposed by Shi and... |

1144 |
Hierarchical grouping to optimize an objective function
- WARD
- 1963
(Show Context)
Citation Context ...r study. It is very important to notice that a good choice of representation of patterns can lead to improvements in clustering performance. Whether it is possible to choose an appropriate set of features depends on the system under study. Once a representation is fixed it is possible to choose an appropriate similarity measure among patterns. The most popular dissimilarity measure for metric representations is the distance, for instance the Euclidean one [25]. Clustering techniques can be roughly divided into two categories: • hierarchical ; • partitioning. Hierarchical clustering techniques [39,74,83] are able to find structures which can be further divided in substructures and so on recursively. The result is a hierarchical structure of groups known as dendrogram. Partitioning clustering methods try to obtain a single partition of data without any other sub-partition like hierarchical algorithms do and are often based on the optimization of an appropriate objective function. The result is the creation of separations hypersurfaces among clusters. For instance we can consider two nonlinear clusters as in figure 1. Standard partitioning methods (e.g., K-Means, Fuzzy c-Means, SOM and Neural G... |

1077 |
Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond
- Scholkopf, Smola
- 2002
(Show Context)
Citation Context ...onal feature space F . One of the most relevant aspects in applications is that it is possible to compute Euclidean distances in F without knowing explicitly Φ. This can be done using the so called distance kernel trick [58,70]: 12 ‖Φ(xi) − Φ(xj)‖ 2 = (Φ(xi) − Φ(xj)) · (Φ(xi) − Φ(xj)) = Φ(xi) · Φ(xi) + Φ(xj) · Φ(xj) − 2Φ(xi) · Φ(xj) = K(xi,xi) + K(xj,xj) − 2K(xi,xj) (26) in which the computation of distances of vectors in feature space is just a function of the input vectors. In fact, every algorithm in which input vectors appear only in dot products with other input vectors can be kernelized [71]. In order to simplify the notation we introduce the so called Gram matrix K where each element kij is the scalar product Φ(xi) · Φ(xi). Thus, Eq. 26 can be rewritten as: ‖Φ(xi) − Φ(xj)‖ 2 = kii + kjj − 2kij . (27) Examples of Mercer kernels are the following [78]: • linear: K(l)(xi,xj) = xi · xj (28) • polynomial of degree p: K(p)(xi,xj) = (1 + xi · xj) p p ∈ N (29) • Gaussian: K(g)(xi,xj) = exp ( − ‖xi − xj‖ 2 2σ2 ) σ ∈ R (30) It is important to stress that the use of the linear kernel in Eq. 26 simply leads to the computation of the Euclidean norm in the input space. Indeed: ‖xi − xj‖ 2 =xi... |

685 | Learning with Kernels: Support Vector - Schölkopf, Smola - 2002 |

661 |
Algebraic connectivity of graphs
- Fiedler
- 1973
(Show Context)
Citation Context ...epts in spectral graph theory. The basic idea is to construct a weighted graph from the initial data set where each node represents a pattern and each weighted edge simply takes into account the similarity between two patterns. In this framework the clustering problem can be seen as a graph cut problem, which can be tackled by means of the spectral graph theory. The core of this theory is the eigenvalue decomposition of the Laplacian matrix of the weighted graph obtained from data. In fact, there is a close relationship between the second smallest eigenvalue of the Laplacian and the graph cut [16,26]. The aim of this paper is to present a survey of kernel and spectral clustering methods. Moreover, an explicit proof of the fact that these two approaches have the same mathematical foundation is reported. In particular it has been shown by Dhillon et al. that Kernel K-Means and spectral clustering with the ratio association as the objective function are perfectly equivalent [20,21,23]. The core of both approaches lies in their ability to construct an adjacency structure between data avoiding to deal with a prefixed shape of clusters. These approaches have a slight similarity with hierarchica... |

592 | An introduction to kernel-based learning algorithms
- Müller, Mika, et al.
- 2001
(Show Context)
Citation Context ...is called a positive definite kernel (or Mercer kernel) if and only if K is symmetric (i.e. K(xi,xj) = K(xj,xi)) and the following equation holds: n ∑ i=1 n ∑ j=1 cicjK(xi,xj) ≥ 0 ∀n ≥ 2 , (24) where cr ∈ R ∀r = 1, . . . , n Each Mercer kernel can be expressed as follows: K(xi,xj) = Φ(xi) · Φ(xj) , (25) where Φ : X → F performs a mapping from the input space X to a high dimensional feature space F . One of the most relevant aspects in applications is that it is possible to compute Euclidean distances in F without knowing explicitly Φ. This can be done using the so called distance kernel trick [58,70]: 12 ‖Φ(xi) − Φ(xj)‖ 2 = (Φ(xi) − Φ(xj)) · (Φ(xi) − Φ(xj)) = Φ(xi) · Φ(xi) + Φ(xj) · Φ(xj) − 2Φ(xi) · Φ(xj) = K(xi,xi) + K(xj,xj) − 2K(xi,xj) (26) in which the computation of distances of vectors in feature space is just a function of the input vectors. In fact, every algorithm in which input vectors appear only in dot products with other input vectors can be kernelized [71]. In order to simplify the notation we introduce the so called Gram matrix K where each element kij is the scalar product Φ(xi) · Φ(xi). Thus, Eq. 26 can be rewritten as: ‖Φ(xi) − Φ(xj)‖ 2 = kii + kjj − 2kij . (27) Examples... |

570 | A tutorial on spectral clustering
- Luxburg
- 2007
(Show Context)
Citation Context ...on as the objective function are perfectly equivalent [20,21,23]. The core of both approaches lies in their ability to construct an adjacency structure between data avoiding to deal with a prefixed shape of clusters. These approaches have a slight similarity with hierarchical methods in the use of an adjacency structure with the main difference in the philosophy of the grouping procedure. A comparison of some spectral clustering methods has been recently proposed in [79], while there are some theoretical results on the capabilities and convergence properties of spectral methods for clustering [40,80,81,90]. Recently kernel methods have been applied to Fuzzy c-Varieties also [50] with the aim of finding varieties in feature space and there are some interesting clustering 3 methods using kernels such as [33] and [34]. Since the choice of the kernel and of the similarity measure is crucial in these methods, many techniques have been proposed in order to learn automatically the shape of kernels from data as in [4,18,27,56]. Regarding the applications, most of these algorithms (e.g., [12,18,50]) have been applied to standard benchmarks such as Ionosphere [73], Breast Cancer [85] and Iris [28] 1 . Ke... |

527 |
Numerical Taxonomy. The principles and Practice of Numerical Taxonomy Classification.
- Sneath, Sokal
- 1973
(Show Context)
Citation Context ...r study. It is very important to notice that a good choice of representation of patterns can lead to improvements in clustering performance. Whether it is possible to choose an appropriate set of features depends on the system under study. Once a representation is fixed it is possible to choose an appropriate similarity measure among patterns. The most popular dissimilarity measure for metric representations is the distance, for instance the Euclidean one [25]. Clustering techniques can be roughly divided into two categories: • hierarchical ; • partitioning. Hierarchical clustering techniques [39,74,83] are able to find structures which can be further divided in substructures and so on recursively. The result is a hierarchical structure of groups known as dendrogram. Partitioning clustering methods try to obtain a single partition of data without any other sub-partition like hierarchical algorithms do and are often based on the optimization of an appropriate objective function. The result is the creation of separations hypersurfaces among clusters. For instance we can consider two nonlinear clusters as in figure 1. Standard partitioning methods (e.g., K-Means, Fuzzy c-Means, SOM and Neural G... |

497 | Survey of clustering algorithms.
- Xu, Wunsch
- 2005
(Show Context)
Citation Context ...lustering methods are presented as extensions of kernel K-means clustering algorithm. Key words: partitional clustering, Mercer kernels, kernel clustering, kernel fuzzy clustering, spectral clustering Email addresses: filippone@disi.unige.it (Maurizio Filippone), francesco.camastra@uniparthenope.it (Francesco Camastra), masulli@disi.unige.it (Francesco Masulli), rovetta@disi.unige.it (Stefano Rovetta). Preprint submitted to Elsevier Science April 30, 2007 1 Introduction Unsupervised data analysis using clustering algorithms provides a useful tool to explore data structures. Clustering methods [39,87] have been addressed in many contexts and disciplines such as data mining, document retrieval, image segmentation and pattern classification. The aim of clustering methods is to group patterns on the basis of a similarity (or dissimilarity) criteria where groups (or clusters) are set of similar patterns. Crucial aspects in clustering are pattern representation and the similarity measure. Each pattern is usually represented by a set of features of the system under study. It is very important to notice that a good choice of representation of patterns can lead to improvements in clustering perfor... |

466 | Co-clustering documents and words using bipartite spectral graph partitioning
- Dhillon
- 2001
(Show Context)
Citation Context ...h as Ionosphere [73], Breast Cancer [85] and Iris [28] 1 . Kernel Fuzzy c-Means proposed in [14,93,94] has been applied in image segmentation problems while in [32] it has been applied in handwritten digits recognition. There are applications of kernel clustering methods in face recognition using kernel SOM [76], in speech recognition [69] and in prediction of crop yield from climate and plantation data [3]. Spectral methods have been applied in clustering of artificial data [60,63], in image segmentation [56,72,75], in bioinformatics [19], and in co-clustering problems of words and documents [22] and genes and conditions [42]. A semi-supervised spectral approach to bioinformatics and handwritten character recognition have been proposed in [48]. The protein sequence clustering problem has been faced using spectral techniques in [61] and kernel methods in [84]. In the next section we briefly introduce the concepts of linear partitioning methods by recalling some basic crisp and fuzzy algorithms. Then the paper is organized as follows: section 3 shows the kernelized version of the algorithms presented in section 2, in section 4 we discuss spectral clustering, while in section 5 we report... |

397 |
Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control
- Aizerman, Braverman, et al.
- 1964
(Show Context)
Citation Context ...orithm [55], based on a soft adaptation rule. This technique resembles the SOM in the sense that not only the winner codevector is adapted. It is different in that codevectors are not constrained to be on a grid, and the adaptation of the codevectors near the winner is controlled by a criterion based on distance ranks. Each time a pattern x is presented, all the codevectors vj are ranked according to their distance to x (the closest obtains the lowest rank). Denoting with ρj the rank of the distance between x and the codevector vj, the update rule is: ∆vj = ǫ(t)hλ(ρj)(x − vj) (10) with ǫ(t) ∈ [0, 1] gradually lowered as t increases and hλ(ρj) a function decreasing with ρj with a characteristic decay λ; usually hλ(ρj) = exp (−ρj/λ). The Neural Gas algorithm is the following: (1) Initialize the codebook V randomly picking from X (2) Initialize the time parameter t = 0 (3) Randomly pick an input x from X (4) Order all elements vj of V according to their distance to x, obtaining the ρj (5) Adapt the codevectors according to Eq. 10 (6) Increase the time parameter t = t + 1 (7) if t < tmax go to step 3. 8 2.4 Fuzzy clustering methods Bezdek [8] introduced the concept of hard and fuzzy partitio... |

393 |
Functions of positive and negative type and their connection with the theory of integral equations
- Mercer
- 1909
(Show Context)
Citation Context ...cale estimation techniques as developed in the robust clustering literature 11 for M-estimators [35,59]. The value of ηi can be updated at each step of the algorithm or can be fixed for all iterations. The former approach can lead to instabilities since the derivation of the equations has been obtained considering ηi fixed. In the latter case a good estimation of ηi can be done only starting from an approximate solution. For this reason often the Possibilistic c-Means is run as a refining step of a Fuzzy c-Means. 3 Kernel Clustering Methods In machine learning, the use of the kernel functions [57] has been introduced by Aizerman et al. [1] in 1964. In 1995 Cortes and Vapnik introduced Support Vector Machines (SVMs) [17] which perform better than other classification algorithms in several problems. The success of SVM has brought to extend the use of kernels to other learning algorithms (e.g., Kernel PCA [70]). The choice of the kernel is crucial to incorporate a priori knowledge on the application, for which it is possible to design ad hoc kernels. 3.1 Mercer kernels We recall the definition of Mercer kernels [2,68], considering, for the sake of simplicity, vectors in Rd instead of Cd. ... |

345 |
neural gas’ network for vector quantization and its application to time series prediction
- Martinetz, Berkovich, et al.
- 1993
(Show Context)
Citation Context ...es an important purpose, namely, that of limiting the flexibility of the mapping in the first training cycles, and gradually increasing it (while decreasing the magnitude of updates to ensure convergence) as more cycles were performed. The strategy is similar to that of other algorithms, including these described in the following, in the capacity control of the method which has the effect of avoiding local minima. This accounts for the fast convergence often reported in experimental works. 2.3 Neural Gas Another technique that tries to minimize the distortion error is the neural gas algorithm [55], based on a soft adaptation rule. This technique resembles the SOM in the sense that not only the winner codevector is adapted. It is different in that codevectors are not constrained to be on a grid, and the adaptation of the codevectors near the winner is controlled by a criterion based on distance ranks. Each time a pattern x is presented, all the codevectors vj are ranked according to their distance to x (the closest obtains the lowest rank). Denoting with ρj the rank of the distance between x and the codevector vj, the update rule is: ∆vj = ǫ(t)hλ(ρj)(x − vj) (10) with ǫ(t) ∈ [0, 1] grad... |

332 | On clusterings: Good, bad and spectral
- Kannan, Vempala, et al.
- 2000
(Show Context)
Citation Context ...on as the objective function are perfectly equivalent [20,21,23]. The core of both approaches lies in their ability to construct an adjacency structure between data avoiding to deal with a prefixed shape of clusters. These approaches have a slight similarity with hierarchical methods in the use of an adjacency structure with the main difference in the philosophy of the grouping procedure. A comparison of some spectral clustering methods has been recently proposed in [79], while there are some theoretical results on the capabilities and convergence properties of spectral methods for clustering [40,80,81,90]. Recently kernel methods have been applied to Fuzzy c-Varieties also [50] with the aim of finding varieties in feature space and there are some interesting clustering 3 methods using kernels such as [33] and [34]. Since the choice of the kernel and of the similarity measure is crucial in these methods, many techniques have been proposed in order to learn automatically the shape of kernels from data as in [4,18,27,56]. Regarding the applications, most of these algorithms (e.g., [12,18,50]) have been applied to standard benchmarks such as Ionosphere [73], Breast Cancer [85] and Iris [28] 1 . Ke... |

330 |
Possibilistic approach to clustering,”
- Krishnapuram, Keller
- 1993
(Show Context)
Citation Context ... of the Lagrangian are computed with respect to the uih and vi and are set to zero. This yields the iteration scheme of these equations: u−1ih = c ∑ j=1 ( ‖xh − vi‖ ‖xh − vj‖ ) 2 m−1 , (15) vi = ∑n h=1 (uih) m xh ∑n h=1 (uih) m . (16) At each iteration it is possible to evaluate the amount of change of the memberships and codevectors and the algorithm can be stopped when these quantities reach a predefined threshold. At the end a soft partitioning of the input space is obtained. 2.5 Possibilistic clustering methods As a further modification of the K-Means algorithm, the possibilistic approach [46,47] relaxes the probabilistic constraint on the membership of a pattern to all clusters. In this way a pattern can have a low membership to all clusters in the case of outliers, whereas for instance, in the situation of overlapped clusters, it can have high membership to more than one cluster. In this framework the membership represents a degree of typicality not depending on the membership values of the same pattern to other clusters. Again the optimization procedure is the Picard iteration method, since the functional depends both on memberships and codevectors. 10 2.5.1 Possibilistic c-Means T... |

321 | Deterministic annealing for clustering, compression, classification, regression, and related optimization problems," in
- Rose
- 1998
(Show Context)
Citation Context ...sellation of the feature space. An on-line version of the kernel K-Means algorithm can be found in [70]. A further version of K-Means in feature space has been proposed by Girolami [30]. In his formulation the number of clusters is denoted by c and a fuzzy membership matrix U is introduced. Each element uih denotes the fuzzy membership 15 of the point xh to the Voronoi set π Φ i . This algorithm tries to minimize the following functional with respect to U : JΦ(U, V Φ) = n ∑ h=1 c ∑ i=1 uih ∥ ∥ ∥Φ(xh) − v Φ i ∥ ∥ ∥ 2 . (39) The minimization technique used by Girolami is Deterministic Annealing [65] which is a stochastic method for optimization. A parameter controls the fuzziness of the membership during the optimization and can be thought proportional to the temperature of a physical system. This parameter is gradually lowered during the annealing and at the end of the procedure the memberships have become crisp; therefore a tessellation of the feature space is found. This linear partitioning in F , back to the input space, forms a nonlinear partitioning of the input space. 3.3 Kernel SOM The kernel version of the SOM algorithm [38,53] is based on the distance kernel trick. The method t... |

298 | On kernel-target alignment.
- Cristianini, Shawe-Taylor, et al.
- 2001
(Show Context)
Citation Context ...clustering methods has been recently proposed in [79], while there are some theoretical results on the capabilities and convergence properties of spectral methods for clustering [40,80,81,90]. Recently kernel methods have been applied to Fuzzy c-Varieties also [50] with the aim of finding varieties in feature space and there are some interesting clustering 3 methods using kernels such as [33] and [34]. Since the choice of the kernel and of the similarity measure is crucial in these methods, many techniques have been proposed in order to learn automatically the shape of kernels from data as in [4,18,27,56]. Regarding the applications, most of these algorithms (e.g., [12,18,50]) have been applied to standard benchmarks such as Ionosphere [73], Breast Cancer [85] and Iris [28] 1 . Kernel Fuzzy c-Means proposed in [14,93,94] has been applied in image segmentation problems while in [32] it has been applied in handwritten digits recognition. There are applications of kernel clustering methods in face recognition using kernel SOM [76], in speech recognition [69] and in prediction of crop yield from climate and plantation data [3]. Spectral methods have been applied in clustering of artificial data [6... |

264 | Multiclass spectral clustering
- Yu, Shi
- 2003
(Show Context)
Citation Context ...deal case of two non connected subgraphs, D− 1 2e2 assumes just two values; this allows to cluster together the components of D− 1 2e2 with the same value. In a real case the splitting point must be chosen to cluster the components of D− 1 2e2 and the authors suggest to use the median value, zero or the value for which the clustering gives the minimum Ncut. The successive partitioning can be made recursively on the obtained sub-graphs or it is possible to use more than one eigenvector. An interesting approach for clustering simultaneously the data set in more than two clusters can be found in [89]. 4.2 Ng, Jordan and Weiss algorithm The algorithm that has been proposed by Ng et al. [60] uses the adjacency matrix A as Laplacian. This definition allows to consider the eigenvector associated with the largest eigenvalues as the “good” one for clustering. This has a computational advantage since the principal eigenvectors can be computed for sparse matrices efficiently using the power iteration technique. The idea is the same as in other spectral clustering methods, i.e., one finds a new representation of patterns on the first k eigenvectors of the Laplacian of the graph. The algorithm is c... |

225 | Kernel k-means: spectral clustering and normalized cuts
- Dhillon, Guan, et al.
- 2004
(Show Context)
Citation Context ...ry is the eigenvalue decomposition of the Laplacian matrix of the weighted graph obtained from data. In fact, there is a close relationship between the second smallest eigenvalue of the Laplacian and the graph cut [16,26]. The aim of this paper is to present a survey of kernel and spectral clustering methods. Moreover, an explicit proof of the fact that these two approaches have the same mathematical foundation is reported. In particular it has been shown by Dhillon et al. that Kernel K-Means and spectral clustering with the ratio association as the objective function are perfectly equivalent [20,21,23]. The core of both approaches lies in their ability to construct an adjacency structure between data avoiding to deal with a prefixed shape of clusters. These approaches have a slight similarity with hierarchical methods in the use of an adjacency structure with the main difference in the philosophy of the grouping procedure. A comparison of some spectral clustering methods has been recently proposed in [79], while there are some theoretical results on the capabilities and convergence properties of spectral methods for clustering [40,80,81,90]. Recently kernel methods have been applied to Fuzz... |

222 | The use of multiple measurements in taxonomic problems. Annual Eugenics. 7 - Fisher - 1936 |

215 | Support vector clustering - Ben-Hur, Horn, et al. - 2001 |

196 | Spectral relaxation for K-means clustering
- Zha, Ding, et al.
- 2002
(Show Context)
Citation Context ...on as the objective function are perfectly equivalent [20,21,23]. The core of both approaches lies in their ability to construct an adjacency structure between data avoiding to deal with a prefixed shape of clusters. These approaches have a slight similarity with hierarchical methods in the use of an adjacency structure with the main difference in the philosophy of the grouping procedure. A comparison of some spectral clustering methods has been recently proposed in [79], while there are some theoretical results on the capabilities and convergence properties of spectral methods for clustering [40,80,81,90]. Recently kernel methods have been applied to Fuzzy c-Varieties also [50] with the aim of finding varieties in feature space and there are some interesting clustering 3 methods using kernels such as [33] and [34]. Since the choice of the kernel and of the similarity measure is crucial in these methods, many techniques have been proposed in order to learn automatically the shape of kernels from data as in [4,18,27,56]. Regarding the applications, most of these algorithms (e.g., [12,18,50]) have been applied to standard benchmarks such as Ionosphere [73], Breast Cancer [85] and Iris [28] 1 . Ke... |

180 |
Lower bounds for the partitioning of graphs
- Donath, Hoffman
- 1973
(Show Context)
Citation Context ... the metric used in [92] involves the minimization of the following functional: JΦ(U, V ) = n ∑ h=1 c ∑ i=1 (uih) m ‖Φ(xh) − Φ(vi)‖ 2 + c ∑ i=1 ηi n ∑ h=1 (1 − uih) m (63) 21 Minimization leads to: u−1ih = 1 + ( ‖Φ(xh) − Φ(vi)‖ 2 ηi ) 1 m−1 , (64) that can be rewritten, considering a Gaussian kernel, as: u−1ih = 1 + 2 ( 1 − K(xh,vi) ηi ) 1 m−1 . (65) The update of the codevectors follows: vi = ∑n h=1 (uih) m K(xh, vi)xh ∑n h=1 (uih) m K(xh, vi) . (66) The computation of the ηi is straightforward. 4 Spectral Clustering Spectral clustering methods [19] have a strong connection with graph theory [16,24]. A comparison of some spectral clustering methods has been recently proposed in [79]. Let X = {x1, . . . ,xn} be the set of patterns to cluster. Starting from X, we can build a complete, weighted undirected graph G(V,A) having a set of nodes V = {v1, . . . , vn} corresponding to the n patterns and edges defined through the n × n adjacency (also affinity) matrix A. The adjacency matrix for a weighted graph is given by the matrix whose element aij represents the weight of the edge connecting nodes i and j. Being an undirected graph, the property aij = aji holds. Adjacency between two patterns c... |

178 | Support vector domain description
- Tax, Duin
- 1999
(Show Context)
Citation Context ...he soft rule for the update to the codevectors in feature space. Rewriting Eq. 10 in feature space for the update of the codevectors we have: ∆vΦj = ǫhλ(ρj) ( Φ(x) − vΦj ) . (46) Here ρj is the rank of the distance ‖Φ(x) − v Φ j ‖. Again it is possible to write vΦj as a linear combination of Φ(xi) as in Eq. 40, allowing to compute such distances by means of the kernel trick. As in the kernel SOM technique, the updating rule for the centroids becomes an updating rule for the coefficients of such combination. 3.5 One Class SVM This approach provides a support vector description in feature space [36,37,77]. The idea is to use kernels to project data into a feature space and then to find the sphere enclosing almost all data, namely not including outliers. Formally a radius R and the center v of the smallest enclosing sphere in feature space are defined. The constraint is thus: ‖Φ(xj) − v‖ 2 ≤ R2 + ξj ∀j , (47) where the non negative slack variables ξj have been added. The Lagrangian for this problem is defined [11]: L = R2 − ∑ j (R2 + ξj − ‖Φ(xj) − v‖ 2)βj − ∑ j ξjµj + C ∑ j ξj (48) 17 where βj ≥ 0 and µj ≥ 0 are Lagrange multipliers, C is a constant and C ∑ j ξj is a penalty term. Computing the... |

174 | Weighted graph cuts without eigenvectors a multilevel approach
- DHILLON, GUAN, et al.
- 2007
(Show Context)
Citation Context ...ry is the eigenvalue decomposition of the Laplacian matrix of the weighted graph obtained from data. In fact, there is a close relationship between the second smallest eigenvalue of the Laplacian and the graph cut [16,26]. The aim of this paper is to present a survey of kernel and spectral clustering methods. Moreover, an explicit proof of the fact that these two approaches have the same mathematical foundation is reported. In particular it has been shown by Dhillon et al. that Kernel K-Means and spectral clustering with the ratio association as the objective function are perfectly equivalent [20,21,23]. The core of both approaches lies in their ability to construct an adjacency structure between data avoiding to deal with a prefixed shape of clusters. These approaches have a slight similarity with hierarchical methods in the use of an adjacency structure with the main difference in the philosophy of the grouping procedure. A comparison of some spectral clustering methods has been recently proposed in [79], while there are some theoretical results on the capabilities and convergence properties of spectral methods for clustering [40,80,81,90]. Recently kernel methods have been applied to Fuzz... |

173 |
Spectral k-way ratio-cut partitioning and clustering
- Chan, Schlag, et al.
- 1994
(Show Context)
Citation Context ... (zTi zi) 1/2 (89) 29 we obtain: J(S1, . . . , Sc) = c ∑ i=1 yTi Ayi = tr(Y T AY ) (90) 5.3 A unified view of the two approaches Comparing Eq. 90 and Eq. 85 it is possible to see the perfect equivalence between kernel K-means and the spectral approach to clustering when one wants to maximize the ratio association. To this end, indeed, it is enough to set the weights in the weighted kernel K-means equal to one obtaining the classical kernel K-means. It is possible to obtain more general results when one wants to optimize other objective functions in the spectral approach, such as the ratio cut [13], the normalized cut and the Kernighan-Lin [41] objective. For instance, in the case of the minimization of the normalized cut which is one of the most used objective functions, the functional to minimize is: J(S1, . . . , Sc) = tr(Y T D−1/2AD−1/2Y ) (91) Thus the correspondence with the objective in the kernel K-means imposes to choose Y = D1/2Z, W = D and K = D−1AD−1. It is worth noting that for an arbitrary A it is not guaranteed that D−1AD−1 is definite positive. In this case the kernel K-means will not necessarily converge. To cope with this problem in [20] the authors propose to enforce ... |

168 |
Multisurface method of pattern separation for medical diagnosis applied to breast cytology.
- Wolberg, Mangasarian
- 1990
(Show Context)
Citation Context ... clustering [40,80,81,90]. Recently kernel methods have been applied to Fuzzy c-Varieties also [50] with the aim of finding varieties in feature space and there are some interesting clustering 3 methods using kernels such as [33] and [34]. Since the choice of the kernel and of the similarity measure is crucial in these methods, many techniques have been proposed in order to learn automatically the shape of kernels from data as in [4,18,27,56]. Regarding the applications, most of these algorithms (e.g., [12,18,50]) have been applied to standard benchmarks such as Ionosphere [73], Breast Cancer [85] and Iris [28] 1 . Kernel Fuzzy c-Means proposed in [14,93,94] has been applied in image segmentation problems while in [32] it has been applied in handwritten digits recognition. There are applications of kernel clustering methods in face recognition using kernel SOM [76], in speech recognition [69] and in prediction of crop yield from climate and plantation data [3]. Spectral methods have been applied in clustering of artificial data [60,63], in image segmentation [56,72,75], in bioinformatics [19], and in co-clustering problems of words and documents [22] and genes and conditions [42]. A se... |

141 | Learning segmentation by random walks.
- Meila, Shi
- 2001
(Show Context)
Citation Context ...clustering methods has been recently proposed in [79], while there are some theoretical results on the capabilities and convergence properties of spectral methods for clustering [40,80,81,90]. Recently kernel methods have been applied to Fuzzy c-Varieties also [50] with the aim of finding varieties in feature space and there are some interesting clustering 3 methods using kernels such as [33] and [34]. Since the choice of the kernel and of the similarity measure is crucial in these methods, many techniques have been proposed in order to learn automatically the shape of kernels from data as in [4,18,27,56]. Regarding the applications, most of these algorithms (e.g., [12,18,50]) have been applied to standard benchmarks such as Ionosphere [73], Breast Cancer [85] and Iris [28] 1 . Kernel Fuzzy c-Means proposed in [14,93,94] has been applied in image segmentation problems while in [32] it has been applied in handwritten digits recognition. There are applications of kernel clustering methods in face recognition using kernel SOM [76], in speech recognition [69] and in prediction of crop yield from climate and plantation data [3]. Spectral methods have been applied in clustering of artificial data [6... |

133 | Spectral biclustering of microarray data: coclustering genes and conditions.
- Klugar, Basri, et al.
- 2003
(Show Context)
Citation Context ...ancer [85] and Iris [28] 1 . Kernel Fuzzy c-Means proposed in [14,93,94] has been applied in image segmentation problems while in [32] it has been applied in handwritten digits recognition. There are applications of kernel clustering methods in face recognition using kernel SOM [76], in speech recognition [69] and in prediction of crop yield from climate and plantation data [3]. Spectral methods have been applied in clustering of artificial data [60,63], in image segmentation [56,72,75], in bioinformatics [19], and in co-clustering problems of words and documents [22] and genes and conditions [42]. A semi-supervised spectral approach to bioinformatics and handwritten character recognition have been proposed in [48]. The protein sequence clustering problem has been faced using spectral techniques in [61] and kernel methods in [84]. In the next section we briefly introduce the concepts of linear partitioning methods by recalling some basic crisp and fuzzy algorithms. Then the paper is organized as follows: section 3 shows the kernelized version of the algorithms presented in section 2, in section 4 we discuss spectral clustering, while in section 5 we report the equivalence between spect... |

118 | Learning spectral clustering.
- Bach, Jordan
- 2003
(Show Context)
Citation Context ...clustering methods has been recently proposed in [79], while there are some theoretical results on the capabilities and convergence properties of spectral methods for clustering [40,80,81,90]. Recently kernel methods have been applied to Fuzzy c-Varieties also [50] with the aim of finding varieties in feature space and there are some interesting clustering 3 methods using kernels such as [33] and [34]. Since the choice of the kernel and of the similarity measure is crucial in these methods, many techniques have been proposed in order to learn automatically the shape of kernels from data as in [4,18,27,56]. Regarding the applications, most of these algorithms (e.g., [12,18,50]) have been applied to standard benchmarks such as Ionosphere [73], Breast Cancer [85] and Iris [28] 1 . Kernel Fuzzy c-Means proposed in [14,93,94] has been applied in image segmentation problems while in [32] it has been applied in handwritten digits recognition. There are applications of kernel clustering methods in face recognition using kernel SOM [76], in speech recognition [69] and in prediction of crop yield from climate and plantation data [3]. Spectral methods have been applied in clustering of artificial data [6... |

94 | Semi-supervised graph clustering: a kernel approach
- KULIS, BASU, et al.
(Show Context)
Citation Context ... while in [32] it has been applied in handwritten digits recognition. There are applications of kernel clustering methods in face recognition using kernel SOM [76], in speech recognition [69] and in prediction of crop yield from climate and plantation data [3]. Spectral methods have been applied in clustering of artificial data [60,63], in image segmentation [56,72,75], in bioinformatics [19], and in co-clustering problems of words and documents [22] and genes and conditions [42]. A semi-supervised spectral approach to bioinformatics and handwritten character recognition have been proposed in [48]. The protein sequence clustering problem has been faced using spectral techniques in [61] and kernel methods in [84]. In the next section we briefly introduce the concepts of linear partitioning methods by recalling some basic crisp and fuzzy algorithms. Then the paper is organized as follows: section 3 shows the kernelized version of the algorithms presented in section 2, in section 4 we discuss spectral clustering, while in section 5 we report the equivalence between spectral and kernel clustering methods. In the last section conclusions are drawn. 2 Partitioning Methods In this section we ... |

94 | Theory of reproducing kernels and its applications - Saitoh - 1988 |

91 | Semi-supervised protein classification using cluster kernels,”
- Weston, Leslie, et al.
- 2004
(Show Context)
Citation Context ...hods in face recognition using kernel SOM [76], in speech recognition [69] and in prediction of crop yield from climate and plantation data [3]. Spectral methods have been applied in clustering of artificial data [60,63], in image segmentation [56,72,75], in bioinformatics [19], and in co-clustering problems of words and documents [22] and genes and conditions [42]. A semi-supervised spectral approach to bioinformatics and handwritten character recognition have been proposed in [48]. The protein sequence clustering problem has been faced using spectral techniques in [61] and kernel methods in [84]. In the next section we briefly introduce the concepts of linear partitioning methods by recalling some basic crisp and fuzzy algorithms. Then the paper is organized as follows: section 3 shows the kernelized version of the algorithms presented in section 2, in section 4 we discuss spectral clustering, while in section 5 we report the equivalence between spectral and kernel clustering methods. In the last section conclusions are drawn. 2 Partitioning Methods In this section we briefly recall some basic facts about partitioning clustering methods and we will report the clustering methods for w... |

85 | Learning eigenfunctions links spectral embedding and kernel pca
- Bengio, Delalleau, et al.
- 2004
(Show Context)
Citation Context ...ted to the data set. 5 A Unified View of Spectral and Kernel Clustering Methods Recently a possible connection between unsupervised kernel algorithms and spectral methods has been studied to find whether these two seemingly different approaches can be described under a more general framework. The hint for this unifying theory lies the adjacency structure constructed by both these approaches. In the spectral approach there is an adjacency between patterns which is the analogous of the kernel functions in kernel methods. A direct connection between Kernel PCA and spectral methods has been shown [6,7]. More recently a unifying view of kernel K-means and spectral clustering methods [20,21,23] has been pointed out. In this section we show explicitly the equivalence between them highlighting that these two approaches have the same foundation and in particular that both can be viewed as a matrix trace maximization problem. 27 5.1 Kernel clustering methods objective To show the direct equivalence between kernel and spectral clustering methods we introduce the weighted version of the kernel K-means [23]. We introduce a weight matrix W having weights wk on the diagonal. Recalling that we denote w... |

84 | A comparison of spectral clustering algorithms.
- Verma, Meila
- 2003
(Show Context)
Citation Context ... reported. In particular it has been shown by Dhillon et al. that Kernel K-Means and spectral clustering with the ratio association as the objective function are perfectly equivalent [20,21,23]. The core of both approaches lies in their ability to construct an adjacency structure between data avoiding to deal with a prefixed shape of clusters. These approaches have a slight similarity with hierarchical methods in the use of an adjacency structure with the main difference in the philosophy of the grouping procedure. A comparison of some spectral clustering methods has been recently proposed in [79], while there are some theoretical results on the capabilities and convergence properties of spectral methods for clustering [40,80,81,90]. Recently kernel methods have been applied to Fuzzy c-Varieties also [50] with the aim of finding varieties in feature space and there are some interesting clustering 3 methods using kernels such as [33] and [34]. Since the choice of the kernel and of the similarity measure is crucial in these methods, many techniques have been proposed in order to learn automatically the shape of kernels from data as in [4,18,27,56]. Regarding the applications, most of the... |

78 |
The possibilistic c-means algorithm: insights and recommendations.
- Krishnapuram, Keller
- 1996
(Show Context)
Citation Context ... of the Lagrangian are computed with respect to the uih and vi and are set to zero. This yields the iteration scheme of these equations: u−1ih = c ∑ j=1 ( ‖xh − vi‖ ‖xh − vj‖ ) 2 m−1 , (15) vi = ∑n h=1 (uih) m xh ∑n h=1 (uih) m . (16) At each iteration it is possible to evaluate the amount of change of the memberships and codevectors and the algorithm can be stopped when these quantities reach a predefined threshold. At the end a soft partitioning of the input space is obtained. 2.5 Possibilistic clustering methods As a further modification of the K-Means algorithm, the possibilistic approach [46,47] relaxes the probabilistic constraint on the membership of a pattern to all clusters. In this way a pattern can have a low membership to all clusters in the case of outliers, whereas for instance, in the situation of overlapped clusters, it can have high membership to more than one cluster. In this framework the membership represents a degree of typicality not depending on the membership values of the same pattern to other clusters. Again the optimization procedure is the Picard iteration method, since the functional depends both on memberships and codevectors. 10 2.5.1 Possibilistic c-Means T... |

74 | Robust image segmentation using fcm with spatial constrains based on a new kernel-induced distance measure.
- Chen, Zhang
- 1907
(Show Context)
Citation Context ...been applied to Fuzzy c-Varieties also [50] with the aim of finding varieties in feature space and there are some interesting clustering 3 methods using kernels such as [33] and [34]. Since the choice of the kernel and of the similarity measure is crucial in these methods, many techniques have been proposed in order to learn automatically the shape of kernels from data as in [4,18,27,56]. Regarding the applications, most of these algorithms (e.g., [12,18,50]) have been applied to standard benchmarks such as Ionosphere [73], Breast Cancer [85] and Iris [28] 1 . Kernel Fuzzy c-Means proposed in [14,93,94] has been applied in image segmentation problems while in [32] it has been applied in handwritten digits recognition. There are applications of kernel clustering methods in face recognition using kernel SOM [76], in speech recognition [69] and in prediction of crop yield from climate and plantation data [3]. Spectral methods have been applied in clustering of artificial data [60,63], in image segmentation [56,72,75], in bioinformatics [19], and in co-clustering problems of words and documents [22] and genes and conditions [42]. A semi-supervised spectral approach to bioinformatics and handwrit... |

73 | B.: A unified view of kernel k-means, spectral clustering and graph cuts
- Dhillon, Guan, et al.
- 2005
(Show Context)
Citation Context ...ry is the eigenvalue decomposition of the Laplacian matrix of the weighted graph obtained from data. In fact, there is a close relationship between the second smallest eigenvalue of the Laplacian and the graph cut [16,26]. The aim of this paper is to present a survey of kernel and spectral clustering methods. Moreover, an explicit proof of the fact that these two approaches have the same mathematical foundation is reported. In particular it has been shown by Dhillon et al. that Kernel K-Means and spectral clustering with the ratio association as the objective function are perfectly equivalent [20,21,23]. The core of both approaches lies in their ability to construct an adjacency structure between data avoiding to deal with a prefixed shape of clusters. These approaches have a slight similarity with hierarchical methods in the use of an adjacency structure with the main difference in the philosophy of the grouping procedure. A comparison of some spectral clustering methods has been recently proposed in [79], while there are some theoretical results on the capabilities and convergence properties of spectral methods for clustering [40,80,81,90]. Recently kernel methods have been applied to Fuzz... |

72 |
Classification of radar returns from the ionosphere using neural networks,”
- Sigillito, Wing, et al.
- 1989
(Show Context)
Citation Context ...spectral methods for clustering [40,80,81,90]. Recently kernel methods have been applied to Fuzzy c-Varieties also [50] with the aim of finding varieties in feature space and there are some interesting clustering 3 methods using kernels such as [33] and [34]. Since the choice of the kernel and of the similarity measure is crucial in these methods, many techniques have been proposed in order to learn automatically the shape of kernels from data as in [4,18,27,56]. Regarding the applications, most of these algorithms (e.g., [12,18,50]) have been applied to standard benchmarks such as Ionosphere [73], Breast Cancer [85] and Iris [28] 1 . Kernel Fuzzy c-Means proposed in [14,93,94] has been applied in image segmentation problems while in [32] it has been applied in handwritten digits recognition. There are applications of kernel clustering methods in face recognition using kernel SOM [76], in speech recognition [69] and in prediction of crop yield from climate and plantation data [3]. Spectral methods have been applied in clustering of artificial data [60,63], in image segmentation [56,72,75], in bioinformatics [19], and in co-clustering problems of words and documents [22] and genes and c... |

66 | A unifying theorem for spectral embedding and clustering.
- Brand, Huang
- 2003
(Show Context)
Citation Context ...n be seen as a linear operator on G. In addition to this definition of Laplacian there are alternative definitions: 23 • Normalized Laplacian LN = D − 1 2 LD− 1 2 • Generalized Laplacian LG = D −1L • Relaxed Laplacian Lρ = L − ρD Each definition is justified by special properties desirable in a given context. The spectral decomposition of the Laplacian matrix can give useful information about the properties of the graph. In particular it can be seen that the second smallest eigenvalue of L is related to the graph cut [26] and the corresponding eigenvector can cluster together similar patterns [10,16,72]. Spectral approach to clustering has a strong connection with Laplacian Eigenmaps [5]. The dimensionality reduction problem aims to find a proper low dimensional representation of a data set in a high dimensional space. In [5], each node in the graph, which represents a pattern, is connected just with nodes corresponding to neighboring patterns and the spectral decomposition of the Laplacian of the obtained graph permits to find a low dimensional representation of X. The authors point out the close connection with spectral clustering and Local Linear Embedding [67] providing theoretical and e... |

56 |
Mercer kernel based clustering in feature space.
- Girolami
- 2002
(Show Context)
Citation Context ...M to find a minimum enclosing sphere in feature space able to enclose almost all data in feature space excluding outliers. The computed hypersphere corresponds to nonlinear surfaces in input space enclosing groups of patterns. The Support Vector Clustering algorithm allows to assign labels to patterns in input space enclosed by the same surface. In the next subsections we will outline these three approaches. 3.2 Kernel K-Means Given the data set X, we map our data in some feature space F , by means of a nonlinear map Φ and we consider k centers in feature space (vΦi ∈ F with i = 1, . . . , k) [30,70]. We call the set V Φ = (vΦ1 , . . . ,v Φ k ) Feature Space Codebook since in our representation the centers in the feature space play the same role of the codevectors in the input space. In analogy with the codevectors in the input space, we define for each center vΦi its Voronoi Region and Voronoi Set in feature space. The Voronoi Region in feature space (RΦi ) of the center v Φ i is the set of all vectors in F for which vΦi is the closest vector RΦi = { xΦ ∈ F ∣ ∣ ∣ ∣ i = arg min j ∥ ∥ ∥xΦ − vΦj ∥ ∥ ∥ } . (33) The Voronoi Set in Feature Space πΦi of the center v Φ i is the set of all vector... |

54 | Optimal cluster preserving embedding of nonmetric proximity data,”
- Roth, Laub, et al.
- 2003
(Show Context)
Citation Context ...ective. For instance, in the case of the minimization of the normalized cut which is one of the most used objective functions, the functional to minimize is: J(S1, . . . , Sc) = tr(Y T D−1/2AD−1/2Y ) (91) Thus the correspondence with the objective in the kernel K-means imposes to choose Y = D1/2Z, W = D and K = D−1AD−1. It is worth noting that for an arbitrary A it is not guaranteed that D−1AD−1 is definite positive. In this case the kernel K-means will not necessarily converge. To cope with this problem in [20] the authors propose to enforce positive definiteness by means of a diagonal shift [66]: K = σD−1 + D−1AD−1 (92) where σ is a positive coefficient large enough to guarantee the positive definiteness of K. Since the mathematical foundation of these methods is the same, it is possible to choose which algorithm to use for clustering choosing, for instance, the approach with the less computational complexity for the particular application. 6 Conclusions Clustering is a classical problem in pattern recognition. Recently spectral and kernel methods for clustering have provided new ideas and interpretations to the solution of this problem. In this paper spectral and kernel methods for ... |

49 | A novel kernelized fuzzy Cmeans algorithm with application in medical image segmentation,
- Zhang, Chen
- 2004
(Show Context)
Citation Context ...been applied to Fuzzy c-Varieties also [50] with the aim of finding varieties in feature space and there are some interesting clustering 3 methods using kernels such as [33] and [34]. Since the choice of the kernel and of the similarity measure is crucial in these methods, many techniques have been proposed in order to learn automatically the shape of kernels from data as in [4,18,27,56]. Regarding the applications, most of these algorithms (e.g., [12,18,50]) have been applied to standard benchmarks such as Ionosphere [73], Breast Cancer [85] and Iris [28] 1 . Kernel Fuzzy c-Means proposed in [14,93,94] has been applied in image segmentation problems while in [32] it has been applied in handwritten digits recognition. There are applications of kernel clustering methods in face recognition using kernel SOM [76], in speech recognition [69] and in prediction of crop yield from climate and plantation data [3]. Spectral methods have been applied in clustering of artificial data [60,63], in image segmentation [56,72,75], in bioinformatics [19], and in co-clustering problems of words and documents [22] and genes and conditions [42]. A semi-supervised spectral approach to bioinformatics and handwrit... |

49 |
Neuronale Netze,
- Ritter, Schulten, et al.
- 1990
(Show Context)
Citation Context ...) and s = (s1, s2) is: drs = |r1 − s1 |+ |r2 − s2 |. (5) The SOM algorithm is the following: (1) Initialize the codebook V randomly picking from X (2) Initialize the set C of connections to form the rectangular grid of dimension n1 × n2 (3) Initialize t = 0 (4) Randomly pick an input x from X (5) Determine the winner s(x) = arg min vj∈V ‖x − vj‖ (6) (6) Adapt each codevector: ∆vj = ǫ(t)h(drs)(x − vj) (7) where h is a decreasing function of d as for instance: h(drs) = exp ( − d2rs 2σ2(t) ) (8) (7) Increment t (8) if t < tmax go to step 4 σ(t) and ǫ(t) are decreasing functions of t, for example [64]: σ(t) = σi ( σf σi )t/tmax , ǫ(t) = ǫi ( ǫf ǫi )t/tmax , (9) 7 where σi, σf and ǫi, ǫf are the initial and final values for the functions σ(t) and ǫ(t). A final note on the use of SOM for clustering. The method was originally devised as a tool for embedding multidimensional data into typically two dimensional spaces, for data visualization. Since then, it has also been frequently used as a clustering method, which was originally not considered appropriate because of the constraints imposed by the topology. However, the topology itself serves an important purpose, namely, that of limiting the ... |

47 | A novel kernel method for clustering” ,
- Camastra, Verri
- 2005
(Show Context)
Citation Context ...e theoretical results on the capabilities and convergence properties of spectral methods for clustering [40,80,81,90]. Recently kernel methods have been applied to Fuzzy c-Varieties also [50] with the aim of finding varieties in feature space and there are some interesting clustering 3 methods using kernels such as [33] and [34]. Since the choice of the kernel and of the similarity measure is crucial in these methods, many techniques have been proposed in order to learn automatically the shape of kernels from data as in [4,18,27,56]. Regarding the applications, most of these algorithms (e.g., [12,18,50]) have been applied to standard benchmarks such as Ionosphere [73], Breast Cancer [85] and Iris [28] 1 . Kernel Fuzzy c-Means proposed in [14,93,94] has been applied in image segmentation problems while in [32] it has been applied in handwritten digits recognition. There are applications of kernel clustering methods in face recognition using kernel SOM [76], in speech recognition [69] and in prediction of crop yield from climate and plantation data [3]. Spectral methods have been applied in clustering of artificial data [60,63], in image segmentation [56,72,75], in bioinformatics [19], and in ... |

36 | A and Saqi M A S, Spectral clustering of protein sequences,
- Paccanaro, Casbon
- 2006
(Show Context)
Citation Context ...ns of kernel clustering methods in face recognition using kernel SOM [76], in speech recognition [69] and in prediction of crop yield from climate and plantation data [3]. Spectral methods have been applied in clustering of artificial data [60,63], in image segmentation [56,72,75], in bioinformatics [19], and in co-clustering problems of words and documents [22] and genes and conditions [42]. A semi-supervised spectral approach to bioinformatics and handwritten character recognition have been proposed in [48]. The protein sequence clustering problem has been faced using spectral techniques in [61] and kernel methods in [84]. In the next section we briefly introduce the concepts of linear partitioning methods by recalling some basic crisp and fuzzy algorithms. Then the paper is organized as follows: section 3 shows the kernelized version of the algorithms presented in section 2, in section 4 we discuss spectral clustering, while in section 5 we report the equivalence between spectral and kernel clustering methods. In the last section conclusions are drawn. 2 Partitioning Methods In this section we briefly recall some basic facts about partitioning clustering methods and we will report t... |

33 |
Spectral kernel methods for clustering.
- Cristianini, Taylor, et al.
- 2001
(Show Context)
Citation Context ...., [12,18,50]) have been applied to standard benchmarks such as Ionosphere [73], Breast Cancer [85] and Iris [28] 1 . Kernel Fuzzy c-Means proposed in [14,93,94] has been applied in image segmentation problems while in [32] it has been applied in handwritten digits recognition. There are applications of kernel clustering methods in face recognition using kernel SOM [76], in speech recognition [69] and in prediction of crop yield from climate and plantation data [3]. Spectral methods have been applied in clustering of artificial data [60,63], in image segmentation [56,72,75], in bioinformatics [19], and in co-clustering problems of words and documents [22] and genes and conditions [42]. A semi-supervised spectral approach to bioinformatics and handwritten character recognition have been proposed in [48]. The protein sequence clustering problem has been faced using spectral techniques in [61] and kernel methods in [84]. In the next section we briefly introduce the concepts of linear partitioning methods by recalling some basic crisp and fuzzy algorithms. Then the paper is organized as follows: section 3 shows the kernelized version of the algorithms presented in section 2, in section 4 w... |

31 | Between min cut and graph bisection
- Wagner, Wagner
- 1993
(Show Context)
Citation Context ...aining isolated nodes. To achieve a better balance in the cardinality of S and S it is suggested to optimize the normalized cut [72]: Ncut(S, S) = cut(S, S) ( 1 assoc(S, V ) + 1 assoc(S, V ) ) , (71) where the association assoc(S, V ) is also known as the volume of S: assoc(S, V ) = ∑ vi∈S,vj∈V aij ≡ vol(S) = ∑ vi∈S dii . (72) There are other definitions of functions to optimize (e.g., the conductance [40], the normalized association [72], ratio cut [21]). The complexity in optimizing these objective functions is very high (e.g., the optimization of the normalized cut is a NP-hard problem [72,82]) and for this reason it has been proposed to relax it by using spectral concepts of graph analysis. This relaxation can be formulated by introducing the Laplacian matrix [16]: L = D − A , (73) which can be seen as a linear operator on G. In addition to this definition of Laplacian there are alternative definitions: 23 • Normalized Laplacian LN = D − 1 2 LD− 1 2 • Generalized Laplacian LG = D −1L • Relaxed Laplacian Lρ = L − ρD Each definition is justified by special properties desirable in a given context. The spectral decomposition of the Laplacian matrix can give useful information about th... |

24 |
An improved cluster labeling method for support vector clustering.
- Lee
- 2005
(Show Context)
Citation Context ...x2 (a) (b) Figure 3. One class SVM applied to two data sets with outliers. The gray line shows the projection in input space of the smallest enclosing sphere in feature space. In (a) a linear kernel and in (b) a Gaussian kernel have been used. define an adjacency structure in this form: 1 if R(y) < R ∀y ∈ Y 0 otherwise. (53) Clusters are simply the connected components of the graph with the adjacency matrix just defined. In the implementation in [36] the check is made sampling the line segment Y in 20 equidistant points. There are some modifications on this labeling algorithm (e.g., [49,88]) that improve performances. An improved version of SVC algorithm with application in handwritten digits recognition can be found in [15]. 3.5.2 Camastra and Verri algorithm A technique combining K-Means and One Class SVM can be found in [12]. The algorithm uses a K-Means-like strategy, i.e., moves repeatedly all centers vΦi in the feature space, computing One Class SVM on their Voronoi sets π Φ i , until no center changes anymore. Moreover, in order to introduce robustness against outliers, the authors have proposed to compute One Class SVM on πΦi (ρ) of each center v Φ i . The set π Φ i (ρ) ... |

22 |
The kernel self-organising map.
- Macdonald, Fyfe
- 2000
(Show Context)
Citation Context ... K(l)(xi,xi) + K (l)(xj,xj) − 2K (l)(xi,xj) = ‖Φ(xi) − Φ(xj)‖ 2 , (31) shows that choosing the kernel K(l) implies Φ = I (where I is the identity function). Following this consideration we can think that kernels can offer a more general way to represent the elements of a set X and possibly, for some of these representations, the clusters can be easily identified. In literature there are some applications of kernels in clustering. These methods can be broadly divided in three categories, which are based respectively on: • kernelization of the metric [86,92,93]; 13 • clustering in feature space [32,38,53,62,91]; • description via support vectors [12,37]. Methods based on kernelization of the metric look for centroids in input space and the distances between patterns and centroids is computed by means of kernels: ‖Φ(xh) − Φ(vi)‖ 2 = K(xh,xh) + K(vi,vi) − 2K(xh,vi) . (32) Clustering in feature space is made by mapping each pattern using the function Φ and then computing centroids in feature space. Calling vΦi the centroids in feature space, we will see in the next sections that it is possible to compute the distances ∥ ∥ ∥Φ(xh) − v Φ i ∥ ∥ ∥ 2 by means of the kernel trick. The description via support ... |

21 |
Spectral Clustering and Kernel PCA are Learning Eigenfunctions,” Tech. Rep., Departement d’informatique et recherche operationelle, universite de Montreal,
- Bengio, Vincent, et al.
- 2003
(Show Context)
Citation Context ...ted to the data set. 5 A Unified View of Spectral and Kernel Clustering Methods Recently a possible connection between unsupervised kernel algorithms and spectral methods has been studied to find whether these two seemingly different approaches can be described under a more general framework. The hint for this unifying theory lies the adjacency structure constructed by both these approaches. In the spectral approach there is an adjacency between patterns which is the analogous of the kernel functions in kernel methods. A direct connection between Kernel PCA and spectral methods has been shown [6,7]. More recently a unifying view of kernel K-means and spectral clustering methods [20,21,23] has been pointed out. In this section we show explicitly the equivalence between them highlighting that these two approaches have the same foundation and in particular that both can be viewed as a matrix trace maximization problem. 27 5.1 Kernel clustering methods objective To show the direct equivalence between kernel and spectral clustering methods we introduce the weighted version of the kernel K-means [23]. We introduce a weight matrix W having weights wk on the diagonal. Recalling that we denote w... |

19 |
A new kernel-based fuzzy clustering approach: support vector clustering with cell growing.
- Chiang, Hao
- 2003
(Show Context)
Citation Context ... enclosing sphere in feature space. In (a) a linear kernel and in (b) a Gaussian kernel have been used. define an adjacency structure in this form: 1 if R(y) < R ∀y ∈ Y 0 otherwise. (53) Clusters are simply the connected components of the graph with the adjacency matrix just defined. In the implementation in [36] the check is made sampling the line segment Y in 20 equidistant points. There are some modifications on this labeling algorithm (e.g., [49,88]) that improve performances. An improved version of SVC algorithm with application in handwritten digits recognition can be found in [15]. 3.5.2 Camastra and Verri algorithm A technique combining K-Means and One Class SVM can be found in [12]. The algorithm uses a K-Means-like strategy, i.e., moves repeatedly all centers vΦi in the feature space, computing One Class SVM on their Voronoi sets π Φ i , until no center changes anymore. Moreover, in order to introduce robustness against outliers, the authors have proposed to compute One Class SVM on πΦi (ρ) of each center v Φ i . The set π Φ i (ρ) is defined as πΦi (ρ) = {xj ∈ π Φ i and ‖Φ(xj) − v Φ i ‖ < ρ} . (54) πΦi (ρ) is the Voronoi set in the feature space of the center v Φ i ... |

18 | Clustering with normalized cuts is clustering with a hyperplane. Statistical Learning in Computer Vision,
- Rahimi, Recht
- 2004
(Show Context)
Citation Context ...6]. Regarding the applications, most of these algorithms (e.g., [12,18,50]) have been applied to standard benchmarks such as Ionosphere [73], Breast Cancer [85] and Iris [28] 1 . Kernel Fuzzy c-Means proposed in [14,93,94] has been applied in image segmentation problems while in [32] it has been applied in handwritten digits recognition. There are applications of kernel clustering methods in face recognition using kernel SOM [76], in speech recognition [69] and in prediction of crop yield from climate and plantation data [3]. Spectral methods have been applied in clustering of artificial data [60,63], in image segmentation [56,72,75], in bioinformatics [19], and in co-clustering problems of words and documents [22] and genes and conditions [42]. A semi-supervised spectral approach to bioinformatics and handwritten character recognition have been proposed in [48]. The protein sequence clustering problem has been faced using spectral techniques in [61] and kernel methods in [84]. In the next section we briefly introduce the concepts of linear partitioning methods by recalling some basic crisp and fuzzy algorithms. Then the paper is organized as follows: section 3 shows the kernelized versio... |

17 | Theory of Reproducing Kernels and its - Saitoh - 1988 |

16 |
Fuzzy c-means clustering algorithm based on kernel method. Computational Intelligence and Multimedia Applications,
- Wu, Xie, et al.
- 2003
(Show Context)
Citation Context ...: ‖xi − xj‖ 2 =xi · xi + xj · xj − 2xi · xj = K(l)(xi,xi) + K (l)(xj,xj) − 2K (l)(xi,xj) = ‖Φ(xi) − Φ(xj)‖ 2 , (31) shows that choosing the kernel K(l) implies Φ = I (where I is the identity function). Following this consideration we can think that kernels can offer a more general way to represent the elements of a set X and possibly, for some of these representations, the clusters can be easily identified. In literature there are some applications of kernels in clustering. These methods can be broadly divided in three categories, which are based respectively on: • kernelization of the metric [86,92,93]; 13 • clustering in feature space [32,38,53,62,91]; • description via support vectors [12,37]. Methods based on kernelization of the metric look for centroids in input space and the distances between patterns and centroids is computed by means of kernels: ‖Φ(xh) − Φ(vi)‖ 2 = K(xh,xh) + K(vi,vi) − 2K(xh,vi) . (32) Clustering in feature space is made by mapping each pattern using the function Φ and then computing centroids in feature space. Calling vΦi the centroids in feature space, we will see in the next sections that it is possible to compute the distances ∥ ∥ ∥Φ(xh) − v Φ i ∥ ∥ ∥ 2 by mean... |

16 | A Support Vector Method for Clustering. - Ben-Hur, Horn, et al. - 2000 |

16 | Support vector clustering through proximity graph modelling.
- Yang, Estivill, et al.
- 2002
(Show Context)
Citation Context ...x2 (a) (b) Figure 3. One class SVM applied to two data sets with outliers. The gray line shows the projection in input space of the smallest enclosing sphere in feature space. In (a) a linear kernel and in (b) a Gaussian kernel have been used. define an adjacency structure in this form: 1 if R(y) < R ∀y ∈ Y 0 otherwise. (53) Clusters are simply the connected components of the graph with the adjacency matrix just defined. In the implementation in [36] the check is made sampling the line segment Y in 20 equidistant points. There are some modifications on this labeling algorithm (e.g., [49,88]) that improve performances. An improved version of SVC algorithm with application in handwritten digits recognition can be found in [15]. 3.5.2 Camastra and Verri algorithm A technique combining K-Means and One Class SVM can be found in [12]. The algorithm uses a K-Means-like strategy, i.e., moves repeatedly all centers vΦi in the feature space, computing One Class SVM on their Voronoi sets π Φ i , until no center changes anymore. Moreover, in order to introduce robustness against outliers, the authors have proposed to compute One Class SVM on πΦi (ρ) of each center v Φ i . The set π Φ i (ρ) ... |

15 | New methods for spectral clustering.
- Fischer, Poland
- 2004
(Show Context)
Citation Context |

15 | Fuzzy clustering using kernel method.
- Zhang, Chen
- 2002
(Show Context)
Citation Context ... K(l)(xi,xi) + K (l)(xj,xj) − 2K (l)(xi,xj) = ‖Φ(xi) − Φ(xj)‖ 2 , (31) shows that choosing the kernel K(l) implies Φ = I (where I is the identity function). Following this consideration we can think that kernels can offer a more general way to represent the elements of a set X and possibly, for some of these representations, the clusters can be easily identified. In literature there are some applications of kernels in clustering. These methods can be broadly divided in three categories, which are based respectively on: • kernelization of the metric [86,92,93]; 13 • clustering in feature space [32,38,53,62,91]; • description via support vectors [12,37]. Methods based on kernelization of the metric look for centroids in input space and the distances between patterns and centroids is computed by means of kernels: ‖Φ(xh) − Φ(vi)‖ 2 = K(xh,xh) + K(vi,vi) − 2K(xh,vi) . (32) Clustering in feature space is made by mapping each pattern using the function Φ and then computing centroids in feature space. Calling vΦi the centroids in feature space, we will see in the next sections that it is possible to compute the distances ∥ ∥ ∥Φ(xh) − v Φ i ∥ ∥ ∥ 2 by means of the kernel trick. The description via support ... |

13 |
Kernel neural gas algorithms with application to cluster analysis.
- Qinand, Suganthan
- 2004
(Show Context)
Citation Context ... K(l)(xi,xi) + K (l)(xj,xj) − 2K (l)(xi,xj) = ‖Φ(xi) − Φ(xj)‖ 2 , (31) shows that choosing the kernel K(l) implies Φ = I (where I is the identity function). Following this consideration we can think that kernels can offer a more general way to represent the elements of a set X and possibly, for some of these representations, the clusters can be easily identified. In literature there are some applications of kernels in clustering. These methods can be broadly divided in three categories, which are based respectively on: • kernelization of the metric [86,92,93]; 13 • clustering in feature space [32,38,53,62,91]; • description via support vectors [12,37]. Methods based on kernelization of the metric look for centroids in input space and the distances between patterns and centroids is computed by means of kernels: ‖Φ(xh) − Φ(vi)‖ 2 = K(xh,xh) + K(vi,vi) − 2K(xh,vi) . (32) Clustering in feature space is made by mapping each pattern using the function Φ and then computing centroids in feature space. Calling vΦi the centroids in feature space, we will see in the next sections that it is possible to compute the distances ∥ ∥ ∥Φ(xh) − v Φ i ∥ ∥ ∥ 2 by means of the kernel trick. The description via support ... |

12 | Kernel-based fuzzy clustering incorporating spatial constraints for image segmentation.
- Zhang, Chen, et al.
- 2003
(Show Context)
Citation Context ...been applied to Fuzzy c-Varieties also [50] with the aim of finding varieties in feature space and there are some interesting clustering 3 methods using kernels such as [33] and [34]. Since the choice of the kernel and of the similarity measure is crucial in these methods, many techniques have been proposed in order to learn automatically the shape of kernels from data as in [4,18,27,56]. Regarding the applications, most of these algorithms (e.g., [12,18,50]) have been applied to standard benchmarks such as Ionosphere [73], Breast Cancer [85] and Iris [28] 1 . Kernel Fuzzy c-Means proposed in [14,93,94] has been applied in image segmentation problems while in [32] it has been applied in handwritten digits recognition. There are applications of kernel clustering methods in face recognition using kernel SOM [76], in speech recognition [69] and in prediction of crop yield from climate and plantation data [3]. Spectral methods have been applied in clustering of artificial data [60,63], in image segmentation [56,72,75], in bioinformatics [19], and in co-clustering problems of words and documents [22] and genes and conditions [42]. A semi-supervised spectral approach to bioinformatics and handwrit... |

11 |
Fuzzy topographic kernel clustering.
- Graepel, Obermayer
- 1998
(Show Context)
Citation Context ...varieties in feature space and there are some interesting clustering 3 methods using kernels such as [33] and [34]. Since the choice of the kernel and of the similarity measure is crucial in these methods, many techniques have been proposed in order to learn automatically the shape of kernels from data as in [4,18,27,56]. Regarding the applications, most of these algorithms (e.g., [12,18,50]) have been applied to standard benchmarks such as Ionosphere [73], Breast Cancer [85] and Iris [28] 1 . Kernel Fuzzy c-Means proposed in [14,93,94] has been applied in image segmentation problems while in [32] it has been applied in handwritten digits recognition. There are applications of kernel clustering methods in face recognition using kernel SOM [76], in speech recognition [69] and in prediction of crop yield from climate and plantation data [3]. Spectral methods have been applied in clustering of artificial data [60,63], in image segmentation [56,72,75], in bioinformatics [19], and in co-clustering problems of words and documents [22] and genes and conditions [42]. A semi-supervised spectral approach to bioinformatics and handwritten character recognition have been proposed in [48]. The prot... |

7 | Mixture density Mercer kernels: A method to learn kernels directly from data. In
- Srivastava
- 2004
(Show Context)
Citation Context ... most of these algorithms (e.g., [12,18,50]) have been applied to standard benchmarks such as Ionosphere [73], Breast Cancer [85] and Iris [28] 1 . Kernel Fuzzy c-Means proposed in [14,93,94] has been applied in image segmentation problems while in [32] it has been applied in handwritten digits recognition. There are applications of kernel clustering methods in face recognition using kernel SOM [76], in speech recognition [69] and in prediction of crop yield from climate and plantation data [3]. Spectral methods have been applied in clustering of artificial data [60,63], in image segmentation [56,72,75], in bioinformatics [19], and in co-clustering problems of words and documents [22] and genes and conditions [42]. A semi-supervised spectral approach to bioinformatics and handwritten character recognition have been proposed in [48]. The protein sequence clustering problem has been faced using spectral techniques in [61] and kernel methods in [84]. In the next section we briefly introduce the concepts of linear partitioning methods by recalling some basic crisp and fuzzy algorithms. Then the paper is organized as follows: section 3 shows the kernelized version of the algorithms presented in s... |

7 |
Kernel based fuzzy and possibilistic c-means clustering,” in
- Zhang, Chen
- 2003
(Show Context)
Citation Context ...: ‖xi − xj‖ 2 =xi · xi + xj · xj − 2xi · xj = K(l)(xi,xi) + K (l)(xj,xj) − 2K (l)(xi,xj) = ‖Φ(xi) − Φ(xj)‖ 2 , (31) shows that choosing the kernel K(l) implies Φ = I (where I is the identity function). Following this consideration we can think that kernels can offer a more general way to represent the elements of a set X and possibly, for some of these representations, the clusters can be easily identified. In literature there are some applications of kernels in clustering. These methods can be broadly divided in three categories, which are based respectively on: • kernelization of the metric [86,92,93]; 13 • clustering in feature space [32,38,53,62,91]; • description via support vectors [12,37]. Methods based on kernelization of the metric look for centroids in input space and the distances between patterns and centroids is computed by means of kernels: ‖Φ(xh) − Φ(vi)‖ 2 = K(xh,xh) + K(vi,vi) − 2K(xh,vi) . (32) Clustering in feature space is made by mapping each pattern using the function Φ and then computing centroids in feature space. Calling vΦi the centroids in feature space, we will see in the next sections that it is possible to compute the distances ∥ ∥ ∥Φ(xh) − v Φ i ∥ ∥ ∥ 2 by mean... |

7 |
LVQ clustering and SOM using a kernel function.
- Inokuchi, Miyamoto
- 2004
(Show Context)
Citation Context |

6 |
Fuzzy c-varieties/elliptotypes clustering in reproducing kernel hilbert space.
- Leski
- 2004
(Show Context)
Citation Context ...approaches lies in their ability to construct an adjacency structure between data avoiding to deal with a prefixed shape of clusters. These approaches have a slight similarity with hierarchical methods in the use of an adjacency structure with the main difference in the philosophy of the grouping procedure. A comparison of some spectral clustering methods has been recently proposed in [79], while there are some theoretical results on the capabilities and convergence properties of spectral methods for clustering [40,80,81,90]. Recently kernel methods have been applied to Fuzzy c-Varieties also [50] with the aim of finding varieties in feature space and there are some interesting clustering 3 methods using kernels such as [33] and [34]. Since the choice of the kernel and of the similarity measure is crucial in these methods, many techniques have been proposed in order to learn automatically the shape of kernels from data as in [4,18,27,56]. Regarding the applications, most of these algorithms (e.g., [12,18,50]) have been applied to standard benchmarks such as Ionosphere [73], Breast Cancer [85] and Iris [28] 1 . Kernel Fuzzy c-Means proposed in [14,93,94] has been applied in image segmen... |

6 |
An improved possibilistic C-Means algorithm with finite rejection and robust scale estimation, proceeding
- Nasraoui, R
- 1996
(Show Context)
Citation Context ...respectively: vi = ∑n h=1 (uih) m xh ∑n h=1 (uih) m , (21) vi = ∑n h=1 uihxh ∑n h=1 uih . (22) The parameter ηi regulates the trade-off between the two terms in Eq. 17 and Eq. 18 and it is related to the width of the clusters. The authors suggest to estimate ηi for PCM-I using this formula: ηi = γ ∑n h=1 (uih) m ‖xh − vi‖ 2 ∑n h=1 (uih) m (23) which is a weighted mean of the intracluster distance of the i-th cluster and the constant γ is typically set at one. The parameter ηi can be estimated with scale estimation techniques as developed in the robust clustering literature 11 for M-estimators [35,59]. The value of ηi can be updated at each step of the algorithm or can be fixed for all iterations. The former approach can lead to instabilities since the derivation of the equations has been obtained considering ηi fixed. In the latter case a good estimation of ηi can be done only starting from an approximate solution. For this reason often the Possibilistic c-Means is run as a refining step of a Fuzzy c-Means. 3 Kernel Clustering Methods In machine learning, the use of the kernel functions [57] has been introduced by Aizerman et al. [1] in 1964. In 1995 Cortes and Vapnik introduced Support V... |

3 |
Clustering via kernel decomposition,”
- Have, Girolami, et al.
- 2006
(Show Context)
Citation Context ...rs. These approaches have a slight similarity with hierarchical methods in the use of an adjacency structure with the main difference in the philosophy of the grouping procedure. A comparison of some spectral clustering methods has been recently proposed in [79], while there are some theoretical results on the capabilities and convergence properties of spectral methods for clustering [40,80,81,90]. Recently kernel methods have been applied to Fuzzy c-Varieties also [50] with the aim of finding varieties in feature space and there are some interesting clustering 3 methods using kernels such as [33] and [34]. Since the choice of the kernel and of the similarity measure is crucial in these methods, many techniques have been proposed in order to learn automatically the shape of kernels from data as in [4,18,27,56]. Regarding the applications, most of these algorithms (e.g., [12,18,50]) have been applied to standard benchmarks such as Ionosphere [73], Breast Cancer [85] and Iris [28] 1 . Kernel Fuzzy c-Means proposed in [14,93,94] has been applied in image segmentation problems while in [32] it has been applied in handwritten digits recognition. There are applications of kernel clustering m... |

3 | An Intelligent System based on Kernel Methods for Crop Yield Prediction - Sap - 2006 |

2 | Robust face recognition from a single training image per person with kernel-based som-face.
- Tan, Chen, et al.
- 2004
(Show Context)
Citation Context ...d of the similarity measure is crucial in these methods, many techniques have been proposed in order to learn automatically the shape of kernels from data as in [4,18,27,56]. Regarding the applications, most of these algorithms (e.g., [12,18,50]) have been applied to standard benchmarks such as Ionosphere [73], Breast Cancer [85] and Iris [28] 1 . Kernel Fuzzy c-Means proposed in [14,93,94] has been applied in image segmentation problems while in [32] it has been applied in handwritten digits recognition. There are applications of kernel clustering methods in face recognition using kernel SOM [76], in speech recognition [69] and in prediction of crop yield from climate and plantation data [3]. Spectral methods have been applied in clustering of artificial data [60,63], in image segmentation [56,72,75], in bioinformatics [19], and in co-clustering problems of words and documents [22] and genes and conditions [42]. A semi-supervised spectral approach to bioinformatics and handwritten character recognition have been proposed in [48]. The protein sequence clustering problem has been faced using spectral techniques in [61] and kernel methods in [84]. In the next section we briefly introduce... |

2 | Kernel based clustering and vector quantization for speech recognition.
- Satish, Sekhar
- 2004
(Show Context)
Citation Context ...is crucial in these methods, many techniques have been proposed in order to learn automatically the shape of kernels from data as in [4,18,27,56]. Regarding the applications, most of these algorithms (e.g., [12,18,50]) have been applied to standard benchmarks such as Ionosphere [73], Breast Cancer [85] and Iris [28] 1 . Kernel Fuzzy c-Means proposed in [14,93,94] has been applied in image segmentation problems while in [32] it has been applied in handwritten digits recognition. There are applications of kernel clustering methods in face recognition using kernel SOM [76], in speech recognition [69] and in prediction of crop yield from climate and plantation data [3]. Spectral methods have been applied in clustering of artificial data [60,63], in image segmentation [56,72,75], in bioinformatics [19], and in co-clustering problems of words and documents [22] and genes and conditions [42]. A semi-supervised spectral approach to bioinformatics and handwritten character recognition have been proposed in [48]. The protein sequence clustering problem has been faced using spectral techniques in [61] and kernel methods in [84]. In the next section we briefly introduce the concepts of linear part... |

2 |
A support vector method for clustering.
- Hur, Horn, et al.
- 2000
(Show Context)
Citation Context ...he soft rule for the update to the codevectors in feature space. Rewriting Eq. 10 in feature space for the update of the codevectors we have: ∆vΦj = ǫhλ(ρj) ( Φ(x) − vΦj ) . (46) Here ρj is the rank of the distance ‖Φ(x) − v Φ j ‖. Again it is possible to write vΦj as a linear combination of Φ(xi) as in Eq. 40, allowing to compute such distances by means of the kernel trick. As in the kernel SOM technique, the updating rule for the centroids becomes an updating rule for the coefficients of such combination. 3.5 One Class SVM This approach provides a support vector description in feature space [36,37,77]. The idea is to use kernels to project data into a feature space and then to find the sphere enclosing almost all data, namely not including outliers. Formally a radius R and the center v of the smallest enclosing sphere in feature space are defined. The constraint is thus: ‖Φ(xj) − v‖ 2 ≤ R2 + ξj ∀j , (47) where the non negative slack variables ξj have been added. The Lagrangian for this problem is defined [11]: L = R2 − ∑ j (R2 + ξj − ‖Φ(xj) − v‖ 2)βj − ∑ j ξjµj + C ∑ j ξj (48) 17 where βj ≥ 0 and µj ≥ 0 are Lagrange multipliers, C is a constant and C ∑ j ξj is a penalty term. Computing the... |

2 |
Support vector clustering.
- Hur, Horn, et al.
- 2001
(Show Context)
Citation Context ... − Φ(xj)‖ 2 , (31) shows that choosing the kernel K(l) implies Φ = I (where I is the identity function). Following this consideration we can think that kernels can offer a more general way to represent the elements of a set X and possibly, for some of these representations, the clusters can be easily identified. In literature there are some applications of kernels in clustering. These methods can be broadly divided in three categories, which are based respectively on: • kernelization of the metric [86,92,93]; 13 • clustering in feature space [32,38,53,62,91]; • description via support vectors [12,37]. Methods based on kernelization of the metric look for centroids in input space and the distances between patterns and centroids is computed by means of kernels: ‖Φ(xh) − Φ(vi)‖ 2 = K(xh,xh) + K(vi,vi) − 2K(xh,vi) . (32) Clustering in feature space is made by mapping each pattern using the function Φ and then computing centroids in feature space. Calling vΦi the centroids in feature space, we will see in the next sections that it is possible to compute the distances ∥ ∥ ∥Φ(xh) − v Φ i ∥ ∥ ∥ 2 by means of the kernel trick. The description via support vectors makes use of One Class SVM to find ... |

1 | Clustering via Hilbert space, Phys - Horn |

1 | Fuzzy topographic kernel clustering, in: W. Brauer (Ed - Graepel, Obermayer - 1998 |

1 | Multiclass spectral clustering, in: ICCV ’03 - Yu, Shi - 2003 |

1 |
An intelligent system based on kernel methods for crop yield prediction.
- Awan, Mohd
- 2006
(Show Context)
Citation Context ...er to learn automatically the shape of kernels from data as in [4,18,27,56]. Regarding the applications, most of these algorithms (e.g., [12,18,50]) have been applied to standard benchmarks such as Ionosphere [73], Breast Cancer [85] and Iris [28] 1 . Kernel Fuzzy c-Means proposed in [14,93,94] has been applied in image segmentation problems while in [32] it has been applied in handwritten digits recognition. There are applications of kernel clustering methods in face recognition using kernel SOM [76], in speech recognition [69] and in prediction of crop yield from climate and plantation data [3]. Spectral methods have been applied in clustering of artificial data [60,63], in image segmentation [56,72,75], in bioinformatics [19], and in co-clustering problems of words and documents [22] and genes and conditions [42]. A semi-supervised spectral approach to bioinformatics and handwritten character recognition have been proposed in [48]. The protein sequence clustering problem has been faced using spectral techniques in [61] and kernel methods in [84]. In the next section we briefly introduce the concepts of linear partitioning methods by recalling some basic crisp and fuzzy algorithms. ... |

1 |
Clustering via Hilbert space. Physica A Statistical Mechanics and its Applications,
- Horn
- 2001
(Show Context)
Citation Context ... approaches have a slight similarity with hierarchical methods in the use of an adjacency structure with the main difference in the philosophy of the grouping procedure. A comparison of some spectral clustering methods has been recently proposed in [79], while there are some theoretical results on the capabilities and convergence properties of spectral methods for clustering [40,80,81,90]. Recently kernel methods have been applied to Fuzzy c-Varieties also [50] with the aim of finding varieties in feature space and there are some interesting clustering 3 methods using kernels such as [33] and [34]. Since the choice of the kernel and of the similarity measure is crucial in these methods, many techniques have been proposed in order to learn automatically the shape of kernels from data as in [4,18,27,56]. Regarding the applications, most of these algorithms (e.g., [12,18,50]) have been applied to standard benchmarks such as Ionosphere [73], Breast Cancer [85] and Iris [28] 1 . Kernel Fuzzy c-Means proposed in [14,93,94] has been applied in image segmentation problems while in [32] it has been applied in handwritten digits recognition. There are applications of kernel clustering methods in... |