Results 1  10
of
16
Spectral Active Clustering via Purification of the kNearest Neighbor Graph
"... Spectral clustering is widely used in data mining, machine learning and pattern recognition. There have been some recent developments in adding pairwise constraints as side information to enforce topdown structure into the clustering results. However, most of these algorithms are “passive ” in the ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
Spectral clustering is widely used in data mining, machine learning and pattern recognition. There have been some recent developments in adding pairwise constraints as side information to enforce topdown structure into the clustering results. However, most of these algorithms are “passive ” in the sense that the side information is provided beforehand. In this paper, we present a spectral active clustering method that actively select pairwise constraints based on a novel notion of node uncertainty rather than pair uncertainty. In our approach, the constraints are used to drive a purification process on the knearest neighbor graph—edges are removed from the graph based on the constraints—that ultimately leads to an improved, constraintsatisfied clustering. We have evaluated our framework on three datasets (UCI, gene and image sets) in the context of baseline and state of the art methods and find the proposed algorithm to be superiorly effective.
Redefining SelfSimilarity in Natural Images for Denoising Using Graph Signal Gradient
"... Abstract—Image denoising is the most basic inverse imaging problem. As an underdetermined problem, appropriate definition of image priors to regularize the problem is crucial. Among recent proposed priors for image denoising are: i) graph Laplacian regularizer where a given pixel patch is assumed ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
Abstract—Image denoising is the most basic inverse imaging problem. As an underdetermined problem, appropriate definition of image priors to regularize the problem is crucial. Among recent proposed priors for image denoising are: i) graph Laplacian regularizer where a given pixel patch is assumed to be smooth in the graphsignal domain; and ii) selfsimilarity prior where image patches are assumed to recur throughout a natural image in nonlocal spatial regions. In our first contribution, we demonstrate that the graph Laplacian regularizer converges to a continuous time functional counterpart, and careful selection of its features can lead to a discriminant signal prior. In our second contribution, we redefine patch selfsimilarity in terms of patch gradients and argue that the new definition results in a more accurate estimate of the graph Laplacian matrix, and thus better image denoising performance. Experiments show that our designed algorithm based on graph Laplacian regularizer and gradientbased selfsimilarity can outperform nonlocal means (NLM) denoising by up to 1.4 dB in PSNR. I.
LargeScale Machine Learning for Classification and Search
, 2012
"... With the rapid development of the Internet, nowadays tremendous amounts of data including images and videos, up to millions or billions, can be collected for training machine learning models. Inspired by this trend, this thesis is dedicated to developing largescale machine learning techniques for t ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
With the rapid development of the Internet, nowadays tremendous amounts of data including images and videos, up to millions or billions, can be collected for training machine learning models. Inspired by this trend, this thesis is dedicated to developing largescale machine learning techniques for the purpose of making classification and nearest neighbor search practical on gigantic databases. Our first approach is to explore data graphs to aid classification and nearest neighbor search. A graph offers an attractive way of representing data and discovering the essential information such as the neighborhood structure. However, both of the graph construction process and graphbased learning techniques become computationally prohibitive at a large scale. To this end, we present an efficient large graph construction approach and subsequently apply it to develop scalable semisupervised learning and unsupervised hashing algorithms. Our unique contributions on the graphrelated topics include: 1. Large Graph Construction: Conventional neighborhood graphs such as kNN graphs require a quadratic time complexity, which is inadequate for largescale applications mentioned above. To overcome this bottleneck, we present a novel graph construction approach,
CONTINUUM LIMIT OF TOTAL VARIATION ON POINT CLOUDS
, 2014
"... We consider point clouds obtained as random samples of a measure on a Euclidean domain. A graph representing the point cloud is obtained by assigning weights to edges based on the distance between the points they connect. Our goal is to develop mathematical tools needed to study the consistency, a ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We consider point clouds obtained as random samples of a measure on a Euclidean domain. A graph representing the point cloud is obtained by assigning weights to edges based on the distance between the points they connect. Our goal is to develop mathematical tools needed to study the consistency, as the number of available data points increases, of graphbased machine learning algorithms for tasks such as clustering. In particular, we study when is the cut capacity, and more generally total variation, on these graphs a good approximation of the perimeter (total variation) in the continuum setting. We address this question in the setting of Γconvergence. We obtain almost optimal conditions on the scaling, as number of points increases, of the size of the neighborhood over which the points are connected by an edge for the Γconvergence to hold. Taking the limit is enabled by a new metric which allows to suitably compare functionals defined on different point clouds.
Contents lists available at ScienceDirect Artificial Intelligence
"... www.elsevier.com/locate/artint ..."
Intel Labs Berkeley
"... Existing approaches to analyzing the asymptotics of graph Laplacians typically assume a wellbehaved kernel function with smoothness assumptions. We remove the smoothness assumption and generalize the analysis of graph Laplacians to include previously unstudied graphs including kNN graphs. We also i ..."
Abstract
 Add to MetaCart
Existing approaches to analyzing the asymptotics of graph Laplacians typically assume a wellbehaved kernel function with smoothness assumptions. We remove the smoothness assumption and generalize the analysis of graph Laplacians to include previously unstudied graphs including kNN graphs. We also introduce a kernelfree framework to analyze graph constructions with shrinking neighborhoods in general and apply it to analyze locally linear embedding (LLE). We also describe how, for a given limit operator, desirable properties such as a convergent spectrum and sparseness can be achieved by choosing the appropriate graph construction. 1.
CONSISTENCY OF CHEEGER AND RATIO GRAPH CUTS
"... ABSTRACT. This paper establishes the consistency of a family of graphcutbased algorithms for clustering of data clouds. We consider point clouds obtained as samples of a groundtruth measure. We investigate approaches to clustering based on minimizing objective functionals defined on proximity g ..."
Abstract
 Add to MetaCart
(Show Context)
ABSTRACT. This paper establishes the consistency of a family of graphcutbased algorithms for clustering of data clouds. We consider point clouds obtained as samples of a groundtruth measure. We investigate approaches to clustering based on minimizing objective functionals defined on proximity graphs of the given sample. Our focus is on functionals based on graph cuts like the Cheeger and ratio cuts. We show that minimizers of the these cuts converge as the sample size increases to a minimizer of a corresponding continuum cut (which partitions the ground truth measure). Moreover, we obtain sharp conditions on how the connectivity radius can be scaled with respect to the number of sample points for the consistency to hold. We provide results for twoway and for multiway cuts. Furthermore we provide numerical experiments that illustrate the results and explore the optimality of scaling in dimension two. 1.
Directed Graph Embedding: an Algorithm based on Continuous Limits of Laplaciantype Operators
"... This paper considers the problem of embedding directed graphs in Euclidean space while retaining directional information. We model the observed graph as a sample from a manifold endowed with a vector field, and we design an algorithm that separates and recovers the features of this process: the geom ..."
Abstract
 Add to MetaCart
(Show Context)
This paper considers the problem of embedding directed graphs in Euclidean space while retaining directional information. We model the observed graph as a sample from a manifold endowed with a vector field, and we design an algorithm that separates and recovers the features of this process: the geometry of the manifold, the data density and the vector field. The algorithm is motivated by our analysis of Laplaciantype operators and their continuous limit as generators of diffusions on a manifold. We illustrate the recovery algorithm on both artificially constructed and real data. 1
Divergence Based Graph Estimation for Manifold Learning
"... Abstract—Manifold learning algorithms rely on a neighbourhood graph to provide an estimate of the data’s local topology. Unfortunately, current methods for estimating local topology assume local Euclidean geometry and locally uniform data density, which often leads to poor embeddings of the data. We ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—Manifold learning algorithms rely on a neighbourhood graph to provide an estimate of the data’s local topology. Unfortunately, current methods for estimating local topology assume local Euclidean geometry and locally uniform data density, which often leads to poor embeddings of the data. We address these shortcomings by proposing a framework that combines local learning with parametric density estimation for local topology estimation. Given a data set D ⊂ X, we first estimate a new metric space (X, dX) that characterizes the varying sample density of X in X, and then use (X, dX) as a new (pilot) input space for manifold learning. The proposed framework results in significantly improved embeddings, which we demonstrated objectively by assessing clustering accuracy. Index Terms—Manifold learning, divergence measures, neighbourhood graphs, graph topology estimation, divergence based graphs. I.
Nonlinear Dimensionality Reduction: Riemannian Metric Estimation and the Problem of Geometric Recovery
"... In recent years, manifold learning has become increasingly popular as a tool for performing nonlinear dimensionality reduction. This has led to the development of numerous algorithms of varying degrees of complexity that aim to recover manifold geometry using either local or global features of the ..."
Abstract
 Add to MetaCart
In recent years, manifold learning has become increasingly popular as a tool for performing nonlinear dimensionality reduction. This has led to the development of numerous algorithms of varying degrees of complexity that aim to recover manifold geometry using either local or global features of the data. Building on the Laplacian Eigenmap and Diffusionmaps framework, we propose a new paradigm that offers a guarantee, under reasonable assumptions, that any manifold learning algorithm will preserve the geometry of a data set. Our approach is based on augmenting the output of embedding algorithms with geometric information embodied in the Riemannian metric of the manifold. We provide an algorithm for estimating the Riemannian metric from data and demonstrate possible applications of our approach in a variety of examples. 1 ar