Results 1  10
of
105
Asymptotic normality of the maximumlikelihood estimator for general hidden Markov models
 Ann. Statist
, 1998
"... ar ..."
Clustering partially observed graphs via convex optimization.
 Journal of Machine Learning Research,
, 2014
"... Abstract This paper considers the problem of clustering a partially observed unweighted graphi.e., one where for some node pairs we know there is an edge between them, for some others we know there is no edge, and for the remaining we do not know whether or not there is an edge. We want to organiz ..."
Abstract

Cited by 47 (13 self)
 Add to MetaCart
(Show Context)
Abstract This paper considers the problem of clustering a partially observed unweighted graphi.e., one where for some node pairs we know there is an edge between them, for some others we know there is no edge, and for the remaining we do not know whether or not there is an edge. We want to organize the nodes into disjoint clusters so that there is relatively dense (observed) connectivity within clusters, and sparse across clusters. We take a novel yet natural approach to this problem, by focusing on finding the clustering that minimizes the number of "disagreements"i.e., the sum of the number of (observed) missing edges within clusters, and (observed) present edges across clusters. Our algorithm uses convex optimization; its basis is a reduction of disagreement minimization to the problem of recovering an (unknown) lowrank matrix and an (unknown) sparse matrix from their partially observed sum. We evaluate the performance of our algorithm on the classical Planted Partition/Stochastic Block Model. Our main theorem provides sufficient conditions for the success of our algorithm as a function of the minimum cluster size, edge density and observation probability; in particular, the results characterize the tradeoff between the observation probability and the edge density gap. When there are a constant number of clusters of equal size, our results are optimal up to logarithmic factors.
Stochastic blockmodels with a growing number of classes
, 2012
"... We present asymptotic and finitesample results on the use of stochastic blockmodels for the analysis of network data. We show that the fraction of misclassified network nodes converges in probability to zero under maximum likelihood fitting when the number of classes is allowed to grow as the root ..."
Abstract

Cited by 47 (11 self)
 Add to MetaCart
We present asymptotic and finitesample results on the use of stochastic blockmodels for the analysis of network data. We show that the fraction of misclassified network nodes converges in probability to zero under maximum likelihood fitting when the number of classes is allowed to grow as the root of the network size and the average network degree grows at least polylogarithmically in this size. We also establish finitesample confidence bounds on maximumlikelihood blockmodel parameter estimates from data comprising independent Bernoulli random variates; these results hold uniformly over class assignment. We provide simulations verifying the conditions sufficient for our results, and conclude by fitting a logit parameterization of a stochastic blockmodel with covariates to a network data example comprising selfreported school friendships, resulting in block estimates that reveal residual structure.
Spectral clustering of graphs with general degrees in the extended planted partition model
 Journal of Machine Learning Research  Proceedings Track
"... Abstract In this paper, we examine a spectral clustering algorithm for similarity graphs drawn from a simple random graph model, where nodes are allowed to have varying degrees, and we provide theoretical bounds on its performance. The random graph model we study is the Extended Planted Partition ( ..."
Abstract

Cited by 42 (0 self)
 Add to MetaCart
(Show Context)
Abstract In this paper, we examine a spectral clustering algorithm for similarity graphs drawn from a simple random graph model, where nodes are allowed to have varying degrees, and we provide theoretical bounds on its performance. The random graph model we study is the Extended Planted Partition (EPP) model, a variant of the classical planted partition model. The standard approach to spectral clustering of graphs is to compute the bottom k singular vectors or eigenvectors of a suitable graph Laplacian, project the nodes of the graph onto these vectors, and then use an iterative clustering algorithm on the projected nodes. However a challenge with applying this approach to graphs generated from the EPP model is that unnormalized Laplacians do not work, and normalized Laplacians do not concentrate well when the graph has a number of low degree nodes. We resolve this issue by introducing the notion of a degreecorrected graph Laplacian. For graphs with many low degree nodes, degree correction has a regularizing effect on the Laplacian. Our spectral clustering algorithm projects the nodes in the graph onto the bottom k right singular vectors of the degreecorrected randomwalk Laplacian, and clusters the nodes in this subspace. We show guarantees on the performance of this algorithm, demonstrating that it outputs the correct partition under a wide range of parameter values. Unlike some previous work, our algorithm does not require access to any generative parameters of the model.
Matrix estimation by universal singular value thresholding
, 2012
"... Abstract. Consider the problem of estimating the entries of a large matrix, when the observed entries are noisy versions of a small random fraction of the original entries. This problem has received widespread attention in recent times, especially after the pioneering works of Emmanuel Candès and ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Consider the problem of estimating the entries of a large matrix, when the observed entries are noisy versions of a small random fraction of the original entries. This problem has received widespread attention in recent times, especially after the pioneering works of Emmanuel Candès and collaborators. This paper introduces a simple estimation procedure, called Universal Singular Value Thresholding (USVT), that works for any matrix that has ‘a little bit of structure’. Surprisingly, this simple estimator achieves the minimax error rate up to a constant factor. The method is applied to solve problems related to low rank matrix estimation, blockmodels, distance matrix completion, latent space models, positive definite matrix completion, graphon estimation, and generalized Bradley–Terry models for pairwise comparison. 1.
The method of moments and degree distributions for network models
 Ann. Statist
, 2011
"... ar ..."
Clustering Sparse Graphs
, 2012
"... We develop a new algorithm to cluster sparse unweighted graphs – i.e. partition the nodes into disjoint clusters so that there is higher density within clusters, and low across clusters. By sparsity we mean the setting where both the incluster and across cluster edge densities are very small, possi ..."
Abstract

Cited by 25 (6 self)
 Add to MetaCart
(Show Context)
We develop a new algorithm to cluster sparse unweighted graphs – i.e. partition the nodes into disjoint clusters so that there is higher density within clusters, and low across clusters. By sparsity we mean the setting where both the incluster and across cluster edge densities are very small, possibly vanishing in the size of the graph. Sparsity makes the problem noisier, and hence more difficult to solve. Any clustering involves a tradeoff between minimizing two kinds of errors: missing edges within clusters and present edges across clusters. Our insight is that in the sparse case, these must be penalized differently. We analyze our algorithm’s performance on the natural, classical and widely studied “planted partition ” model (also called the stochastic block model); we show that our algorithm can cluster sparser graphs, and with smaller clusters, than all previous methods. This is seen empirically as well. 1
Consistent adjacencyspectral partitioning for the stochastic block model when the model parameters are unknown
, 2014
"... ..."
PSEUDOLIKELIHOOD METHODS FOR COMMUNITY DETECTION IN LARGE SPARSE NETWORKS
 SUBMITTED TO THE ANNALS OF STATISTICS
, 2013
"... Many algorithms have been proposed for fitting network models with communities but most of them do not scale well to large networks, and often fail on sparse networks. Here we propose a new fast pseudolikelihood method for fitting the stochastic block model for networks, as well as a variant that a ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
(Show Context)
Many algorithms have been proposed for fitting network models with communities but most of them do not scale well to large networks, and often fail on sparse networks. Here we propose a new fast pseudolikelihood method for fitting the stochastic block model for networks, as well as a variant that allows for an arbitrary degree distribution by conditioning on degrees. We show that the algorithms perform well under a range of settings, including on very sparse networks, and illustrate on the example of a network of political blogs. We also propose spectral clustering with perturbations, a method of independent interest, which works well on sparse networks where regular spectral clustering fails, and use it to provide an initial value for pseudolikelihood. We prove that pseudolikelihood provides consistent estimates of the communities under a mild condition on the starting value, for the case of a block model with two balanced communities.
Review of statistical network analysis: models, algorithms, and software
 STATISTICAL ANALYSIS AND DATA MINING
, 2012
"... ..."
(Show Context)