Results 1  10
of
77
SPECTRAL CLUSTERING AND THE HIGHDIMENSIONAL STOCHASTIC BLOCKMODEL
 SUBMITTED TO THE ANNALS OF STATISTICS
"... Networks or graphs can easily represent a diverse set of data sources that are characterized by interacting units or actors. Social networks, representing people who communicate with each other, are one example. Communities or clusters of highly connected actors form an essential feature in the stru ..."
Abstract

Cited by 98 (7 self)
 Add to MetaCart
Networks or graphs can easily represent a diverse set of data sources that are characterized by interacting units or actors. Social networks, representing people who communicate with each other, are one example. Communities or clusters of highly connected actors form an essential feature in the structure of several empirical networks. Spectral clustering is a popular and computationally feasible method to discover these communities. The Stochastic Blockmodel (Holland, Laskey and Leinhardt, 1983) is a social network model with well defined communities; each node is a member of one community. For a network generated from the Stochastic Blockmodel, we bound the number of nodes “misclustered” by spectral clustering. The asymptotic results in this paper are the first clustering results that allow the number of clusters in the model to grow with the number of nodes, hence the name highdimensional. In order to study spectral clustering under the Stochastic Blockmodel, we first show that under the more general latent space model, the eigenvectors of the normalized graph Laplacian asymptotically converge to the eigenvectors of a “population” normalized graph Laplacian. Aside from the implication for spectral clustering, this provides insight into a graph visualization technique. Our method of studying the eigenvectors of random matrices is original.
Stochastic blockmodels with a growing number of classes
, 2012
"... We present asymptotic and finitesample results on the use of stochastic blockmodels for the analysis of network data. We show that the fraction of misclassified network nodes converges in probability to zero under maximum likelihood fitting when the number of classes is allowed to grow as the root ..."
Abstract

Cited by 44 (11 self)
 Add to MetaCart
We present asymptotic and finitesample results on the use of stochastic blockmodels for the analysis of network data. We show that the fraction of misclassified network nodes converges in probability to zero under maximum likelihood fitting when the number of classes is allowed to grow as the root of the network size and the average network degree grows at least polylogarithmically in this size. We also establish finitesample confidence bounds on maximumlikelihood blockmodel parameter estimates from data comprising independent Bernoulli random variates; these results hold uniformly over class assignment. We provide simulations verifying the conditions sufficient for our results, and conclude by fitting a logit parameterization of a stochastic blockmodel with covariates to a network data example comprising selfreported school friendships, resulting in block estimates that reveal residual structure.
Coevolution of social and affiliation networks
 In 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD
, 2009
"... In our work, we address the problem of modeling social network generation which explains both link and group formation. Recent studies on social network evolution propose generative models which capture the statistical properties of realworld networks related only to nodetonode link formation. We ..."
Abstract

Cited by 38 (2 self)
 Add to MetaCart
(Show Context)
In our work, we address the problem of modeling social network generation which explains both link and group formation. Recent studies on social network evolution propose generative models which capture the statistical properties of realworld networks related only to nodetonode link formation. We propose a novel model which captures the coevolution of social and affiliation networks. We provide surprising insights into group formation based on observations in several realworld networks, showing that users often join groups for reasons other than their friends. Our experiments show that the model is able to capture both the newly observed and previously studied network properties. This work is the first to propose a generative model which captures the statistical properties of these complex networks. The proposed model facilitates controlled experiments which study the effect of actors ’ behavior on the network evolution, and it allows the generation of realistic synthetic datasets.
Multilayer networks
 TOOL FOR MULTILAYER ANALYSIS AND VISUALIZATION OF NETWORKS 17 OF 18
, 2014
"... In most natural and engineered systems, a set of entities interact with each other in complicated patterns that can encompass multiple types of relationships, change in time, and include other types of complications. Such systems include multiple subsystems and layers of connectivity, and it is impo ..."
Abstract

Cited by 30 (7 self)
 Add to MetaCart
(Show Context)
In most natural and engineered systems, a set of entities interact with each other in complicated patterns that can encompass multiple types of relationships, change in time, and include other types of complications. Such systems include multiple subsystems and layers of connectivity, and it is important to take such “multilayer” features into account to try to improve our understanding of complex systems. Consequently, it is necessary to generalize “traditional ” network theory by developing (and validating) a framework and associated tools to study multilayer systems in a comprehensive fashion. The origins of such efforts date back several decades and arose in multiple disciplines, and now the study of multilayer networks has become one of the most important directions in network science. In this paper, we discuss the history of multilayer networks (and related concepts) and review the exploding body of work on such networks. To unify the disparate terminology in the large body of recent work, we discuss a general framework for multilayer networks, construct a dictionary
Scalable Inference of Overlapping Communities
"... We develop a scalable algorithm for posterior inference of overlapping communities in large networks. Our algorithm is based on stochastic variational inference in the mixedmembership stochastic blockmodel (MMSB). It naturally interleaves subsampling the network with estimating its community struct ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
(Show Context)
We develop a scalable algorithm for posterior inference of overlapping communities in large networks. Our algorithm is based on stochastic variational inference in the mixedmembership stochastic blockmodel (MMSB). It naturally interleaves subsampling the network with estimating its community structure. We apply our algorithm on ten large, realworld networks with up to 60,000 nodes. It converges several orders of magnitude faster than the stateoftheart algorithm for MMSB, finds hundreds of communities in large realworld networks, and detects the true communities in 280 benchmark networks with equal or better accuracy compared to other scalable algorithms. 1
Dynamic egocentric models for citation networks
 In Proc. 28th Intl. Conf. on Machine Learning
, 2011
"... The analysis of the formation and evolution of networks over time is of fundamental importance to social science, biology, and many other fields. While longitudinal network data sets are increasingly being recorded at the granularity of individual timestamped events, most studies only focus on coll ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
The analysis of the formation and evolution of networks over time is of fundamental importance to social science, biology, and many other fields. While longitudinal network data sets are increasingly being recorded at the granularity of individual timestamped events, most studies only focus on collapsed crosssectional snapshots of the network. In this paper, we introduce a dynamic egocentric framework that models continuoustime network data using multivariate counting processes. For inference, an efficient partial likelihood approach is used, allowing our methods to scale to large networks. We apply our techniques to various citation networks and demonstrate the predictive power and interpretability of the learned statistical models. 1.
Review of statistical network analysis: models, algorithms, and software
 STATISTICAL ANALYSIS AND DATA MINING
, 2012
"... ..."
(Show Context)
Supplement to “Consistency of community detection in networks under degreecorrected stochastic block models.” DOI:10.1214/12AOS1036SUPP
 Department of Statistics George Mason University 4400 University Drive, MS 4A7
, 2012
"... ar ..."
CONSISTENCY UNDER SAMPLING OF EXPONENTIAL RANDOM GRAPH
"... The growing availability of network data and of scientific interest in distributed systems has led to the rapid development of statistical models of network structure. Typically, however, these are models for the entire network, while the data consists only of a sampled subnetwork. Parameters for t ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
The growing availability of network data and of scientific interest in distributed systems has led to the rapid development of statistical models of network structure. Typically, however, these are models for the entire network, while the data consists only of a sampled subnetwork. Parameters for the whole network, which is what is of interest, are estimated by applying the model to the subnetwork. This assumes that the model is consistent under sampling, or, in terms of the theory of stochastic processes, that it defines a projective family. Focusing on the popular class of exponential random graph models (ERGMs), we show that this apparently trivial condition is in fact violated by many popular and scientifically appealing models, and that satisfying it drastically limits ERGM’s expressive power. These results are actually special cases of more general results about exponential families of dependent random variables, which we also prove. Using such results, we offer easily checked conditions for the consistency of maximum likelihood estimation in ERGMs, and discuss some possible constructive responses.
An introduction to spectral distances in networks
 in Neural Nets WIRN10: Proceedings of the 20th Italian Workshop on Neural Nets
, 2011
"... ar ..."
(Show Context)