Results 1  10
of
318
Mixed membership stochastic block models for relational data with application to proteinprotein interactions
 In Proceedings of the International Biometrics Society Annual Meeting
, 2006
"... We develop a model for examining data that consists of pairwise measurements, for example, presence or absence of links between pairs of objects. Examples include protein interactions and gene regulatory networks, collections of authorrecipient email, and social networks. Analyzing such data with p ..."
Abstract

Cited by 378 (52 self)
 Add to MetaCart
(Show Context)
We develop a model for examining data that consists of pairwise measurements, for example, presence or absence of links between pairs of objects. Examples include protein interactions and gene regulatory networks, collections of authorrecipient email, and social networks. Analyzing such data with probabilistic models requires special assumptions, since the usual independence or exchangeability assumptions no longer hold. We introduce a class of latent variable models for pairwise measurements: mixed membership stochastic blockmodels. Models in this class combine a global model of dense patches of connectivity (blockmodel) and a local model to instantiate nodespecific variability in the connections (mixed membership). We develop a general variational inference algorithm for fast approximate posterior inference. We demonstrate the advantages of mixed membership stochastic blockmodels with applications to social networks and protein interaction networks.
New specifications for exponential random graph models
, 2004
"... The most promising class of statistical models for expressing structural properties of social networks observed at one moment in time, is the class of Exponential Random Graph Models (ERGMs), also known as p ∗ models. The strong point of these models is that they can represent a variety of structura ..."
Abstract

Cited by 168 (27 self)
 Add to MetaCart
The most promising class of statistical models for expressing structural properties of social networks observed at one moment in time, is the class of Exponential Random Graph Models (ERGMs), also known as p ∗ models. The strong point of these models is that they can represent a variety of structural tendencies, such as transitivity, that define complicated dependence patterns not easily modeled by more basic probability models. Recently, MCMC algorithms have been developed which produce approximate Maximum Likelihood estimators. Applying these models in their traditional specification to observed network data often has led to problems, however, which can be traced back to the fact that important parts of the parameter space correspond to nearly degenerate distributions, which may lead to convergence problems of estimation algorithms, and a poor fit to empirical data. This paper proposes new specifications of Exponential Random Graph Models. These specifications represent structural properties such as transitivity and heterogeneity of degrees by more complicated graph statistics than the traditional star and triangle counts. Three kinds of statistic are proposed: geometrically weighted degree distributions, alternating ktriangles, and alternating independent twopaths. Examples are presented both of modeling graphs and digraphs, in which the new specifications lead to much better results than the earlier existing specifications of the ERGM. It is concluded that the new specifications increase the range and applicability of the ERGM as a tool for the statistical analysis of social networks.
Algorithms for estimating relative importance in networks
 In Proceedings of KDD 2003
, 2003
"... Large and complex graphs representing relationships among sets of entities are an increasingly common focus of interest in data analysis—examples include social networks, Web graphs, telecommunication networks, and biological networks. In interactive analysis of such data a natural query is “which e ..."
Abstract

Cited by 138 (4 self)
 Add to MetaCart
(Show Context)
Large and complex graphs representing relationships among sets of entities are an increasingly common focus of interest in data analysis—examples include social networks, Web graphs, telecommunication networks, and biological networks. In interactive analysis of such data a natural query is “which entities are most important in the network relative to a particular individual or set of individuals? ” We investigate the problem of answering such queries in this paper, focusing in particular on defining and computing the importance of nodes in a graph relative to one or more root nodes. We define a general framework and a number of different algorithms, building on ideas from social networks, graph theory, Markov models, and Web graph analysis. We experimentally evaluate the different properties of these algorithms on toy graphs and demonstrate how our approach can be used to study relative importance in realworld networks including a network of interactions among September 11th terrorists, a network of collaborative research in biotechnology among companies and universities, and a network of coauthorship relationships among computer science researchers.
Networkbased marketing: Identifying likely adopters via consumer networks
 Statistical Science
"... Abstract. Networkbased marketing refers to a collection of marketing techniques that take advantage of links between consumers to increase sales. We concentrate on the consumer networks formed using direct interactions (e.g., communications) between consumers. We survey the diverse literature on su ..."
Abstract

Cited by 114 (12 self)
 Add to MetaCart
Abstract. Networkbased marketing refers to a collection of marketing techniques that take advantage of links between consumers to increase sales. We concentrate on the consumer networks formed using direct interactions (e.g., communications) between consumers. We survey the diverse literature on such marketing with an emphasis on the statistical methods used and the data to which these methods have been applied. We also provide a discussion of challenges and opportunities for this burgeoning research topic. Our survey highlights a gap in the literature. Because of inadequate data, prior studies have not been able to provide direct, statistical support for the hypothesis that network linkage can directly affect product/service adoption. Using a new data set that represents the adoption of a new telecommunications service, we show very strong support for the hypothesis. Specifically, we show three main results: (1) “Network neighbors”—those consumers linked to a prior customer—adopt the service at a rate 3–5 times greater than baseline groups selected by the best practices of the firm’s marketing team. In addition, analyzing the network allows the firm to acquire new customers who otherwise would have fallen through the cracks, because they would not have been identified based on traditional attributes. (2) Statistical models, built with a very large amount of geographic, demographic and prior purchase data, are significantly and substantially improved by including network information. (3) More detailed network information allows the ranking of the network neighbors so as to permit the selection of small sets of individuals with very high probabilities of adoption. Key words and phrases: Viral marketing, word of mouth, targeted marketing, network analysis, classification, statistical relational learning. 1.
Relational topic models for document networks
 In Proc. of Conf. on AI and Statistics (AISTATS
"... We develop the relational topic model (RTM), a model of documents and the links between them. For each pair of documents, the RTM models their link as a binary random variable that is conditioned on their contents. The model can be used to summarize a network of documents, predict links between them ..."
Abstract

Cited by 114 (5 self)
 Add to MetaCart
(Show Context)
We develop the relational topic model (RTM), a model of documents and the links between them. For each pair of documents, the RTM models their link as a binary random variable that is conditioned on their contents. The model can be used to summarize a network of documents, predict links between them, and predict words within them. We derive efficient inference and learning algorithms based on variational methods and evaluate the predictive performance of the RTM for large networks of scientific abstracts and web documents. 1
Nonparametric Latent Feature Models for Link Prediction
"... As the availability and importance of relational data—such as the friendships summarized on a social networking website—increases, it becomes increasingly important to have good models for such data. The kinds of latent structure that have been considered for use in predicting links in such networks ..."
Abstract

Cited by 106 (1 self)
 Add to MetaCart
(Show Context)
As the availability and importance of relational data—such as the friendships summarized on a social networking website—increases, it becomes increasingly important to have good models for such data. The kinds of latent structure that have been considered for use in predicting links in such networks have been relatively limited. In particular, the machine learning community has focused on latent class models, adapting Bayesian nonparametric methods to jointly infer how many latent classes there are while learning which entities belong to each class. We pursue a similar approach with a richer kind of latent variable—latent features—using a Bayesian nonparametric approach to simultaneously infer the number of features at the same time we learn which entities have each feature. Our model combines these inferred features with known covariates in order to perform link prediction. We demonstrate that the greater expressiveness of this approach allows us to improve performance on three datasets. 1
Relational learning via latent social dimensions, in 'KDD '09
 Proceedings di of the 15th ACM SIGKDD international ti conference on Knowledge
, 2009
"... Social media such as blogs, Facebook, Flickr, etc., presents data in a network format rather than classical IID distribution. To address the interdependency among data instances, relational learning has been proposed, and collective inference based on network connectivity is adopted for prediction. ..."
Abstract

Cited by 86 (28 self)
 Add to MetaCart
(Show Context)
Social media such as blogs, Facebook, Flickr, etc., presents data in a network format rather than classical IID distribution. To address the interdependency among data instances, relational learning has been proposed, and collective inference based on network connectivity is adopted for prediction. However, the connections in social media are often multidimensional. An actor can connect to another actor due to different factors, e.g., alumni, colleagues, living in the same city or sharing similar interest, etc. Collective inference normally does not differentiate these connections. In this work, we propose to extract latent social dimensions based on network information first, and then utilize them as features for discriminative learning. These social dimensions describe different affiliations of social actors hidden in the network, and the subsequent discriminative learning can automatically determine which affiliations are better aligned with the class labels. Such a scheme is preferred when multiple diverse relations are associated with the same network. We conduct extensive experiments on social media data (one from a realworld blog site and the other from a popular content sharing site). Our model outperforms representative relational learning methods based on collective inference, especially when few labeled data are available. The sensitivity of this model and its connection to existing methods are also carefully examined.
Leveraging relational autocorrelation with latent group models
 In MRDM '05: Proceedings of the 4th international workshop on Multirelational mining. ACM
"... Abstract. The presence of autocorrelation provides strong motivation for using relational techniques for learning and inference. Autocorrelation is a statistical dependency between the values of the same variable on related entities and is a nearly ubiquitous characteristic of relational data sets. ..."
Abstract

Cited by 80 (23 self)
 Add to MetaCart
(Show Context)
Abstract. The presence of autocorrelation provides strong motivation for using relational techniques for learning and inference. Autocorrelation is a statistical dependency between the values of the same variable on related entities and is a nearly ubiquitous characteristic of relational data sets. Recent research has explored the use of collective inference techniques to exploit this phenomenon. These techniques achieve significant performance gains by modeling observed correlations among class labels of related instances, but the models fail to capture a frequent cause of autocorrelation—the presence of underlying groups that influence the attributes on a set of entities. We propose a latent group model (LGM) for relational data, which discovers and exploits the hidden structures responsible for the observed autocorrelation among class labels. Modeling the latent group structure improves model performance, increases inference efficiency, and enhances our understanding of the datasets. We evaluate performance on three relational classification tasks and show that LGM outperforms models that ignore latent group structure when there is little known information with which to seed inference.