Results 1  10
of
13
Mixed membership stochastic block models for relational data with application to proteinprotein interactions
 In Proceedings of the International Biometrics Society Annual Meeting
, 2006
"... We develop a model for examining data that consists of pairwise measurements, for example, presence or absence of links between pairs of objects. Examples include protein interactions and gene regulatory networks, collections of authorrecipient email, and social networks. Analyzing such data with p ..."
Abstract

Cited by 378 (52 self)
 Add to MetaCart
We develop a model for examining data that consists of pairwise measurements, for example, presence or absence of links between pairs of objects. Examples include protein interactions and gene regulatory networks, collections of authorrecipient email, and social networks. Analyzing such data with probabilistic models requires special assumptions, since the usual independence or exchangeability assumptions no longer hold. We introduce a class of latent variable models for pairwise measurements: mixed membership stochastic blockmodels. Models in this class combine a global model of dense patches of connectivity (blockmodel) and a local model to instantiate nodespecific variability in the connections (mixed membership). We develop a general variational inference algorithm for fast approximate posterior inference. We demonstrate the advantages of mixed membership stochastic blockmodels with applications to social networks and protein interaction networks.
A Latent Space Model for Rank Data.
"... Proportional representation by means of a single transferable vote (PRSTV) is the electoral system employed in Irish elections. In this system, voters rank some or all of the candidates in order of preference. A latent space model is proposed for these election data where both candidates and voters ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
(Show Context)
Proportional representation by means of a single transferable vote (PRSTV) is the electoral system employed in Irish elections. In this system, voters rank some or all of the candidates in order of preference. A latent space model is proposed for these election data where both candidates and voters are located in the same Ddimensional space. The locations are determined by the ranked preferences which are modeled using the PlackettLuce model for rank data. Voter positions reflect their preferences while the candidate locations represent the global view of the candidates by the electorate. 1.
Topic taxonomy adaptation for group profiling
 ACM Trans. Knowl. Discov. Data
, 2008
"... A topic taxonomy is an effective representation that describes salient features of virtual groups or online communities. A topic taxonomy consists of topic nodes. Each internal node is defined by its vertical path (i.e., ancestor and child nodes) and its horizonal list of attributes (or terms). In a ..."
Abstract

Cited by 13 (9 self)
 Add to MetaCart
(Show Context)
A topic taxonomy is an effective representation that describes salient features of virtual groups or online communities. A topic taxonomy consists of topic nodes. Each internal node is defined by its vertical path (i.e., ancestor and child nodes) and its horizonal list of attributes (or terms). In a textdominant environment, a topic taxonomy can be used to flexibly describe a group’s interests with varying granularity. However, the stagnant nature of a taxonomy may fail to timely capture the dynamic change of group’s interest. This paper addresses the problem of how to adapt a topic taxonomy to the accumulated data that reflect the change of group’s interest to achieve dynamic group profiling. We first discuss the issues related to topic taxonomy. We next formulate taxonomy adaptation as an optimization problem to find the taxonomy that best fits the data. We then present a viable algorithm that can efficiently accomplish taxonomy adaptation. We conduct extensive experiments to evaluate our approach’s efficacy for group profiling, compare the approach with some alternatives, and study its performance for dynamic group profiling. While pointing out various applications of taxonomy adaption, we suggest some future work that can take advantage of burgeoning Web 2.0 services for online targeted marketing, counterterrorism in connecting dots, and community tracking.
Bayesian Models and Methods in Public Policy and Government Settings ∗
"... Abstract. Starting with the neoBayesian revival of the 1950s, many statisticians argued that it was inappropriate to use Bayesian methods, and in particular subjective Bayesian methods in governmental and public policy settings because of their reliance upon prior distributions. But the Bayesian fr ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Abstract. Starting with the neoBayesian revival of the 1950s, many statisticians argued that it was inappropriate to use Bayesian methods, and in particular subjective Bayesian methods in governmental and public policy settings because of their reliance upon prior distributions. But the Bayesian framework often provides the primary way to respond to questions raised in these settings and the numbers and diversity of Bayesian applications has grown dramatically in recent years. Through a series of examples, both historical and recent, we argue that Bayesian approaches with formal and informal assessments of priors AND likelihood functions are wellaccepted and should become the norm in public settings. Our examples include censustaking and small area estimation, U.S. election night forecasting, studies reported to the U.S. Food and Drug Administration, assessing global climate change, and measuring potential declines in disability among the elderly.
Longitudinal mixed membership models with . . .
, 2010
"... When analyzing longitudinal data we need to balance our understanding of individual variability with the production of meaningful and interpretable summaries of overall population tendencies. This is especially true when those in the target population are known to be heterogeneous in their progress ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
When analyzing longitudinal data we need to balance our understanding of individual variability with the production of meaningful and interpretable summaries of overall population tendencies. This is especially true when those in the target population are known to be heterogeneous in their progression over time due to unobserved individual traits. Additional complications arise when the data are discrete and multivariate. I propose a new family of models to analyze such data by combining features from a version of the crosssectional Grade of Membership Model (Woodbury et al., 1978; Erosheva et al., 2007) and from the longitudinal Multivariate Latent Trajectory Model (Connor, 2006). This new family of models works by considering individuals to be combinations of a small number of “ideal ” or “extreme ” classes. By describing the ways each of these extreme classes evolves over time we are able to describe distinct general tendencies. At the same time, by considering individuals to be individuallevel mixtures of these profiles, we are able to handle complex forms of heterogeneity. I apply my method to data from the National Long Term Care Survey (NLTCS), a
Combining stochastic block models and mixed membership for statistical network analysis. Statistical Network Analysis
 Models, Issues and New Directions. Lecture Notes in Comput. Sci. 4503 57–74
, 2007
"... We consider the statistical analysis of a collection of unipartite graphs, i.e., multiple matrices of relations among objects of a single type. Such data arise, for example, in biological settings, collections of authorrecipient email, and social networks. In many applications, clustering the objec ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
(Show Context)
We consider the statistical analysis of a collection of unipartite graphs, i.e., multiple matrices of relations among objects of a single type. Such data arise, for example, in biological settings, collections of authorrecipient email, and social networks. In many applications, clustering the objects of study or situating them in a low dimensional space (e.g., a simplex) is only one of the goals of the analysis. Begin able to estimate relational structures among the clusters themselves is often times as important. For example, in biological applications we are interested in estimating how stable protein complexes (i.e., clusters of proteins) interact. To support such integrated data analyses, we develop the family of “stochastic block models of mixed membership”. Our models combine features of mixedmembership models (Erosheva & Fienberg, 2005) and block models for relational data (Holland et al., 1983) in a hierarchical Bayesian framework. We develop a novel “nested ” variational inference scheme, which is necessary to successfully perform fast approximate posterior inference in our models of relational data. We present evidence to support our claims, using both synthetic data and biological case study. 1.
Estimating identification disclosure risk using mixed membership models
 Journal of the American Statistical Association
, 2012
"... Statistical agencies and other organizations that disseminate data are obligated to protect data subjects ’ confidentiality. For example, illintentioned individuals might link data subjects to records in other databases by matching on common characteristics (keys). Successful links are particularl ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Statistical agencies and other organizations that disseminate data are obligated to protect data subjects ’ confidentiality. For example, illintentioned individuals might link data subjects to records in other databases by matching on common characteristics (keys). Successful links are particularly problematic for data subjects with combinations of keys that are unique in the population. Hence, as part of their assessments of disclosure risks, many data stewards estimate the probabilities that sample uniques on sets of discrete keys are also population uniques on those keys. This is typically done using loglinear modeling on the keys. However, loglinear models can yield biased estimates of cell probabilities for sparse contingency tables with many zero counts, which often occurs in databases with many keys. This bias can result in unreliable estimates of probabilities of uniqueness and, hence, misrepresentations of disclosure risks. We propose an alternative to loglinear models for datasets with sparse keys based on a Bayesian version of grade of membership (GoM) models. We present a Bayesian GoM model for multinomial variables and offer an MCMC algorithm for fitting the model. We evaluate the approach by treating data from a recent US Census Bureau public
Supplement to “Longitudinal Mixed Membership trajectory models for disability survey data.” DOI:10.1214/14AOAS769SUPP
, 2014
"... We develop methods for analyzing discrete multivariate longitudinal data and apply them to functional disability data on the U.S. elderly population from the National Long Term Care Survey (NLTCS), 1982–2004. Our models build on a Mixed Membership framework, in which individuals are allowed multip ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We develop methods for analyzing discrete multivariate longitudinal data and apply them to functional disability data on the U.S. elderly population from the National Long Term Care Survey (NLTCS), 1982–2004. Our models build on a Mixed Membership framework, in which individuals are allowed multiple membership on a set of extreme profiles characterized by timedependent trajectories of progression into disability. We also develop an extension that allows us to incorporate birthcohort effects, in order to assess intergenerational changes. Applying these methods, we find that most individuals follow trajectories that imply a late onset of disability, and that younger cohorts tend to develop disabilities at a later stage in life compared to their elders.