Results 1 - 10
of
199
Collective classification in network data
, 2008
"... Numerous real-world applications produce networked data such as web data (hypertext documents connected via hyperlinks) and communication networks (people connected via communication links). A recent focus in machine learning research has been to extend traditional machine learning classification te ..."
Abstract
-
Cited by 178 (32 self)
- Add to MetaCart
(Show Context)
Numerous real-world applications produce networked data such as web data (hypertext documents connected via hyperlinks) and communication networks (people connected via communication links). A recent focus in machine learning research has been to extend traditional machine learning classification techniques to classify nodes in such data. In this report, we attempt to provide a brief introduction to this area of research and how it has progressed during the past decade. We introduce four of the most widely used inference algorithms for classifying networked data and empirically compare them on both synthetic and real-world data.
Network-based marketing: Identifying likely adopters via consumer networks
- Statistical Science
"... Abstract. Network-based marketing refers to a collection of marketing techniques that take advantage of links between consumers to increase sales. We concentrate on the consumer networks formed using direct interactions (e.g., communications) between consumers. We survey the diverse literature on su ..."
Abstract
-
Cited by 114 (12 self)
- Add to MetaCart
Abstract. Network-based marketing refers to a collection of marketing techniques that take advantage of links between consumers to increase sales. We concentrate on the consumer networks formed using direct interactions (e.g., communications) between consumers. We survey the diverse literature on such marketing with an emphasis on the statistical methods used and the data to which these methods have been applied. We also provide a discussion of challenges and opportunities for this burgeoning research topic. Our survey highlights a gap in the literature. Because of inadequate data, prior studies have not been able to provide direct, statistical support for the hypothesis that network linkage can directly affect product/service adoption. Using a new data set that represents the adoption of a new telecommunications service, we show very strong support for the hypothesis. Specifically, we show three main results: (1) “Network neighbors”—those consumers linked to a prior customer—adopt the service at a rate 3–5 times greater than baseline groups selected by the best practices of the firm’s marketing team. In addition, analyzing the network allows the firm to acquire new customers who otherwise would have fallen through the cracks, because they would not have been identified based on traditional attributes. (2) Statistical models, built with a very large amount of geographic, demographic and prior purchase data, are significantly and substantially improved by including network information. (3) More detailed network information allows the ranking of the network neighbors so as to permit the selection of small sets of individuals with very high probabilities of adoption. Key words and phrases: Viral marketing, word of mouth, targeted marketing, network analysis, classification, statistical relational learning. 1.
Relational dependency networks
- Journal of Machine Learning Research
, 2007
"... Recent work on graphical models for relational data has demonstrated significant improvements in classification and inference when models represent the dependencies among instances. Despite its use in conventional statistical models, the assumption of instance independence is contradicted by most re ..."
Abstract
-
Cited by 113 (24 self)
- Add to MetaCart
Recent work on graphical models for relational data has demonstrated significant improvements in classification and inference when models represent the dependencies among instances. Despite its use in conventional statistical models, the assumption of instance independence is contradicted by most relational datasets. For example, in citation data there are dependencies among the topics of a paper’s references, and in genomic data there are dependencies among the functions of interacting proteins. In this paper, we present relational dependency networks (RDNs), graphical models that are capable of expressing and reasoning with such dependencies in a relational setting. We discuss RDNs in the context of relational Bayes networks and relational Markov networks and outline the relative strengths of RDNs—namely, the ability to represent cyclic dependencies, simple methods for parameter estimation, and efficient structure learning techniques. The strengths of RDNs are due to the use of pseudolikelihood learning techniques, which estimate an efficient approximation of the full joint distribution. We present learned RDNs for a number of real-world datasets and evaluate the models in a prediction context, showing that RDNs identify and exploit cyclic relational dependencies to achieve significant performance gains over conventional conditional models. In addition, we use synthetic data to explore model performance under various relational data characteristics, showing that RDN learning and inference techniques are accurate over a wide range of conditions.
To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles
- In WWW
, 2009
"... In order to address privacy concerns, many social media websites allow users to hide their personal profiles from the public. In this work, we show how an adversary can exploit an online social network with a mixture of public and private user profiles to predict the private attributes of users. We ..."
Abstract
-
Cited by 109 (3 self)
- Add to MetaCart
(Show Context)
In order to address privacy concerns, many social media websites allow users to hide their personal profiles from the public. In this work, we show how an adversary can exploit an online social network with a mixture of public and private user profiles to predict the private attributes of users. We map this problem to a relational classification problem and we propose practical models that use friendship and group membership information (which is often not hidden) to infer sensitive attributes. The key novel idea is that in addition to friendship links, groups can be carriers of significant information. We show that on several well-known social media sites, we can easily and accurately recover the information of private-profile users. To the best of our knowledge, this is the first work that uses link-based and group-based classification to study privacy implications in social networks with mixed public and private user profiles.
Relational learning via latent social dimensions, in 'KDD '09
- Proceedings di of the 15th ACM SIGKDD international ti conference on Knowledge
, 2009
"... Social media such as blogs, Facebook, Flickr, etc., presents data in a network format rather than classical IID distribution. To address the interdependency among data instances, relational learning has been proposed, and collective inference based on network connectivity is adopted for prediction. ..."
Abstract
-
Cited by 86 (28 self)
- Add to MetaCart
Social media such as blogs, Facebook, Flickr, etc., presents data in a network format rather than classical IID distribution. To address the interdependency among data instances, relational learning has been proposed, and collective inference based on network connectivity is adopted for prediction. However, the connections in social media are often multi-dimensional. An actor can connect to another actor due to different factors, e.g., alumni, colleagues, living in the same city or sharing similar interest, etc. Collective inference normally does not differentiate these connections. In this work, we propose to extract latent social dimensions based on network information first, and then utilize them as features for discriminative learning. These social dimensions describe different affiliations of social actors hidden in the network, and the subsequent discriminative learning can automatically determine which affiliations are better aligned with the class labels. Such a scheme is preferred when multiple diverse relations are associated with the same network. We conduct extensive experiments on social media data (one from a real-world blog site and the other from a popular content sharing site). Our model outperforms representative relational learning methods based on collective inference, especially when few labeled data are available. The sensitivity of this model and its connection to existing methods are also carefully examined.
Leveraging relational autocorrelation with latent group models
- In MRDM '05: Proceedings of the 4th international workshop on Multi-relational mining. ACM
"... Abstract. The presence of autocorrelation provides strong motivation for using relational techniques for learning and inference. Autocorrelation is a statistical dependency between the values of the same variable on related entities and is a nearly ubiquitous characteristic of relational data sets. ..."
Abstract
-
Cited by 80 (23 self)
- Add to MetaCart
(Show Context)
Abstract. The presence of autocorrelation provides strong motivation for using relational techniques for learning and inference. Autocorrelation is a statistical dependency between the values of the same variable on related entities and is a nearly ubiquitous characteristic of relational data sets. Recent research has explored the use of collective inference techniques to exploit this phenomenon. These techniques achieve significant performance gains by modeling observed correlations among class labels of related instances, but the models fail to capture a frequent cause of autocorrelation—the presence of underlying groups that influence the attributes on a set of entities. We propose a latent group model (LGM) for relational data, which discovers and exploits the hidden structures responsible for the observed autocorrelation among class labels. Modeling the latent group structure improves model performance, increases inference efficiency, and enhances our understanding of the datasets. We evaluate performance on three relational classification tasks and show that LGM outperforms models that ignore latent group structure when there is little known information with which to seed inference.
Social Ties and their Relevance to Churn in Mobile Telecom Networks
, 2008
"... Social Network Analysis has emerged as a key paradigm in modern sociology, technology, and information sciences. The paradigm stems from the view that the attributes of an individual in a network are less important than their ties (relationships) with other individuals in the network. Exploring the ..."
Abstract
-
Cited by 52 (1 self)
- Add to MetaCart
(Show Context)
Social Network Analysis has emerged as a key paradigm in modern sociology, technology, and information sciences. The paradigm stems from the view that the attributes of an individual in a network are less important than their ties (relationships) with other individuals in the network. Exploring the nature and strength of these ties can help understand the structure and dynamics of social networks and explain real-world phenomena, ranging from organizational efficiency to the spread of information and disease. In this paper, we examine the communication patterns of millions of mobile phone users, allowing us to study the underlying social network in a large-scale communication network. Our primary goal is to address the role of social ties in the formation and growth of groups, or communities, in a mobile network. In particular, we study the evolution of churners in an operator’s network spanning over a period of four months. Our analysis explores the propensity of a subscriber to churn out of a service provider’s network depending on the number of ties (friends) that have already churned. Based on our findings, we propose a spreading activation-based technique that predicts potential churners by examining the current set of churners and their underlying social network. The efficiency of the prediction is expressed as a lift curve, which indicates the fraction of all churners that can be caught when a certain fraction of subscribers were contacted.
Power Iteration Clustering
"... We show that the power iteration, typically used to approximate the dominant eigenvector of a matrix, can be applied to a normalized affinity matrix to create a one-dimensional embedding of the underlying data. This embedding is then used, as in spectral clustering, to cluster the data via k-means. ..."
Abstract
-
Cited by 38 (5 self)
- Add to MetaCart
(Show Context)
We show that the power iteration, typically used to approximate the dominant eigenvector of a matrix, can be applied to a normalized affinity matrix to create a one-dimensional embedding of the underlying data. This embedding is then used, as in spectral clustering, to cluster the data via k-means. We demonstrate this method’s effectiveness and scalability on several synthetic and real datasets, and conclude that to find a meaningful low-dimensional embedding for clustering, it is not necessary to find any eigenvectors—we just need a linear combination of the top eigenvectors. 1
Effective label acquisition for collective classification
- in International Conference on Knowledge Discovery and Data mining
, 2008
"... Information diffusion, viral marketing, and collective classification all attempt to model and exploit the relationships in a network to make inferences about the labels of nodes. A variety of techniques have been introduced and methods that combine attribute information and neighboring label inform ..."
Abstract
-
Cited by 38 (6 self)
- Add to MetaCart
(Show Context)
Information diffusion, viral marketing, and collective classification all attempt to model and exploit the relationships in a network to make inferences about the labels of nodes. A variety of techniques have been introduced and methods that combine attribute information and neighboring label information have been shown to be effective for collective labeling of the nodes in a network. However, in part because of the correlation between node labels that the techniques exploit, it is easy to find cases in which, once a misclassification is made, incorrect information propagates throughout the network. This problem can be mitigated if the system is allowed to judiciously acquire the labels for a small number of nodes. Unfortunately, under relatively general assumptions, determining the optimal set of labels to acquire is intractable. Here we propose an acquisition method that learns the cases when a given collective classification algorithm makes mistakes, and suggests acquisitions to correct those mistakes. We empirically show on both real and synthetic datasets that this method significantly outperforms a greedy approximate inference approach, a viral marketing approach, and approaches based on network structural measures such as node degree and network clustering. In addition to significantly improving accuracy with just a small amount of labeled data, our method is tractable on large networks.