Results 1 - 10
of
203
Inferring social ties across heterogeneous networks
- In WSDM’12
, 2012
"... It is well known that different types of social ties have essentially different influence between people. However, users in online social networks rarely categorize their contacts into “family”, “colleagues”, or “classmates”. While a bulk of research has focused on inferring particular types of rela ..."
Abstract
-
Cited by 46 (19 self)
- Add to MetaCart
(Show Context)
It is well known that different types of social ties have essentially different influence between people. However, users in online social networks rarely categorize their contacts into “family”, “colleagues”, or “classmates”. While a bulk of research has focused on inferring particular types of relationships in a specific social network, few publications systematically study the generalization of the problem of inferring social ties over multiple heterogeneous networks. In this work, we develop a framework for classifying the type of social relationships by learning across heterogeneous networks. The framework incorporates social theories into a machine learning model, which effectively improves the accuracy of inferring the type of social relationships in a target network, by borrowing knowledge from a different source network. Our empirical study on five different genres of networks validates the effectiveness of the proposed framework. For example, by leveraging information from a coauthor network with labeled advisor-advisee relationships, the proposed framework is able to obtain an F1-score of 90 % (8-28 % improvements over alternative methods) for inferring manager-subordinate relationships in an enterprise email network.
Co-Author Relationship Prediction in Heterogeneous Bibliographic Networks*
"... Abstract—The problem of predicting links or interactions between objects in a network, is an important task in network analysis. Along this line, link prediction between co-authors in a co-author network is a frequently studied problem. In most of these studies, authors are considered in a homogeneo ..."
Abstract
-
Cited by 37 (10 self)
- Add to MetaCart
(Show Context)
Abstract—The problem of predicting links or interactions between objects in a network, is an important task in network analysis. Along this line, link prediction between co-authors in a co-author network is a frequently studied problem. In most of these studies, authors are considered in a homogeneous network, i.e., only one type of objects (author type) and one type of links (co-authorship) exist in the network. However, in a real bibliographic network, there are multiple types of objects (e.g., venues, topics, papers) and multiple types of links among these objects. In this paper, we study the problem of co-author relationship prediction in the heterogeneous bibliographic network, and a new methodology called PathPredict, i.e., meta path-based relationship prediction model, is proposed to solve this problem. First, meta path-based topological features are systematically extracted from the network. Then, a supervised model is used to learn the best weights associated with different topological features in deciding the co-author relationships. We present experiments on a real bibliographic network, the DBLP network, which show that meta path-based heterogeneous topological features can generate more accurate prediction results as compared to homogeneous topological features. In addition, the level of significance of each topological feature can be learned from the model, which is helpful in understanding the mechanism behind the relationship building. I.
Learning to infer social ties in large networks
- In PKDD
, 2011
"... Abstract. In online social networks, most relationships are lack of meaning labels (e.g., “colleague ” and “intimate friends”), simply because users do not take the time to label them. An interesting question is: can we automatically infer the type of social relationships in a large network? what ar ..."
Abstract
-
Cited by 36 (16 self)
- Add to MetaCart
(Show Context)
Abstract. In online social networks, most relationships are lack of meaning labels (e.g., “colleague ” and “intimate friends”), simply because users do not take the time to label them. An interesting question is: can we automatically infer the type of social relationships in a large network? what are the fundamental factors that imply the type of social relation-ships? In this work, we formalize the problem of social relationship learn-ing into a semi-supervised framework, and propose a Partially-labeled Pairwise Factor Graph Model (PLP-FGM) for learning to infer the type of social ties. We tested the model on three different genres of data sets: Publication, Email and Mobile. Experimental results demonstrate that the proposed PLP-FGM model can accurately infer 92.7 % of advisor-advisee relationships from the coauthor network (Publication), 88.0 % of manager-subordinate relationships from the email network (Email), and 83.1 % of the friendships from the mobile network (Mobile). Finally, we develop a distributed learning algorithm to scale up the model to real large networks. 1
A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search
"... In this paper, we propose a unified topic modeling approach and its integration into the random walk framework for academic search. Specifically, we present a topic model for simultaneously modeling papers, authors, and publication venues. We combine the proposed topic model into the random walk fra ..."
Abstract
-
Cited by 34 (12 self)
- Add to MetaCart
(Show Context)
In this paper, we propose a unified topic modeling approach and its integration into the random walk framework for academic search. Specifically, we present a topic model for simultaneously modeling papers, authors, and publication venues. We combine the proposed topic model into the random walk framework. Experimental results show that our proposed approach for academic search significantly outperforms the baseline methods of using BM25 and language model, and those of using the existing topic models (including pLSI, LDA, and the AT model). 1
Probabilistic Topic Models with Biased Propagation on Heterogeneous Information Networks
"... With the development of Web applications, textual documents are not only getting richer, but also ubiquitously interconnected with users and other objects in various ways, which brings about text-rich heterogeneous information networks. Topic models have been proposed and shown to be useful for docu ..."
Abstract
-
Cited by 25 (8 self)
- Add to MetaCart
(Show Context)
With the development of Web applications, textual documents are not only getting richer, but also ubiquitously interconnected with users and other objects in various ways, which brings about text-rich heterogeneous information networks. Topic models have been proposed and shown to be useful for document analysis, and the interactions among multi-typed objects play a key role at disclosing the rich semantics of the network. However, most of topic models only consider the textual information while ignore the network structures or can merely integrate with homogeneous networks. None of them can handle heterogeneous information network well. In this paper, we propose a novel topic model with biased propagation (TMBP) algorithm to directly incorporate heterogeneous information network with topic modeling in a unified way. The underlying intuition is that multi-typed objects should be treated differently along with their inherent textual information and the rich semantics of the heterogeneous information network. A simple and unbiased topic propagation across such a heterogeneous network does not make much sense. Consequently, we investigate and develop two biased propagation frameworks, the biased random walk framework and the biased regularization framework, for the TMBP algorithm from different perspectives, which can discover latent topics and identify clusters of multi-typed objects simultaneously. We extensively evaluate the proposed approach and compare to the state-of-the-art techniques on several datasets. Experimental results demonstrate that the improvement in our proposed approach is consistent and promising.
Mining Structural Hole Spanners Through Information Diffusion in Social Networks
"... The theory of structural holes [4] suggests that individuals would benefit from filling the “holes ” (called as structural hole spanners) between people or groups that are otherwise disconnected. A few empirical studies have verified that structural hole spanners play a key role in the information d ..."
Abstract
-
Cited by 24 (13 self)
- Add to MetaCart
(Show Context)
The theory of structural holes [4] suggests that individuals would benefit from filling the “holes ” (called as structural hole spanners) between people or groups that are otherwise disconnected. A few empirical studies have verified that structural hole spanners play a key role in the information diffusion. However, there is still lack of a principled methodology to detect structural hole spanners from a given social network. In this work, we precisely define the problem of mining top-k structural hole spanners in large-scale social networks and provide an objective (quality) function to formalize the problem. Two instantiation models have been developed to implement the objective function. For the first model, we present an exact algorithm to solve it and prove its convergence. As for the second model, the optimization is proved to be NP-hard, and we design an efficient algorithm with provable approximation guarantees. We test the proposed models on three different networks: Coauthor, Twitter, and Inventor. Our study provides evidence for the theory of structural holes, e.g., 1 % of Twitter users who span structural holes control 25 % of the information diffusion on Twitter. We compare the proposed models with several alternative methods and the results show that our models clearly outperform the comparison methods. Our experiments also demonstrate that the detected structural hole spanners can help other social network applications, such as community kernel detection and link prediction. To the best of our knowledge, this is the first attempt to address the problem of mining structural hole spanners in large social networks.
Cross-domain collaboration recommendation
- In KDD’12
, 2012
"... Interdisciplinary collaborations have generated huge impact to society. However, it is often hard for researchers to establish such cross-domain collaborations. What are the patterns of cross-domain collaborations? How do those collaborations form? Can we predict this type of collaborations? Cross-d ..."
Abstract
-
Cited by 23 (8 self)
- Add to MetaCart
Interdisciplinary collaborations have generated huge impact to society. However, it is often hard for researchers to establish such cross-domain collaborations. What are the patterns of cross-domain collaborations? How do those collaborations form? Can we predict this type of collaborations? Cross-domain collaborations exhibit very different patterns compared to traditional collaborations in the same domain: 1) sparse connection: cross-domain collaborations are rare; 2) complementary expertise: cross-domain collaborators often have different expertise and interest; 3) topic skewness: cross-domain collaboration topics are focused on a subset of topics. All these patterns violate fundamental assumptions of traditional recommendation systems. In this paper, we analyze the cross-domain collaboration data from research publications and confirm the above patterns. We propose the Cross-domain Topic Learning (CTL) model to address these challenges. For handling sparse connections, CTL consolidates the existing cross-domain collaborations through topic layers instead of at author layers, which alleviates the sparseness issue. For handling complementary expertise, CTL models topic distributions from source and target domains separately, as well as the correlation across domains. For handling topic skewness, CTL only models relevant topics to the cross-domain collaboration. We compare CTL with several baseline approaches on large publication datasets from different domains. CTL outperforms baselines significantly on multiple recommendation metrics. Beyond accurate recommendation performance, CTL is also insensitive to parameter tuning as confirmed in the sensitivity analysis.
Query preserving graph compression
- In SIGMOD
, 2012
"... It is common to find graphs with millions of nodes and bil-lions of edges in, e.g., social networks. Queries on such graphs are often prohibitively expensive. These motivate us to propose query preserving graph compression, to com-press graphs relative to a class Q of queries of users ’ choice. We c ..."
Abstract
-
Cited by 22 (10 self)
- Add to MetaCart
(Show Context)
It is common to find graphs with millions of nodes and bil-lions of edges in, e.g., social networks. Queries on such graphs are often prohibitively expensive. These motivate us to propose query preserving graph compression, to com-press graphs relative to a class Q of queries of users ’ choice. We compute a small Gr from a graph G such that (a) for any query Q ∈ Q, Q(G) = Q′(Gr), where Q ′ ∈ Q can be efficiently computed from Q; and (b) any algorithm for com-puting Q(G) can be directly applied to evaluating Q ′ on Gr as is. That is, while we cannot lower the complexity of evalu-ating graph queries, we reduce data graphs while preserving the answers to all the queries in Q. To verify the effective-ness of this approach, (1) we develop compression strategies for two classes of queries: reachability and graph pattern queries via (bounded) simulation. We show that graphs can be efficiently compressed via a reachability equivalence rela-tion and graph bisimulation, respectively, while preserving query answers. (2) We provide techniques for maintaining compressed graph Gr in response to changes ΔG to the orig-inal graph G. We show that the incremental maintenance problems are unbounded for the two classes of queries, i.e., their costs are not a function of the size of ΔG and changes in Gr. Nevertheless, we develop incremental algorithms that depend only on ΔG and Gr, independent of G, i.e., we do not have to decompress Gr to propagate the changes. (3) Using real-life data, we experimentally verify that our com-pression techniques could reduce graphs in average by 95% for reachability and 57 % for graph pattern matching, and that our incremental maintenance algorithms are efficient. Categories and Subject Descriptors F.2 [Analysis of algorithms and problem complexity]: Nonnumerical algorithms and problems—graph compression
A Discriminative Approach to Topic-based Citation Recommendation ⋆
"... Abstract. In this paper, we present a study of a novel problem, i.e. topic-based citation recommendation, which involves recommending papers to be referred to. Traditionally, this problem is usually treated as an engineering issue and dealt with using heuristics. This paper gives a formalization of ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
Abstract. In this paper, we present a study of a novel problem, i.e. topic-based citation recommendation, which involves recommending papers to be referred to. Traditionally, this problem is usually treated as an engineering issue and dealt with using heuristics. This paper gives a formalization of topic-based citation recommendation and proposes a discriminative approach to this problem. Specifically, it proposes a two-layer Restricted Boltzmann Machine model, called RBM-CS, which can discover topic distributions of paper content and citation relationship simultaneously. Experimental results demonstrate that RBM-CS can significantly outperform baseline methods for citation recommendation. 1