Results 1 - 10
of
369
Cascading behavior in large blog graphs
- In SDM
, 2007
"... How do blogs cite and influence each other? How do such links evolve? Does the popularity of old blog posts drop exponentially with time? These are some of the questions that we address in this work. Blogs (weblogs) have become an important medium of information because of their timely publication, ..."
Abstract
-
Cited by 132 (25 self)
- Add to MetaCart
(Show Context)
How do blogs cite and influence each other? How do such links evolve? Does the popularity of old blog posts drop exponentially with time? These are some of the questions that we address in this work. Blogs (weblogs) have become an important medium of information because of their timely publication, ease of use, and wide availability. In fact, they often make headlines, by discussing and discovering evidence about political events and facts. Often blogs link to one another, creating a publicly available record of how information and influence spreads through an underlying social network. Aggregating links from several blog posts creates a directed graph which we analyze to discover the patterns of information propagation in blogspace, and thereby understand the underlying social network. Here we report some surprising findings of the blog linking and information propagation structure, after we analyzed one of the largest available datasets, with 45, 000 blogs and ≈ 2.2 million blog-postings. Our analysis also sheds light on how rumors, viruses, and ideas propagate over social and computer networks.
GUESS: a language and interface for graph exploration
- IN CHI ’06: PROCEEDINGS OF THE SIGCHI CONFERENCE ON HUMAN FACTORS IN
, 2006
"... As graph models are applied to more widely varying fields, researchers struggle with tools for exploring and analyzing these structures. We describe GUESS, a novel system for graph exploration that combines an interpreted language with a graphical front end that allows researchers to rapidly prototy ..."
Abstract
-
Cited by 115 (1 self)
- Add to MetaCart
As graph models are applied to more widely varying fields, researchers struggle with tools for exploring and analyzing these structures. We describe GUESS, a novel system for graph exploration that combines an interpreted language with a graphical front end that allows researchers to rapidly prototype and deploy new visualizations. GUESS also contains a novel, interactive interpreter that connects the language and interface in a way that facilities exploratory visualization tasks. Our language, Gython, is a domain-specific embedded language which provides all the advantages of Python with new, graph specific operators, primitives, and shortcuts. We highlight key aspects of the system in the context of a large user survey and specific, real-world, case studies ranging from social and knowledge networks to distributed computer network analysis.
Patterns of influence in a recommendation network,”
- Proc. 10th Pacific-Asia Conf. on Advances in Knowledge Discovery and Data Mining (PAKDD),
, 2006
"... Abstract. Information cascades are phenomena whereby individuals adopt a new action or idea due to influence by others. As such a process spreads through an underlying social network, it can result in widespread adoption overall. We consider information cascades in the context of recommendations, a ..."
Abstract
-
Cited by 105 (14 self)
- Add to MetaCart
Abstract. Information cascades are phenomena whereby individuals adopt a new action or idea due to influence by others. As such a process spreads through an underlying social network, it can result in widespread adoption overall. We consider information cascades in the context of recommendations, and in particular study the patterns of cascading recommendations that arise in large social networks. We investigate a large person-to-person recommendation network, consisting of four million people who made sixteen million recommendations on half a million products. Such a dataset allows to pose a number of fundamental questions: What cascades arise frequently in real life? What features distinguish them? We enumerate and count cascade subgraphs on large directed graphs; as one component of this, we develop a novel efficient heuristic based on graph isomorphism testing that scales to large datasets. We discover novel patterns: the distribution of cascade sizes and depths follows a power law. Generally, cascades tend to be shallow, but occasional large bursts of propagation can occur. Cascade subgraphs are mainly tree-like, but we observe variability in connectivity and branching across recommendations for different types of products.
Efficient Aggregation for Graph Summarization
"... Graphs are widely used to model real world objects and their relationships, and large graph datasets are common in many application domains. To understand the underlying characteristics of large graphs, graph summarization techniques are critical. However, existing graph summarization methods are mo ..."
Abstract
-
Cited by 83 (5 self)
- Add to MetaCart
(Show Context)
Graphs are widely used to model real world objects and their relationships, and large graph datasets are common in many application domains. To understand the underlying characteristics of large graphs, graph summarization techniques are critical. However, existing graph summarization methods are mostly statistical (studying statistics such as degree distributions, hop-plots and clustering coefficients). These statistical methods are very useful, but the resolutions of the summaries are hard to control. In this paper, we introduce two database-style operations to summarize graphs. Like the OLAP-style aggregation methods that allow users to drill-down or roll-up to control the resolution of summarization, our methods provide an analogous functionality for large graph datasets. The first operation, called SNAP, produces a summary graph by grouping nodes based on user-selected node attributes and relationships. The second operation, called k-SNAP, further allows users to control the resolutions of summaries and provides the “drill-down ” and “roll-up ” abilities to navigate through summaries with different resolutions. We propose an efficient algorithm to evaluate the SNAP operation. In addition, we prove that the k-SNAP computation is NPcomplete. We propose two heuristic methods to approximate the k-SNAP results. Through extensive experiments on a variety of real and synthetic datasets, we demonstrate the effectiveness and efficiency of the proposed methods.
Political polarization on Twitter
- In ICWSM
, 2011
"... In this study we investigate how social media shape the networked public sphere and facilitate communication be-tween communities with different political orientations. We examine two networks of political communication on Twit-ter, comprised of more than 250,000 tweets from the six weeks leading up ..."
Abstract
-
Cited by 73 (7 self)
- Add to MetaCart
In this study we investigate how social media shape the networked public sphere and facilitate communication be-tween communities with different political orientations. We examine two networks of political communication on Twit-ter, comprised of more than 250,000 tweets from the six weeks leading up to the 2010 U.S. congressional midterm elections. Using a combination of network clustering algo-rithms and manually-annotated data we demonstrate that the network of political retweets exhibits a highly segregated par-tisan structure, with extremely limited connectivity between left- and right-leaning users. Surprisingly this is not the case for the user-to-user mention network, which is dominated by a single politically heterogeneous cluster of users in which ideologically-opposed individuals interact at a much higher rate compared to the network of retweets. To explain the dis-tinct topologies of the retweet and mention networks we con-jecture that politically motivated individuals provoke inter-action by injecting partisan content into information streams whose primary audience consists of ideologically-opposed users. We conclude with statistical evidence in support of this hypothesis. 1
Randomizing Social Networks: a Spectrum Preserving Approach
, 2008
"... Understanding the general properties of real social networks has gained much attention due to the proliferation of networked data. The nodes in the network are the individuals and the links among them denote their relationships. Many applications of networks such as anonymous Web browsing require re ..."
Abstract
-
Cited by 64 (7 self)
- Add to MetaCart
Understanding the general properties of real social networks has gained much attention due to the proliferation of networked data. The nodes in the network are the individuals and the links among them denote their relationships. Many applications of networks such as anonymous Web browsing require relationship anonymity due to the sensitive, stigmatizing, or confidential nature of the relationship. One general approach for this problem is to randomize the edges in true networks, and only disclose the randomized networks. In this paper, we investigate how various properties of networks may be affected due to randomization. Specifically, we focus on the spectrum since the eigenvalues of a network are intimately connected to many important topological features. We also conduct theoretical analysis on the extent to which edge anonymity can be achieved. A spectrum preserving graph randomization method, which can better preserve network properties while protecting edge anonymity, is then presented and empirically evaluated.
Statistical significance of communities in networks
- Physical Review E
"... Community structure is one of the main structural features of networks, revealing both their internal organization and the similarity of their elementary units. Despite the large variety of methods proposed to detect communities in graphs, there is a big need for multi-purpose techniques, able to ha ..."
Abstract
-
Cited by 58 (2 self)
- Add to MetaCart
Community structure is one of the main structural features of networks, revealing both their internal organization and the similarity of their elementary units. Despite the large variety of methods proposed to detect communities in graphs, there is a big need for multi-purpose techniques, able to handle different types of datasets and the subtleties of community structure. In this paper we present OSLOM (Order Statistics Local Optimization Method), the first method capable to detect clusters in networks accounting for edge directions, edge weights, overlapping communities, hierarchies and community dynamics. It is based on the local optimization of a fitness function expressing the statistical significance of clusters with respect to random fluctuations, which is estimated with tools of Extreme and Order Statistics. OSLOM can be used alone or as a refinement procedure of partitions/covers delivered by other techniques. We have also implemented sequential algorithms combining OSLOM with other fast techniques, so that the community structure of very large networks can be uncovered. Our method has a comparable performance as the best existing algorithms on artificial benchmark graphs. Several applications on real networks are shown as well. OSLOM is implemented in a freely available software
Combining Link and Content for Community Detection: A Discriminative Approach
- KDD'09
, 2009
"... In this paper, we consider the problem of combining link and content analysis for community detection from networked data, such as paper citation networks and Word Wide Web. Most existing approaches combine link and content information by a generative model that generates both links and contents via ..."
Abstract
-
Cited by 54 (5 self)
- Add to MetaCart
In this paper, we consider the problem of combining link and content analysis for community detection from networked data, such as paper citation networks and Word Wide Web. Most existing approaches combine link and content information by a generative model that generates both links and contents via a shared set of community memberships. These generative models have some shortcomings in that they failed to consider additional factors that could affect the community memberships and isolate the contents that are irrelevant to community memberships. To explicitly address these shortcomings, we propose a discriminative model for combining the link and content analysis for community detection. First, we propose a conditional model for link analysis and in the model, we introduce hidden variables to explicitly model the popularity of nodes. Second, to alleviate the impact of irrelevant content attributes, we develop a discriminative model for content analysis. These two models are unified seamlessly via the community memberships. We present efficient algorithms to solve the related optimization problems based on bound optimization and alternating projection. Extensive experiments with benchmark data sets show that the proposed framework significantly outperforms the state-of-the-art approaches for combining link and content analysis for community detection.
Fast Counting of Triangles in Large Real Networks: Algorithms and Laws
"... How can we quickly find the number of triangles in a large graph, without actually counting them? Triangles are important for real world social networks, lying at the heart of the clustering coefficient and of the transitivity ratio. However, straight-forward and even approximate counting algorithms ..."
Abstract
-
Cited by 50 (10 self)
- Add to MetaCart
(Show Context)
How can we quickly find the number of triangles in a large graph, without actually counting them? Triangles are important for real world social networks, lying at the heart of the clustering coefficient and of the transitivity ratio. However, straight-forward and even approximate counting algorithms can be slow, trying to execute or approximate the equivalent of a 3-way database join. In this paper, we provide two algorithms, the Eigen-Triangle for counting the total number of triangles in a graph, and the EigenTriangleLocal algorithm that gives the count of triangles that contain a desired node. Additional contributions include the following: (a) We show that both algorithms achieve excellent accuracy, with up to ≈ 1000x faster execution time, on several, real graphs and (b) we discover two new power laws ( Degree-Triangle and TriangleParticipation laws) with surprising properties. Figure 1. Speed-up ratio versus accuracy for the Wikipedia web graph ( ≈ 3, 1M nodes, ≈ 37M edges). Proposed method achieves 1021x faster time, for 97.4 % accuracy, compared to a typical competitor, the Node Iterator method. 1
Predicting the political alignment of twitter users
- In Proceedings of the IEEE Third International Conference on Social Computing (SocialCom
, 2011
"... Abstract—The widespread adoption of social media for po-litical communication creates unprecedented opportunities to monitor the opinions of large numbers of politically active individuals in real time. However, without a way to distinguish between users of opposing political alignments, conflicting ..."
Abstract
-
Cited by 49 (4 self)
- Add to MetaCart
(Show Context)
Abstract—The widespread adoption of social media for po-litical communication creates unprecedented opportunities to monitor the opinions of large numbers of politically active individuals in real time. However, without a way to distinguish between users of opposing political alignments, conflicting signals at the individual level may, in the aggregate, obscure partisan differences in opinion that are important to political strategy. In this article we describe several methods for predicting the political alignment of Twitter users based on the content and structure of their political communication in the run-up to the 2010 U.S. midterm elections. Using a data set of 1,000 manually-annotated individuals, we find that a support vector machine (SVM) trained on hashtag metadata outperforms an SVM trained on the full text of users ’ tweets, yielding predictions of political affiliations with 91 % accuracy. Applying latent semantic analysis to the content of users ’ tweets we identify hidden structure in the data strongly associated with political affiliation, but do not find that topic detection improves prediction performance. All of these content-based methods are outperformed by a classifier based on the segregated community structure of political information diffusion networks (95 % accuracy). We conclude with a practical application of this machinery to web-based political advertising, and outline several approaches to public opinion monitoring based on the techniques developed herein. I.