Results 1 - 10
of
151
What is Twitter, a Social Network or a News Media?
"... Twitter, a microblogging service less than three years old, commands more than 41 million users as of July 2009 and is growing fast. Twitter users tweet about any topic within the 140-character limit and follow others to receive their tweets. The goal of this paper is to study the topological charac ..."
Abstract
-
Cited by 991 (12 self)
- Add to MetaCart
(Show Context)
Twitter, a microblogging service less than three years old, commands more than 41 million users as of July 2009 and is growing fast. Twitter users tweet about any topic within the 140-character limit and follow others to receive their tweets. The goal of this paper is to study the topological characteristics of Twitter and its power as a new medium of information sharing. We have crawled the entire Twitter site and obtained 41.7 million user profiles, 1.47 billion social relations, 4, 262 trending topics, and 106 million tweets. In its follower-following topology analysis we have found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks [28]. In order to identify influentials on Twitter, we have ranked users by the number of followers and by PageRank and found two rankings to be similar.
Characterizing User Behavior in Online Social Networks
"... Understanding how users behave when they connect to social networking sites creates opportunities for better interface design, richer studies of social interactions, and improved design of content distribution systems. In this paper, we present a first of a kind analysis of user workloads in online ..."
Abstract
-
Cited by 151 (4 self)
- Add to MetaCart
(Show Context)
Understanding how users behave when they connect to social networking sites creates opportunities for better interface design, richer studies of social interactions, and improved design of content distribution systems. In this paper, we present a first of a kind analysis of user workloads in online social networks. Our study is based on detailed clickstream data, collected over a 12-day period, summarizing HTTP sessions of 37,024 users who accessed four popular social networks: Orkut, MySpace, Hi5, and LinkedIn. The data were collected from a social network aggregator website in Brazil, which enables users to connect to multiple social networks with a single authentication. Our analysis of the clickstream data reveals key features of the social network workloads, such as how frequently people connect to social networks and for how long, as well as the types and sequences of activities that users conduct on these sites. Additionally, we crawled the social network topology of Orkut, so that we could analyze user interaction data in light of the social graph. Our data analysis suggests insights into how users interact with friends in Orkut, such as how frequently users visit their friends ’ or non-immediate friends ’ pages. In summary, our analysis demonstrates the power of using clickstream data in identifying patterns in social network workloads and social interactions. Our analysis shows that browsing, which cannot be inferred from crawling publicly available data, accounts for 92 % of all user activities. Consequently, compared to using only crawled data, considering silent interactions like browsing friends ’ pages increases the measured level of interaction among users.
Learning Influence Probabilities In Social Networks
"... Recently, there has been tremendous interest in the phenomenon of influence propagation in social networks. The studies in this area assume they have as input to their problems a social graph with edges labeled with probabilities of influence between users. However, the question of where these proba ..."
Abstract
-
Cited by 148 (18 self)
- Add to MetaCart
Recently, there has been tremendous interest in the phenomenon of influence propagation in social networks. The studies in this area assume they have as input to their problems a social graph with edges labeled with probabilities of influence between users. However, the question of where these probabilities come from or how they can be computed from real social network data has been largely ignored until now. Thus it is interesting to ask whether from a social graph and a log of actions by its users, one can build models of influence. This is the main problem attacked in this paper. In addition to proposing models and algorithms for learning the model parameters and for testing the learned models to make predictions, we also develop techniques for predicting the time by which a user may be expected to perform an action. We validate our ideas and techniques using the Flickr data set consisting of a social graph with 1.3M nodes, 40M edges, and an action log consisting of 35M tuples referring to 300K distinct actions. Beyond showing that there is genuine influence happening in a real social network, we show that our techniques have excellent prediction performance.
The Role of Social Networks in Information Diffusion
, 2012
"... Online social networking technologies enable individuals to simultaneously share information with any number of peers. Quantifying the causal effect of these mediums on the dissemination of information requires not only identification of who influences whom, but also of whether individuals would sti ..."
Abstract
-
Cited by 87 (3 self)
- Add to MetaCart
Online social networking technologies enable individuals to simultaneously share information with any number of peers. Quantifying the causal effect of these mediums on the dissemination of information requires not only identification of who influences whom, but also of whether individuals would still propagate information in the absence of social signals about that information. We examine the role of social networks in online information diffusion with a large-scale field experiment that randomizes exposure to signals about friends ’ information sharing among 253 million subjects in situ. Those who are exposed are significantly more likely to spread information, and do so sooner than those who are not exposed. We further examine the relative role of strong and weak ties in information propagation. We show that, although stronger ties are individually more influential, it is the more abundant weak ties who are responsible for the propagation of novel information. This suggests that weak ties may play a more dominant role in the dissemination of information online than currently believed.
Modeling Information Diffusion in Implicit Networks
"... Abstract—Social media forms a central domain for the production and dissemination of real-time information. Even though such flows of information have traditionally been thought of as diffusion processes over social networks, the underlying phenomena are the result of a complex web of interactions a ..."
Abstract
-
Cited by 83 (2 self)
- Add to MetaCart
(Show Context)
Abstract—Social media forms a central domain for the production and dissemination of real-time information. Even though such flows of information have traditionally been thought of as diffusion processes over social networks, the underlying phenomena are the result of a complex web of interactions among numerous participants. Here we develop a Linear Influence Model where rather than requiring the knowledge of the social network and then modeling the diffusion by predicting which node will influence which other nodes in the network, we focus on modeling the global influence of a node on the rate of diffusion through the (implicit) network. We model the number of newly infected nodes as a function of which other nodes got infected in the past. For each node we estimate an influence function that quantifies how many subsequent infections can be attributed to the influence of that node over time. A nonparametric formulation of the model leads to a simple least squares problem that can be solved on large datasets. We validate our model on a set of 500 million tweets and a set of 170 million news articles and blog posts. We show that the Linear Influence Model accurately models influences of nodes and reliably predicts the temporal dynamics of information diffusion. We find that patterns of influence of individual participants differ significantly depending on the type of the node and the topic of the information. I.
Understanding Latent Interactions in Online Social Networks
"... Popular online social networks (OSNs) like Facebook and Twitter are changing the way users communicate and interact with the Internet. A deep understanding of user interactions in OSNs canprovide important insights into questions of human social behavior, and into the design of social platforms and ..."
Abstract
-
Cited by 53 (16 self)
- Add to MetaCart
(Show Context)
Popular online social networks (OSNs) like Facebook and Twitter are changing the way users communicate and interact with the Internet. A deep understanding of user interactions in OSNs canprovide important insights into questions of human social behavior, and into the design of social platforms and applications. However, recent studies have shown that a majority of user interactions on OSNs are latent interactions,passiveactionssuchasprofilebrowsing that cannot be observed by traditional measurement techniques. In this paper, we seek a deeper understanding of both visible and latent user interactions in OSNs. For quantifiable data on latent user interactions, we perform a detailed measurement study on Renren, the largest OSN in China with more than 150 million users to date. All friendship links in Renren are public, allowing us to exhaustively crawl a connected graph component of 42 million users and 1.66 billion social links in 2009. Renren also keeps detailed visitor logs for each user profile, and counters for each photo and diary/blog entry. We capture detailed histories of profile visits over a period of 90 days for more than 61,000 users in the Peking University Renren network, and use statistics of profile visits to study issues of user profile popularity, reciprocity of profile visits, and the impact of content updates on user popularity. We find that latent interactions are much more prevalent and frequent than visible events, non-reciprocal in nature, and that profile popularity are uncorrelated with the frequency of content updates. Finally, we construct latent interaction graphs as models of user browsing behavior, and compare their structural properties against those of both visible interaction graphs and social graphs.
Correcting for Missing Data in Information Cascades
, 2010
"... Transmission of infectious diseases, propagation of information, and spread of ideas and influence through social networks are all examples of diffusion. In such cases we say that a contagion spreads through the network, a process that can be modeled by a cascade graph. Studying cascades and network ..."
Abstract
-
Cited by 33 (2 self)
- Add to MetaCart
(Show Context)
Transmission of infectious diseases, propagation of information, and spread of ideas and influence through social networks are all examples of diffusion. In such cases we say that a contagion spreads through the network, a process that can be modeled by a cascade graph. Studying cascades and network diffusion is challenging due to missing data. Even a single missing observation in a sequence of propagation events can significantly alter our inferences about the diffusion process. We address the problem of missing data in information cascades. Specifically, given only a fraction C ′ of the complete cascade C, our goal is to estimate the properties of the complete cascade C, such as its size or depth. To estimate the properties of C, we first formulate k-tree model of cascades and analytically study its properties in the face of missing data. We then propose a numerical method that given a cascade model and observed cascade C ′ can estimate properties ofthecomplete cascade C. Weevaluate our methodology usinginformation propagation cascades in the Twitter network (70 million nodes and 2 billion edges), as well as information cascades arising in the blogosphere. Our experiments show that the k-tree model is an effective tool to study the effects of missing data in cascades. Most importantly, we show that our method (and the k-tree model) can accurately estimate properties of the complete cascade C even when 90 % of the data is missing. 1
Cold Start Link Prediction
"... Inthetraditionallinkpredictionproblem, asnapshotofasocial network is used as a starting point to predict, by means of graph-theoretic measures, the links that are likely to appear in the future. In this paper, we introduce cold start link prediction as the problem of predicting the structure of a so ..."
Abstract
-
Cited by 33 (0 self)
- Add to MetaCart
(Show Context)
Inthetraditionallinkpredictionproblem, asnapshotofasocial network is used as a starting point to predict, by means of graph-theoretic measures, the links that are likely to appear in the future. In this paper, we introduce cold start link prediction as the problem of predicting the structure of a social network when the network itself is totally missing while some other information regarding the nodes is available. Weproposeatwo-phasemethodbasedonthebootstrap probabilistic graph. The first phase generates an implicit social network under the form of a probabilistic graph. The second phase applies probabilistic graph-based measures to produce the final prediction. We assess our method empirically over a large data collection obtained from Flickr, using interest groups as the initial information. The experiments confirm the effectiveness of our approach.
Who is Tweeting on Twitter: Human, Bot, or Cyborg?
"... Twitter is a new web application playing dual roles of online social networking and micro-blogging. Users communicate with each other by publishing text-based posts. The popularity and open structure of Twitter have attracted a large number of automated programs, known as bots, which appear to be a ..."
Abstract
-
Cited by 32 (1 self)
- Add to MetaCart
(Show Context)
Twitter is a new web application playing dual roles of online social networking and micro-blogging. Users communicate with each other by publishing text-based posts. The popularity and open structure of Twitter have attracted a large number of automated programs, known as bots, which appear to be a double-edged sword to Twitter. Legitimate bots generate a large amount of benign tweets delivering news and updating feeds, while malicious bots spread spam or malicious contents. More interestingly, in the middle between human and bot, there has emerged cyborg referred to either bot-assisted human or human-assisted bot. To assist human users in identifying who they are interacting with, this paper focuses on the classification of human, bot and cyborg accounts on Twitter. We first conduct a set of large-scale measurements with a collection of over 500,000 accounts. We observe the difference among human, bot and cyborg in terms of tweeting behavior, tweet content, and account properties. Based on the measurement results, we propose a classification system that includes the following four parts: (1) an entropy-based component, (2) a machine-learning-based component, (3) an account properties component, and (4) a decision maker. It uses the combination of features extracted from an unknown user to determine the likelihood of being a human, bot or cyborg. Our experimental evaluation demonstrates the efficacy of the proposed classification system.
Track Globally, Deliver Locally: Improving Content Delivery Networks by Tracking Geographic Social Cascades ABSTRACT
, 2011
"... Providers such as YouTube offer easy access to multimedia content to millions, generating high bandwidth and storage demand on the Content Delivery Networks they rely upon. More and more, the diffusion of this content happens on online social networks such as Facebook and Twitter, where social casca ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
(Show Context)
Providers such as YouTube offer easy access to multimedia content to millions, generating high bandwidth and storage demand on the Content Delivery Networks they rely upon. More and more, the diffusion of this content happens on online social networks such as Facebook and Twitter, where social cascades can be observed when users increasingly repost links they have received from others. In this paper we describe how geographic information extracted from social cascades can be exploited to improve caching of multimedia files in a Content Delivery Network. We take advantage of the fact that social cascades can propagate in a geographically limited area to discern whether an item is spreading locally or globally. This informs cache replacement policies, which utilize this information to ensure that content relevant to a cascade is kept close to the users who may be interested in it. We validate our approach by using a novel dataset which combines social interaction data with geographic information: we track social cascades of YouTube links over Twitter and build a proof-of-concept geographic model of a realistic distributed Content Delivery Network. Our performance evaluation shows that we are able to improve cache hits with respect to cache policies without geographic and social information.