Results 1 - 10
of
991
spam: The Underground on 140 Characters or Less
- In CCS
, 2010
"... In this work we present a characterization of spam on Twitter. We find that 8 % of 25 million URLs posted to the site point to phishing, malware, and scams listed on popular blacklists. We analyze the accounts that send spam and find evidence that it originates from previously legitimate accounts th ..."
Abstract
-
Cited by 147 (11 self)
- Add to MetaCart
(Show Context)
In this work we present a characterization of spam on Twitter. We find that 8 % of 25 million URLs posted to the site point to phishing, malware, and scams listed on popular blacklists. We analyze the accounts that send spam and find evidence that it originates from previously legitimate accounts that have been compromised and are now being puppeteered by spammers. Using clickthrough data, we analyze spammers ’ use of features unique to Twitter and the degree that they affect the success of spam. We find that Twitter is a highly successful platform for coercing users to visit spam pages, with a clickthrough rate of 0.13%, compared to much lower rates previously reported for email spam. We group spam URLs into campaigns and identify trends that uniquely distinguish phishing, malware, and spam, to gain an insight into the underlying techniques used to attract users. Given the absence of spam filtering on Twitter, we examine whether the use of URL blacklists would help to significantly stem the spread of Twitter spam. Our results indicate that blacklists are too slow at identifying new threats, allowing more than 90 % of visitors to view a page before it becomes blacklisted. We also find that even if blacklist delays were reduced, the use by spammers of URL shortening services for obfuscation negates the potential gains unless tools that use blacklists develop more sophisticated spam filtering.
Who Says What to Whom on Twitter
, 2011
"... We study several longstanding questions in media communications research, in the context of the microblogging service Twitter, regarding the production, flow, and consumption of information. To do so, we exploit a recently introduced feature of Twitter known as “lists ” to distinguish between elite ..."
Abstract
-
Cited by 136 (7 self)
- Add to MetaCart
(Show Context)
We study several longstanding questions in media communications research, in the context of the microblogging service Twitter, regarding the production, flow, and consumption of information. To do so, we exploit a recently introduced feature of Twitter known as “lists ” to distinguish between elite users—by which we mean celebrities, bloggers, and representatives of media outlets and other formal organizations—and ordinary users. Based on this classification, we find a striking concentration of attention on Twitter, in that roughly 50 % of URLs consumed are generated by just 20K elite users, where the media produces the most information, but celebrities are the most followed. We also find significant homophily within categories: celebrities listen to celebrities, while bloggers listen to bloggers etc; however, bloggers in general rebroadcast more information than the other categories. Next we re-examine the classical “two-step flow ” theory of communications, finding considerable support for it on Twitter. Third, we find that URLs broadcast by different categories of users or containing different types of content exhibit systematically different lifespans. And finally, we examine the attention paid by the different user categories to different news topics.
Want to be retweeted? Large scale analytics on factors impacting retweet in Twitter network
- in Proceedings of the IEEE Second International Conference on Social Computing (SocialCom
, 2010
"... Abstract — Retweeting is the key mechanism for information diffusion in Twitter. It emerged as a simple yet powerful way of disseminating information in the Twitter social network. Even though a lot of information is shared in Twitter, little is known yet about how and why certain information spread ..."
Abstract
-
Cited by 133 (2 self)
- Add to MetaCart
(Show Context)
Abstract — Retweeting is the key mechanism for information diffusion in Twitter. It emerged as a simple yet powerful way of disseminating information in the Twitter social network. Even though a lot of information is shared in Twitter, little is known yet about how and why certain information spreads more widely than others. In this paper, we examine a number of features that might affect retweetability of tweets. We gathered content and contextual features from 74M tweets and used this data set to identify factors that are significantly associated with retweet rate. We also built a predictive retweet model. We found that, amongst content features, URLs and hashtags have strong relationships with retweetability. Amongst contextual features, the number of followers and followees as well as the age of the account seem to affect retweetability, while, interestingly, the number of past tweets does not predict retweetability of a user’s tweet. We believe that this research would inform the design of sensemaking and analytics tools for social media streams. Keywords-Twitter; retweet; tweet; follower; social network; social media; factor analysis I.
PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs
"... Large-scale graph-structured computation is central to tasks ranging from targeted advertising to natural language processing and has led to the development of several graph-parallel abstractions including Pregel and GraphLab. However, the natural graphs commonly found in the real-world have highly ..."
Abstract
-
Cited by 128 (4 self)
- Add to MetaCart
(Show Context)
Large-scale graph-structured computation is central to tasks ranging from targeted advertising to natural language processing and has led to the development of several graph-parallel abstractions including Pregel and GraphLab. However, the natural graphs commonly found in the real-world have highly skewed power-law degree distributions, which challenge the assumptions made by these abstractions, limiting performance and scalability. In this paper, we characterize the challenges of computation on natural graphs in the context of existing graphparallel abstractions. We then introduce the PowerGraph abstraction which exploits the internal structure of graph programs to address these challenges. Leveraging the PowerGraph abstraction we introduce a new approach to distributed graph placement and representation that exploits the structure of power-law graphs. We provide a detailed analysis and experimental evaluation comparing PowerGraph to two popular graph-parallel systems. Finally, we describe three different implementation strategies for PowerGraph and discuss their relative merits with empirical evaluations on large-scale real-world problems demonstrating order of magnitude gains. 1
A Latent Variable Model for Geographic Lexical Variation
"... The rapid growth of geotagged social media raises new computational possibilities for investigating geographic linguistic variation. In this paper, we present a multi-level generative model that reasons jointly about latent topics and geographical regions. High-level topics such as “sports ” or “ent ..."
Abstract
-
Cited by 127 (18 self)
- Add to MetaCart
The rapid growth of geotagged social media raises new computational possibilities for investigating geographic linguistic variation. In this paper, we present a multi-level generative model that reasons jointly about latent topics and geographical regions. High-level topics such as “sports ” or “entertainment ” are rendered differently in each geographic region, revealing topic-specific regional distinctions. Applied to a new dataset of geotagged microblogs, our model recovers coherent topics and their regional variants, while identifying geographic areas of linguistic consistency. The model also enables prediction of an author’s geographic location from raw text, outperforming both text regression and supervised topic models. 1
Information Credibility on Twitter
"... We analyze the information credibility of news propagated through Twitter, a popular microblogging service. Previous research has shown that most of the messages posted on Twitter are truthful, but the service is also used to spread misinformation and false rumors, often unintentionally. On this pap ..."
Abstract
-
Cited by 125 (5 self)
- Add to MetaCart
(Show Context)
We analyze the information credibility of news propagated through Twitter, a popular microblogging service. Previous research has shown that most of the messages posted on Twitter are truthful, but the service is also used to spread misinformation and false rumors, often unintentionally. On this paper we focus on automatic methods for assessing the credibility of a given set of tweets. Specifically, we analyze microblog postings related to “trending ” topics, and classify them as credible or not credible, based on features extracted from them. We use features from users ’ posting and re-posting (“re-tweeting”) behavior, from the text of the posts, and from citations to external sources. We evaluate our methods using a significant number of human assessments about the credibility of items on a recent sample of Twitter postings. Our results shows that there are measurable differences in the way messages propagate, that can be used to classify them automatically as credible or not credible, with precision and recall in the range of 70% to 80%.
GraphChi: Large-scale Graph Computation On just a PC
- In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, OSDI’12
, 2012
"... Current systems for graph computation require a distributed computing cluster to handle very large real-world problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains c ..."
Abstract
-
Cited by 115 (6 self)
- Add to MetaCart
(Show Context)
Current systems for graph computation require a distributed computing cluster to handle very large real-world problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains challenging, especially to non-experts. In this work, we present GraphChi, a disk-based system for computing efficiently on graphs with billions of edges. By using a well-known method to break large graphs into small parts, and a novel parallel sliding windows method, GraphChi is able to execute several advanced data mining, graph mining, and machine learning algorithms on very large graphs, using just a single consumer-level computer. We further extend GraphChi to support graphs that evolve over time, and demonstrate that, on a single computer, GraphChi can process over one hundred thousand graph updates per second, while simultaneously performing computation. We show, through experiments and theoretical analysis, that GraphChi performs well on both SSDs and rotational hard drives. By repeating experiments reported for existing distributed systems, we show that, with only fraction of the resources, GraphChi can solve the same problems in very reasonable time. Our work makes large-scale graph computation available to anyone with a modern PC. 1
Beyond Trending Topics: Real-World Event Identification on Twitter
"... User-contributed messages on social media sites such as Twitter have emerged as powerful, real-time means of information sharing on the Web. These short messages tend to reflect a variety of events in real time, making Twitter particularly well suited as a source of real-time event content. In this ..."
Abstract
-
Cited by 97 (3 self)
- Add to MetaCart
(Show Context)
User-contributed messages on social media sites such as Twitter have emerged as powerful, real-time means of information sharing on the Web. These short messages tend to reflect a variety of events in real time, making Twitter particularly well suited as a source of real-time event content. In this paper, we explore approaches for analyzing the stream of Twitter messages to distinguish between messages about real-world events and non-event messages. Our approach relies on a rich family of aggregate statistics of topically similar message clusters. Large-scale experiments over millions of Twitter messages show the effectiveness of our approach for surfacing real-world event content on Twitter. 1
TwitInfo: Aggregating and Visualizing Microblogs for Event Exploration
"... Microblogs are a tremendous repository of user-generated content about world events. However, for people trying to understand events by querying services like Twitter, a chronological log of posts makes it very difficult to get a detailed understanding of an event. In this paper, we present TwitInfo ..."
Abstract
-
Cited by 96 (5 self)
- Add to MetaCart
(Show Context)
Microblogs are a tremendous repository of user-generated content about world events. However, for people trying to understand events by querying services like Twitter, a chronological log of posts makes it very difficult to get a detailed understanding of an event. In this paper, we present TwitInfo, a system for visualizing and summarizing events on Twitter. TwitInfo allows users to browse a large collection of tweets using a timeline-based display that highlights peaks of high tweet activity. A novel streaming algorithm automatically discovers these peaks and labels them meaningfully using text from the tweets. Users can drill down to subevents, and explore further via geolocation, sentiment, and popular URLs. We contribute a recall-normalized aggregate sentiment visualization to produce more honest sentiment overviews. An evaluation of the system revealed that users were able to reconstruct meaningful summaries of events in a small amount of time. An interview with a Pulitzer Prize-winning journalist suggested that the system would be especially useful for understanding a long-running event and for identifying eyewitnesses. Quantitatively, our system can identify 80-100 % of manually labeled peaks, facilitating a relatively complete view of each event studied.
Twitter Under Crisis: Can we trust what we RT?
"... In this article we explore the behavior of Twitter users under an emergency situation. In particular, we analyze the activity related to the 2010 earthquake in Chile and characterize Twitter in the hours and days following this disaster. Furthermore, we perform a preliminary study of certain social ..."
Abstract
-
Cited by 87 (2 self)
- Add to MetaCart
(Show Context)
In this article we explore the behavior of Twitter users under an emergency situation. In particular, we analyze the activity related to the 2010 earthquake in Chile and characterize Twitter in the hours and days following this disaster. Furthermore, we perform a preliminary study of certain social phenomenons, such as the dissemination of false rumors and confirmed news. We analyze how this information propagated through the Twitter network, with the purpose of assessing the reliability of Twitter as an information source under extreme circumstances. Our analysis shows that the propagation of tweets that correspond to rumors differs from tweets that spread news because rumors tend to be questioned more than news by the Twitter community. This result shows that it is posible to detect rumors by using aggregate analysis on tweets.