Results 1 - 10
of
13
NIFTY: A System for Large Scale Information Flow Tracking and Clustering
"... The real-time information on news sites, blogs and social networking sites changes dynamically and spreads rapidly through the Web. Developing methods for handling such information at a massive scale requires that we think about how information content varies over time, how it is transmitted, and ho ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
(Show Context)
The real-time information on news sites, blogs and social networking sites changes dynamically and spreads rapidly through the Web. Developing methods for handling such information at a massive scale requires that we think about how information content varies over time, how it is transmitted, and how it mutates as it spreads. We describe the News Information Flow Tracking, Yay! (NIFTY) system for large scale real-time tracking of “memes ” — short textual phrases that travel and mutate through the Web. NIFTY is based on a novel highly-scalable incremental meme-clustering algorithm that efficiently extracts and identifies mutational variants of a single meme. NIFTY runs orders of magnitude faster than our previous MEMETRACKER system, while also maintaining better consistency and quality of extracted memes. We demonstrate the effectiveness of our approach by processing a 20 terabyte dataset of 6.1 billion blog posts and news articles that we have been continuously collecting for the last four years. NIFTY extracted 2.9 billion unique textual phrases and identified more than 9 million memes. Our meme-tracking algorithm was able to process the entire dataset in less than five days using a single machine. Furthermore, we also provide a live deployment of the NIFTY system that allows users to explore the dynamics of online news in near real-time.
Traveling trends: Social butterflies or frequent fliers
- In Proc. of 2013 ACM Conf. on Online Social Networks (COSN’13
, 2013
"... Trending topics are the online conversations that grab collec-tive attention on social media. They are continually chang-ing and often reflect exogenous events that happen in the real world. Trends are localized in space and time as they are driven by activity in specific geographic areas that act a ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
(Show Context)
Trending topics are the online conversations that grab collec-tive attention on social media. They are continually chang-ing and often reflect exogenous events that happen in the real world. Trends are localized in space and time as they are driven by activity in specific geographic areas that act as sources of traffic and information flow. Taken indepen-dently, trends and geography have been discussed in recent literature on online social media; although, so far, little has been done to characterize the relation between trends and geography. Here we investigate more than eleven thousand topics that trended on Twitter in 63 main US locations dur-ing a period of 50 days in 2013. This data allows us to study the origins and pathways of trends, how they com-pete for popularity at the local level to emerge as winners at the country level, and what dynamics underlie their pro-duction and consumption in different geographic areas. We identify two main classes of trending topics: those that sur-face locally, coinciding with three different geographic clus-ters (East coast, Midwest and Southwest); and those that emerge globally from several metropolitan areas, coinciding with the major air traffic hubs of the country. These hubs act as trendsetters, generating topics that eventually trend at the country level, and driving the conversation across the country. This poses an intriguing conjecture, drawing a parallel between the spread of information and diseases: Do trends travel faster by airplane than over the Internet?
Clustering memes in social media
- In IEEE/ACM Intl. Conf. on Advances in Social Networks Analysis and Mining (ASONAM
, 2013
"... Abstract—The increasing pervasiveness of social media cre-ates new opportunities to study human social behavior, while challenging our capability to analyze their massive data streams. One of the emerging tasks is to distinguish between different kinds of activities, for example engineered misinform ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
(Show Context)
Abstract—The increasing pervasiveness of social media cre-ates new opportunities to study human social behavior, while challenging our capability to analyze their massive data streams. One of the emerging tasks is to distinguish between different kinds of activities, for example engineered misinformation campaigns versus spontaneous communication. Such detection problems re-quire a formal definition of meme, or unit of information that can spread from person to person through the social network. Once a meme is identified, supervised learning methods can be applied to classify different types of communication. The appropriate granularity of a meme, however, is hardly captured from existing entities such as tags and keywords. Here we present a framework for the novel task of detecting memes by clustering messages from large streams of social data. We evaluate various similarity measures that leverage content, metadata, network features, and their combinations. We also explore the idea of pre-clustering on the basis of existing entities. A systematic evaluation is carried out using a manually curated dataset as ground truth. Our analysis shows that pre-clustering and a combination of heterogeneous features yield the best trade-off between number of clusters and their quality, demonstrating that a simple combination based on pairwise maximization of similarity is as effective as a non-trivial optimization of parameters. Our approach is fully automatic, unsupervised, and scalable for real-time detection of memes in streaming data. I.
Multimedia semantics: Interactions between content and community
- Proceedings of the IEEE
, 2012
"... Abstract—This paper reviews the state of the art and some emerging issues in research areas related to pattern analysis and monitoring of web-based social communities. This research area is important for several reasons. The presence of near ubiquitous low-cost computing and com-munication technolog ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Abstract—This paper reviews the state of the art and some emerging issues in research areas related to pattern analysis and monitoring of web-based social communities. This research area is important for several reasons. The presence of near ubiquitous low-cost computing and com-munication technologies have enabled people to access and share information at unprecedented scale, which necessitates new research for making sense of such content. Furthermore, popular websites with sophisticated media sharing and notification features allow users to stay in touch with friends and loved ones, and also to help form explicit and implicit groups. These social structures are an important source of information for better organizing and managing multimedia. In this article, we study how media-rich social networks provide additional insight into familiar multimedia research
Topicality and social impact: Diverse messages but focused messengers. Under review, arXiv 1402.5443
, 2014
"... Are users who comment on a variety of matters more likely to achieve high influence than those who delve into one fo-cused field? Do general Twitter hashtags, such as #lol, tend to be more popular than novel ones, such as #instant-lyinlove? Questions like these demand a way to detect top-ics hidden ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
Are users who comment on a variety of matters more likely to achieve high influence than those who delve into one fo-cused field? Do general Twitter hashtags, such as #lol, tend to be more popular than novel ones, such as #instant-lyinlove? Questions like these demand a way to detect top-ics hidden behind messages associated with an individual or a hashtag, and a gauge of similarity among these top-ics. Here we develop such an approach to identify clusters of similar hashtags by detecting communities in the hash-tag co-occurrence network. Then the topical diversity of a user’s interests is quantified by the entropy of her hashtags across different topic clusters. A similar measure is applied to hashtags, based on co-occurring tags. We find that high topical diversity of early adopters or co-occurring tags im-plies high future popularity of hashtags. In contrast, low diversity helps an individual accumulate social influence. In short, diverse messages and focused messengers are more likely to gain impact. Categories and Subject Descriptors [Information systems]: Information systems applications— Collaborative and social computing systems and tools, Social networking sites; [Information systems]: World Wide Web—Web applications, Web mining; [Human-centered
Saving, Reusing, and Remixing Web Video: Using Attitudes and Practices to Reveal Social Norms
"... The growth of online videos has spurred a concomitant increase in the storage, reuse, and remix of this content. As we gain more experience with video content, social norms about ownership have evolved accordingly, spelling out what people think is appropriate use of content that is not necessarily ..."
Abstract
- Add to MetaCart
(Show Context)
The growth of online videos has spurred a concomitant increase in the storage, reuse, and remix of this content. As we gain more experience with video content, social norms about ownership have evolved accordingly, spelling out what people think is appropriate use of content that is not necessarily their own. We use a series of three studies, each centering on a different genre of recordings, to probe 634 participants ’ attitudes toward video storage, reuse, and remix; we also question participants about their own experiences with online video. The results allow us to characterize current practice and emerging social norms and to establish the relationship between the two. Hypotheticals borrowed from legal research are used as the primary vehicle for testing attitudes, and for identifying boundaries between socially acceptable and unacceptable behavior.
Multimedia Semantics: Interactions Between Content and Community
, 2012
"... Research on pattern analysis and monitoring of web-based social communities is reviewed in this paper; the authors study how media-rich social networks provide insight into multimedia research problems. ..."
Abstract
- Add to MetaCart
Research on pattern analysis and monitoring of web-based social communities is reviewed in this paper; the authors study how media-rich social networks provide insight into multimedia research problems.
Clustering memes in social media streams
"... The problem of clustering content in social media has pervasive appli-cations, including the identification of discussion topics, event detection, and content recommendation. Here we describe a streaming framework for online detection and clustering of memes in social media, specifically Twitter. A ..."
Abstract
- Add to MetaCart
(Show Context)
The problem of clustering content in social media has pervasive appli-cations, including the identification of discussion topics, event detection, and content recommendation. Here we describe a streaming framework for online detection and clustering of memes in social media, specifically Twitter. A pre-clustering procedure, namely protomeme detection, first isolates atomic tokens of information carried by the tweets. Protomemes are thereafter aggregated, based on multiple similarity measures, to obtain memes as cohesive groups of tweets reflecting actual concepts or topics of discussion. The clustering algorithm takes into account various dimen-sions of the data and metadata, including natural language, the social network, and the patterns of information diffusion. As a result, our sys-tem can build clusters of semantically, structurally, and topically related tweets. The clustering process is based on a variant of Online K-means that incorporates a memory mechanism, used to “forget ” old memes and replace them over time with the new ones. The evaluation of our frame-work is carried out by using a dataset of Twitter trending topics. Over a one-week period, we systematically determined whether our algorithm was able to recover the trending hashtags. We show that the proposed method outperforms baseline algorithms that only use content features, as well as a state-of-the-art event detection method that assumes full knowledge of the underlying follower network. We finally show that our online learning framework is flexible, due to its independence of the adopted clustering algorithm, and best suited to work in a streaming scenario. 1