Results 1 - 10
of
27
Open Domain Event Extraction from Twitter
"... Tweets are the most up-to-date and inclusive stream of information and commentary on current events, but they are also fragmented and noisy, motivating the need for systems that can extract, aggregate and categorize important events. Previous work on extracting structured representations of events h ..."
Abstract
-
Cited by 52 (3 self)
- Add to MetaCart
(Show Context)
Tweets are the most up-to-date and inclusive stream of information and commentary on current events, but they are also fragmented and noisy, motivating the need for systems that can extract, aggregate and categorize important events. Previous work on extracting structured representations of events has focused largely on newswire text; Twitter’s unique characteristics present new challenges and opportunities for open-domain event extraction. This paper describes TwiCal— the first open-domain event-extraction and categorization system for Twitter. We demonstrate that accurately extracting an open-domain calendar of significant events from Twitter is indeed feasible. In addition, we present a novel approach for discovering important event categories and classifying extracted events based on latent variable models. By leveraging large volumes of unlabeled data, our approach achieves a 14 % increase in maximum F1 over a supervised baseline. A continuously updating demonstration of our system can be viewed at
Spatio-Temporal Dynamics of Online Memes: A Study of Geo-Tagged Tweets
"... We conduct a study of the spatio-temporal dynamics of Twitter hashtags through a sample of 2 billion geo-tagged tweets. In our analysis, we (i) examine the impact of location, time, and distance on the adoption of hashtags, which is important for understanding meme diffusion and information propagat ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
(Show Context)
We conduct a study of the spatio-temporal dynamics of Twitter hashtags through a sample of 2 billion geo-tagged tweets. In our analysis, we (i) examine the impact of location, time, and distance on the adoption of hashtags, which is important for understanding meme diffusion and information propagation; (ii) examine the spatial propagation of hashtags through their focus, entropy, and spread; and (iii) present two methods that leverage the spatio-temporal propagation of hashtags to characterize locations. Based on this study, we find that although hashtags are a global phenomenon, thephysical distance between locations is astrong constraint on the adoption of hashtags, both in terms of the hashtags shared between locations andin the timingof when these hashtags are adopted. We find both spatial and temporal locality as most hashtags spread over small geographical areas but at high speeds. We also find that hashtags are mostly a local phenomenon with long-tailed life spans. These (and other) findings have important implications for a variety of systems and applications, including targeted advertising, location-based services, social media search, and content delivery networks.
Towards automatic image understanding and mining via social curation
- In ICDM
, 2012
"... Abstract—The amount and variety of multimedia data such as images, movies and music available on over social networks are increasing rapidly. However, the ability to analyze and exploit these unorganized multimedia data remains inadequate, even with state-of-the-art media processing techniques. Our ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
(Show Context)
Abstract—The amount and variety of multimedia data such as images, movies and music available on over social networks are increasing rapidly. However, the ability to analyze and exploit these unorganized multimedia data remains inadequate, even with state-of-the-art media processing techniques. Our finding in this paper is that the emerging social curation service is a promising information source for the automatic under-standing and mining of images distributed and exchanged via social media. One remarkable virtue of social curation service datasets is that they are weakly supervised: the content in the service is manually collected, selected and maintained by users. This is very different from other social information sources, and we can utilize this characteristics for media content mining without expensive media processing techniques. In this paper we present a machine learning system for predicting view counts of images in social curation data as the first step to automatic image content evaluation. Our experiments confirm that the simple features extracted from a social curation corpus are much superior in terms of count prediction than the gold-standard image features of computer vision research. Keywords-Social curation, automatic image understanding and evaluation, feature extraction, regression I.
On Sampling the Wisdom of Crowds: Random vs. Expert Sampling of the Twitter Stream
"... Several applications today rely upon content streams crowdsourced from online social networks. Since real-time processing of large amounts of data generated on these sites is difficult, analytics companies and researchers are increasingly resorting to sampling. In this paper, we investigate the cruc ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
(Show Context)
Several applications today rely upon content streams crowdsourced from online social networks. Since real-time processing of large amounts of data generated on these sites is difficult, analytics companies and researchers are increasingly resorting to sampling. In this paper, we investigate the crucial question of how to sample the data generated by users in social networks. The traditional method is to randomly sample all the data. We analyze a different sampling methodology, where content is gathered only from a relatively small subset (< 1%) of the user population namely, the expert users. Over the duration of a month, we gathered tweets from over 500,000 Twitter users who are identified as experts on a diverse set of topics, and compared the resulting expert-sampled tweets with the 1 % randomly sampled tweets provided publicly by Twitter. We compared the sampled datasets along several dimensions, including the diversity, timeliness, and trustworthiness of the information contained within them, and find important differences between the datasets. Our observations have major implications for applications such as topical search, trustworthy content recommendations, and breaking news detection.
Spatial influence vs. community influence: modeling the global spread of social media
- In Proceedings of the 21st ACM international conference on Information and knowledge management, CIKM ’12
, 2012
"... In this paper we seek to understand and model the global spread of social media. How does social media spread from location to location across the globe? Can we model this spread and predict where social media will be popular in the future? Toward answering these questions, we develop a probabilisti ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
(Show Context)
In this paper we seek to understand and model the global spread of social media. How does social media spread from location to location across the globe? Can we model this spread and predict where social media will be popular in the future? Toward answering these questions, we develop a probabilistic model that synthesizes two conflicting hypotheses about the nature of online information spread: (i) the spatial influence model, which asserts that social media spreads to locations that are close by; and (ii) the community affinity influence model, which asserts that social media spreads between locations that are culturally connected, even if they are distant. Based on the geospatial footprint of 755 million geo-tagged hashtags spread through Twitter, we evaluate these models at predicting locations that will adopt hashtags in the future. We find that distance is the single most important explanation of future hashtag adoption since hashtags are fundamentally local. We also find that community affinities (like culture, language, and common interests) enhance the quality of purely spatial models, indicating the necessity of incorporating non-spatial features into models of global social media spread.
Time-aware Topic Recommendation based on Micro-blogs
- In Proc. of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12
, 2012
"... This is the author’s version of a work that was submitted/accepted for pub-lication in the following source: ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
This is the author’s version of a work that was submitted/accepted for pub-lication in the following source:
Unexpected relevance: An empirical study of serendipity in retweets.
- In Proceedings of ICWSM.
, 2013
"... Abstract Serendipity is a beneficial discovery that happens in an unexpected way. It has been found spectacularly valuable in various contexts, including scientific discoveries, acquisition of business, and recommender systems. Although never formally proved with large-scale behavioral analysis, it ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract Serendipity is a beneficial discovery that happens in an unexpected way. It has been found spectacularly valuable in various contexts, including scientific discoveries, acquisition of business, and recommender systems. Although never formally proved with large-scale behavioral analysis, it is believed by scientists and practitioners that serendipity is an important factor of positive user experience and increased user engagement. In this paper, we take the initiative to study the ubiquitous occurrence of serendipitious information diffusion and its effect in the context of microblogging communities. We refer to serendipity as unexpected relevance, then propose a principled statistical method to test the unexpectedness and the relevance of information received by a microblogging user, which identifies a serendipitous diffusion of information to the user. Our findings based on large-scale behavioral analysis reveal that there is a surprisingly strong presence of serendipitous information diffusion in retweeting, which accounts for more than 25% of retweets in both Twitter and Weibo. Upon the identification of serendipity, we are able to conduct observational analysis that reveals the benefit of serendipity to microblogging users. Results show that both the discovery and the provision of serendipity increase the level of user activities and social interactions, while the provision of serendipitous information also increases the influence of Twitter users.
Adaptive Method for Following Dynamic Topics on Twitter
"... Many research social studies of public response on social media require following (i.e., tracking) topics on Twitter for long periods of time. The current approaches rely on streaming tweets based on some hashtags or keywords, or following some Twitter accounts. Such approaches lead to limited cover ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
(Show Context)
Many research social studies of public response on social media require following (i.e., tracking) topics on Twitter for long periods of time. The current approaches rely on streaming tweets based on some hashtags or keywords, or following some Twitter accounts. Such approaches lead to limited coverage of on-topic tweets. In this paper, we introduce a novel technique for following such topics in a more effective way. A topic is defined as a set of well-prepared queries that cover the static side of the topic. We propose an automatic approach that adapts to emerging aspects of a tracked broad topic over time. We tested our tracking approach on three broad dynamic topics that are hot in different categories: Egyptian politics, Syrian conflict, and international sports. We measured the effectiveness of our approach over four full days spanning a period of four months to ensure consistency in effectiveness. Experimental results showed that, on average, our approach achieved over 100 % increase in recall relative to the baseline Boolean approach, while maintaining an acceptable precision of 83%.
Inferring user preferences by probabilistic logical reasoning over social networks. arXiv preprint arXiv:1411.2679
, 2014
"... We propose a framework for inferring the latent attitudes or pref-erences of users by performing probabilistic first-order logical rea-soning over the social network graph. Our method answers ques-tions about Twitter users like Does this user like sushi? or Is this user a New York Knicks fan? by bui ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
We propose a framework for inferring the latent attitudes or pref-erences of users by performing probabilistic first-order logical rea-soning over the social network graph. Our method answers ques-tions about Twitter users like Does this user like sushi? or Is this user a New York Knicks fan? by building a probabilistic model that reasons over user attributes (the user’s location or gender) and the social network (the user’s friends and spouse), via inferences like homophily (I am more likely to like sushi if spouse or friends like sushi, I am more likely to like the Knicks if I live in New York). The algorithm uses distant supervision, semi-supervised data harvesting and vector space models to extract user attributes (e.g. spouse, edu-cation, location) and preferences (likes and dislikes) from text. The extracted propositions are then fed into a probabilistic reasoner (we investigate both Markov Logic and Probabilistic Soft Logic). Our experiments show that probabilistic logical reasoning significantly improves the performance on attribute and relation extraction, and also achieves an F-score of 0.791 at predicting a users likes or dis-likes, significantly better than two strong baselines.
Large-scale highprecision topic modeling on Twitter
- in KDD. ACM
"... ABSTRACT We are interested in organizing a continuous stream of sparse and noisy texts, known as "tweets", in real time into an ontology of hundreds of topics with measurable and stringently high precision. This inference is performed over a full-scale stream of Twitter data, whose statis ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
ABSTRACT We are interested in organizing a continuous stream of sparse and noisy texts, known as "tweets", in real time into an ontology of hundreds of topics with measurable and stringently high precision. This inference is performed over a full-scale stream of Twitter data, whose statistical distribution evolves rapidly over time. The implementation in an industrial setting with the potential of affecting and being visible to real users made it necessary to overcome a host of practical challenges. We present a spectrum of topic modeling techniques that contribute to a deployed system. These include nontopical tweet detection, automatic labeled data acquisition, evaluation with human computation, diagnostic and corrective learning and, most importantly, high-precision topic inference. The latter represents a novel two-stage training algorithm for tweet text classification and a close-loop inference mechanism for combining texts with additional sources of information. The resulting system achieves 93% precision at substantial overall coverage.