Results 1 - 10
of
79
From Tweets to Polls : Linking Text Sentiment to Public Opinion Time Series
, 2010
"... We connect measures of public opinion measured from polls with sentiment measured from text. We analyze several surveys on consumer confidence and political opinion over the 2008 to 2009 period, and find they correlate to sentiment word frequencies in contemporaneous Twitter messages. While our resu ..."
Abstract
-
Cited by 297 (11 self)
- Add to MetaCart
We connect measures of public opinion measured from polls with sentiment measured from text. We analyze several surveys on consumer confidence and political opinion over the 2008 to 2009 period, and find they correlate to sentiment word frequencies in contemporaneous Twitter messages. While our results vary across datasets, in several cases the correlations are as high as 80%, and capture important large-scale trends. The results highlight the potential of text streams as a substitute and supplement for traditional polling.
Sentiment strength detection in short informal text
- J AM SOC INF SCI TECHNOL. 2010 DECEMBER;61:2544–2558
, 2010
"... A huge number of informal messages are posted every day in social network sites, blogs and discussion forums. Emotions seem to be frequently important in these texts for expressing friendship, showing social support or as part of online arguments. Algorithms to identify sentiment and sentiment stren ..."
Abstract
-
Cited by 92 (7 self)
- Add to MetaCart
A huge number of informal messages are posted every day in social network sites, blogs and discussion forums. Emotions seem to be frequently important in these texts for expressing friendship, showing social support or as part of online arguments. Algorithms to identify sentiment and sentiment strength are needed to help understand the role of emotion in this informal communication and also to identify inappropriate or anomalous affective utterances, potentially associated with threatening behaviour to the self or others. Nevertheless, existing sentiment detection algorithms tend to be commercially-oriented, designed to identify opinions about products rather than user behaviours. This article partly fills this gap with a new algorithm, SentiStrength, to extract sentiment strength from informal English text, using new methods to exploit the de-facto grammars and spelling styles of cyberspace. Applied to MySpace comments and with a lookup table of term sentiment strengths optimised by machine learning, SentiStrength is able to predict positive emotion with 60.6 % accuracy and negative emotion with 72.8 % accuracy, both based upon strength scales of 1-5. The former, but not the latter, is better than baseline and a wide range of general machine learning approaches.
Designing incentives for inexpert human raters
- Proceedings of the 2011 ACM Conference on Computer Supported Cooperative Work (CSCW ’11
, 2011
"... The emergence of online labor markets makes it far easier to use individual human raters to evaluate materials for data collection and analysis in the social sciences. In this paper, we report the results of an experiment — conducted in an on-line labor market — that measured the effectiveness of a ..."
Abstract
-
Cited by 63 (3 self)
- Add to MetaCart
(Show Context)
The emergence of online labor markets makes it far easier to use individual human raters to evaluate materials for data collection and analysis in the social sciences. In this paper, we report the results of an experiment — conducted in an on-line labor market — that measured the effectiveness of a col-lection of social and financial incentive schemes for motivat-ing workers to conduct a qualitative, content analysis task. Overall, workers performed better than chance, but results varied considerably depending on task difficulty. We find that treatment conditions which asked workers to prospec-tively think about the responses of their peers — when com-bined with financial incentives — produced more accurate performance. Other treatments generally had weak effects on quality. Workers in India performed significantly worse than US workers, regardless of treatment group.
Smoothing Techniques for Adaptive Online Language Models: Topic Tracking in Tweet Streams
"... We are interested in the problem of tracking broad topics such as “baseball ” and “fashion ” in continuous streams of short texts, exemplified by tweets from the microblogging service Twitter. The task is conceived as a language modeling problem where per-topic models are trained using hashtags in t ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
(Show Context)
We are interested in the problem of tracking broad topics such as “baseball ” and “fashion ” in continuous streams of short texts, exemplified by tweets from the microblogging service Twitter. The task is conceived as a language modeling problem where per-topic models are trained using hashtags in the tweet stream, which serve as proxies for topic labels. Simple perplexity-based classifiers are then applied to filter the tweet stream for topics of interest. Within this framework, we evaluate, both intrinsically and extrinsically, smoothing techniques for integrating “foreground ” models (to capture recency) and “background ” models (to combat sparsity), as well as different techniques for retaining history. Experiments show that unigram language models smoothed using a normalized extension of stupid backoff and a simple queue for history retention performs well on the task.
A meta-analysis of state-of-the-art electoral prediction from Twitter data
, 2012
"... NOTICE: This is the author’s version of a work accepted for publication by SAGE Publications. Changes resulting from the publishing process, including peer review, editing, corrections, structural formatting and other quality control mechanisms, may not be reflected in this document. Changes may hav ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
NOTICE: This is the author’s version of a work accepted for publication by SAGE Publications. Changes resulting from the publishing process, including peer review, editing, corrections, structural formatting and other quality control mechanisms, may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was published
Coder Reliability and Misclassification in the Human Coding of Party Manifestos
, 2012
"... The Comparative Manifesto Project (CMP) provides the only time series of estimated party policy positions in political science and has been extensively used in a wide variety of applications. Recent work (e.g., Benoit, Laver, and Mikhaylov 2009; Klingemann et al. 2006) focuses on nonsystematic sourc ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
(Show Context)
The Comparative Manifesto Project (CMP) provides the only time series of estimated party policy positions in political science and has been extensively used in a wide variety of applications. Recent work (e.g., Benoit, Laver, and Mikhaylov 2009; Klingemann et al. 2006) focuses on nonsystematic sources of error in these estimates that arise from the text generation process. Our concern here, by contrast, is with error that arises during the text coding process since nearly all manifestos are coded only once by a single coder. First, we discuss reliability and misclassification in the context of hand-coded content analysis methods. Second, we report results of a coding experiment that used trained human coders to code sample manifestos provided by the CMP, allowing us to estimate the reliability of both coders and coding categories. Third, we compare our test codings to the published CMP “gold standard ” codings of the test documents to assess accuracy and produce empirical estimates of a misclassification matrix for each coding category. Finally, we demonstrate the effect of coding misclassification on the CMP’s most widely used index, its left–right scale. Our findings indicate that misclassification is a serious and systemic problem with the current CMP data set and coding process, suggesting the CMP scheme should be significantly simplified to address reliability issues.
A Demographic Analysis of Online Sentiment during Hurricane Irene
"... We examine the response to the recent natural disaster Hurricane Irene on Twitter.com. We collect over 65,000 Twitter messages relating to Hurricane Irene from August 18th to August 31st, 2011, and group them by location and gender. We train a sentiment classifier to categorize messages based on lev ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
(Show Context)
We examine the response to the recent natural disaster Hurricane Irene on Twitter.com. We collect over 65,000 Twitter messages relating to Hurricane Irene from August 18th to August 31st, 2011, and group them by location and gender. We train a sentiment classifier to categorize messages based on level of concern, and then use this classifier to investigate demographic differences. We report three principal findings: (1) the number of Twitter messages related to Hurricane Irene in directly affected regions peaks around the time the hurricane hits that region; (2) the level of concern in the days leading up to the hurricane’s arrival is dependent on region; and (3) the level of concern is dependent on gender, with females being more likely to express concern than males. Qualitative linguistic variations further support these differences. We conclude that social media analysis provides a viable, real-time complement to traditional survey methods for understanding public perception towards an impending disaster.
Anaphoric Annotation of Wikipedia and Blogs in the Live Memories Corpus
- In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC
, 2010
"... The Live Memories corpus is an Italian corpus annotated for anaphoric relations. The corpus includes manual annotated information about morphosyntactic agreement, anaphoricity, and semantic class of the NPs. For the annotation of the anaphoric links the corpus takes into account specific phenomena o ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
The Live Memories corpus is an Italian corpus annotated for anaphoric relations. The corpus includes manual annotated information about morphosyntactic agreement, anaphoricity, and semantic class of the NPs. For the annotation of the anaphoric links the corpus takes into account specific phenomena of the Italian language like incorporated clitics and phonetically non realized pronouns. The Live Memories Corpus contains texts from the Italian Wikipedia about the region Trentino/Süd Tirol and from blog sites with users’ comments. It is planned to add a set of articles of local news papers. 1.
Restructuring the Social Sciences: Reflections from Harvard’s Institute for Quantitative Social Science *
, 2013
"... The social sciences are undergoing a dramatic transformation from studying problems to solving them; from making due with a small number of sparse data sets to analyzing increasing quantities of diverse, highly informative data ; from isolated scholars toiling away on their own to larger scale, coll ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
The social sciences are undergoing a dramatic transformation from studying problems to solving them; from making due with a small number of sparse data sets to analyzing increasing quantities of diverse, highly informative data ; from isolated scholars toiling away on their own to larger scale, collaborative, interdisciplinary, labstyle research teams ; and from a purely academic pursuit focused inward to having a major impact on public policy, commerce and industry, other academic fields, and some of the major problems that affect people and groups. In the midst of all this productive chaos, we have been building the Institute for Quantitative Social Science at Harvard, a new type of center intended to respond to and help foster these broader developments. We offer here some suggestions from our experiences for the increasing number of other universities that have begun to build similar institutions and for how we might work together to advance social science more generally.
Scalable Crisis Relief: Crowdsourced SMS Translation and categorization with Mission 4636
, 2010
"... Crowdsourced crisis response harnesses distributed human networks in combination with information and communication technology (ICT) to create scalable, flexible and rapid communication systems that promote well-being, survival, and recovery during the acute phase of an emergency. In this paper, we ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
Crowdsourced crisis response harnesses distributed human networks in combination with information and communication technology (ICT) to create scalable, flexible and rapid communication systems that promote well-being, survival, and recovery during the acute phase of an emergency. In this paper, we analyze a recent experi-ence in which CrowdFlower conducted crowdsourced translation, categorization and geo-tagging for SMS-based reporting as part of Mission 4636 after a 7.0 magnitude earthquake struck Haiti on Jan-uary 12, 2010. We discuss CrowdFlower’s approach to this task, lessons learned from the experience, and opportunities to general-ize the techniques and technologies involved for other ICT for de-velopment (ICTD) applications. We find that CrowdFlower’s most significant contribution to Mission 4636 and to the broader field of crowdsourced crisis relief lies in the flexible, scalable nature of the pool of earthquake survivors, volunteers, workers, and machines that the organization engaged during the emergency response ef-forts.