Results 1 - 10
of
172
Thumbs up? Sentiment Classification using Machine Learning Techniques
- IN PROCEEDINGS OF EMNLP
, 2002
"... We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three mac ..."
Abstract
-
Cited by 377 (4 self)
- Add to MetaCart
We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging. 1
Opinion Mining and Sentiment Analysis
"... An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, active ..."
Abstract
-
Cited by 149 (3 self)
- Add to MetaCart
An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object. This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Our focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. We include materialon summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided. 1
Automatic Detection of Text Genre
, 1997
"... As the text databases available to users become larger and more heterogeneous, genre becomes increasingly important for computational linguistics as a complement to topical and structural principles of classification. We propose a th ..."
Abstract
-
Cited by 112 (0 self)
- Add to MetaCart
As the text databases available to users become larger and more heterogeneous, genre becomes increasingly important for computational linguistics as a complement to topical and structural principles of classification. We propose a th
Stylistic Experiments For Information Retrieval
, 2000
"... Information retrieval systems are built to handle texts as topical items: texts are tabulated by occurrence frequencies of content words in them, under the assumption that text topic is reasonably well modeled by content word occurrence. But texts have several interesting characteristics beyond topi ..."
Abstract
-
Cited by 47 (8 self)
- Add to MetaCart
Information retrieval systems are built to handle texts as topical items: texts are tabulated by occurrence frequencies of content words in them, under the assumption that text topic is reasonably well modeled by content word occurrence. But texts have several interesting characteristics beyond topic. The experiments described in this text investigate stylistic variation. Roughly put, style is the difference between two ways of saying the same thing -- and systematic stylistic variation can be used to characterize the genre of documents. These experiments investigate if stylistic information is distinguishable using simple language engineering methods, and if in that case this type of information can be used to improve information retrieval systems.
Integrating Automatic Genre Analysis into Digital Libraries
- IN FIRST ACM-IEEE JOINT CONF ON DIGITAL LIBRARIES
, 2001
"... With the number and types of documents in digital library systems increasing, tools for automatically organizing and presenting the content have to be found. While many approaches focus on topic-based organization and structuring, hardly any system incorporates automatic structural analysis and repr ..."
Abstract
-
Cited by 39 (2 self)
- Add to MetaCart
With the number and types of documents in digital library systems increasing, tools for automatically organizing and presenting the content have to be found. While many approaches focus on topic-based organization and structuring, hardly any system incorporates automatic structural analysis and representation. Yet, genre information (unconsciously) forms one of the most distinguishing features in conventional libraries and in information searches. In this paper we present an approach to automatically analyze the structure of documents and to integrate this information into an automatically created content-based organization. In the resulting visualization, documents on similar topics, yet representing different genres, are depicted as books in diering colors. This representation supports users intuitively in locating relevant information presented in a relevant form.
Assessing information quality of a community-based encyclopedia
- In Proceedings of the International Conference on Information Quality
, 2005
"... Effective information quality analysis needs powerful yet easy ways to obtain metrics. The English version of Wikipedia provides an extremely interesting yet challenging case for the study of Information Quality dynamics at both macro and micro levels. We propose seven IQ metrics which can be evalua ..."
Abstract
-
Cited by 36 (5 self)
- Add to MetaCart
Effective information quality analysis needs powerful yet easy ways to obtain metrics. The English version of Wikipedia provides an extremely interesting yet challenging case for the study of Information Quality dynamics at both macro and micro levels. We propose seven IQ metrics which can be evaluated automatically and test the set on a representative sample of Wikipedia content. The methodology of the metrics construction and the results of tests, along with a number of statistical characterizations of Wikipedia articles, their content construction, process metadata and social context are reported. 1.
Variation in the Contextuality of Language: An Empirical Measure
- Context in Context, Special issue of Foundations of Science
, 1998
"... : The concept of formality/contextuality is proposed as the most important dimension of variation between linguistic expressions. Formal communication conveys information explicitly, through the linguistic expression itself, whereas contextual communication conveys information implicitly, through th ..."
Abstract
-
Cited by 22 (2 self)
- Add to MetaCart
: The concept of formality/contextuality is proposed as the most important dimension of variation between linguistic expressions. Formal communication conveys information explicitly, through the linguistic expression itself, whereas contextual communication conveys information implicitly, through the context of the expression. An empirical measure of formality, the F-score, is proposed, based on the frequencies of different word classes. Nouns, adjectives, articles and prepositions are more frequent in formal styles; pronouns, adverbs, verbs and interjections are more frequent in contextual styles. This measure adequately distinguishes different genres of language production using data for Dutch, French, Italian, and English. Factor analyses applied to data in 7 different languages produce a similar factor as the most important one. Both the data and the theoretical model suggest that contextuality decreases when unambiguous understanding becomes more important or more difficult to ach...
The form is the substance: Classification of genres in text
- In ACL Workshop on Human Language Technology and Knowledge Management
, 2001
"... Categorization of text in IR has traditionally focused on topic. As use of the Internet and e−mail increases, categorization has become a key area of research as users demand methods of prioritizing documents. This work investigates text classification by format style, i.e. "genre", and demonstrates ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
Categorization of text in IR has traditionally focused on topic. As use of the Internet and e−mail increases, categorization has become a key area of research as users demand methods of prioritizing documents. This work investigates text classification by format style, i.e. "genre", and demonstrates, by complementing topic classification, that it can significantly improve retrieval of information. The paper compares use of presentation features to word features, and the combination thereof, using Naïve Bayes, C4.5 and SVM classifiers. Results show use of combined feature sets with SVM yields 92% classification accuracy in sorting seven genres. 1

