Results 1 - 10
of
381
Opinion Mining and Sentiment Analysis
, 2008
"... An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, active ..."
Abstract
-
Cited by 749 (3 self)
- Add to MetaCart
An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object. This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Our focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. We include materialon summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.
Time-Based Language Models
, 2003
"... We explore the relationship between time and relevance using TREC ad-hoc queries. A type of query is identified that favors very recent documents. We propose a time-based language model approach to retrieval for these queries. We show how time can be incorporated into both query-likelihood models an ..."
Abstract
-
Cited by 440 (36 self)
- Add to MetaCart
We explore the relationship between time and relevance using TREC ad-hoc queries. A type of query is identified that favors very recent documents. We propose a time-based language model approach to retrieval for these queries. We show how time can be incorporated into both query-likelihood models and relevance models. We carried out experiments to compare time-based language models to heuristic techniques for incorporating document recency in the ranking. Our results show that time-based models perform as well as or better than the best of the heuristic techniques.
Parsimonious Language Models for Information Retrieval
- In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 2004
"... We systematically investigate a new approach to estimating the parameters of language models for information retrieval, called parsimonious language models. Parsimonious language models explicitly address the relation between levels of language models that are typically used for smoothing. As such, ..."
Abstract
-
Cited by 322 (41 self)
- Add to MetaCart
(Show Context)
We systematically investigate a new approach to estimating the parameters of language models for information retrieval, called parsimonious language models. Parsimonious language models explicitly address the relation between levels of language models that are typically used for smoothing. As such, they need fewer (non-zero) parameters to describe the data. We apply parsimonious models at three stages of the retrieval process:1) at indexing time; 2) at search time; 3) at feedback time. Experimental results show that we are able to build models that are significantly smaller than standard models, but that still perform at least as well as the standard approaches.
Two-Stage language models for information retrieval
- In: Proc. of the 25th ACM SIGIR Conf
, 2002
"... The optimal settings of retrieval parameters often depend on both the document collection and the query, and are usually found through empirical tuning. In this paper, we propose a family of two-stage language models for information retrieval that explicitly captures the different influences of the ..."
Abstract
-
Cited by 265 (20 self)
- Add to MetaCart
The optimal settings of retrieval parameters often depend on both the document collection and the query, and are usually found through empirical tuning. In this paper, we propose a family of two-stage language models for information retrieval that explicitly captures the different influences of the query and document collection on the optimal settings of retrieval parameters. As a special case, we present a two-stage smoothing method that allows us to estimate the smoothing parameters completely automatically. In the first stage, the document language model is smoothed using a Dirichlet prior with the collection language model as the reference model. In the second stage, the smoothed document language model is further in-terpolated with a query background language model. We propose a leave-one-out method for estimating the Dirichlet parameter of the first stage, and the use of document mixture models for estimating the interpolation parameter of the second stage. Evaluation on five different databases and four types of queries indicates that the two-stage smoothing method with the proposed parameter estimation methods consistently gives retrieval performance that is close to— or better than—the best results achieved using a single smoothing method and exhaustive parameter search on the test data.
Model-based Feedback in the Language Modeling Approach to Information Retrieval
- In Proceedings of Tenth International Conference on Information and Knowledge Management
, 2001
"... The language modeling approach to retrieval has been shown to perform well empirically. One advantage of this new approach is its statistical foundations. However, feedback, as one important component in a retrieval system, has only been dealt with heuristically in this new retrieval approach: the o ..."
Abstract
-
Cited by 251 (24 self)
- Add to MetaCart
The language modeling approach to retrieval has been shown to perform well empirically. One advantage of this new approach is its statistical foundations. However, feedback, as one important component in a retrieval system, has only been dealt with heuristically in this new retrieval approach: the original query is usually literally expanded by adding ditional terms to it. Such expansion-based feedback creates an inconsistent interpretation of the original and the expanded query. In this paper, we present a more principled approach to feedback in the language modeling approach. Specifically, we treat feedback as updating the query language model based on the extra evidence carried by the feedback documents. Such a model-based feedback strategy easily fits into an extension of the language modeling approach. We propose and evaluate two different approaches to updating a query language model based on feedback documents, one based on a generarive probabilistic model of feedback documents and one based on minimization of the KL-divergence over feedback documents. Experiment resuits show that both approaches are effective and outperform the Rocchio feedback approach.
Probabilistic Models for Information Retrieval based on Divergence from Randomness
- ACM TRANSACTIONS ON INFORMATION SYSTEMS
, 2002
"... We introduce and create a framework for deriving probabilistic models of Information Retrieval. The models are nonparametric models of IR obtained in the language model approach. We derive term-weighting models by measuring the divergence of the actual term distribution from that obtained under a ra ..."
Abstract
-
Cited by 243 (5 self)
- Add to MetaCart
We introduce and create a framework for deriving probabilistic models of Information Retrieval. The models are nonparametric models of IR obtained in the language model approach. We derive term-weighting models by measuring the divergence of the actual term distribution from that obtained under a random process. Among the random processes we study the binomial distribution and Bose–Einstein statistics. We define two types of term frequency normalization for tuning term weights in the document–query matching process. The first normalization assumes that documents have the same length and measures the information gain with the observed term once it has been accepted as a good descriptor of the observed document. The second normalization is related to the document length and to other statistics. These two normalization methods are applied to the basic models in succession to obtain weighting formulae. Results show that our framework produces different nonparametric models forming baseline alternatives to the standard tf-idf model.
Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval
- In Proceedings of SIGIR
, 2003
"... We present a non-traditional retrieval problem we call subtopic retrieval. The subtopic retrieval problem is concerned with finding documents that cover many different subtopics of a query topic. This means that the utility of a document in a ranking is dependent on other documents in the ranking, v ..."
Abstract
-
Cited by 217 (6 self)
- Add to MetaCart
We present a non-traditional retrieval problem we call subtopic retrieval. The subtopic retrieval problem is concerned with finding documents that cover many different subtopics of a query topic. This means that the utility of a document in a ranking is dependent on other documents in the ranking, violating the assumption of independent relevance which is assumed in most traditional retrieval methods. Subtopic retrieval poses challenges for evaluating performance, as well as for developing effective algorithms. We propose a framework for evaluating subtopic retrieval which generalizes the traditional precision and recall metrics by accounting for intrinsic topic difficulty as well as redundancy in documents. We propose and systematically evaluate several methods for performing subtopic retrieval using statistical language models and a maximal marginal relevance (MMR) ranking strategy. A mixture model combined with query likelihood relevance ranking is shown to modestly outperform a baseline relevance ranking on a data set used in the TREC interactive track.
A support vector method for optimizing average precision
- In SIGIR ’07
, 2007
"... Machine learning is commonly used to improve ranked re-trieval systems. Due to computational difficulties, few learn-ing techniques have been developed to directly optimize for mean average precision (MAP), despite its widespread use in evaluating such systems. Existing approaches optimiz-ing MAP ei ..."
Abstract
-
Cited by 195 (7 self)
- Add to MetaCart
Machine learning is commonly used to improve ranked re-trieval systems. Due to computational difficulties, few learn-ing techniques have been developed to directly optimize for mean average precision (MAP), despite its widespread use in evaluating such systems. Existing approaches optimiz-ing MAP either do not find a globally optimal solution, or are computationally expensive. In contrast, we present a general SVM learning algorithm that efficiently finds a globally optimal solution to a straightforward relaxation of MAP. We evaluate our approach using the TREC 9 and TREC 10 Web Track corpora (WT10g), comparing against SVMs optimized for accuracy and ROCArea. In most cases we show our method to produce statistically significant im-provements in MAP scores.