Results 1 - 10
of
10
Constructing a test collection with multi-intent queries
- In Sakai et al
"... Users often issue vague queries; when we cannot predict their intents precisely, a natural solution is to diversify the search results, hoping that some of the results correspond to the intent: This is usually called “result diversification”. Only a few studies have been completed to systematically ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Users often issue vague queries; when we cannot predict their intents precisely, a natural solution is to diversify the search results, hoping that some of the results correspond to the intent: This is usually called “result diversification”. Only a few studies have been completed to systematically evaluate approaches on result diversity. Some questions still remain unanswered: 1) As we cannot exhaustively list all intents in an evaluation, how does an incomplete intent set influence evaluation results? 2) Intents are not equally popular; so how can we estimate the probability of each intent? In this paper, we address these questions in building up a test collection for multi-intent queries. The labeling tool that we have developed allows assessors to add new intents while performing relevance assessments. Thus, we can investigate the influence of an incomplete intent set through experiments. Moreover, we propose two simple methods to estimate the probabilities of the underlying intents. Experimental results indicate that the evaluation results are different if we take the probabilities into consideration.
Graphbased marginal ranking for update summarization
- In Proceedings of the Eleventh SIAM International Conference on Data Mining. SIAM / Omnipress
, 2011
"... Update summarization is to summarize a document collection B given that the users have already read another document collection A, which has time stamp prior to that of B. An important and challenging issue in update summarization is that contents in B already covered by A should be excluded from th ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Update summarization is to summarize a document collection B given that the users have already read another document collection A, which has time stamp prior to that of B. An important and challenging issue in update summarization is that contents in B already covered by A should be excluded from the update summary. In this paper, we propose a graphbased regularization framework MarginRank for update summarization. MarginRank extends the cost function of Zhou’s Manifold Ranking with suppression terms, suppression of A on B, to fulfil the assumption that users have read A. MarginRank ranks sentences in B in a way that the top ranked sentences are most important and at the same time cover different contents from A. Experiments on the benchmark data sets TAC 2008 and 2009 show the effectiveness of the proposed method. 1
Recommendations using Absorbing Random Walks
"... Collaborative filtering attempts to find items of interest for a user by utilizing the preferences of other users. In this paper we describe an approach to filtering that explicitly uses social relationships, such as friendship, to find items of interest to a user. Modeling user-item relations as a ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Collaborative filtering attempts to find items of interest for a user by utilizing the preferences of other users. In this paper we describe an approach to filtering that explicitly uses social relationships, such as friendship, to find items of interest to a user. Modeling user-item relations as a bipartite graph we augment it with user-user (social) links and propose an absorbing random walk that induces a set of stationary distributions, one per user, over all items. These distributions can be interpreted as personalized rankings for each user. We exploit sparsity of both user-item and user-user relationships to improve the efficiency of our algorithm. 1.
Linear Submodular Bandits and their Application to Diversified Retrieval
"... Diversified retrieval and online learning are two core research areas in the design of modern information retrieval systems. In this paper, we propose the linear submodular bandits problem, which is an online learning setting for optimizing a general class of feature-rich submodular utility models f ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Diversified retrieval and online learning are two core research areas in the design of modern information retrieval systems. In this paper, we propose the linear submodular bandits problem, which is an online learning setting for optimizing a general class of feature-rich submodular utility models for diversified retrieval. We present an algorithm, called LSBGREEDY, and prove that it efficiently converges to a near-optimal model. As a case study, we applied our approach to the setting of personalized news recommendation, where the system must recommend small sets of news articles selected from tens of thousands of available articles each day. In a live user study, we found that LSBGREEDY significantly outperforms existing online learning approaches. 1
NEW LEARNING FRAMEWORKS FOR INFORMATION RETRIEVAL
, 2011
"... Recent advances in machine learning have enabled the training of increasingly complex information retrieval models. This dissertation proposes principled approaches to formalize the learning problems for information retrieval, with an eye towards developing a unified learning framework. This will co ..."
Abstract
- Add to MetaCart
Recent advances in machine learning have enabled the training of increasingly complex information retrieval models. This dissertation proposes principled approaches to formalize the learning problems for information retrieval, with an eye towards developing a unified learning framework. This will conceptually simplify the overall development process, making it easier to reason about higher level goals and properties of the retrieval system. This dissertation advocates two complementary approaches, structured prediction and interactive learning, to learn feature-rich retrieval models that can perform well in practice.
Search and Retrieval—Query formulation
"... Query recommendation has been considered as an effective way to help search users in their information seeking activities. Traditional approaches mainly focused on recommending alternative queries with close search intent to the original query. However, to only take relevance into account may genera ..."
Abstract
- Add to MetaCart
Query recommendation has been considered as an effective way to help search users in their information seeking activities. Traditional approaches mainly focused on recommending alternative queries with close search intent to the original query. However, to only take relevance into account may generate redundant recommendations to users. It is better to provide diverse as well as relevant query recommendations, so that we can cover multiple potential search intents of users and minimize the risk that users will not be satisfied. Besides, previous query recommendation approaches mostly relied on measuring the relevance or similarity between queries in the Euclidean space. However, there is no convincing evidence that the query space is Euclidean. It is more natural and reasonable to assume that the query space is a manifold. In this paper, therefore, we aim to recommend diverse and relevant queries based on the intrinsic query manifold. We propose a unified model, named manifold ranking with stop points, for query recommendation. By turning ranked queries into stop points on the query manifold, our approach can generate query recommendations by simultaneously considering both diversity and relevance in a unified way. Empirical experimental results on a large scale query log of a commercial search engine show that our approach can effectively generate highly diverse as well as closely related query recommendations.
Diversified Ranking on Large Graphs: An Optimization Viewpoint
"... Diversified ranking on graphs is a fundamental mining task and has a variety of high-impact applications. There are two important open questions here. The first challenge is the measure- how to quantify the goodness of a given top-k ranking list that captures both the relevance and the diversity? Th ..."
Abstract
- Add to MetaCart
Diversified ranking on graphs is a fundamental mining task and has a variety of high-impact applications. There are two important open questions here. The first challenge is the measure- how to quantify the goodness of a given top-k ranking list that captures both the relevance and the diversity? The second challenge lies in the algorithmic aspect- how to find an optimal, or near-optimal, top-k ranking list that maximizes the measure we defined in a scalable way? In this paper, we address these challenges from an optimization point of view. Firstly, we propose a goodness measure for a given top-k ranking list. The proposed goodness measure intuitively captures both (a) the relevance between each individual node in the ranking list and the query; and (b) the diversity among different nodes in the ranking list. Moreover, we propose a scalable algorithm (linear wrt the size of the graph) that generates a provably near-optimal solution. The experimental evaluations on real graphs demonstrate its effectiveness and efficiency.
MMI Diversity Based Text Summarization
"... The search for interesting information in a huge data collection is a tough job frustrating the seekers for that information. The automatic text summarization has come to facilitate such searching process. The selection of distinct ideas “diversity ” from the original document can produce an appropr ..."
Abstract
- Add to MetaCart
The search for interesting information in a huge data collection is a tough job frustrating the seekers for that information. The automatic text summarization has come to facilitate such searching process. The selection of distinct ideas “diversity ” from the original document can produce an appropriate summary. Incorporating of multiple means can help to find the diversity in the text. In this paper, we propose approach for text summarization, in which three evidences are employed (clustering, binary tree and diversity based method) to help in finding the document distinct ideas. The emphasis of our approach is on controlling the redundancy in the summarized text. The role of clustering is very important, where some clustering algorithms perform better than others. Therefore we conducted an experiment for comparing two clustering algorithms (K-means and complete linkage clustering algorithms) based on the performance of our method, the results shown that k-means performs better than complete linkage. In general, the experimental results shown that our method performs well for text summarization comparing with the benchmark methods used in this study.
TMSP: Topic Guided Manifold Ranking with Sink Points for Guided Summarization
"... Guided summarization is an extension of query-focused multi- document summarization. We proposed a novel ranking algorithm, Topic Guided Manifold Ranking with Sink Points (TMSP) for guided summarization tasks of TAC2010. TMSP is a topic extended version of Manifold Ranking with Sink Points (MRSP), w ..."
Abstract
- Add to MetaCart
Guided summarization is an extension of query-focused multi- document summarization. We proposed a novel ranking algorithm, Topic Guided Manifold Ranking with Sink Points (TMSP) for guided summarization tasks of TAC2010. TMSP is a topic extended version of Manifold Ranking with Sink Points (MRSP), which handles the Update Summarization tasks of TAC2009 well. We adopt the TMSP and MRSP methods to guided summarization this year. The evaluation results show that our approaches are both competitive in practice. 1

