Results 1 - 10
of
360
Personalizing search via automated analysis of interests and activities
, 2005
"... We formulate and study search algorithms that consider a user’s prior interactions with a wide variety of content to personalize that user’s current Web search. Rather than relying on the unrealistic assumption that people will precisely specify their intent when searching, we pursue techniques that ..."
Abstract
-
Cited by 303 (29 self)
- Add to MetaCart
(Show Context)
We formulate and study search algorithms that consider a user’s prior interactions with a wide variety of content to personalize that user’s current Web search. Rather than relying on the unrealistic assumption that people will precisely specify their intent when searching, we pursue techniques that leverage implicit information about the user’s interests. This information is used to re-rank Web search results within a relevance feedback framework. We explore rich models of user interests, built from both search-related information, such as previously issued queries and previously visited Web pages, and other information about the user such as documents and email the user has read and created. Our research suggests that rich representations of the user and the corpus are important for personalization, but that it is possible to approximate these representations and provide efficient client-side algorithms for personalizing search. We show that such personalization algorithms can significantly improve on current Web search.
Cluster-based retrieval using language models
- In Proceedings of SIGIR
, 2004
"... Previous research on cluster-based retrieval has been inconclusive as to whether it does bring improved retrieval effectiveness over document-based retrieval. Recent developments in the language modeling approach to IR have motivated us to re-examine this problem within this new retrieval framework. ..."
Abstract
-
Cited by 170 (13 self)
- Add to MetaCart
(Show Context)
Previous research on cluster-based retrieval has been inconclusive as to whether it does bring improved retrieval effectiveness over document-based retrieval. Recent developments in the language modeling approach to IR have motivated us to re-examine this problem within this new retrieval framework. We propose two new models for cluster-based retrieval and evaluate them on several TREC collections. We show that cluster-based retrieval can perform consistently across collections of realistic size, and significant improvements over document-based retrieval can be obtained in a fully automatic manner and without relevance information provided by human.
Understanding inverse document frequency: On theoretical arguments for IDF
- Journal of Documentation
, 2004
"... The term weighting function known as IDF was proposed in 1972, and has since been extremely widely used, usually as part of a TF*IDF function. It is often described as a heuristic, and many papers have been written (some based on Shannon’s Information Theory) seeking to establish some theoretical ba ..."
Abstract
-
Cited by 168 (2 self)
- Add to MetaCart
The term weighting function known as IDF was proposed in 1972, and has since been extremely widely used, usually as part of a TF*IDF function. It is often described as a heuristic, and many papers have been written (some based on Shannon’s Information Theory) seeking to establish some theoretical basis for it. Some of these attempts are reviewed, and it is shown that the Information Theory approaches are problematic, but that there are good theoretical justifications of both IDF and TF*IDF in traditional probabilistic model of information retrieval.
Inferring query performance using pre-retrieval predictors
- In Proc. Symposium on String Processing and Information Retrieval
, 2004
"... Abstract. The prediction of query performance is an interesting and important issue in Information Retrieval (IR). Current predictors involve the use of relevance scores, which are time-consuming to compute. Therefore, current predictors are not very suitable for practical applications. In this pape ..."
Abstract
-
Cited by 104 (6 self)
- Add to MetaCart
(Show Context)
Abstract. The prediction of query performance is an interesting and important issue in Information Retrieval (IR). Current predictors involve the use of relevance scores, which are time-consuming to compute. Therefore, current predictors are not very suitable for practical applications. In this paper, we study a set of predictors of query performance, which can be generated prior to the retrieval process. The linear and non-parametric correlations of the predictors with query performance are thoroughly assessed on the TREC disk4 and disk5 (minus CR) collections. According to the results, some of the proposed predictors have significant correlation with query performance, showing that these predictors can be useful to infer query performance in practical applications. 1
Have Things Changed Now? An Empirical Study of Bug Characteristics in Modern Open Source Software
- Proc. of 1st Workshop on Architectural and System Support for Improving Software Dependability
, 2006
"... Software errors are a major cause for system failures. To effectively design tools and support for detecting and recovering from software failures requires a deep understanding of bug 1 characteristics. Recently, software and its development process have significantly changed in many ways, including ..."
Abstract
-
Cited by 76 (14 self)
- Add to MetaCart
(Show Context)
Software errors are a major cause for system failures. To effectively design tools and support for detecting and recovering from software failures requires a deep understanding of bug 1 characteristics. Recently, software and its development process have significantly changed in many ways, including more help from bug detection tools, shift towards multi-threading architecture, the opensource development paradigm and increasing concerns about security and user-friendly interface. Therefore, results from previous studies may not be applicable to present software. Furthermore, many new aspects such as security, concurrency and open-sourcerelated characteristics have not well studied. Additionally, previous studies were based on a small number of bugs, which may lead to non-representative results. To investigate the impacts of the new factors on software errors,
A Risk Minimization Framework for Information Retrieval
- IN PROCEEDINGS OF THE ACM SIGIR 2003 WORKSHOP ON MATHEMATICAL/FORMAL METHODS IN IR. ACM
, 2003
"... This paper presents a novel probabilistic information retrieval framework in which the retrieval problem is formally treated as a statistical decision problem. In this framework, queries and documents are modeled using statistical language models (i.e., probabilistic models of text), user preference ..."
Abstract
-
Cited by 66 (2 self)
- Add to MetaCart
(Show Context)
This paper presents a novel probabilistic information retrieval framework in which the retrieval problem is formally treated as a statistical decision problem. In this framework, queries and documents are modeled using statistical language models (i.e., probabilistic models of text), user preferences are modeled through loss functions, and retrieval is cast as a risk minimization problem. We discuss how this framework can unify existing retrieval models and accommodate the systematic development of new retrieval models. As an example of using the framework to model non-traditional retrieval problems, we derive new retrieval models for subtopic retrieval, which is concerned with retrieving documents to cover many different subtopics of a general query topic. These new models differ from traditional retrieval models in that they go beyond independent topical relevance.
discriminant model for information retrieval
- In the Proceedings of SIGIR’2005
, 2005
"... This paper presents a new discriminative model for information retrieval (IR), referred to as linear discriminant model (LDM), which provides a flexible framework to incorporate arbitrary features. LDM is different from most existing models in that it takes into account a variety of linguistic featu ..."
Abstract
-
Cited by 64 (17 self)
- Add to MetaCart
This paper presents a new discriminative model for information retrieval (IR), referred to as linear discriminant model (LDM), which provides a flexible framework to incorporate arbitrary features. LDM is different from most existing models in that it takes into account a variety of linguistic features that are derived from the component models of HMM that is widely used in language modeling approaches to IR. Therefore, LDM is a means of melding discriminative and generative models for IR. We present two algorithms of parameter learning for LDM. One is to optimize the average precision (AP) directly using an iterative procedure. The other is a perceptron-based algorithm that minimizes the number of discordant document-pairs in a rank list. The effectiveness of our approach has been evaluated on the task of ad hoc retrieval using six English and Chinese TREC test sets. Results show that (1) in most test sets, LDM significantly outperforms the state-of-the-art language modeling approaches and the classical probabilistic retrieval model; (2) it is more appropriate to train LDM using a measure of AP rather than likelihood if the IR system is graded on AP; and (3) linguistic features (e.g. phrases and dependences) are effective for IR if they are incorporated properly.
Query Expansion using Associated Queries
- IN PROC. INT. CONF. ON INFORMATION AND KNOWLEDGE MANAGEMENT
, 2003
"... Hundreds of millions of users each day use web search engines to meet their information needs. Advances in web search e#ectiveness are therefore perhaps the most significant public outcomes of IR research. Query expansion is one such method for improving the e#ectiveness of ranked retrieval by ad ..."
Abstract
-
Cited by 45 (6 self)
- Add to MetaCart
Hundreds of millions of users each day use web search engines to meet their information needs. Advances in web search e#ectiveness are therefore perhaps the most significant public outcomes of IR research. Query expansion is one such method for improving the e#ectiveness of ranked retrieval by adding additional terms to a query. In previous approaches to query expansion, the additional terms are selected from highly ranked documents returned from an initial retrieval run. We propose a new method of obtaining expansion terms, based on selecting terms from past user queries that are associated with documents in the collection. Our