Results 1 - 10
of
330
Optimizing Search Engines using Clickthrough Data
, 2002
"... This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches ..."
Abstract
-
Cited by 1314 (23 self)
- Add to MetaCart
This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches to learning retrieval functions from examples exist, they typically require training data generated from relevance judgments by experts. This makes them difficult and expensive to apply. The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking. Such clickthrough data is available in abundance and can be recorded at very low cost. Taking a Support Vector Machine (SVM) approach, this paper presents a method for learning retrieval functions. From a theoretical perspective, this method is shown to be well-founded in a risk minimization framework. Furthermore, it is shown to be feasible even for large sets of queries and features. The theoretical results are verified in a controlled experiment. It shows that the method can effectively adapt the retrieval function of a meta-search engine to a particular group of users, outperforming Google in terms of retrieval quality after only a couple of hundred training examples.
The political blogosphere and the 2004 U.S. election: Divided they blog
, 2005
"... In this paper, we study the linking patterns and discussion topics of political bloggers. Our aim is to measure the degree of interaction between liberal and conservative blogs, and to uncover any differences in the structure of the two communities. Specifically, we analyze the posts of 40 “A-list” ..."
Abstract
-
Cited by 369 (11 self)
- Add to MetaCart
In this paper, we study the linking patterns and discussion topics of political bloggers. Our aim is to measure the degree of interaction between liberal and conservative blogs, and to uncover any differences in the structure of the two communities. Specifically, we analyze the posts of 40 “A-list” blogs over the period of two months preceding the U.S. Presidential Election of 2004, to study how often they referred to one another and to quantify the overlap in the topics they discussed, both within the liberal and conservative communities, and also across communities. We also study a single day snapshot of over 1,000 political blogs. This snapshot captures blogrolls (the list of links to other blogs frequently found in sidebars), and presents a more static picture of a broader blogosphere. Most significantly, we find differences in the behavior of liberal and conservative blogs, with conservative blogs linking to each other more frequently and in a denser pattern.
Item-Based Top-N Recommendation Algorithms
- ACM TRANSACTIONS ON INFORMATION SYSTEMS
, 2004
"... ... In this paper we present one such class of model-based recommendation algorithms that first determines the similarities between the various items and then uses them to identify the set of items to be recommended. The key steps in this class of algorithms are (i) the method used to compute the si ..."
Abstract
-
Cited by 306 (2 self)
- Add to MetaCart
... In this paper we present one such class of model-based recommendation algorithms that first determines the similarities between the various items and then uses them to identify the set of items to be recommended. The key steps in this class of algorithms are (i) the method used to compute the similarity between the items, and (ii) the method used to combine these similarities in order to compute the similarity between a basket of items and a candidate recommender item. Our experimental evaluation on eight real datasets shows that these item-based algorithms are up to two orders of magnitude faster than the traditional user-neighborhood based recommender systems and provide recommendations with comparable or better quality
Query chains: Learning to rank from implicit feedback
- In ACM SIGKDD International Conference On Knowledge Discovery and Data Mining (KDD
, 2005
"... This paper presents a novel approach for using clickthrough data to learn ranked retrieval functions for web search results. We observe that users searching the web often perform a sequence, or chain, of queries with a similar information need. Using query chains, we generate new types of preference ..."
Abstract
-
Cited by 240 (10 self)
- Add to MetaCart
(Show Context)
This paper presents a novel approach for using clickthrough data to learn ranked retrieval functions for web search results. We observe that users searching the web often perform a sequence, or chain, of queries with a similar information need. Using query chains, we generate new types of preference judgments from search engine logs, thus taking advantage of user intelligence in reformulating queries. To validate our method we perform a controlled user study comparing generated preference judgments to explicit relevance judgments. We also implemented a real-world search engine to test our approach, using a modified ranking SVM to learn an improved ranking function from preference data. Our results demonstrate significant improvements in the ranking given by the search engine. The learned rankings outperform both a static ranking function, as well as one trained without considering query chains.
Evaluation of Item-Based Top-N Recommendation Algorithms
, 2000
"... The explosive growth of the world-wide-web and the emergence of e-commerce has led to the development of recommender systems---a personalized information filtering technology used to identify a set of N items that will be of interest to a certain user. User-based Collaborative filtering is the mos ..."
Abstract
-
Cited by 178 (3 self)
- Add to MetaCart
(Show Context)
The explosive growth of the world-wide-web and the emergence of e-commerce has led to the development of recommender systems---a personalized information filtering technology used to identify a set of N items that will be of interest to a certain user. User-based Collaborative filtering is the most successful technology for building recommender systems to date, and is extensively used in many commercial recommender systems. Unfortunately, the computational complexity of these methods grows linearly with the number of customers that in typical commercial applications can grow to be several millions. To address these scalability concerns item-based recommendation techniques have been developed that analyze the user-item matrix to identify relations between the different items, and use these relations to compute the list of recommendations. In this paper we present one such class of item-based recommendation algorithms that first determine the similarities between the various ite...
Automatic identification of user goals in web search
, 2004
"... There have been recent interests in studying the “goal ” behind a user’s Web query, so that this goal can be used to improve the quality of a search engine’s results. Previous studies have mainly focused on using manual query-log investigation to identify Web query goals. In this paper we study whet ..."
Abstract
-
Cited by 149 (3 self)
- Add to MetaCart
(Show Context)
There have been recent interests in studying the “goal ” behind a user’s Web query, so that this goal can be used to improve the quality of a search engine’s results. Previous studies have mainly focused on using manual query-log investigation to identify Web query goals. In this paper we study whether and how we can automate this goal-identification process. We first present our results from a human subject study that strongly indicate the feasibility of automatic query-goal identification. We then propose two types of features for the goal-identification task: user-click behavior and anchor-link distribution. Our experimental evaluation shows that by combining these features we can correctly identify the goals for 90 % of the queries studied.
Query recommendation using query logs in search engines
- IN INTERNATIONAL WORKSHOP ON CLUSTERING INFORMATION OVER THE WEB (CLUSTWEB, IN CONJUNCTION WITH EDBT), CREETE
, 2004
"... In this paper we propose a method that, given a query submitted to a search engine, suggests a list of related queries. The related queries are based in previously issued queries, and can be issued by the user to the search engine to tune or redirect the search process. The method proposed is based ..."
Abstract
-
Cited by 134 (8 self)
- Add to MetaCart
(Show Context)
In this paper we propose a method that, given a query submitted to a search engine, suggests a list of related queries. The related queries are based in previously issued queries, and can be issued by the user to the search engine to tune or redirect the search process. The method proposed is based on a query clustering process in which groups of semantically similar queries are identified. The clustering process uses the content of historical preferences of users registered in the query log of the search engine. The method not only discovers the related queries, but also ranks them according to a relevance criterion. Finally, we show with experiments over the query log of a search engine the effectiveness of the method.
SearchTogether: An Interface for Collaborative Web Search
- UIST
, 2007
"... Studies of search habits reveal that people engage in many search tasks involving collaboration with others, such as travel planning, organizing social events, or working on a homework assignment. However, current Web search tools are designed for a single user, working alone. We introduce SearchTog ..."
Abstract
-
Cited by 133 (15 self)
- Add to MetaCart
(Show Context)
Studies of search habits reveal that people engage in many search tasks involving collaboration with others, such as travel planning, organizing social events, or working on a homework assignment. However, current Web search tools are designed for a single user, working alone. We introduce SearchTogether, a prototype that enables groups of remote users to synchronously or asynchronously collaborate when searching the Web. We describe an example usage scenario, and discuss the ways SearchTogether facilitates collaboration by supporting awareness, division of labor, and persistence. We then discuss the findings of our evaluation of SearchTogether, analyzing which aspects of its design enabled successful collaboration among study participants. ACM Classification: H5.3 [Information interfaces and
Extracting semantic relations from query logs
- In SIGKDD
, 2007
"... In this paper we study a large query log of more than twenty million queries with the goal of extracting the semantic re-lations that are implicitly captured in the actions of users submitting queries and clicking answers. Previous query log analyses were mostly done with just the queries and not th ..."
Abstract
-
Cited by 127 (8 self)
- Add to MetaCart
(Show Context)
In this paper we study a large query log of more than twenty million queries with the goal of extracting the semantic re-lations that are implicitly captured in the actions of users submitting queries and clicking answers. Previous query log analyses were mostly done with just the queries and not the actions that followed after them. We rst propose a novel way to represent queries in a vector space based on a graph derived from the query-click bipartite graph. We then an-alyze the graph produced by our query log, showing that it is less sparse than previous results suggested, and that almost all the measures of these graphs follow power laws, shedding some light on the searching user behavior as well as on the distribution of topics that people want in the Web. The representation we introduce allows to infer interesting semantic relationships between queries. Second, we provide an experimental analysis on the quality of these relations, showing that most of them are relevant. Finally we sketch an application that detects multitopical URLs.
Empirical and theoretical comparisons of selected criterion functions for document clustering
- Machine Learning
"... Abstract. This paper evaluates the performance of different criterion functions in the context of partitional clustering algorithms for document datasets. Our study involves a total of seven different criterion functions, three of which are introduced in this paper and four that have been proposed i ..."
Abstract
-
Cited by 116 (6 self)
- Add to MetaCart
(Show Context)
Abstract. This paper evaluates the performance of different criterion functions in the context of partitional clustering algorithms for document datasets. Our study involves a total of seven different criterion functions, three of which are introduced in this paper and four that have been proposed in the past. We present a comprehensive experimental evaluation involving 15 different datasets, as well as an analysis of the characteristics of the various criterion functions and their effect on the clusters they produce. Our experimental results show that there are a set of criterion functions that consistently outperform the rest, and that some of the newly proposed criterion functions lead to the best overall results. Our theoretical analysis shows that the relative performance of the criterion functions depends on (i) the degree to which they can correctly operate when the clusters are of different tightness, and (ii) the degree to which they can lead to reasonably balanced clusters. Keywords: