Results 1 -
8 of
8
A characterization of online browsing behavior.
- WWW
, 2010
"... ABSTRACT In this paper, we undertake a large-scale study of online user behavior based on search and toolbar logs. We propose a new CCS taxonomy of pageviews consisting of Content (news, portals, games, verticals, multimedia), Communication (email, social networking, forums, blogs, chat), and Searc ..."
Abstract
-
Cited by 33 (0 self)
- Add to MetaCart
(Show Context)
ABSTRACT In this paper, we undertake a large-scale study of online user behavior based on search and toolbar logs. We propose a new CCS taxonomy of pageviews consisting of Content (news, portals, games, verticals, multimedia), Communication (email, social networking, forums, blogs, chat), and Search (Web search, item search, multimedia search). We show that roughly half of all pageviews online are content, one-third are communications, and the remaining one-sixth are search. We then give further breakdowns to characterize the pageviews within each high-level category. We then study the extent to which pages of certain types are revisited by the same user over time, and the mechanisms by which users move from page to page, within and across hosts, and within and across page types. We consider robust schemes for assigning responsibility for a pageview to ancestors along the chain of referrals. We show that mail, news, and social networking pageviews are insular in nature, appearing primarily in homogeneous sessions of one type. Search pageviews, on the other hand, appear on the path to a disproportionate number of pageviews, but cannot be viewed as the principal mechanism by which those pageviews were reached. Finally, we study the burstiness of pageviews associated with a URL, and show that by and large, online browsing behavior is not significantly affected by "breaking" material with non-uniform visit frequency.
Modeling and analysis of cross-session search tasks
, 2011
"... The information needs of search engine users vary in complexity, depending on the task they are trying to accomplish. Some simple needs can be satisfied with a single query, whereas others require a series of queries over a longer period of time. While search engines effectively satisfy many simple ..."
Abstract
-
Cited by 30 (11 self)
- Add to MetaCart
(Show Context)
The information needs of search engine users vary in complexity, depending on the task they are trying to accomplish. Some simple needs can be satisfied with a single query, whereas others require a series of queries over a longer period of time. While search engines effectively satisfy many simple needs, searchers receive little support when their information needs span sessions. In this work, we propose methods for modeling and analyzing user search behavior that extends over multiple search sessions. We focus on two problems: (i) given a user query, identify all related queries from previous sessions that the user has issued, and (ii) given a multi-query task for a user, predict whether the user will return to this task in the future. We model both problems within a classification framework that uses features of individual queries and long-term user search behavior at different granularity. Experimental evaluation of the proposed models for both tasks indicates that it is possible to effectively model and analyze crosssession search behavior. Our findings have implications for improving search for complex information needs and designing search engine features to support cross-session search tasks.
Mining Concept Sequences from Large-Scale Search Logs for Context-Aware Query Suggestion
"... Query suggestion plays an important role in improving usability of search engines. Although some recently proposed methods provide query suggestions by mining query patterns from search logs, none of them models the immediately preceding queries as context systematically, and uses context informatio ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Query suggestion plays an important role in improving usability of search engines. Although some recently proposed methods provide query suggestions by mining query patterns from search logs, none of them models the immediately preceding queries as context systematically, and uses context information effectively in query suggestions. Context-aware query suggestion is challenging in both modeling context and scaling up query suggestion using context. In this article, we propose a novel context-aware query suggestion approach. To tackle the challenges, our approach consists of two stages. In the first, offline model-learning stage, to address data sparseness, queries are summarized into concepts by clustering a click-through bipartite. A concept sequence suffix tree is then constructed from session data as a context-aware query suggestion model. In the second, online query suggestion stage, a user’s search context is captured by mapping the query sequence submitted by the user to a sequence of concepts. By looking up the context in the concept sequence suffix tree, we suggest to the user context-aware queries. We test our approach on large-scale search logs of a commercial search engine containing 4.0 billion Web queries, 5.9 billion clicks, and 1.87 billion search sessions. The experimental results clearly show that our approach outperforms three baseline methods in both coverage and quality of suggestions.
Search, Interrupted: Understanding and Predicting Search Task Continuation
"... Many important search tasks require multiple search sessions to complete. Tasks such as travel planning, large purchases, or job searches can span hours, days, or even weeks. Inevitably, life interferes, requiring the searcher either to recover the “state ” of the search manually (most common), or p ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
Many important search tasks require multiple search sessions to complete. Tasks such as travel planning, large purchases, or job searches can span hours, days, or even weeks. Inevitably, life interferes, requiring the searcher either to recover the “state ” of the search manually (most common), or plan for interruption in advance (unlikely). The goal of this work is to better understand, characterize, and automatically detect search tasks that will be continued in the near future. To this end, we analyze a query log from the Bing Web search engine to identify the types of intents, topics, and search behavior patterns associated with long-running tasks that are likely to be continued. Using our insights, we develop an effective prediction algorithm that significantly outperforms both the previous state-of-the-art method, and even the ability of human judges, to predict future task continuation. Potential applications of our techniques would allow a search engine to preemptively “save state ” for a searcher (e.g., by caching search results), perform more targeted personalization, and otherwise better support the searcher experience for interrupted search tasks.
Modeling and Predicting the Task-by-Task Behavior of Search Engine Users
"... Web search engines answer user needs on a query-by-query fashion, namely they retrieve the set of the most relevant results to each issued query, independently. However, users often submit queries to perform multiple, related tasks. In this paper, we first discuss a methodology to discover from quer ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Web search engines answer user needs on a query-by-query fashion, namely they retrieve the set of the most relevant results to each issued query, independently. However, users often submit queries to perform multiple, related tasks. In this paper, we first discuss a methodology to discover from query logs the latent tasks performed by users. Furthermore, we introduce the Task Relation Graph (TRG) as a representation of users ’ search behaviors on a task-by-task perspective. The task-by-task behavior is captured by weighting the edges of TRG with a relatedness score computed between pairs of tasks, as mined from the query log. We validate our approach on a concrete application, namely a task recommender system, which suggests related tasks to users on the basis of the task predictions derived from the TRG. Finally, we show that the task recommendations generated by our solution are beyond the reach of existing query suggestion schemes, and that our method recommends tasks that user will likely perform in the near future.
Generating reformulation trees for complex queries
- In Proc. of the 35th ACM SIGIR
, 2012
"... Search queries have evolved beyond keyword queries. Many complex queries such as verbose queries, natural language questionqueriesanddocument-basedqueriesarewidelyused inavarietyofapplications. Processingthesecomplexqueries usually requires a series of query operations, which results in multiple seq ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Search queries have evolved beyond keyword queries. Many complex queries such as verbose queries, natural language questionqueriesanddocument-basedqueriesarewidelyused inavarietyofapplications. Processingthesecomplexqueries usually requires a series of query operations, which results in multiple sequences of reformulated queries. However, previous query representations, either the“bag of words”method or the recently proposed “query distribution”method, cannot effectively model these query sequences, since they ignore the relationships between two queries. In this paper, a reformulation tree framework is proposed to organize multiple sequences of reformulated queries as a tree structure, where each path of the tree corresponds to a sequence of reformulated queries. Specifically, a two-level reformulation tree is implemented for verbose queries. This tree effectively combines two query operations, i.e., subset selection and query substitution, within the same framework. Furthermore, a weight estimation approach is proposed to assign weights toeachnodeofthereformulation treebyconsidering the relationships between different nodes and directly optimizing retrieval performance. Experiments on TREC collections show that this reformulation tree based representation significantly outperforms the state-of-the-art techniques.
A vlHMM Approach to Context-Aware Search
"... Capturing the context of a user’s query from the previous queries and clicks in the same session leads to a better understanding of the user’s information need. A context-aware approach to document reranking, URL recommendation, and query suggestion may substantially improve users ’ search experienc ..."
Abstract
- Add to MetaCart
Capturing the context of a user’s query from the previous queries and clicks in the same session leads to a better understanding of the user’s information need. A context-aware approach to document reranking, URL recommendation, and query suggestion may substantially improve users ’ search experience. In this article, we propose a general approach to context-aware search by learning a variable length hidden Markov model (vlHMM) from search sessions extracted from log data. While the mathematical model is powerful, the huge amounts of log data present great challenges. We develop several distributed learning techniques to learn a very large vlHMM under the map-reduce framework. Moreover, we construct feature vectors for each state of the vlHMM model to handle users ’ novel queries not covered by the training data. We test our approach on a raw dataset consisting of 1.9 billion queries, 2.9 billion clicks, and 1.2 billion search sessions before filtering, and evaluate the effectiveness of the vlHMM learned from the real data on three search applications: document reranking, query suggestion, and URL recommendation. The experiment results validate the effectiveness of vlHMM in the applications of document reranking, URL recommendation, and query suggestion.
Enabling Direct Interest-Aware Audience Selection
"... Advertisers typically have a fairly accurate idea of the inter-ests of their target audience. However, today’s online adver-tising systems are unable to leverage this information. The reasons are two-fold. First, there is no agreed upon vocab-ulary of interests for advertisers and advertising system ..."
Abstract
- Add to MetaCart
(Show Context)
Advertisers typically have a fairly accurate idea of the inter-ests of their target audience. However, today’s online adver-tising systems are unable to leverage this information. The reasons are two-fold. First, there is no agreed upon vocab-ulary of interests for advertisers and advertising systems to communicate. More importantly, advertising systems lack a mechanism for mapping users to the interest vocabulary. In this paper, we tackle both problems. We present a sys-tem for direct interest-aware audience selection. This system takes the query histories of search engine users as input, ex-tracts their interests, and describes them with interpretable labels. The labels are not drawn from a predefined tax-onomy, but rather dynamically generated from the query histories, and are thus easy for the advertisers to interpret and use. In addition, the system enables seamless addition of interest labels provided by the advertiser. The proposed system runs at scale on millions of users and hundreds of millions of queries. Our experimental evaluation shows that our approach leads to a significant increase of over 50 % in the probability that a user will click on an ad related to a given interest.