Results 1 - 10
of
86
Discovering key concepts in verbose queries
- In Proc. of SIGIR
, 2008
"... Current search engines do not, in general, perform well with longer, more verbose queries. One of the main issues in processing these queries is identifying the key concepts that will have the most impact on effectiveness. In this paper, we develop and evaluate a technique that uses query-dependent, ..."
Abstract
-
Cited by 85 (19 self)
- Add to MetaCart
(Show Context)
Current search engines do not, in general, perform well with longer, more verbose queries. One of the main issues in processing these queries is identifying the key concepts that will have the most impact on effectiveness. In this paper, we develop and evaluate a technique that uses query-dependent, corpus-dependent, and corpus-independent features for automatic extraction of key concepts from verbose queries. We show that our method achieves higher accuracy in the identification of key concepts than standard weighting methods such as inverse document frequency. Finally, we propose a probabilistic model for integrating the weighted key concepts identified by our method into a query, and demonstrate that this integration significantly improves retrieval effectiveness for a large set of natural language description queries derived from TREC topics on several newswire and web collections.
Using the wisdom of the crowds for keyword generation
- In WWW
, 2008
"... In the sponsored search model, search engines are paid by businesses that are interested in displaying ads for their site alongside the search results. Businesses bid for keywords, and their ad is displayed when the keyword is queried to the search engine. An important problem in this process is key ..."
Abstract
-
Cited by 53 (3 self)
- Add to MetaCart
(Show Context)
In the sponsored search model, search engines are paid by businesses that are interested in displaying ads for their site alongside the search results. Businesses bid for keywords, and their ad is displayed when the keyword is queried to the search engine. An important problem in this process is keyword generation: given a business that is interested in launching a campaign, suggest keywords that are related to that campaign. We address this problem by making use of the query logs of the search engine. We identify queries related to a campaign by exploiting the associations between queries and URLs as they are captured by the user’s clicks. These queries form good keyword suggestions since they capture the “wisdom of the crowd ” as to what is related to a site. We formulate the problem as a semi-supervised learning problem, and propose algorithms within the Markov Random Field model. We perform experiments with real query logs, and we demonstrate that our algorithms scale to large query logs and produce meaningful results.
Search advertising using web relevance feedback
- In Proc 17th. Intl. Conf. on Information and Knowledge Management
, 2008
"... The business of Web search, a $10 billion industry, relies heavily on sponsored search, whereas a few carefully-selected paid advertisements are displayed alongside algorithmic search results. A key technical challenge in sponsored search is to select ads that are relevant for the user’s query. Iden ..."
Abstract
-
Cited by 43 (15 self)
- Add to MetaCart
(Show Context)
The business of Web search, a $10 billion industry, relies heavily on sponsored search, whereas a few carefully-selected paid advertisements are displayed alongside algorithmic search results. A key technical challenge in sponsored search is to select ads that are relevant for the user’s query. Identifying relevant ads is challenging because queries are usually very short, and because users, consciously or not, choose terms intended to lead to optimal Web search results and not to optimal ads. Furthermore, the ads themselves are short and usually formulated to capture the reader’s attention rather than to facilitate query matching. Traditionally, matching of ads to queries employed standard information retrieval techniques using the bag of words approach. Here we propose to go beyond the bag of words, and augment both queries and ads with additional knowledgerich features. We use Web search results initially returned for the query to create a pool of relevant documents. Classifying these documents with respect to an external taxonomy and identifying salient named entities give rise to two new feature types. Empirical evaluation based on over 9,000 query-ad pairwise judgments confirms that using augmented queries produces highly relevant ads. Our methodology also relaxes the requirement for each ad to explicitly specify the exhaustive list of queries (“bid phrases”) that can trigger it.
Contextual Advertising by Combining Relevance with Click Feedback
, 2008
"... Contextual advertising supports much of the Web’s ecosystem today. User experience and revenue (shared by the site publisher ad the ad network) depend on the relevance of the displayed ads to the page content. As with other document retrieval systems, relevance is provided by scoring the match betwe ..."
Abstract
-
Cited by 38 (7 self)
- Add to MetaCart
(Show Context)
Contextual advertising supports much of the Web’s ecosystem today. User experience and revenue (shared by the site publisher ad the ad network) depend on the relevance of the displayed ads to the page content. As with other document retrieval systems, relevance is provided by scoring the match between individual ads (documents) and the content of the page where the ads are shown (query). In this paper we show how this match can be improved significantly by augmenting the ad-page scoring function with extra parameters from a logistic regression model on the words in the pages and ads. A key property of the proposed model is that it can be mapped to standard cosine similarity matching and is suitable for efficient and scalable implementation over inverted indexes. The model parameter values are learnt from logs containing ad impressions and clicks, with shrinkage estimators being used to combat sparsity. To scale our computations to train on an extremely large training corpus consisting of several gigabytes of data, we parallelize our fitting algorithm in a Hadoop [10] framework. Experimental evaluation is provided showing improved click prediction over a holdout set of impression and click events from a large scale real-world ad placement engine. Our best model achieves a 25 % lift in precision relative to a traditional information retrieval model which is based on cosine similarity, for recalling 10 % of the clicks in our test data.
Extracting key terms from noisy and multi-theme documents
, 2009
"... We present a novel method for key term extraction from text documents. In our method, document is modeled as a graph of semantic relationships between terms of that document. We exploit the following remarkable feature of the graph: the terms related to the main topics of the document tend to bunch ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
We present a novel method for key term extraction from text documents. In our method, document is modeled as a graph of semantic relationships between terms of that document. We exploit the following remarkable feature of the graph: the terms related to the main topics of the document tend to bunch up into densely interconnected subgraphs or communities, while non-important terms fall into weakly interconnected communities, or even become isolated vertices. We apply graph community detection techniques to partition the graph into thematically cohesive groups of terms. We introduce a criterion function to select groups that contain key terms discarding groups with unimportant terms. To weight terms and determine semantic relatedness between them we exploit information extracted from Wikipedia. Using such an approach gives us the following two advantages. First, it allows effectively processing multi-theme documents. Second, it is good at filtering out noise information in the document, such as, for example, navigational bars or headers in web pages. Evaluations of the method show that it outperforms existing methods producing key terms with higher precision and recall. Additional experiments on web pages prove that our method appears to be substantially more effective on noisy and multi-theme documents than existing methods.
Online learning from click data for sponsored search
- In Proceedings of the 17th international conference on World Wide Web
"... Sponsored search is one of the enabling technologies for today’s Web search engines. It corresponds to matching and showing ads related to the user query on the search engine results page. Users are likely to click on topically related ads and the advertisers pay only when a user clicks on their ad. ..."
Abstract
-
Cited by 33 (1 self)
- Add to MetaCart
(Show Context)
Sponsored search is one of the enabling technologies for today’s Web search engines. It corresponds to matching and showing ads related to the user query on the search engine results page. Users are likely to click on topically related ads and the advertisers pay only when a user clicks on their ad. Hence, it is important to be able to predict if an ad is likely to be clicked, and maximize the number of clicks. We investigate the sponsored search problem from a machine learning perspective with respect to three main sub-problems: how to use click data for training and evaluation, which learning framework is more suitable for the task, and which features are useful for existing models. We perform a large scale evaluation based on data from a commercial Web search engine. Results show that it is possible to learn and evaluate directly and exclusively on click data encoding pairwise preferences following simple and conservative assumptions. Furthermore, we find that online multilayer perceptron learning, based on a small set of features representing content similarity of different kinds, significantly outperforms an information retrieval baseline and other learning models, providing a suitable framework for the sponsored search task.
AdROSA – Adaptive personalization of web advertising
- Information Sciences
, 2007
"... Abstract. One of the greatest and most recent challenges for online advertising is the use of adaptive personalization at the same time that the Internet continues to grow as a global market. Most existing solutions to online advertising placement are based on demographic targeting or on information ..."
Abstract
-
Cited by 30 (5 self)
- Add to MetaCart
(Show Context)
Abstract. One of the greatest and most recent challenges for online advertising is the use of adaptive personalization at the same time that the Internet continues to grow as a global market. Most existing solutions to online advertising placement are based on demographic targeting or on information gained directly from the user. The AdROSA system for automatic web banner personalization, which integrates web usage and content mining techniques to reduce user input and to respect users ' privacy, is presented in the paper. Furthermore, certain advertising policies, important factors for both publishers and advertisers, are taken into consideration. The integration of all the relevant information is accomplished in one vector space to enable online and fully personalized advertising.
Improving Ad Relevance in Sponsored Search
"... We describe a machine learning approach for predicting sponsored search ad relevance. Our baseline model incorporates basic features of text overlap and we then extend the model to learn from past user clicks on advertisements. We present a novel approach using translation models to learn user click ..."
Abstract
-
Cited by 25 (4 self)
- Add to MetaCart
We describe a machine learning approach for predicting sponsored search ad relevance. Our baseline model incorporates basic features of text overlap and we then extend the model to learn from past user clicks on advertisements. We present a novel approach using translation models to learn user click propensity from sparse click logs. Our relevance predictions are then applied to multiple sponsored search applications in both offline editorial evaluations and live online user tests. The predicted relevance score is used to improve the quality of the search page in three areas: filtering low quality ads, more accurate ranking for ads, and optimized page placement of ads to reduce prominent placement of low relevance ads. We show significant gains across all three tasks.
To Swing or not to Swing: Learning when (not) to Advertise
"... Web textual advertising can be interpreted as a search problem over the corpus of ads available for display in a particular context. In contrast to conventional information retrieval systems, which always return results if the corpus contains any documents lexically related to the query, in Web adve ..."
Abstract
-
Cited by 24 (4 self)
- Add to MetaCart
(Show Context)
Web textual advertising can be interpreted as a search problem over the corpus of ads available for display in a particular context. In contrast to conventional information retrieval systems, which always return results if the corpus contains any documents lexically related to the query, in Web advertising it is acceptable, and occasionally even desirable, not to show any results. When no ads are relevant to the user’s interests, then showing irrelevant ads should be avoided since they annoy the user and produce no economic benefit. In this paper we pose a decision problem “whether to swing”, that is, whether or not to show any of the ads for the incoming request. We propose two methods for addressing this problem, a simple thresholding approach and a machine learning approach, which collectively analyzes the set of candidate ads augmented with external knowledge. Our experimental evaluation, based on over 28,000 editorial judgments, shows that we are able to predict, with high accuracy, when to“swing”for both content match and sponsored search advertising.
Multi-Label Learning with Millions of Labels: Recommending Advertiser Bid Phrases for Web Pages
"... Recommending phrases from web pages for advertisers to bid on against search engine queries is an important research problem with direct commercial impact. Most approaches have found it infeasible to determine the relevance of all possible queries to a given ad landing page and have focussed on maki ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
(Show Context)
Recommending phrases from web pages for advertisers to bid on against search engine queries is an important research problem with direct commercial impact. Most approaches have found it infeasible to determine the relevance of all possible queries to a given ad landing page and have focussed on making recommendations from a small set of phrases extracted (and expanded) from the page using NLP and ranking based techniques. In this paper, we eschew this paradigm, and demonstrate that it is possible to efficiently predicttherelevantsubsetofqueriesfromalargesetofmonetizable ones by posing the problem as a multi-label learning task with each query being represented by a separate label. We develop Multi-label Random Forests to tackle problems with millions of labels. Our proposed classifier has prediction costs that are logarithmic in the number of labels and can make predictions in a few milliseconds using 10 Gb of RAM. We demonstrate that it is possible to generate training data for our classifier automatically from click logs withoutanyhumanannotationorintervention. Wetrainour classifier on tens of millions of labels, features and training points in less than two days on a thousand node cluster. We develop a sparse semi-supervised multi-label learning formulation to deal with training set biases and noisy labels harvested automatically from the click logs. This formulation is used to infer a belief in the state of each label for each training ad and the random forest classifier is extended to train on these beliefs rather than the given labels. Experiments reveal significant gains over ranking and NLP based techniques on a large test set of 5 million ads using multiple metrics.