Results 1 
9 of
9
Efficient query rewrite for structured web queries
 In Proceedings of the 20th ACM international conference on Information and knowledge management
, 2011
"... ar ..."
(Show Context)
1Consensusbased Ranking of Multivalued Objects: A Generalized Borda Count Approach
"... Abstract—In this paper, we tackle a novel problem of ranking multivalued objects, where an object has multiple instances in a multidimensional space, and the number of instances per object is not fixed. Given an ad hoc scoring function that assigns a score to a multidimensional instance, we want to ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract—In this paper, we tackle a novel problem of ranking multivalued objects, where an object has multiple instances in a multidimensional space, and the number of instances per object is not fixed. Given an ad hoc scoring function that assigns a score to a multidimensional instance, we want to rank a set of multivalued objects. Different from the existing models of ranking uncertain and probabilistic data, which model an object as a random variable and the instances of an object are assumed exclusive, we have to capture the coexistence of instances here. To tackle the problem, we advocate the semantics of favoring widely preferred objects instead of majority votes, which is widely used in many elections and competitions. Technically, we borrow the idea from Borda Count, a well recognized method in consensusbased voting systems. However, Borda Count cannot handle multivalued objects of inconsistent cardinality, and is costly to evaluate topk queries on large multidimensional data sets. To address the challenges, we extend and generalize Borda Count to quantilebased Borda Count, and develop efficient computational methods with comprehensive cost analysis. We present case studies on real data sets to demonstrate the effectiveness of the generalized Borda Count ranking, and use synthetic and real data sets to verify the efficiency of our computational method.
A Unified Framework for Answering k Closest Pairs Queries and Variants
 TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
"... Given a scoring function that computes the score of a pair of objects, a topk pairs query returns k pairs with the smallest scores. In this paper, we present a unified framework for answering generic topk pairs queries including kclosest pairs queries, kfurthest pairs queries and their variants ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Given a scoring function that computes the score of a pair of objects, a topk pairs query returns k pairs with the smallest scores. In this paper, we present a unified framework for answering generic topk pairs queries including kclosest pairs queries, kfurthest pairs queries and their variants. Note that kclosest pairs query is a special case of topk pairs queries where the scoring function is the distance between the two objects in a pair. We are the first to present a unified framework to efficiently answer a broad class of topk queries including the queries mentioned above. We present efficient algorithms and provide a detailed theoretical analysis that demonstrates that the expected performance of our proposed algorithms is optimal for two dimensional data sets. Furthermore, our framework does not require prebuilt indexes, uses limited main memory and is easy to implement. We also extend our techniques to support topk pairs queries on multivalued (or uncertain) objects. We also demonstrate that our framework can handle exclusive topk pairs queries. Our extensive experimental study demonstrates effectiveness and efficiency of our proposed techniques.
Document Selection for Tiered Indexing in Commerce Search
"... ABSTRACT A search engine aims to return a set of relevant documents in response to a query, while minimizing the response time. This has led to the use of a tiered index, where the search engine maintains a small cache of documents that can serve a large fraction of queries. We give a novel algorit ..."
Abstract
 Add to MetaCart
(Show Context)
ABSTRACT A search engine aims to return a set of relevant documents in response to a query, while minimizing the response time. This has led to the use of a tiered index, where the search engine maintains a small cache of documents that can serve a large fraction of queries. We give a novel algorithm for the selection of documents in a tiered index for commerce search (i.e. users searching for products on the web) that effectively exploits the superior structural characteristics of commerce search queries. This is in sharp contrast to previous approaches to tiered indexing that were aimed at general web search where queries are typically unstructured. We theoretically analyze our algorithms and give performance guarantees even in worstcase scenarios. We then complement and strengthen our theoretical claims by performing exhaustive experiments on realworld commerce search data, and show that our algorithm outperforms stateoftheart tiered indexing techniques that were developed for general web search.
Topk Similarity Join over Multivalued Objects
"... Abstract. The topk similarity joins have been extensively studied and used in a wide spectrum of applications such as information retrieval, decision making, spatial data analysis and data mining. Given two sets of objects U and V, a topk similarity join returns k pairs of most similar objects fro ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. The topk similarity joins have been extensively studied and used in a wide spectrum of applications such as information retrieval, decision making, spatial data analysis and data mining. Given two sets of objects U and V, a topk similarity join returns k pairs of most similar objects from U ×V. In the conventional model of topk similarity join processing, an object is usually regarded as a point in a multidimensional space and the similarity between two objects is usually measured by distance metrics such as Euclidean distance. However, in many applications an object may be described by multiple values (instances) and the conventional model is not applicable since it does not address the distributions of object instances. In this paper, we study topk similarity join queries over multivalued objects. We apply quantile based distance to explore the relative instance distribution among the multiple instances of objects. Efficient and effective techniques to process topk similarity joins over multivalued objects are developed following a filteringrefinement framework. Novel distance, statistic and weight based pruning techniques are proposed. Comprehensive experiments on both real and synthetic datasets demonstrate the efficiency and effectiveness of our techniques. 1
Optimal Spatial Dominance: An Effective Search of Nearest Neighbor Candidates
"... In many domains such as computational geometry and database management, an object may be described by multiple instances (points). Then the distance (or similarity) between two objects is captured by the pairwise distances among their instances. In the past, numerous nearest neighbor (NN) functio ..."
Abstract
 Add to MetaCart
(Show Context)
In many domains such as computational geometry and database management, an object may be described by multiple instances (points). Then the distance (or similarity) between two objects is captured by the pairwise distances among their instances. In the past, numerous nearest neighbor (NN) functions have been proposed to define the distance between objects with multiple instances and to identify the NN object. Nevertheless, considering that a user may not have a specific NN function in mind, it is desirable to provide her with a set of NN candidates. Ideally, the set of NN candidates must include every object that is NN for at least one of the NN functions and must exclude every nonpromising object. However, no one has studied the problem of NN candidates computation from this perspective. Al
ARC DP120104168, and NSFC61021004.
"... Efficient topk similarity join processing ..."
(Show Context)
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING Income (K) 1 Consensusbased Ranking of Multivalued Objects: A Generalized Borda Count Approach
"... Abstract—In this paper, we tackle a novel problem of ranking multivalued objects, where an object has multiple instances in a multidimensional space, and the number of instances per object is not fixed. Given an ad hoc scoring function that assigns a score to a multidimensional instance, we want to ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—In this paper, we tackle a novel problem of ranking multivalued objects, where an object has multiple instances in a multidimensional space, and the number of instances per object is not fixed. Given an ad hoc scoring function that assigns a score to a multidimensional instance, we want to rank a set of multivalued objects. Different from the existing models of ranking uncertain and probabilistic data, which model an object as a random variable and the instancesofanobjectareassumedexclusive,wehave to capture the coexistence of instances here. To tackle the problem, we advocate the semantics of favoring widely preferred objects instead of majority votes, which is widely used in many elections and competitions. Technically, we borrow the idea from Borda Count, a well recognized method in consensusbased voting systems. However, Borda Count cannot handle multivalued objects of inconsistent cardinality, and is costly to evaluate topk queries on large multidimensional data sets. To address the challenges, we extend and generalize Borda Count to quantilebased Borda Count, and develop efficient computational methods with comprehensive cost analysis. We present case studies on real data sets to demonstrate the effectiveness of the generalized Borda Count ranking, and use synthetic and real data sets to verify the efficiency of our computational method.