Results 1  10
of
17
Diversified topk graph pattern matching
 PVLDB
"... Graph pattern matching has been widely used in e.g., social data analysis. A number of matching algorithms have been developed that, given a graph pattern Q and a graph G, compute the set M(Q;G) of matches of Q in G. However, these algorithms often return an excessive number of matches, and are ex ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
Graph pattern matching has been widely used in e.g., social data analysis. A number of matching algorithms have been developed that, given a graph pattern Q and a graph G, compute the set M(Q;G) of matches of Q in G. However, these algorithms often return an excessive number of matches, and are expensive on large reallife social graphs. Moreover, in practice many social queries are to find matches of a specific pattern node, rather than the entire M(Q;G). This paper studies topk graph pattern matching. (1) We revise graph pattern matching defined in terms of simulation, by supporting a designated output node uo. Given G and Q, it is to find those nodes in M(Q;G) that match uo, instead of the large setM(Q;G). (2) We study two classes of functions for ranking the matches: relevance functions r() based on, e.g., social impact, and distance functions d() to cover diverse elements. (3) We develop two algorithms for computing topk matches of uo based on r(), with the early termination property, i.e., they find topk matches without computing the entireM(Q;G). (4) We also study diversified topk matching, a bicriteria optimization problem based on both r() and d(). We show that its decision problem is NPcomplete. Nonetheless, we provide an approximation algorithm with performance guarantees and a heuristic one with the early termination property. (5) Using reallife and synthetic data, we experimentally verify that our (diversified) topk matching algorithms are effective, and outperform traditional matching algorithms in efficiency. 1.
Learning Mixtures of Submodular Functions for Image Collection Summarization
"... We address the problem of image collection summarization by learning mixtures of submodular functions. Submodularity is useful for this problem since it naturally represents characteristics such as fidelity and diversity, desirable for any summary. Several previously proposed image summarization sco ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
We address the problem of image collection summarization by learning mixtures of submodular functions. Submodularity is useful for this problem since it naturally represents characteristics such as fidelity and diversity, desirable for any summary. Several previously proposed image summarization scoring methodologies, in fact, instinctively arrived at submodularity. We provide classes of submodular component functions (including some which are instantiated via a deep neural network) over which mixtures may be learnt. We formulate the learning of such mixtures as a supervised problem via largemargin structured prediction. As a loss function, and for automatic summary scoring, we introduce a novel summary evaluation method called VROUGE, and test both submodular and nonsubmodular optimization (using the submodularsupermodular procedure) to learn a mixture of submodular functions. Interestingly, using nonsubmodular optimization to learn submodular functions provides the best results. We also provide a new data set consisting of 14 realworld image collections along with many humangenerated ground truth summaries collected using Amazon Mechanical Turk. We compare our method with previous work on this problem and show that our learning approach outperforms all competitors on this new data set. This paper provides, to our knowledge, the first systematic approach for quantifying the problem of image collection summarization, along with a new data set of image collections and human summaries. 1
The DisC Diversity Model ⇤
"... In this paper, we summarize our work on diversification based on dissimilarity and coverage (DisC diversity) by presenting our main theoretical results and contributions. 1. DISC DIVERSITY Diversification has attracted considerable attention, often as a means of enhancing the quality of the query re ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
In this paper, we summarize our work on diversification based on dissimilarity and coverage (DisC diversity) by presenting our main theoretical results and contributions. 1. DISC DIVERSITY Diversification has attracted considerable attention, often as a means of enhancing the quality of the query results presented to users [3]. Most diversification approaches rely on assigning a diversity score to each data item and then selecting as diverse either the k items with the largest score for a given k (e.g., [1]), or the items with score larger than some predefined threshold (e.g., [9]). In our work [4, 5], we address diversity through a di↵erent perspective and aim at selectinga representativesubsetthat contains items that are both dissimilar with each other and cover the whole result set. Let P be a set of items. We define similarity between two items using a distance metric d. Forarealnumberr, r 0, we use Nr(pi) todenotethesetofneighbors (or, the neighborhood) ofanitempi2 P, i.e., the items lying at distance at most r from pi: Nr(pi) ={pj  pi 6 = pj ^ d(pi,pj) apple r} We use N + r (pi) todenotethesetNr(pi) [{pi}. Itemsinthe neighborhood of pi are considered similar to pi, whileitems outside its neighborhood are considered dissimilar to pi. We define an rDisC diverse subset as follows: Definition 1. (rDisC Diverse Subset) Let P be a set of items and r, r 0, a real number. A subset S of P is an rDissimilarandCovering diverse subset, or rDisC diverse subset, of P, if the following two conditions hold: (i) (coverage condition) 8pi 2P, 9 pj 2 N + r (pi), suchthat pj 2 S and (ii) (dissimilarity condition) 8 pi, pj 2 S with pi 6 = pj, itholdsthatd(pi,pj)>r. This work was supported by “Epirus on Android ” a research project cofinanced by the European Union (European Regional Development FundERDF) and Greek national funds through the
On the Complexity of Query Result Diversification
"... Query result diversification is a bicriteria optimization problem for ranking query results. Given a database D, a query Q and a positive integer k, it is to find a set of k tuples from Q(D) such that the tuples are as relevant as possible to the query, and at the same time, as diverse as possible ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Query result diversification is a bicriteria optimization problem for ranking query results. Given a database D, a query Q and a positive integer k, it is to find a set of k tuples from Q(D) such that the tuples are as relevant as possible to the query, and at the same time, as diverse as possible to each other. Subsets of Q(D) are ranked by an objective function defined in terms of relevance and diversity. Query result diversification has found a variety of applications in databases, information retrieval and operations research. This paper studies the complexity of result diversification for relational queries. We identify three problems in connection with query result diversification, to determine whether there exists a set of k tuples that is ranked above a bound with respect to relevance and diversity, to assess the rank of a given kelement set, and to count how many kelement sets are ranked above a given bound. We study these problems for a variety of query languages and for three objective functions. We establish the upper and lower bounds of these problems, all matching, for both combined complexity and data complexity. We also investigate several special settings of these problems, identifying tractable cases. 1.
Ravi Kumar
"... We propose a new optimization framework for summarization by generalizing the submodular framework of (Lin and Bilmes, 2011). In our framework the summarization desideratum is expressed as a sum of a submodular function and a nonsubmodular function, which we call dispersion; the latter uses interse ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We propose a new optimization framework for summarization by generalizing the submodular framework of (Lin and Bilmes, 2011). In our framework the summarization desideratum is expressed as a sum of a submodular function and a nonsubmodular function, which we call dispersion; the latter uses intersentence dissimilarities in different ways in order to ensure nonredundancy of the summary. We consider three natural dispersion functions and show that a greedy algorithm can obtain an approximately optimal summary in all three cases. We conduct experiments on two corpora—DUC 2004 and user comments on news articles—and show that the performance of our algorithm outperforms those that rely only on submodularity. 1
unknown title
"... Graph pattern matching has been widely used in e.g., social data analysis. A number of matching algorithms have been developed that, given a graph pattern Q and a graph G, compute the set M(Q,G) of matches of Q in G. However, these algorithms often return an excessive number of matches, and are expe ..."
Abstract
 Add to MetaCart
(Show Context)
Graph pattern matching has been widely used in e.g., social data analysis. A number of matching algorithms have been developed that, given a graph pattern Q and a graph G, compute the set M(Q,G) of matches of Q in G. However, these algorithms often return an excessive number of matches, and are expensive on large reallife social graphs. Moreover, inpracticemanysocialqueriesaretofindmatches of a specific pattern node, rather than the entire M(Q,G). This paper studies topk graph pattern matching. (1) We revise graph pattern matching defined in terms of simulation, by supporting a designated output node uo. Given G and Q, it is to find those nodes in M(Q,G) that match uo, instead of thelarge set M(Q,G). (2) Westudy twoclasses of functions for ranking the matches: relevance functions δr() based on, e.g., social impact, and distance functions δd() to cover diverse elements. (3) We develop two algorithms for computing topk matches of uo based on δr(), with the early termination property, i.e., they find topk matches without computing the entire M(Q,G). (4) We also study diversified topk matching, a bicriteria optimization problem based on both δr() and δd(). We show that its decision problem is NPcomplete. Nonetheless, we provide an approximation algorithm with performance guarantees and a heuristic one with the early termination property. (5) Using reallife and synthetic data, we experimentally verify that our (diversified) topk matching algorithms are effective, and outperform traditional matching algorithms in efficiency. 1.
Graph Query Reformulation with Diversity
"... We study a problem of graphquery reformulation enabling explorative querydriven discovery in graph databases. Given a query issued by the user, the system, apart from returning the result patterns, also proposes a number of specializations (i.e., supergraphs) of the original query to facilitate ..."
Abstract
 Add to MetaCart
(Show Context)
We study a problem of graphquery reformulation enabling explorative querydriven discovery in graph databases. Given a query issued by the user, the system, apart from returning the result patterns, also proposes a number of specializations (i.e., supergraphs) of the original query to facilitate the exploration of the results. We formalize the problem of finding a set of reformulations of the input query by maximizing a linear combination of coverage (of the original query’s answer set) and diversity among the specializations. We prove that our problem is hard, but also that a simple greedy algorithm achieves a 1 2approximation guarantee. The most challenging step of the greedy algorithm is the computation of the specialization that brings the maximum increment to the objective function. To efficiently solve this step, we show how to compute the objectivefunction increment of a specialization linearly in the number of its results and derive an upper bound that we exploit to devise an efficient searchspace visiting strategy. An extensive evaluation on real and synthetic databases attests high efficiency and accuracy of our proposal.
Composable Coresets for Diversity and Coverage
"... In this paper we consider efficient construction of “composable coresets ” for basic diversity and coverage maximization problems. A coreset for a pointset in a metric space is a subset of the pointset with the property that an approximate solution to the whole pointset can be obtained given ..."
Abstract
 Add to MetaCart
In this paper we consider efficient construction of “composable coresets ” for basic diversity and coverage maximization problems. A coreset for a pointset in a metric space is a subset of the pointset with the property that an approximate solution to the whole pointset can be obtained given the coreset alone. A composable coreset has the property that for a collection of sets, the approximate solution to the union of the sets in the collection can be obtained given the union of the composable coresets for the point sets in the collection. Using composable coresets one can obtain efficient solutions to a wide variety of massive data processing applications, including nearest neighbor search, streaming algorithms and mapreduce computation. Our main results are algorithms for constructing com