Results 21  30
of
68
Path query processing on very large rdf graphs
 In WebDB
, 2011
"... Finding the shortest path between two nodes in an RDF graph is a fundamental operation that allows to discover complex relationships between entities. In this paper we consider the path queries over graphs from a database perspective. We provide the fullfledge database solution to execute path quer ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Finding the shortest path between two nodes in an RDF graph is a fundamental operation that allows to discover complex relationships between entities. In this paper we consider the path queries over graphs from a database perspective. We provide the fullfledge database solution to execute path queries over very large RDF graphs. We present lowlevel techniques to speedup shortest paths algorithms, and a robust method to estimate selectivities of path queries. We perform extended experiments on several large RDF collections, including the UniProt collection, demonstrating that our approach outperforms the path query capabilities of modern systems by a large margin. 1.
On Text Clustering with Side Information
"... Abstract — Text clustering has become an increasingly important problem in recent years because of the tremendous amount of unstructured data which is available in various forms in online forums such as the web, social networks, and other information networks. In most cases, the data is not purely a ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Abstract — Text clustering has become an increasingly important problem in recent years because of the tremendous amount of unstructured data which is available in various forms in online forums such as the web, social networks, and other information networks. In most cases, the data is not purely available in text form. A lot of sideinformation is available along with the text documents. Such sideinformation may be of different kinds, such as the links in the document, useraccess behavior from web logs, or other nontextual attributes which are embedded into the text document. Such attributes may contain a tremendous amount of information for clustering purposes. However, the relative importance of this sideinformation may be difficult to estimate, especially when some of the information is noisy. In such cases, it can be risky to incorporate sideinformation into the clustering process, because it can either improve the quality of the representation for clustering, or can add noise to the process. Therefore, we need a principled way to perform the clustering process, so as to maximize the advantages from using this side information. In this paper, we design an algorithm which combines classical partitioning algorithms with probabilistic models in order to create an effective clustering approach. We present experimental results on a number of real data sets in order to illustrate the advantages of using such an approach. I.
Discovery of Topk Dense Subgraphs in Dynamic Graph Collections
"... Abstract. Dense subgraph discovery is a key issue in graph mining, due to its importance in several applications, such as correlation analysis, community discovery in the Web, gene coexpression and proteinprotein interactions in bioinformatics. In this work, we study the discovery of the topk den ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Dense subgraph discovery is a key issue in graph mining, due to its importance in several applications, such as correlation analysis, community discovery in the Web, gene coexpression and proteinprotein interactions in bioinformatics. In this work, we study the discovery of the topk dense subgraphs in a set of graphs. After the investigation of the problem in its static case, we extend the methodology to work with dynamic graph collections, where the graph collection changes over time. Our methodology is based on lower and upper bounds of the density, resulting in a reduction of the number of exact density computations. Our algorithms do not rely on userdefined threshold values and the only input required is the number of dense subgraphs in the result (k). In addition to the exact algorithms, an approximation algorithm is provided for topk dense subgraph discovery, which trades result accuracy for speed. We show that a significant number of exact density computations is avoided, resulting in efficient monitoring of the topk dense subgraphs. 1
On Flow Authority Discovery in Social Networks
"... A central characteristic of social networks is that it facilitates rapid dissemination of information between large groups of individuals. This paper will examine the problem of determination of information flow representatives, asmall group of authoritative representatives to whom the dissemination ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
A central characteristic of social networks is that it facilitates rapid dissemination of information between large groups of individuals. This paper will examine the problem of determination of information flow representatives, asmall group of authoritative representatives to whom the dissemination of a piece of information leads to the maximum spread. Clearly, information flow is affected by a number of different structural factors such as the node degree, connectivity, intensity of information flow interaction and the global structural behavior of the underlying network. We will propose a stochastic information flow model, and use it to determine the authoritative representatives in the underlying social network. We will first design an accurate RankedReplace algorithm, and then use a Bayes probabilistic model in order to approximate the effectiveness of this algorithm with the use of a fast algorithm. We will examine the results on a number of real social network data sets, and show that the method is more effective than stateoftheart methods.
Parallel Processing of Multiple Graph Queries Using MapReduce
"... Abstract—Recently the volume of the graph data set is often too large to be processed with a single machine in a timely manner. A multiuser environment deteriorates this situation with many graph queries given by multiple users. In this paper, we address the problem of processing multiple graph que ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Recently the volume of the graph data set is often too large to be processed with a single machine in a timely manner. A multiuser environment deteriorates this situation with many graph queries given by multiple users. In this paper, we address the problem of processing multiple graph queries over a large set of graphs. We devise several methods that support efficient processing of multiple graph queries based on MapReduce. Particularly, we focus on processing multiple queries for graph data in parallel with a single input scan. We show that our methods improve the performance of multiple graph query processing with various experiments. Keywordsparallel processing; MapReduce; graph query; big data; I.
On Influential Node Discovery in Dynamic Social Networks
"... The problem of maximizing influence spread has been widely studied in social networks, because of its tremendous number of applications in determining critical points in a social network for information dissemination. All the techniques proposed in the literature are inherently static in nature, whi ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
The problem of maximizing influence spread has been widely studied in social networks, because of its tremendous number of applications in determining critical points in a social network for information dissemination. All the techniques proposed in the literature are inherently static in nature, which are designed for social networks with a fixed set of links. However, many forms of social interactions are transient in nature, with relatively short periods of interaction. Any influence spread may happen only during the period of interaction, and the probability of spread is a function of the corresponding interaction time. Furthermore, such interactions are quite fluid and evolving, as a result of which the topology of the underlying network may change rapidly, as new interactions form and others terminate. In such cases, it may be desirable to determine the influential nodes based on the dynamic interaction patterns. Alternatively, one may wish to discover the most likely starting points for a given infection pattern. We will propose methods which can be used both for optimization of information spread, as well as the backward tracing of the source of influence spread. We will present experimental results illustrating the effectiveness of our approach on a number of real data sets.
Evaluating MultiWay Joins over Discounted Hitting Time
"... Abstract—The discounted hitting time (DHT), which is a randomwalk similarity measure for graph node pairs, is useful in various applications, including link prediction, collaborative recommendation, and reputation ranking. We examine a novel query, called the multiway join (or nway join), on DHT ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Abstract—The discounted hitting time (DHT), which is a randomwalk similarity measure for graph node pairs, is useful in various applications, including link prediction, collaborative recommendation, and reputation ranking. We examine a novel query, called the multiway join (or nway join), on DHT scores. Given a graph and n sets of nodes, the nway join retrieves a set of ntuples with the k highest scores, according to some aggregation function of DHT values. This query enables analysis and prediction of complex relationship among n sets of nodes. Since an nway join is expensive to compute, we develop the Partial Join algorithm (or PJ). This solution decomposes an nway join into a number of topm 2way joins, and combines their results to construct the answer of the nway join. Since PJ may necessitate the computation of top(m + 1) 2way joins, we study an incremental solution, which allows the top(m + 1) 2way join to be derived quickly from the topm 2way join results earlier computed. We further examine fast processing and pruning algorithms for 2way joins. An extensive evaluation on three real datasets shows that PJ accurately evaluates nway joins, and is four orders of magnitude faster than basic solutions. I.
Discovering Descriptive Rules in Relational Dynamic Graphs
"... Graph mining methods have become quite popular and a timely challenge is to discover dynamic properties in evolving graphs or networks. We consider the socalled relational dynamic oriented graphs that can be encoded as nary relations with n ≥ 3 and thus represented by Boolean tensors. Two dimensio ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
Graph mining methods have become quite popular and a timely challenge is to discover dynamic properties in evolving graphs or networks. We consider the socalled relational dynamic oriented graphs that can be encoded as nary relations with n ≥ 3 and thus represented by Boolean tensors. Two dimensions are used to encode the graph adjacency matrices and at least one other denotes time. We design the pattern domain of multidimensional association rules, i.e., non trivial extensions of the popular association rules that may involve subsets of any dimensions in their antecedents and their consequents. First, we design new objective interestingness measures for such rules and it leads to different approaches for measuring the rule confidence. Second, we must compute collections of a priori interesting rules. It is considered here as a postprocessing of the closed patterns that can be extracted efficiently from Boolean tensors. We propose optimizations to support both rule extraction scalability and non redundancy. We illustrate the addedvalue of this new data mining task to discover patterns from a reallife relational dynamic graph.
Weighted MUSE for Frequent Subgraph Pattern Finding
 in Uncertain DBLP Data. In Internet Technology and Applications (iTAP), 2011 International Conference on
, 2011
"... Abstract — Studies shows that finding frequent subgraphs in uncertain graphs database is an NP complete problem. Finding the frequency at which these subgraphs occur in uncertain graph database is also computationally expensive. This paper focus on investigation of mining frequent subgraph patter ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract — Studies shows that finding frequent subgraphs in uncertain graphs database is an NP complete problem. Finding the frequency at which these subgraphs occur in uncertain graph database is also computationally expensive. This paper focus on investigation of mining frequent subgraph patterns in DBLP uncertain graph data using an approximation based method. The frequent subgraph pattern mining problem is formalized by using the expected support measure. Here n approximate mining algorithm based Weighted MUSE, is proposed to discover possible frequent subgraph patterns from uncertain graph data. with the explosive growth of digital data in every field of life, amount of data is increment at a very high rate. To extract or mine knowledge from these large amounts of data, data mining come forward. The main reason that
Node classification in uncertain graphs
 In SSDBM
, 2014
"... In many real applications that use and analyze networked data, the links in the network graph may be erroneous, or derived from probabilistic techniques. In such cases, the node classification problem can be challenging, since the unreliability of the links may affect the final results of the class ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
In many real applications that use and analyze networked data, the links in the network graph may be erroneous, or derived from probabilistic techniques. In such cases, the node classification problem can be challenging, since the unreliability of the links may affect the final results of the classification process. In this paper, we focus on situations that require the analysis of the uncertainty that is present in the graph structure. We study the novel problem of node classification in uncertain graphs, by treating uncertainty as a firstclass citizen. We propose two techniques based on a Bayes model, and show the benefits of incorporating uncertainty in the classification process as a firstclass citizen. The experimental results demonstrate the effectiveness of our approaches. 1.