Results 1  10
of
233
The WebGraph Framework I: Compression Techniques
 In Proc. of the Thirteenth International World Wide Web Conference
, 2003
"... Studying web graphs is often dicult due to their large size. Recently, several proposals have been published about various techniques that allow to store a web graph in memory in a limited space, exploiting the inner redundancies of the web. The WebGraph framework is a suite of codes, algorithms ..."
Abstract

Cited by 268 (31 self)
 Add to MetaCart
(Show Context)
Studying web graphs is often dicult due to their large size. Recently, several proposals have been published about various techniques that allow to store a web graph in memory in a limited space, exploiting the inner redundancies of the web. The WebGraph framework is a suite of codes, algorithms and tools that aims at making it easy to manipulate large web graphs. This papers presents the compression techniques used in WebGraph, which are centred around referentiation and intervalisation (which in turn are dual to each other).
Approximate Graph Coloring by Semidefinite Programming.
 In Proceedings of 35th Annual IEEE Symposium on Foundations of Computer Science,
, 1994
"... Abstract. We consider the problem of coloring kcolorable graphs with the fewest possible colors. We present a randomized polynomial time algorithm that colors a 3colorable graph on n vertices with min{O(⌬ 1/3 log 1/2 ⌬ log n), O(n 1/4 log 1/2 n)} colors where ⌬ is the maximum degree of any vertex ..."
Abstract

Cited by 210 (7 self)
 Add to MetaCart
Abstract. We consider the problem of coloring kcolorable graphs with the fewest possible colors. We present a randomized polynomial time algorithm that colors a 3colorable graph on n vertices with min{O(⌬ 1/3 log 1/2 ⌬ log n), O(n 1/4 log 1/2 n)} colors where ⌬ is the maximum degree of any vertex. Besides giving the best known approximation ratio in terms of n, this marks the first nontrivial approximation result as a function of the maximum degree ⌬. This result can be generalized to kcolorable graphs to obtain a coloring using min{O(⌬ 1Ϫ2/k log 1/2 ⌬ log n), O(n 1Ϫ3/(kϩ1) log 1/2 n)} colors. Our results are inspired by the recent work of Goemans and Williamson who used an algorithm for semidefinite optimization problems, which generalize linear programs, to obtain improved approximations for the MAX CUT and MAX 2SAT problems. An intriguing outcome of our work is a duality relationship established between the value of the optimum solution to our semidefinite program and the Lovász function. We show lower bounds on the gap between the Permission to make digital / hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery (ACM), Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and / or a fee. © 1998 ACM 00045411/98/03000246 $05.00 Journal of the ACM, Vol. 45, No. 2, March 1998, pp. 246 265. optimum solution of our semidefinite program and the actual chromatic number; by duality this also demonstrates interesting new facts about the function.
Deeper Inside PageRank
 INTERNET MATHEMATICS
, 2004
"... This paper serves as a companion or extension to the “Inside PageRank” paper by Bianchini et al. [Bianchini et al. 03]. It is a comprehensive survey of all issues associated with PageRank, covering the basic PageRank model, available and recommended solution methods, storage issues, existence, uniq ..."
Abstract

Cited by 208 (4 self)
 Add to MetaCart
(Show Context)
This paper serves as a companion or extension to the “Inside PageRank” paper by Bianchini et al. [Bianchini et al. 03]. It is a comprehensive survey of all issues associated with PageRank, covering the basic PageRank model, available and recommended solution methods, storage issues, existence, uniqueness, and convergence properties, possible alterations to the basic model, suggested alternatives to the traditional solution methods, sensitivity and conditioning, and finally the updating problem. We introduce a few new results, provide an extensive reference list, and speculate about exciting areas of future research.
Efficient Mining of Frequent Subgraph in the Presence of Isomorphism
"... Frequent subgraph mining is an active research topic in the data mining community. A graph is a general model to represent data and has been used in many domains like cheminformatics and bioinformatics. Mining patterns from graph databases is challenging since graph related operations, such as subgr ..."
Abstract

Cited by 194 (23 self)
 Add to MetaCart
Frequent subgraph mining is an active research topic in the data mining community. A graph is a general model to represent data and has been used in many domains like cheminformatics and bioinformatics. Mining patterns from graph databases is challenging since graph related operations, such as subgraph testing, generally have higher time complexity than the corresponding operations on itemsets, sequences, and trees, which have been studied extensively. In this paper, we propose a novel frequent subgraph mining algorithm: FFSM, which employs a vertical search scheme within an algebraic graphical framework we have developed to reduce the number of redundant candidates proposed. Our empirical study on synthetic and real datasets demonstrates that FFSM achieves a substantial performance gain over the current startoftheart subgraph mining algorithm gSpan.
Trust networks on the semantic Web
 In Proceedings of Cooperative Information Agents
, 2003
"... Abstract. The socalled "Web of Trust " is one of the ultimate goals of the Semantic Web. Research on the topic of trust in this domain has focused largely on digital signatures, certificates, and authentication. At the same time, there is a wealth of research into trust and social network ..."
Abstract

Cited by 174 (2 self)
 Add to MetaCart
(Show Context)
Abstract. The socalled "Web of Trust " is one of the ultimate goals of the Semantic Web. Research on the topic of trust in this domain has focused largely on digital signatures, certificates, and authentication. At the same time, there is a wealth of research into trust and social networks in the physical world. In this paper, we describe an approach for integrating the two to build a web of trust in a more social respect. This paper describes the applicability of social network analysis to the semantic web, particularly discussing the multidimensional networks that evolve from ontological trust specifications. As a demonstration of algorithms used to infer trust relationships, we present several tools that allow users to take advantage of trust metrics that use the network. 1
Exploiting the Block Structure of the Web for Computing PageRank
, 2003
"... The web link graph has a nested block structure: the vast majority of hyperlinks link pages on a host to other pages on the same host, and many of those that do not link pages within the same domain. We show how to exploit this structure to speed up the computation of PageRank by a 3stage alg ..."
Abstract

Cited by 158 (4 self)
 Add to MetaCart
The web link graph has a nested block structure: the vast majority of hyperlinks link pages on a host to other pages on the same host, and many of those that do not link pages within the same domain. We show how to exploit this structure to speed up the computation of PageRank by a 3stage algorithm whereby (1) the local PageRanks of pages for each host are computed independently using the link structure of that host, (2) these local PageRanks are then weighted by the "importance" of the corresponding host, and (3) the standard PageRank algorithm is then run using as its starting vector the weighted concatenation of the local PageRanks. Empirically, this algorithm speeds up the computation of PageRank by a factor of 2 in realistic scenarios. Further, we develop a variant of this algorithm that efficiently computes many different "personalized" PageRanks, and a variant that efficiently recomputes PageRank after node updates.
ANF: A Fast and Scalable Tool for Data Mining in Massive Graphs
 NTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING
, 2002
"... Graphs are an increasingly important data source, with such important graphs as the Internet and the Web. Other familiar graphs include CAD circuits, phone records, gene sequences, city streets, social networks and academic citations. Any kind of relationship, such as actors appearing in movies, can ..."
Abstract

Cited by 121 (18 self)
 Add to MetaCart
(Show Context)
Graphs are an increasingly important data source, with such important graphs as the Internet and the Web. Other familiar graphs include CAD circuits, phone records, gene sequences, city streets, social networks and academic citations. Any kind of relationship, such as actors appearing in movies, can be represented as a graph. This work presents a data mining tool, called ANF, that can quickly answer a number of interesting questions on graphrepresented data, such as the following. How robust is the Internet to failures? What are the most influential database papers? Are there gender differences in movie appearance patterns? At its core, ANF is based on a fast and memoryefficient approach for approximating the complete "neighbourhood function" for a graph. For the Internet graph (268K nodes), ANF's highlyaccurate approximation is more than 700 times faster than the exact computation. This reduces the running time from nearly a day to a matter of a minute or two, allowing users to perform ad hoc drilldown tasks and to repeatedly answer questions about changing data sources. To enable this drilldown, ANF employs new techniques for approximating neighbourhoodtype functions for graphs with distinguished nodes and/or edges. When compared to the best existing approximation, ANF's approach is both faster and more accurate, given the same resources. Additionally, unlike previous approaches, ANF scales gracefully to handle disk resident graphs. Finally, we present some of our results from mining large graphs using ANF.
A General Model of Web Graphs
, 2003
"... We describe a very general model of a random graph process whose proportional degree sequence obeys a power law. Such laws have recently been observed in graphs associated with the world wide web. ..."
Abstract

Cited by 118 (6 self)
 Add to MetaCart
We describe a very general model of a random graph process whose proportional degree sequence obeys a power law. Such laws have recently been observed in graphs associated with the world wide web.
The Indexbased XXL Search Engine for Querying XML Data with Relevance Ranking
 In EDBT
, 2002
"... Query languages for XML such as XPath or XQuery support Boolean retrieval: a query result is a (possibly restructured) subset of XML elements or entire documents that satisfy the search conditions of the query. This search paradigm works for highly schematic XML data collections such as electroni ..."
Abstract

Cited by 117 (12 self)
 Add to MetaCart
Query languages for XML such as XPath or XQuery support Boolean retrieval: a query result is a (possibly restructured) subset of XML elements or entire documents that satisfy the search conditions of the query. This search paradigm works for highly schematic XML data collections such as electronic catalogs. However, for searching information in open environments such as the Web or intranets of large corporations, ranked retrieval is more appropriate: a query result is a rank list of XML elements in descending order of (estimated) relevance. Web search engines, which are based on the ranked retreval paradigm, do, however, not consider the additional information and rich annotations provided by the structure of XML documents and their element names. This paper presents the XXL search engine that supports relevance ranking on XML data. XXL is particularly geared for path queries with wildcards that can span multiple XML collections and contain both exactmatch as well as semanticsimilarity search conditions. In addition, ontological information and suitable index structures are used to improve the search efficiency and effectiveness.
Survey of graph database models
, 2001
"... Graph database models can be characterized as those where data structures for the schema and instances are modeled as graphs or generalizations of them, and data manipulation is expressed by graphoriented operations and type constructors. These models flourished in the eighties and early nineties i ..."
Abstract

Cited by 112 (8 self)
 Add to MetaCart
Graph database models can be characterized as those where data structures for the schema and instances are modeled as graphs or generalizations of them, and data manipulation is expressed by graphoriented operations and type constructors. These models flourished in the eighties and early nineties in parallel to object oriented models and their influence gradually faded with the emergence of other database models, particularly the geographical, spatial, semistructured and XML. Recently, the need to manage information with inherent graphlike nature has brought back the relevance of the area. In fact, a whole new wave of applications for graph databases emerged with the development of huge networks (e.g. Web, geographical systems, transportation, telephones), and families of networks generated due to the automation of the process of data gathering (e.g. social and biological networks). The main objective of this survey is to present in a single place the work that has been done in the area of graph database modeling, concentrating in data structures, query languages and integrity constraints.