Results 1 - 10
of
50
Graph nodes clustering based on the commute-time kernel
- In Proceedings of the 11th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2007). Lecture notes in Computer Science, LNCS
, 2007
"... This work presents a kernel method for clustering the nodes of a weighted, undirected, graph. The algorithm is based on a two-step procedure. First, the sigmoid commute-time kernel (KCT), providing a similarity measure between any couple of nodes by taking the indirect links into account, is compute ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
This work presents a kernel method for clustering the nodes of a weighted, undirected, graph. The algorithm is based on a two-step procedure. First, the sigmoid commute-time kernel (KCT), providing a similarity measure between any couple of nodes by taking the indirect links into account, is computed from the adjacency matrix of the graph. Then, the nodes of the graph are clustered by performing a kernel k-means or fuzzy k-means on this CT kernel matrix. For this purpose, a new, simple, version of the kernel k-means and the kernel fuzzy k-means is introduced. The joint use of the CT kernel matrix and kernel clustering appears to be quite successful. Indeed, this methodology provides good results, outperforming the spherical k-means, on a document clustering problem involving the newsgroups database. 1
An experimental investigation of graph kernels on a collaborative recommendation task
- Proceedings of the 6th International Conference on Data Mining (ICDM 2006
, 2006
"... This paper presents a survey as well as a systematic empirical comparison of seven graph kernels and two related similarity matrices (simply referred to as graph kernels), namely the exponential diffusion kernel, the Laplacian exponential diffusion kernel, the von Neumann diffusion kernel, the regul ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
This paper presents a survey as well as a systematic empirical comparison of seven graph kernels and two related similarity matrices (simply referred to as graph kernels), namely the exponential diffusion kernel, the Laplacian exponential diffusion kernel, the von Neumann diffusion kernel, the regularized Laplacian kernel, the commute-time kernel, the random-walk-with-restart similarity matrix, and finally, three graph kernels introduced in this paper: the regularized commute-time kernel, the Markov diffusion kernel, and the cross-entropy diffusion matrix. The kernel-on-a-graph approach is simple and intuitive. It is illustrated by applying the nine graph kernels to a collaborative-recommendation task and to a semisupervised classification task, both on several databases. The graph methods compute proximity measures between nodes that help study the structure of the graph. Our comparisons suggest that the regularized commute-time and the Markov diffusion kernels perform best, closely followed by the regularized Laplacian kernel. 1
Audience selection for on-line brand advertising: privacy-friendly social network targeting
- In KDD ’09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
, 2009
"... This paper describes and evaluates privacy-friendly methods for extracting quasi-social networks from browser behavior on user-generated content sites, for the purpose of finding good audiences for brand advertising (as opposed to click maximizing, for example). Targeting social-network neighbors re ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
This paper describes and evaluates privacy-friendly methods for extracting quasi-social networks from browser behavior on user-generated content sites, for the purpose of finding good audiences for brand advertising (as opposed to click maximizing, for example). Targeting social-network neighbors resonates well with advertisers, and on-line browsing behavior data counterintuitively can allow the identification of good audiences anonymously. Besides being one of the first papers to our knowledge on data mining for on-line brand advertising, this paper makes several important contributions. We introduce a framework for evaluating brand audiences, in analogy to predictive-modeling holdout evaluation. We introduce methods for extracting quasi-social networks from data on visitations to social networking pages, without collecting any information on the identities of the browsers or the content of the social-network pages. We introduce measures of brand proximity in the network, and show that audiences with high brand proximity indeed show substantially higher brand affinity. Finally, we provide evidence that the quasi-social network embeds a true social network, which along with results from social theory offers one explanation for the increases in audience brand affinity.
Tuning continual exploration in reinforcement learning
- Proceedings of the 16th International Conference on Artificial Neural Networks (ICANN 06). Lecture notes in Computer Science, LNCS 4131:734–749
, 2006
"... This paper presents a model allowing to tune continual exploration in an optimal way by integrating exploration and exploitation in a common framework. It first quantifies the rate of exploration by defining the degree of exploration of a state as the probability-distribution entropy for choosing an ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
This paper presents a model allowing to tune continual exploration in an optimal way by integrating exploration and exploitation in a common framework. It first quantifies the rate of exploration by defining the degree of exploration of a state as the probability-distribution entropy for choosing an admissible action. Then, the exploration/exploitation tradeoff is stated as a global optimization problem: find the exploration strategy that minimizes the expected cumulated cost, while maintaining fixed degrees of exploration at same nodes. In other words, “exploitation”is maximized for constant “exploration”. This formulation leads to a set of nonlinear updating rules reminiscent of the value-iteration algorithm. Convergence of these rules to a local minimum can be proved for a stationary environment. Interestingly, in the deterministic case, when there is no exploration, these equations reduce to the Bellman equations for finding the shortest path while, when it is maximum, a full “blind”exploration is performed. We also show that, if the graph of states is directed and acyclic, the nonlinear equations can easily be solved by performing a single backward pass from the destination state. Stochastic shortest path problems as well as discounted problems are also examined. The theoretical results are confirmed by simple simulations showing that this exploration strategy outperforms the naive ǫ-greedy and Boltzmann strategies. 1.
A family of dissimilarity measures between nodes generalizing both the shortest-path and the commute-time distances
- in Proceedings of the 14th SIGKDD International Conference on Knowledge Discovery and Data Mining
"... This work introduces a new family of link-based dissimilarity measures between nodes of a weighted directed graph. This measure, called the randomized shortest-path (RSP) dissimilarity, depends on a parameter θ and has the interesting property of reducing, on one end, to the standard shortest-path d ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
This work introduces a new family of link-based dissimilarity measures between nodes of a weighted directed graph. This measure, called the randomized shortest-path (RSP) dissimilarity, depends on a parameter θ and has the interesting property of reducing, on one end, to the standard shortest-path distance when θ is large and, on the other end, to the commute-time (or resistance) distance when θ is small (near zero). Intuitively, it corresponds to the expected cost incurred by a random walker in order to reach a destination node from a starting node while maintaining a constant entropy (related to θ) spread in the graph. The parameter θ is therefore biasing gradually the simple random walk on the graph towards the shortest-path policy. By adopting a statistical physics approach and computing a sum over all the possible paths (discrete path integral), it is shown that the RSP dissimilarity from every node to a particular node of interest can be computed efficiently by solving two linear systems of n equations, where n is the number of nodes. On the other hand, the dissimilarity between every couple of nodes is obtained by inverting an n × n matrix. The proposed measure can be used for various graph mining tasks such as computing betweenness centrality, finding dense communities, etc, as shown in the experimental section.
Learning Spectral Graph Transformations for Link Prediction
"... We present a unified framework for learning link prediction and edge weight prediction functions in large networks, based on the transformation of a graph’s algebraic spectrum. Our approach generalizes several graph kernels and dimensionality reduction methods and provides a method to estimate their ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We present a unified framework for learning link prediction and edge weight prediction functions in large networks, based on the transformation of a graph’s algebraic spectrum. Our approach generalizes several graph kernels and dimensionality reduction methods and provides a method to estimate their parameters efficiently. We show how the parameters of these prediction functions can be learned by reducing the problem to a one-dimensional regression problem whose runtime only depends on the method’s reduced rank and that can be inspected visually. We derive variants that apply to undirected, weighted, unweighted, unipartite and bipartite graphs. We evaluate our method experimentally using examples from social networks, collaborative filtering, trust networks, citation networks, authorship graphs and hyperlink networks. 1.
On Social Networks and Collaborative Recommendation
"... Social network systems, like last.fm, play a significant role in Web 2.0, containing large amounts of multimedia-enriched data that are enhanced both by explicit user-provided annotations and implicit aggregated feedback describing the personal preferences of each user. It is also a common tendency ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Social network systems, like last.fm, play a significant role in Web 2.0, containing large amounts of multimedia-enriched data that are enhanced both by explicit user-provided annotations and implicit aggregated feedback describing the personal preferences of each user. It is also a common tendency for these systems to encourage the creation of virtual networks among their users by allowing them to establish bonds of friendship and thus provide a novel and direct medium for the exchange of data. We investigate the role of these additional relationships in developing a track recommendation system. Taking into account both the social annotation and friendships inherent in the social graph established among users, items and tags, we created a collaborative recommendation system that effectively adapts to the personal information needs of each user. We adopt the generic framework of Random Walk with Restarts in order to provide with a more natural and efficient way to represent social networks. In this work we collected a representative enough portion of the music social network last.fm, capturing explicitly expressed bonds of friendship of the user as well as social tags. We performed a series of comparison experiments between the Random Walk with Restarts model and a user-based collaborative filtering method using the Pearson Correlation similarity. The results show that the graph model system benefits from the additional information embedded in social knowledge. In addition, the graph model outperforms the standard collaborative filtering method.
A Sketch-Based Distance Oracle for Web-Scale Graphs
"... We study the fundamental problem of computing distances between nodes in large graphs such as the web graph and social networks. Our objective is to be able to answer distance queries between pairs of nodes in real time. Since the standard shortest path algorithms are expensive, our approach moves t ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
We study the fundamental problem of computing distances between nodes in large graphs such as the web graph and social networks. Our objective is to be able to answer distance queries between pairs of nodes in real time. Since the standard shortest path algorithms are expensive, our approach moves the time-consuming shortest-path computation offline, and at query time only looks up precomputed values and performs simple and fast computations on these precomputed values. More specifically, during the offline phase we compute and store a small “sketch ” for each node in the graph, and at query-time we look up the sketches of the source and destination nodes and perform a simple computation using these two sketches to estimate the distance. Categories and Subject Descriptors G.2.2 [Graph Theory]: Graph algorithms, path and circuit problems
Spectral Analysis of Signed Graphs for Clustering, Prediction and Visualization
"... We study the application of spectral clustering, prediction and visualization methods to graphs with negatively weighted edges. We show that several characteristic matrices of graphs can be extended to graphs with positively and negatively weighted edges, giving signed spectral clustering methods, s ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We study the application of spectral clustering, prediction and visualization methods to graphs with negatively weighted edges. We show that several characteristic matrices of graphs can be extended to graphs with positively and negatively weighted edges, giving signed spectral clustering methods, signed graph kernels and network visualization methods that apply to signed graphs. In particular, we review a signed variant of the graph Laplacian. We derive our results by considering random walks, graph clustering, graph drawing and electrical networks, showing that they all result in the same formalism for handling negatively weighted edges. We illustrate our methods using examples from social networks with negative edges and bipartite rating graphs. 1
Bypass rates: reducing query abandonment using negative inferences
- in Proc. of the ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining
, 2008
"... We introduce a new approach to analyzing click logs by examining both the documents that are clicked and those that are bypassed—documents returned higher in the ordering of the search results but skipped by the user. This approach complements the popular click-through rate analysis, and helps to dr ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
We introduce a new approach to analyzing click logs by examining both the documents that are clicked and those that are bypassed—documents returned higher in the ordering of the search results but skipped by the user. This approach complements the popular click-through rate analysis, and helps to draw negative inferences in the click logs. We formulate a natural objective that finds sets of results that are unlikely to be collectively bypassed by a typical user. This is closely related to the problem of reducing query abandonment. We analyze a greedy approach to optimizing this objective, and establish theoretical guarantees of its performance. We evaluate our approach on a large set of queries, and demonstrate that it compares favorably to the maximal marginal relevance approach on a number of metrics including mean average precision and mean reciprocal rank.

