Results 1  10
of
32
HigherOrder Web Link Analysis Using Multilinear Algebra
 IEEE INTERNATIONAL CONFERENCE ON DATA MINING
, 2005
"... Linear algebra is a powerful and proven tool in web search. Techniques, such as the PageRank algorithm of Brin and Page and the HITS algorithm of Kleinberg, score web pages based on the principal eigenvector (or singular vector) of a particular nonnegative matrix that captures the hyperlink structu ..."
Abstract

Cited by 69 (18 self)
 Add to MetaCart
(Show Context)
Linear algebra is a powerful and proven tool in web search. Techniques, such as the PageRank algorithm of Brin and Page and the HITS algorithm of Kleinberg, score web pages based on the principal eigenvector (or singular vector) of a particular nonnegative matrix that captures the hyperlink structure of the web graph. We propose and test a new methodology that uses multilinear algebra to elicit more information from a higherorder representation of the hyperlink graph. We start by labeling the edges in our graph with the anchor text of the hyperlinks so that the associated linear algebra representation is a sparse, threeway tensor. The first two dimensions of the tensor represent the web pages while the third dimension adds the anchor text. We then use the rank1 factors of a multilinear PARAFAC tensor decomposition, which are akin to singular vectors of the SVD, to automatically identify topics in the collection along with the associated authoritative web pages.
Predicting a Scientific Community’s Response to an Article
"... We consider the problem of predicting measurable responses to scientific articles based primarily on their text content. Specifically, we consider papers in two fields (economics and computational linguistics) and make predictions about downloads and withincommunity citations. Our approach is based ..."
Abstract

Cited by 14 (7 self)
 Add to MetaCart
(Show Context)
We consider the problem of predicting measurable responses to scientific articles based primarily on their text content. Specifically, we consider papers in two fields (economics and computational linguistics) and make predictions about downloads and withincommunity citations. Our approach is based on generalized linear models, allowing interpretability; a novel extension that captures firstorder temporal effects is also presented. We demonstrate that text features significantly improve accuracy of predictions over metadata features like authors, topical categories, and publication venues. 1
Exploiting timevarying relationships in statistical relational models
 In Proceedings of the 1st SNAKDD Workshop, 13th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
, 2007
"... In a growing number of relational domains, the data record temporal sequences of interactions among entities. For example, in citation domains authors publish scientific papers together each year and in telephone fraud detection domains people make calls to each other each day. The temporal dynamics ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
(Show Context)
In a growing number of relational domains, the data record temporal sequences of interactions among entities. For example, in citation domains authors publish scientific papers together each year and in telephone fraud detection domains people make calls to each other each day. The temporal dynamics of these interactions contain information that can improve predictive models (e.g., people publishing together frequently are likely to be publishing on the same topic) but to date there has been little effort to incorporate timevarying dependencies into relational models. Past work in relational learning has focused primarily on static “snapshots” of relational data. In this paper, we present an initial approach to modeling dynamic relational data graphs in predictive models of attributes. More specifically, we use a twostep process that first summarizes the dynamic graph with a weighted static graph and then incorporates the link weights in a relational Bayes classifier. We evaluate our approach on the Cora dataset (where coauthor and citation links vary over time) showing that our approach results in significant performance gains over a baseline snapshot approach that ignores the temporal component of the data. 1.
Multilinear Algebra For Analyzing Data With Multiple Linkages
, 2006
"... Link analysis typically focuses on a single type of connection, e.g., two journal papers are linked because they are written by the same author. However, often we want to analyze data that has multiple linkages between objects, e.g., two papers may have the same keywords and one may cite the othe ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
Link analysis typically focuses on a single type of connection, e.g., two journal papers are linked because they are written by the same author. However, often we want to analyze data that has multiple linkages between objects, e.g., two papers may have the same keywords and one may cite the other. The goal of this paper is to show that multilinear algebra provides a tool for multilink analysis. We analyze five years of publication data from journals published by the Society for Industrial and Applied Mathematics. We explore how papers can be grouped in the context of multiple link types using a tensor to represent all the links between them. A PARAFAC decomposition on the resulting tensor yields information similar to the SVD decomposition of a standard adjacency matrix. We show how the PARAFAC decomposition can be used to understand the structure of the document space and define paperpaper similarities based on multiple linkages. Examples are presented where the decomposed tensor data is used to find papers similar to a body of work (e.g., related by topic or similar to a particular author's papers), find related authors using linkages other than explicit coauthorship or citations, distinguish between papers written by di#erent authors with the same name, and predict the journal in which a paper was published.
Transforming Graph Data for Statistical Relational Learning
, 2012
"... Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of Statistical Relational Learning (SRL) algorithms to these domains. In th ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
(Show Context)
Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of Statistical Relational Learning (SRL) algorithms to these domains. In this article, we examine and categorize techniques for transforming graphbased relational data to improve SRL algorithms. In particular, appropriate transformations of the nodes, links, and/or features of the data can dramatically affect the capabilities and results of SRL algorithms. We introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. More specifically, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) systematically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey competing approaches for each of these tasks. We also discuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed.
Vertex Collocation Profiles: Subgraph Counting for Link Analysis and Prediction
"... We introduce the concept of a vertex collocation profile (VCP) for the purpose of topological link analysis and prediction. VCPs provide nearly complete information about the surrounding local structure of embedded vertex pairs. The VCP approach offers a new tool for domain experts to understand the ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
(Show Context)
We introduce the concept of a vertex collocation profile (VCP) for the purpose of topological link analysis and prediction. VCPs provide nearly complete information about the surrounding local structure of embedded vertex pairs. The VCP approach offers a new tool for domain experts to understand the underlying growth mechanisms in their networks and to analyze link formation mechanisms in the appropriate sociological, biological, physical, or other context. The same resolution that gives VCP its analytical power also enables it to perform well when used in supervised models to discriminate potential new links. We first develop the theory, mathematics, and algorithms underlying VCPs. Then we demonstrate VCP methods performing link prediction competitively with unsupervised and supervised methods across several different network families. We conclude with timing results that introduce the comparative performance of several existing algorithms and the practicability of VCP computations on large networks.
Finding cohesive clusters for analyzing knowledge communities
"... Documents and authors can be clustered into “knowledge communities ” based on the overlap in the papers they cite. We introduce a new clustering algorithm, Streemer, which finds cohesive foreground clusters embedded in a diffuse background, and use it to identify knowledge communities as foreground ..."
Abstract

Cited by 8 (6 self)
 Add to MetaCart
(Show Context)
Documents and authors can be clustered into “knowledge communities ” based on the overlap in the papers they cite. We introduce a new clustering algorithm, Streemer, which finds cohesive foreground clusters embedded in a diffuse background, and use it to identify knowledge communities as foreground clusters of papers which share common citations. To analyze the evolution of these communities over time, we build predictive models with features based on the citation structure, the vocabulary of the papers, and the affiliations and prestige of the authors. Findings include that scientific knowledge communities tend to grow more rapidly if their publications build on diverse information and if they use a narrow vocabulary. 1.
Utile distinctions for relational reinforcement learning
 Proceedings of the 20th International Joint Conference on Artificial Intelligence
, 2007
"... We introduce an approach to autonomously creating state space abstractions for an online reinforcement learning agent using a relational representation. Our approach uses a treebased function approximation derived from McCallum’s [1995] UTree algorithm. We have extended this approach to use a relat ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
We introduce an approach to autonomously creating state space abstractions for an online reinforcement learning agent using a relational representation. Our approach uses a treebased function approximation derived from McCallum’s [1995] UTree algorithm. We have extended this approach to use a relational representation where relational observations are represented by attributed graphs [McGovern et al., 2003]. We address the challenges introduced by a relational representation by using stochastic sampling to manage the search space [Srinivasan, 1999] and temporal sampling to manage autocorrelation [Jensen and Neville, 2002]. Relational UTree incorporates Iterative Tree Induction [Utgoff et al., 1997] to allow it to adapt to changing environments. We empirically demonstrate that Relational UTree performs better than similar relational learning methods [Finney et al., 2002; Driessens et al., 2001] in a blocks world domain. We also demonstrate that Relational UTree can learn to play a subtask of the game of Go called TsumeGo [Ramon et al., 2001]. 1
Relational Classification Through Three–State Epidemic Dynamics
"... Abstract Relational classification in networked data plays an important role in many problems such as text categorization, classification of web pages, group finding in peer networks, etc. We have previously demonstrated that for a class of label propagating algorithms the underlying dynamics can b ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Abstract Relational classification in networked data plays an important role in many problems such as text categorization, classification of web pages, group finding in peer networks, etc. We have previously demonstrated that for a class of label propagating algorithms the underlying dynamics can be modeled as a twostate epidemic process on heterogeneous networks, where infected nodes correspond to classified data instances. We have also suggested a binary classification algorithm that utilizes non–trivial characteristics of epidemic dynamics. In this paper we extend our previous work by considering a three–state epidemic model for label propagation. Specifically, we introduce a new, intermediate state that corresponds to “susceptible ” data instances. The utility of the added state is that it allows to control the rates of epidemic spreading, hence making the algorithm more flexible. We show empirically that this extension improves significantly the performance of the algorithm. In particular, we demonstrate that the new algorithm achieves good classification accuracy even for relatively large overlap across the classes.