Spectra and Pseudospectra: The Behavior of Nonnormal Matrices and Operators
, 2005
Eigenvalues, latent roots, proper values, characteristic values—four synonyms for a set of numbers that provide much useful information about a matrix or operator. A huge amount of research has been directed at the theory of eigenvalues (localization, perturbation, canonical forms,...), at applications (ubiquitous), and at numerical computation. I would like to begin with a very selective description of some historical aspects of these topics, before moving on to pseudoeigenvalues, the subject of the book under review. Back in the 1930s, Frazer, Duncan, and Collar of the Aerodynamics Department of the National Physical Laboratory (NPL), England, were developing matrix methods for analyzing flutter (unwanted vibrations) in aircraft. This was the beginning of what became known as matrix structural analysis [9], and led to the authors ’ book Elementary Matrices and Some Applications to Dynamics and Differential Equations, published in 1938 [10], which was “the first to employ matrices as an engineering tool ” [2]. Olga Taussky worked in Frazer’s group at NPL during the Second World War, analyzing 6 × 6 quadratic eigenvalue problems (QEPs)
Top 10 algorithms in data mining
, 2007
Abstract This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, kMeans, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining algorithms in the research community. With each algorithm, we provide a description of the algorithm, discuss the impact of the algorithm, and review current and further research on the algorithm. These 10 algorithms cover classification,
Naïve Learning in Social Networks and the Wisdom of Crowds
, 2010
We study learning in a setting where agents receive independent noisy signals about the true value of a variable and then communicate in a network. They naïvely update beliefs by repeatedly taking weighted averages of neighbors’ opinions. We show that all opinions in a large society converge to the truth if and only if the influence of the most influential agent vanishes as the society grows. We also identify obstructions to this, including prominent groups, and provide structural conditions on the network ensuring efficient learning. Whether agents converge to the truth is unrelated to how quickly consensus is approached. (JEL D83, D85, Z13)
Querying and creating visualizations by analogy
 IEEE Transactions on Visualization and Computer Graphics
Abstract — While there have been advances in visualization systems, particularly in multiview visualizations and visual exploration, the process of building visualizations remains a major bottleneck in data exploration. We show that provenance metadata collected during the creation of pipelines can be reused to suggest similar content in related visualizations and guide semiautomated changes. We introduce the idea of querybyexample in the context of an ensemble of visualizations, and the use of analogies as firstclass operations in a system to guide scalable interactions. We describe an implementation of these techniques in VisTrails, a publiclyavailable, opensource system. Index Terms—visualization systems, querybyexample, analogy 1
Detecting spammers and content promoters in online video social networks
 In Proc. of SIGIR
, 2009
A number of online video social networks, out of which YouTube is the most popular, provides features that allow users to post a video as a response to a discussion topic. These features open opportunities for users to introduce polluted content, or simply pollution, into the system. For instance, spammers may post an unrelated video as response to a popular one aiming at increasing the likelihood of the response being viewed by a larger number of users. Moreover, opportunistic users promoters may try to gain visibility to a specific video by posting a large number of (potentially unrelated) responses to boost the rank of the responded video, making it appear in the top lists maintained by the system. Content pollution may jeopardize the trust of users on the system, thus compromising its success in promoting social interactions. In spite of that, the available literature is very limited in providing a
Network Properties Revealed Through Matrix Functions
, 2008
The newly emerging field of Network Science deals with the tasks of modelling, comparing and summarizing large data sets that describe complex interactions. Because pairwise affinity data can be stored in a twodimensional array, graph theory and applied linear algebra provide extremely useful tools. Here, we focus on the general concepts of centrality, communicability and betweenness, each of which quantifies important features in a network. Some recent work in the mathematical physics literature has shown that the exponential of a network’s adjacency matrix can be used as the basis for defining and computing specific versions of these measures. We introduce here a general class of measures based on matrix functions, and show that a particular case involving a matrix resolvent arises naturally from graphtheoretic arguments. We also point out connections between these measures and the quantities typically computed when spectral methods are used for data mining tasks such as clustering and ordering. We finish with computational examples showing the new matrix resolvent version applied to real networks.
THE SINKHORNKNOPP ALGORITHM: CONVERGENCE AND APPLICATIONS
, 2007
As long as a square nonnegative matrix A contains sufficient nonzero elements, then the SinkhornKnopp algorithm can be used to balance the matrix, that is, to find a diagonal scaling of A that is doubly stochastic. It is known that the convergence is linear and an upper bound has been given for the rate of convergence for positive matrices. In this paper we give an explicit expression for the rate of convergence for fully indecomposable matrices. We describe how balancing algorithms can be used to give a measure of web page significance. We compare the measure with some well known alternatives, including PageRank. We show that with an appropriate modification, the SinkhornKnopp algorithm is a natural candidate for computing the measure on enormous data sets.
A family of dissimilarity measures between nodes generalizing both the shortestpath and the commutetime distances
 in Proceedings of the 14th SIGKDD International Conference on Knowledge Discovery and Data Mining
This work introduces a new family of linkbased dissimilarity measures between nodes of a weighted directed graph. This measure, called the randomized shortestpath (RSP) dissimilarity, depends on a parameter θ and has the interesting property of reducing, on one end, to the standard shortestpath distance when θ is large and, on the other end, to the commutetime (or resistance) distance when θ is small (near zero). Intuitively, it corresponds to the expected cost incurred by a random walker in order to reach a destination node from a starting node while maintaining a constant entropy (related to θ) spread in the graph. The parameter θ is therefore biasing gradually the simple random walk on the graph towards the shortestpath policy. By adopting a statistical physics approach and computing a sum over all the possible paths (discrete path integral), it is shown that the RSP dissimilarity from every node to a particular node of interest can be computed efficiently by solving two linear systems of n equations, where n is the number of nodes. On the other hand, the dissimilarity between every couple of nodes is obtained by inverting an n × n matrix. The proposed measure can be used for various graph mining tasks such as computing betweenness centrality, finding dense communities, etc, as shown in the experimental section.
Eigenvector centrality mapping for analyzing connectivity patterns in fMRI data of the human brain
 PLoS ONE 5, e10232. doi:10.1371/journal.pone.0010232 Mennes
, 2010
Functional magnetic resonance data acquired in a taskabsent condition (‘‘resting state’’) require new data analysis techniques that do not depend on an activation model. In this work, we introduce an alternative assumption and parameterfree method based on a particular form of node centrality called eigenvector centrality. Eigenvector centrality attributes a value to each voxel in the brain such that a voxel receives a large value if it is strongly correlated with many other nodes that are themselves central within the network. Google’s PageRank algorithm is a variant of eigenvector centrality. Thus far, other centrality measures in particular ‘‘betweenness centrality’ ’ have been applied to fMRI data using a preselected set of nodes consisting of several hundred elements. Eigenvector centrality is computationally much more efficient than betweenness centrality and does not require thresholding of similarity values so that it can be applied to thousands of voxels in a region of interest covering the entire cerebrum which would have been infeasible using betweenness centrality. Eigenvector centrality can be used on a variety of different similarity metrics. Here, we present applications based on linear correlations and on spectral coherences between fMRI times series. This latter approach allows us to draw conclusions of connectivity patterns in different spectral bands. We apply this method to fMRI data in taskabsent conditions where subjects were in states of hunger or satiety. We show that eigenvector centrality is modulated by the state that the subjects were in. Our analyses demonstrate that eigenvector centrality is a computationally efficient tool