Results 1  10
of
40
Estimation of simultaneously sparse and low rank matrices
 In Proc. ICML
, 2012
"... The paper introduces a penalized matrix estimation procedure aiming at solutions which are sparse and lowrank at the same time. Such structures arise in the context of social networks or protein interactions where underlying graphs have adjacency matrices which are blockdiagonal in the appropria ..."
Abstract

Cited by 27 (5 self)
 Add to MetaCart
(Show Context)
The paper introduces a penalized matrix estimation procedure aiming at solutions which are sparse and lowrank at the same time. Such structures arise in the context of social networks or protein interactions where underlying graphs have adjacency matrices which are blockdiagonal in the appropriate basis. We introduce a convex mixed penalty which involves `1norm and trace norm simultaneously. We obtain an oracle inequality which indicates how the two effects interact according to the nature of the target matrix. We bound generalization error in the link prediction problem. We also develop proximal descent strategies to solve the optimization problem efficiently and evaluate performance on synthetic and real data sets. 1.
You Are What You Like! Information Leakage Through Users’ Interests
 In NDSS
, 2012
"... Suppose that a Facebook user, whose age is hidden or missing, likes Britney Spears. Can you guess his/her age? Knowing that most Britney fans are teenagers, it is fairly easy for humans to answer this question. Interests (or “likes”) of users is one of the highlyavailable online information. In th ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
(Show Context)
Suppose that a Facebook user, whose age is hidden or missing, likes Britney Spears. Can you guess his/her age? Knowing that most Britney fans are teenagers, it is fairly easy for humans to answer this question. Interests (or “likes”) of users is one of the highlyavailable online information. In this paper, we show how these seemingly harmless interests (e.g., music interests) can leak privacysensitive information about users. In particular, we infer their undisclosed (private) attributes using the public attributes of other users sharing similar interests. In order to compare userdefined interest names, we extract their semantics using an ontologized version of Wikipedia and measure their similarity by applying a statistical learning method. Besides selfdeclared interests in music, our technique does not rely on any further information about users such as friend relationships or group belongings. Our experiments, based on more than 104K public profiles collected from Facebook and more than 2000 private profiles provided by volunteers, show that our inference technique efficiently predicts attributes that are very often hidden by users. To the best of our knowledge, this is the first time that user interests are used for profiling, and more generally, semanticsdriven inference of private data is addressed. 1.
Vertex Neighborhoods, Low Conductance Cuts, and Good Seeds for Local Community Methods
"... The communities of a social network are sets of vertices with more connections inside the set than outside. We theoretically demonstrate that two commonly observed properties of social networks, heavytailed degree distributions and large clustering coefficients, imply the existence of vertex neighb ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
(Show Context)
The communities of a social network are sets of vertices with more connections inside the set than outside. We theoretically demonstrate that two commonly observed properties of social networks, heavytailed degree distributions and large clustering coefficients, imply the existence of vertex neighborhoods (also known as egonets) that are themselves good communities. We evaluate these neighborhood communities on a range of graphs. What we find is that the neighborhood communities can exhibit conductance scores that are as good as the Fiedler cut. Also, the conductance of neighborhood communities shows similar behavior as the network community profile computed with a personalized PageRank community detection method. Neighborhood communities give us a simple and powerful heuristic for speeding up local partitioning methods. Since finding good seeds for the PageRank clustering method is difficult, most approaches involve an expensive sweep over a great many starting vertices. We show how to use neighborhood communities to quickly generate a small set of seeds.
CoarseGrained Topology Estimation via Graph Sampling
, 2012
"... In many online networks, nodes are partitioned into categories (e.g., countries or universities in OSNs), which naturally defines a weighted category graph i.e., a coarsegrained version of the underlying network. In this paper, we show how to efficiently estimate the category graph from a probabili ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
In many online networks, nodes are partitioned into categories (e.g., countries or universities in OSNs), which naturally defines a weighted category graph i.e., a coarsegrained version of the underlying network. In this paper, we show how to efficiently estimate the category graph from a probability sample of nodes. We prove consistency of our estimators and evaluate their efficiency via simulation. We also apply our methodology to a sample of Facebook users to obtain a number of category graphs, such as the college friendship graph and the country friendship graph. We share and visualize the resulting data at www.geosocialmap.com.
Learning a Distance Metric from a Network
"... Many realworld networks are described by both connectivity information and features for every node. To better model and understand these networks, we present structure preserving metric learning (SPML), an algorithm for learning a Mahalanobis distance metric from a network such that the learned dis ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
Many realworld networks are described by both connectivity information and features for every node. To better model and understand these networks, we present structure preserving metric learning (SPML), an algorithm for learning a Mahalanobis distance metric from a network such that the learned distances are tied to the inherent connectivity structure of the network. Like the graph embedding algorithm structure preserving embedding, SPML learns a metric which is structure preserving, meaning a connectivity algorithm such as knearest neighbors will yield the correct connectivity when applied using the distances from the learned metric. We show a variety of synthetic and realworld experiments where SPML predicts link patterns from node features more accurately than standard techniques. We further demonstrate a method for optimizing SPML based on stochastic gradient descent which removes the runningtime dependency on the size of the network and allows the method to easily scale to networks of thousands of nodes and millions of edges. 1
Treelike structure in large social and information networks
, 2013
"... Although large social and information networks are often thought of as having hierarchical or treelike structure, this assumption is rarely tested. We have performed a detailed empirical analysis of the treelike properties of realistic informatics graphs using two very different notions of treel ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Although large social and information networks are often thought of as having hierarchical or treelike structure, this assumption is rarely tested. We have performed a detailed empirical analysis of the treelike properties of realistic informatics graphs using two very different notions of treelikeness: Gromov’s δhyperbolicity, which is a notion from geometric group theory that measures how treelike a graph is in terms of its metric structure; and tree decompositions, tools from structural graph theory which measure how treelike a graph is in terms of its cut structure. Although realistic informatics graphs often do not have meaningful treelike structure when viewed with respect to the simplest and most popular metrics, e.g., the value of δ or the treewidth, we conclude that many such graphs do have meaningful treelike structure when viewed with respect to more refined metrics, e.g., a sizeresolved notion of δ or a closer analysis of the tree decompositions. We also show that, although these two rigorous notions of treelikeness capture very different treelike structures in worstcase, for realistic informatics graphs they empirically identify surprisingly similar structure. We interpret this treelike structure in terms of the recentlycharacterized “nested coreperiphery” property of large informatics graphs; and we show that the fast and scalable kcore heuristic can be used to identify this treelike structure.
Graph Sample and Hold: A Framework for BigGraph Analytics
"... Sampling is a standard approach in biggraph analytics; the goal is to efficiently estimate the graph properties by consulting a sample of the whole population. A perfect sample is assumed to mirror every property of the whole population. Unfortunately, such a perfect sample is hard to collect in c ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Sampling is a standard approach in biggraph analytics; the goal is to efficiently estimate the graph properties by consulting a sample of the whole population. A perfect sample is assumed to mirror every property of the whole population. Unfortunately, such a perfect sample is hard to collect in complex populations such as graphs (e.g. web graphs, social networks), where an underlying network connects the units of the population. Therefore, a good sample will be representative in the sense that graph properties of interest can be estimated with a known degree of accuracy. While previous work focused particularly on sampling schemes to estimate certain graph properties (e.g. triangle count), much less is known for the case when we need to estimate various graph properties with the same sampling scheme. In this paper, we propose a generic stream sampling framework for biggraph analytics,
1Mechanism Design for Finding Experts Using Locally Constructed Social Referral Web
"... Abstract—In this work, we address the problem of distributed expert finding using chains of social referrals and profile matching with only local information in online social networks. By assuming that users are selfish, rational, and have privately known cost of participating in the referrals, we d ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
(Show Context)
Abstract—In this work, we address the problem of distributed expert finding using chains of social referrals and profile matching with only local information in online social networks. By assuming that users are selfish, rational, and have privately known cost of participating in the referrals, we design a novel truthful efficient mechanism in which an expertfinding query will be relayed by intermediate users. When receiving a referral request, a participant will locally choose among her neighbors some user to relay the request. In our mechanism, several closely coupled methods are carefully designed to improve the performance of distributed search, including, profile matching, social acquaintance prediction, score function for locally choosing relay neighbors, and budget estimation. We conduct extensive experiments on several datasets of online social networks. The extensive study of our mechanism shows that the success rate of our mechanism is about 90 % in finding closely matched experts using only local search and limited budget, which significantly improves the previously best rate 20%. The overall cost of finding an expert by our truthful mechanism is about 20 % of the untruthful methods, e.g.. the method that always selects highdegree neighbors. The median length of social referral chains is 6 using our localized search decision, which surprisingly matches the wellknown smallworld phenomenon of global social structures.
A METHOD BASED ON TOTAL VARIATION FOR NETWORK MODULARITY OPTIMIZATION USING THE MBO SCHEME∗
"... Abstract. The study of network structure is pervasive in sociology, biology, computer science, and many other disciplines. One of the most important areas of network science is the algorithmic detection of cohesive groups of nodes called “communities. ” One popular approach to finding communities i ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Abstract. The study of network structure is pervasive in sociology, biology, computer science, and many other disciplines. One of the most important areas of network science is the algorithmic detection of cohesive groups of nodes called “communities. ” One popular approach to finding communities is to maximize a quality function known as modularity to achieve some sort of optimal clustering of nodes. In this paper, we interpret the modularity function from a novel perspective: we reformulate modularity optimization as a minimization problem of an energy functional that consists of a total variation term and an 2 balance term. By employing numerical techniques from image processing and 1 compressive sensing—such as convex splitting and the Merriman–Bence–Osher (MBO) scheme—we develop a variational algorithm for the minimization problem. We present our computational results using both synthetic benchmark networks and real data.
Simmelian Backbones: Amplifying Hidden Homophily in Facebook Networks
"... Abstract—Network data are more difficult to analyze when they originate from different types of relationships. In online social networks like Facebook, for example, interactions related to friendship, kinship, business, interests, and other relationships may all be represented as catchall “friendshi ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Network data are more difficult to analyze when they originate from different types of relationships. In online social networks like Facebook, for example, interactions related to friendship, kinship, business, interests, and other relationships may all be represented as catchall “friendships. ” Because several relations are mingled into one, the resulting networks exhibit relatively high and uniform density. As a consequence, the variation in positional differences and local cohesion may be too small for reliable analysis. We introduce a method to identify the essential relationships in networks representing social interactions. Our method is based on a novel concept of triadic cohesion that is motivated by Simmel’s concept of membership in social groups. We demonstrate that our Simmelian backbones are capable of extracting structure from Facebook interaction networks that makes them easy to visualize and analyze. Since all computations are local, the method can be restricted to partial networks such as ego networks, and scales to big data. I.