Results 1 
8 of
8
Geotagging one hundred million twitter accounts with total variation minimization
 In Big Data (Big Data), 2014 IEEE International Conference on
, 2014
"... Abstract—Geographically annotated social media is extremely valuable for modern information retrieval. However, when researchers can only access publiclyvisible data, one quickly finds that social media users rarely publish location information. In this work, we provide a method which can geolocat ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
(Show Context)
Abstract—Geographically annotated social media is extremely valuable for modern information retrieval. However, when researchers can only access publiclyvisible data, one quickly finds that social media users rarely publish location information. In this work, we provide a method which can geolocate the overwhelming majority of active Twitter users, independent of their location sharing preferences, using only publiclyvisible Twitter data. Our method infers an unknown user’s location by examining their friend’s locations. We frame the geotagging problem as an optimization over a social network with a total variationbased objective and provide a scalable and distributed algorithm for its solution. Furthermore, we show how a robust estimate of the geographic dispersion of each user’s ego network can be used as a peruser accuracy measure which is effective at removing outlying errors. Leavemanyout evaluation shows that our method is able to infer location for 101,846,236 Twitter users at a median error of 6.38 km, allowing us to geotag over 80 % of public tweets. KeywordsSocial and Information Networks; Data mining; Optimization
CONTINUUM LIMIT OF TOTAL VARIATION ON POINT CLOUDS
, 2014
"... We consider point clouds obtained as random samples of a measure on a Euclidean domain. A graph representing the point cloud is obtained by assigning weights to edges based on the distance between the points they connect. Our goal is to develop mathematical tools needed to study the consistency, a ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We consider point clouds obtained as random samples of a measure on a Euclidean domain. A graph representing the point cloud is obtained by assigning weights to edges based on the distance between the points they connect. Our goal is to develop mathematical tools needed to study the consistency, as the number of available data points increases, of graphbased machine learning algorithms for tasks such as clustering. In particular, we study when is the cut capacity, and more generally total variation, on these graphs a good approximation of the perimeter (total variation) in the continuum setting. We address this question in the setting of Γconvergence. We obtain almost optimal conditions on the scaling, as number of points increases, of the size of the neighborhood over which the points are connected by an edge for the Γconvergence to hold. Taking the limit is enabled by a new metric which allows to suitably compare functionals defined on different point clouds.
Minimal Dirichlet energy partitions for graphs
, 2014
"... Motivated by a geometric problem, we introduce a new nonconvex graph partitioning objective where the optimality criterion is given by the sum of the Dirichlet eigenvalues of the partition components. A relaxed formulation is identified and a novel rearrangement algorithm is proposed, which we show ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Motivated by a geometric problem, we introduce a new nonconvex graph partitioning objective where the optimality criterion is given by the sum of the Dirichlet eigenvalues of the partition components. A relaxed formulation is identified and a novel rearrangement algorithm is proposed, which we show is strictly decreasing and converges in a finite number of iterations to a local minimum of the relaxed objective function. Our method is applied to several clustering problems on graphs constructed from synthetic data, MNIST handwritten digits, and manifold discretizations. The model has a semisupervised extension and provides a natural representative for the clusters as well.
Local barycentric coordinates
 ACM Trans. Graph
, 2014
"... manipulated control points are deformed, as indicated by the logarithmic colorcoding of the displacement magnitude. Barycentric coordinates yield a powerful and yet simple paradigm to interpolate data values on polyhedral domains. They represent interior points of the domain as an affine combinatio ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
manipulated control points are deformed, as indicated by the logarithmic colorcoding of the displacement magnitude. Barycentric coordinates yield a powerful and yet simple paradigm to interpolate data values on polyhedral domains. They represent interior points of the domain as an affine combination of a set of control points, defining an interpolation scheme for any function defined on a set of control points. Numerous barycentric coordinate schemes have been proposed satisfying a large variety of properties. However, they typically define interpolation as a combination of all control points. Thus a local change in the value at a single control point will create a global change by propagation into the whole domain. In this context, we present a family of local barycentric coordinates (LBC), which select for each interior point a small set of control points and satisfy common requirements on barycentric coordinates, such as linearity, nonnegativity, and smoothness. LBC are achieved through a convex optimization based on total variation, and provide a compact representation that reduces memory footprint and allows for fast deformations. Our experiments show that LBC provide more local and finer control on shape deformation than previous approaches, and lead to more intuitive deformation results.
CONSISTENCY OF CHEEGER AND RATIO GRAPH CUTS
, 2014
"... This paper establishes the consistency of a family of graphcutbased algorithms for clustering of data clouds. We consider point clouds obtained as samples of a groundtruth measure. We investigate approaches to clustering based on minimizing objective functionals defined on proximity graphs of t ..."
Abstract
 Add to MetaCart
This paper establishes the consistency of a family of graphcutbased algorithms for clustering of data clouds. We consider point clouds obtained as samples of a groundtruth measure. We investigate approaches to clustering based on minimizing objective functionals defined on proximity graphs of the given sample. Our focus is on functionals based on graph cuts like the Cheeger and ratio cuts. We show that minimizers of the these cuts converge as the sample size increases to a minimizer of a corresponding continuum cut (which partitions the ground truth measure). Moreover, we obtain sharp conditions on how the connectivity radius can be scaled with respect to the number of sample points for the consistency to hold. We provide results for twoway and for multiway cuts. Furthermore we provide numerical experiments that illustrate the results and explore the optimality of scaling in dimension two.
Tight Continuous Relaxation of the Balanced kCut Problem
"... Spectral Clustering as a relaxation of the normalized/ratio cut has become one of the standard graphbased clustering methods. Existing methods for the computation of multiple clusters, corresponding to a balanced kcut of the graph, are either based on greedy techniques or heuristics which have we ..."
Abstract
 Add to MetaCart
(Show Context)
Spectral Clustering as a relaxation of the normalized/ratio cut has become one of the standard graphbased clustering methods. Existing methods for the computation of multiple clusters, corresponding to a balanced kcut of the graph, are either based on greedy techniques or heuristics which have weak connection to the original motivation of minimizing the normalized cut. In this paper we propose a new tight continuous relaxation for any balanced kcut problem and show that a related recently proposed relaxation is in most cases loose leading to poor performance in practice. For the optimization of our tight continuous relaxation we propose a new algorithm for the difficult sumofratios minimization problem which achieves monotonic descent. Extensive comparisons show that our method outperforms all existing approaches for ratio cut and other balanced kcut criteria. 1
An Incremental Reseeding Strategy for Clustering
"... In this work we propose a simple and easily parallelizable algorithm for multiway graph partitioning. The algorithm alternates between three basic components: diffusing seed vertices over the graph, thresholding the diffused seeds, and then randomly reseeding the thresholded clusters. We demonstra ..."
Abstract
 Add to MetaCart
(Show Context)
In this work we propose a simple and easily parallelizable algorithm for multiway graph partitioning. The algorithm alternates between three basic components: diffusing seed vertices over the graph, thresholding the diffused seeds, and then randomly reseeding the thresholded clusters. We demonstrate experimentally that the proper combination of these ingredients leads to an algorithm that achieves stateoftheart performance in terms of cluster purity on standard benchmarks datasets. Moreover, the algorithm runs an order of magnitude faster than the other algorithms that achieve comparable results in terms of accuracy [1]. We also describe a coarsen, cluster and refine approach similar to [2, 3] that removes an additional order of magnitude from the runtime of our algorithm while still maintaining competitive accuracy. 1