Results 1  10
of
73
Memetracking and the Dynamics of the News Cycle
, 2009
"... Tracking new topics, ideas, and “memes” across the Web has been an issue of considerable interest. Recent work has developed methods for tracking topic shifts over long time scales, as well as abrupt spikes in the appearance of particular named entities. However, these approaches are less well suite ..."
Abstract

Cited by 357 (14 self)
 Add to MetaCart
Tracking new topics, ideas, and “memes” across the Web has been an issue of considerable interest. Recent work has developed methods for tracking topic shifts over long time scales, as well as abrupt spikes in the appearance of particular named entities. However, these approaches are less well suited to the identification of content that spreads widely and then fades over time scales on the order of days — the time scale at which we perceive news and events. We develop a framework for tracking short, distinctive phrases that travel relatively intact through online text; developing scalable algorithms for clustering textual variants of such phrases, we identify a broad class of memes that exhibit wide spread and rich variation on a daily basis. As our principal domain of study, we show how such a memetracking approach can provide a coherent representation of the news cycle — the daily rhythms in the news media that have long been the subject of qualitative interpretation but have never been captured accurately enough to permit actual quantitative analysis. We tracked 1.6 million mainstream media sites and blogs over a period of three months with the total of 90 million articles and we find a set of novel and persistent temporal patterns in the news cycle. In particular, we observe a typical lag of 2.5 hours between the peaks of attention to a phrase in the news media and in blogs respectively, with divergent behavior around the overall peak and a “heartbeat”like pattern in the handoff between news and blogs. We also develop and analyze a mathematical model for the kinds of temporal variation that the system exhibits.
Approximation Algorithms for Classification Problems with Pairwise Relationships: Metric Labeling and Markov Random Fields
 IN IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE
, 1999
"... In a traditional classification problem, we wish to assign one of k labels (or classes) to each of n objects, in a way that is consistent with some observed data that we have about the problem. An active line of research in this area is concerned with classification when one has information about pa ..."
Abstract

Cited by 195 (2 self)
 Add to MetaCart
In a traditional classification problem, we wish to assign one of k labels (or classes) to each of n objects, in a way that is consistent with some observed data that we have about the problem. An active line of research in this area is concerned with classification when one has information about pairwise relationships among the objects to be classified; this issue is one of the principal motivations for the framework of Markov random fields, and it arises in areas such as image processing, biometry, and document analysis. In its most basic form, this style of analysis seeks a classification that optimizes a combinatorial function consisting of assignment costs  based on the individual choice of label we make for each object  and separation costs  based on the pair of choices we make for two "related" objects. We formulate a general classification problem of this type, the metric labeling problem; we show that it contains as special cases a number of standard classification f...
Hogwild!: A lockfree approach to parallelizing stochastic gradient descent
, 2011
"... Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve stateoftheart performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performancedestroying memory locking and synchronization. This work a ..."
Abstract

Cited by 143 (7 self)
 Add to MetaCart
(Show Context)
Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve stateoftheart performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performancedestroying memory locking and synchronization. This work aims to show using novel theoretical analysis, algorithms, and implementation that SGD can be implemented without any locking. We present an update scheme called HOGWILD! which allows processors access to shared memory with the possibility of overwriting each other’s work. We show that when the associated optimization problem is sparse, meaning most gradient updates only modify small parts of the decision variable, then HOGWILD! achieves a nearly optimal rate of convergence. We demonstrate experimentally that HOGWILD! outperforms alternative schemes that use locking by an order of magnitude. 1
Clustering with qualitative information
 In Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
, 2003
"... We consider the problem of clustering a collection of elements based on pairwise judgments of similarity and dissimilarity. Bansal, Blum and Chawla [1] cast the problem thus: given a graph G whose edges are labeled “+ ” (similar) or “− ” (dissimilar), partition the vertices into clusters so that ..."
Abstract

Cited by 123 (9 self)
 Add to MetaCart
(Show Context)
We consider the problem of clustering a collection of elements based on pairwise judgments of similarity and dissimilarity. Bansal, Blum and Chawla [1] cast the problem thus: given a graph G whose edges are labeled “+ ” (similar) or “− ” (dissimilar), partition the vertices into clusters so that the number of pairs correctly (resp. incorrectly) classified with respect to the input labeling is maximized (resp. minimized). Complete graphs, where the classifier labels every edge, and general graphs, where some edges are not labeled, are both worth studying. We answer several questions left open in [1] and provide a sound overview of clustering with qualitative information. We give a factor 4 approximation for minimization on complete graphs, and a factor O(log n) approximation for general graphs. For the maximization version, a PTAS for complete graphs is shown in [1]; we give a factor 0.7664 approximation for general graphs, noting that a PTAS is unlikely by proving APXhardness. We also prove the APXhardness of minimization on complete graphs. 1.
Approximation Algorithms for the Metric Labeling Problem via a New Linear Programming Formulation
, 2000
"... We consider approximation algorithms for the metric labeling problem. Informally speaking, we are given a weighted graph that specifies relations between pairs of objects drawn from a given set of objects. The goal is to find a minimum cost labeling of these objects where the cost of a labeling is d ..."
Abstract

Cited by 77 (1 self)
 Add to MetaCart
(Show Context)
We consider approximation algorithms for the metric labeling problem. Informally speaking, we are given a weighted graph that specifies relations between pairs of objects drawn from a given set of objects. The goal is to find a minimum cost labeling of these objects where the cost of a labeling is determined by the pairwise relations between the objects and a distance function on labels; the distance function is assumed to be a metric. Each object also incurs an assignment cost that is label, and vertex dependent. The problem was introduced in a recent paper by Kleinberg and Tardos [19], and captures many classification problems that arise in computer vision and related fields. They gave an O(log k log log k) approximation for the general case where k is the number of labels and a 2approximation for the uniform metric case. More recently, Gupta and Tardos [14] gave a 4approximation for the truncated linear metric, a natural nonuniform metric motivated by practical applications to image restoration and visual correspondence. In this paper we introduce a new natural integer programming formulation and show that the integrality gap of its linear relaxation either matches or improves the ratios known for several cases of the metric labeling problem studied until now, providing a unified approach to solving them. Specifically, we show that the integrality gap of our LP is bounded by O(log k log log k) for general metric and 2 for the uniform metric thus matching the ratios in [19]. We also develop an algorithm based on our LP that achieves a ratio of 2 + p 2 ' 3:414 for the truncated linear metric improving the ratio provided by [14]. Our algorithm uses the fact that the integrality gap of our LP is 1 on a linear metric. We believe that our formulation h...
Approximation algorithms for the 0extension problem
 IN PROCEEDINGS OF THE TWELFTH ANNUAL ACMSIAM SYMPOSIUM ON DISCRETE ALGORITHMS
, 2001
"... In the 0extension problem, we are given a weighted graph with some nodes marked as terminals and a semimetric on the set of terminals. Our goal is to assign the rest of the nodes to terminals so as to minimize the sum, over all edges, of the product of the edge’s weight and the distance between t ..."
Abstract

Cited by 70 (3 self)
 Add to MetaCart
(Show Context)
In the 0extension problem, we are given a weighted graph with some nodes marked as terminals and a semimetric on the set of terminals. Our goal is to assign the rest of the nodes to terminals so as to minimize the sum, over all edges, of the product of the edge’s weight and the distance between the terminals to which its endpoints are assigned. This problem generalizes the multiway cut problem of Dahlhaus, Johnson, Papadimitriou, Seymour, and Yannakakis and is closely related to the metric labeling problem introduced by Kleinberg and Tardos. We present approximation algorithms for 0Extension. In arbitrary graphs, we present a O(log k)approximation algorithm, k being the number of terminals. We also give O(1)approximation guarantees for weighted planar graphs. Our results are based on a natural metric relaxation of the problem, previously considered by Karzanov. It is similar in flavor to the linear programming relaxation of Garg, Vazirani, and Yannakakis for the multicut problem and similar to relaxations for other graph partitioning problems. We prove that the integrality ratio of the metric relaxation is at least c √ lg k for a positive c for infinitely many k. Our results improve some of the results of Kleinberg and Tardos and they further our understanding on how to use metric relaxations.
Rounding algorithms for a geometric embedding of minimum multiway cut
 In STOC ’99: Proceedings of the 31st Annual ACM Symposium on Theory of Computing
, 1999
"... Given an undirected graph with edge costs and a subset of k ≥ 3 nodes called terminals, a multiway, or kway, cut is a subset of the edges whose removal disconnects each terminal from the others. The multiway cut problem is to find a minimumcost multiway cut. This problem is MaxSNP hard. Recently ..."
Abstract

Cited by 52 (2 self)
 Add to MetaCart
(Show Context)
Given an undirected graph with edge costs and a subset of k ≥ 3 nodes called terminals, a multiway, or kway, cut is a subset of the edges whose removal disconnects each terminal from the others. The multiway cut problem is to find a minimumcost multiway cut. This problem is MaxSNP hard. Recently Calinescu, Karloff, and Rabani (STOC’98) gave a novel geometric relaxation of the problem and a rounding scheme that produced a (3/2 − 1/k)approximation algorithm. In this paper, we study their geometric relaxation. In particular, we study the worstcase ratio between the value of the relaxation and the value of the minimum multicut (the socalled integrality gap of the relaxation). For k = 3, we show the integrality gap is 12/11, giving tight upper and lower bounds. That is, we exhibit a graph with integrality gap 12/11 and give an algorithm that finds a cut of value 12/11 times the relaxation value. This is the best possible performance guarantee for any algorithm based purely on the value of the relaxation and improves on Calinescu et al.’s factor of 7/6. We also improve the upper bounds for all larger values of k. For k = 4, 5, our best upper bounds are based on computer constructed and analyzed rounding schemes, while for k> 6 we give an algorithm with performance ratio 1.3438 − ɛk. Our results were discovered with the help of computational experiments that we also describe here.
A Comparative Study of Modern Inference Techniques for Discrete Energy Minimization Problem
"... Seven years ago, Szeliski et al. published an influential study on energy minimization methods for Markov random fields (MRF). This study provided valuable insights in choosing the best optimization technique for certain classes of problems. While these insights remain generally useful today, the ph ..."
Abstract

Cited by 46 (11 self)
 Add to MetaCart
Seven years ago, Szeliski et al. published an influential study on energy minimization methods for Markov random fields (MRF). This study provided valuable insights in choosing the best optimization technique for certain classes of problems. While these insights remain generally useful today, the phenominal success of random field models means that the kinds of inference problems we solve have changed significantly. Specifically, the models today often include higher order interactions, flexible connectivity structures, large labelspaces of different cardinalities, or learned energy tables. To reflect these changes, we provide a modernized and enlarged study. We present an empirical comparison of 24 stateofart techniques on a corpus of 2,300 energy minimization instances from 20 diverse computer vision applications. To ensure reproducibility, we evaluate all methods in the OpenGM2 framework and report extensive results regarding runtime and solution quality. Key insights from our study agree with the results of Szeliski et al. for the types of models they studied. However, on new and challenging types of models our findings disagree and suggest that polyhedral methods and integer programming solvers are competitive in terms of runtime and solution quality over a large range of model types.
A linear programming formulation and approximation algorithms for the metric labeling problem
 SIAM J. Discrete Math
"... We consider approximation algorithms for the metric labeling problem. This problem was introduced in a paper by Kleinberg and Tardos [J. ACM, 49 (2002), pp. 616–630] and captures many classification problems that arise in computer vision and related fields. They gave an O(log k log log k) approximat ..."
Abstract

Cited by 43 (1 self)
 Add to MetaCart
(Show Context)
We consider approximation algorithms for the metric labeling problem. This problem was introduced in a paper by Kleinberg and Tardos [J. ACM, 49 (2002), pp. 616–630] and captures many classification problems that arise in computer vision and related fields. They gave an O(log k log log k) approximation for the general case, where k is the number of labels, and a 2approximation for the uniform metric case. (In fact, the bound for general metrics can be improved to O(log k) by the work of Fakcheroenphol, Rao, and Talwar [Proceedings
An improved approximation algorithm for the 0extension problem
 In 14th Annual ACMSIAM Symposium on Discrete Algorithms
, 2003
"... Abstract Given a graph G = (V, E), a set of terminals T ` V, anda metric D on T, the 0extension problem is to assignvertices in V to terminals, so that the sum, over all edges e, of the distance (under D) between the terminals towhich the end points of e are assigned, is minimized.This problem was ..."
Abstract

Cited by 36 (5 self)
 Add to MetaCart
(Show Context)
Abstract Given a graph G = (V, E), a set of terminals T ` V, anda metric D on T, the 0extension problem is to assignvertices in V to terminals, so that the sum, over all edges e, of the distance (under D) between the terminals towhich the end points of e are assigned, is minimized.This problem was first studied by Karzanov. Calinescu, Karloff and Rabani gave an O(log k) approximationalgorithm based on a linear programming relaxation for the problem, where k is the number of terminals. Weimprove on this bound, and give an O(log k / log log k)approximation algorithm for the problem. 1 Introduction In the 0extension problem, we are given an undirectedgraph G = (V, E) with costs c(u, v) on edges, a setof terminals