Results 1  10
of
188
Memetracking and the Dynamics of the News Cycle
, 2009
"... Tracking new topics, ideas, and “memes” across the Web has been an issue of considerable interest. Recent work has developed methods for tracking topic shifts over long time scales, as well as abrupt spikes in the appearance of particular named entities. However, these approaches are less well suite ..."
Abstract

Cited by 357 (14 self)
 Add to MetaCart
Tracking new topics, ideas, and “memes” across the Web has been an issue of considerable interest. Recent work has developed methods for tracking topic shifts over long time scales, as well as abrupt spikes in the appearance of particular named entities. However, these approaches are less well suited to the identification of content that spreads widely and then fades over time scales on the order of days — the time scale at which we perceive news and events. We develop a framework for tracking short, distinctive phrases that travel relatively intact through online text; developing scalable algorithms for clustering textual variants of such phrases, we identify a broad class of memes that exhibit wide spread and rich variation on a daily basis. As our principal domain of study, we show how such a memetracking approach can provide a coherent representation of the news cycle — the daily rhythms in the news media that have long been the subject of qualitative interpretation but have never been captured accurately enough to permit actual quantitative analysis. We tracked 1.6 million mainstream media sites and blogs over a period of three months with the total of 90 million articles and we find a set of novel and persistent temporal patterns in the news cycle. In particular, we observe a typical lag of 2.5 hours between the peaks of attention to a phrase in the news media and in blogs respectively, with divergent behavior around the overall peak and a “heartbeat”like pattern in the handoff between news and blogs. We also develop and analyze a mathematical model for the kinds of temporal variation that the system exhibits.
Approximation Algorithms for Classification Problems with Pairwise Relationships: Metric Labeling and Markov Random Fields
 IN IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE
, 1999
"... In a traditional classification problem, we wish to assign one of k labels (or classes) to each of n objects, in a way that is consistent with some observed data that we have about the problem. An active line of research in this area is concerned with classification when one has information about pa ..."
Abstract

Cited by 195 (2 self)
 Add to MetaCart
In a traditional classification problem, we wish to assign one of k labels (or classes) to each of n objects, in a way that is consistent with some observed data that we have about the problem. An active line of research in this area is concerned with classification when one has information about pairwise relationships among the objects to be classified; this issue is one of the principal motivations for the framework of Markov random fields, and it arises in areas such as image processing, biometry, and document analysis. In its most basic form, this style of analysis seeks a classification that optimizes a combinatorial function consisting of assignment costs  based on the individual choice of label we make for each object  and separation costs  based on the pair of choices we make for two "related" objects. We formulate a general classification problem of this type, the metric labeling problem; we show that it contains as special cases a number of standard classification f...
The Coign Automatic Distributed Partitioning System
, 1999
"... Although successive generations of middleware (such as RPC, CORBA, and DCOM) have made it easier to connect distributed programs, the process of distributed application decomposition has changed little: programmers manually divide applications into subprograms and manually assign those subprograms ..."
Abstract

Cited by 150 (9 self)
 Add to MetaCart
(Show Context)
Although successive generations of middleware (such as RPC, CORBA, and DCOM) have made it easier to connect distributed programs, the process of distributed application decomposition has changed little: programmers manually divide applications into subprograms and manually assign those subprograms to machines. Often the techniques used to choose a distribution are ad hoc and create onetime solutions biased to a specific combination of users, machines, and networks. We assert that system software, not the programmer, should manage the task of distributed decomposition. To validate our assertion we present Coign, an automatic distributed partitioning system that significantly eases the development of distributed applications. Given an application (in binary form) built from distributable COM components, Coign constructs a graph model of the application’s intercomponent communication through scenariobased profiling. Later, Coign applies a graphcutting algorithm to partition the application across a network and minimize execution delay due to network communication. Using Coign, even an end user (without access to source code) can transform a nondistributed application into an optimized, distributed application. Coign has automatically distributed binaries from over 2 million lines of application code, including Microsoft’s PhotoDraw 2000 image processor. To our knowledge, Coign is the first system to automatically partition and distribute binary applications. 1.
A Framework For Community Identification in Dynamic Social Networks
, 2007
"... We propose frameworks and algorithms for identifying communities in social networks that change over time. Communities are intuitively characterized as “unusually densely knit ” subsets of a social network. This notion becomes more problematic if the social interactions change over time. Aggregating ..."
Abstract

Cited by 113 (6 self)
 Add to MetaCart
We propose frameworks and algorithms for identifying communities in social networks that change over time. Communities are intuitively characterized as “unusually densely knit ” subsets of a social network. This notion becomes more problematic if the social interactions change over time. Aggregating social networks over time can radically misrepresent the existing and changing community structure. Instead, we propose an optimizationbased approach for modeling dynamic community structure. We prove that finding the most explanatory community structure is NPhard and APXhard, and propose algorithms based on dynamic programming, exhaustive search, maximum matching, and greedy heuristics. We demonstrate empirically that the heuristics trace developments of community structure accurately for several synthetic and realworld examples.
Fast approximate energy minimization with label costs
, 2010
"... The αexpansion algorithm [7] has had a significant impact in computer vision due to its generality, effectiveness, and speed. Thus far it can only minimize energies that involve unary, pairwise, and specialized higherorder terms. Our main contribution is to extend αexpansion so that it can simult ..."
Abstract

Cited by 108 (9 self)
 Add to MetaCart
(Show Context)
The αexpansion algorithm [7] has had a significant impact in computer vision due to its generality, effectiveness, and speed. Thus far it can only minimize energies that involve unary, pairwise, and specialized higherorder terms. Our main contribution is to extend αexpansion so that it can simultaneously optimize “label costs ” as well. An energy with label costs can penalize a solution based on the set of labels that appear in it. The simplest special case is to penalize the number of labels in the solution. Our energy is quite general, and we prove optimality bounds for our algorithm. A natural application of label costs is multimodel fitting, and we demonstrate several such applications in vision: homography detection, motion segmentation, and unsupervised image segmentation. Our C++/MATLAB implementation is publicly available.
On the Hardness of Approximating Multicut and SparsestCut
 In Proceedings of the 20th Annual IEEE Conference on Computational Complexity
, 2005
"... We show that the MULTICUT, SPARSESTCUT, and MIN2CNF ≡ DELETION problems are NPhard to approximate within every constant factor, assuming the Unique Games Conjecture of Khot [STOC, 2002]. A quantitatively stronger version of the conjecture implies inapproximability factor of Ω(log log n). 1. ..."
Abstract

Cited by 102 (5 self)
 Add to MetaCart
(Show Context)
We show that the MULTICUT, SPARSESTCUT, and MIN2CNF ≡ DELETION problems are NPhard to approximate within every constant factor, assuming the Unique Games Conjecture of Khot [STOC, 2002]. A quantitatively stronger version of the conjecture implies inapproximability factor of Ω(log log n). 1.
Approximation Algorithms for the Metric Labeling Problem via a New Linear Programming Formulation
, 2000
"... We consider approximation algorithms for the metric labeling problem. Informally speaking, we are given a weighted graph that specifies relations between pairs of objects drawn from a given set of objects. The goal is to find a minimum cost labeling of these objects where the cost of a labeling is d ..."
Abstract

Cited by 77 (1 self)
 Add to MetaCart
(Show Context)
We consider approximation algorithms for the metric labeling problem. Informally speaking, we are given a weighted graph that specifies relations between pairs of objects drawn from a given set of objects. The goal is to find a minimum cost labeling of these objects where the cost of a labeling is determined by the pairwise relations between the objects and a distance function on labels; the distance function is assumed to be a metric. Each object also incurs an assignment cost that is label, and vertex dependent. The problem was introduced in a recent paper by Kleinberg and Tardos [19], and captures many classification problems that arise in computer vision and related fields. They gave an O(log k log log k) approximation for the general case where k is the number of labels and a 2approximation for the uniform metric case. More recently, Gupta and Tardos [14] gave a 4approximation for the truncated linear metric, a natural nonuniform metric motivated by practical applications to image restoration and visual correspondence. In this paper we introduce a new natural integer programming formulation and show that the integrality gap of its linear relaxation either matches or improves the ratios known for several cases of the metric labeling problem studied until now, providing a unified approach to solving them. Specifically, we show that the integrality gap of our LP is bounded by O(log k log log k) for general metric and 2 for the uniform metric thus matching the ratios in [19]. We also develop an algorithm based on our LP that achieves a ratio of 2 + p 2 ' 3:414 for the truncated linear metric improving the ratio provided by [14]. Our algorithm uses the fact that the integrality gap of our LP is 1 on a linear metric. We believe that our formulation h...
An improved approximation algorithm for multiway cut
 Journal of Computer and System Sciences
, 1998
"... Given an undirected graph with edge costs and a subset of k nodes called terminals, a multiway cut is a subset of edges whose removal disconnects each terminal from the rest. Multiway Cut is the problem of finding a multiway cut of minimum cost. Previously, a very simple combinatorial algorithm due ..."
Abstract

Cited by 74 (5 self)
 Add to MetaCart
Given an undirected graph with edge costs and a subset of k nodes called terminals, a multiway cut is a subset of edges whose removal disconnects each terminal from the rest. Multiway Cut is the problem of finding a multiway cut of minimum cost. Previously, a very simple combinatorial algorithm due to Dahlhaus, � Johnson, Papadimitriou, Seymour, and Yannakakis gave a performance guarantee of 2 1 − 1 k. In this paper, we present a new linear programming relaxation for Multiway Cut and a new approximation algorithm based on it. The algorithm breaks the threshold of 2 for approximating Multiway Cut, achieving a. This improves the previous result for every value of k. performance ratio of at most 1.5 − 1 k In particular, for k = 3 we get a ratio of 7
Approximation algorithms for the 0extension problem
 IN PROCEEDINGS OF THE TWELFTH ANNUAL ACMSIAM SYMPOSIUM ON DISCRETE ALGORITHMS
, 2001
"... In the 0extension problem, we are given a weighted graph with some nodes marked as terminals and a semimetric on the set of terminals. Our goal is to assign the rest of the nodes to terminals so as to minimize the sum, over all edges, of the product of the edge’s weight and the distance between t ..."
Abstract

Cited by 70 (3 self)
 Add to MetaCart
(Show Context)
In the 0extension problem, we are given a weighted graph with some nodes marked as terminals and a semimetric on the set of terminals. Our goal is to assign the rest of the nodes to terminals so as to minimize the sum, over all edges, of the product of the edge’s weight and the distance between the terminals to which its endpoints are assigned. This problem generalizes the multiway cut problem of Dahlhaus, Johnson, Papadimitriou, Seymour, and Yannakakis and is closely related to the metric labeling problem introduced by Kleinberg and Tardos. We present approximation algorithms for 0Extension. In arbitrary graphs, we present a O(log k)approximation algorithm, k being the number of terminals. We also give O(1)approximation guarantees for weighted planar graphs. Our results are based on a natural metric relaxation of the problem, previously considered by Karzanov. It is similar in flavor to the linear programming relaxation of Garg, Vazirani, and Yannakakis for the multicut problem and similar to relaxations for other graph partitioning problems. We prove that the integrality ratio of the metric relaxation is at least c √ lg k for a positive c for infinitely many k. Our results improve some of the results of Kleinberg and Tardos and they further our understanding on how to use metric relaxations.
Rounding algorithms for a geometric embedding of minimum multiway cut
 In STOC ’99: Proceedings of the 31st Annual ACM Symposium on Theory of Computing
, 1999
"... Given an undirected graph with edge costs and a subset of k ≥ 3 nodes called terminals, a multiway, or kway, cut is a subset of the edges whose removal disconnects each terminal from the others. The multiway cut problem is to find a minimumcost multiway cut. This problem is MaxSNP hard. Recently ..."
Abstract

Cited by 52 (2 self)
 Add to MetaCart
(Show Context)
Given an undirected graph with edge costs and a subset of k ≥ 3 nodes called terminals, a multiway, or kway, cut is a subset of the edges whose removal disconnects each terminal from the others. The multiway cut problem is to find a minimumcost multiway cut. This problem is MaxSNP hard. Recently Calinescu, Karloff, and Rabani (STOC’98) gave a novel geometric relaxation of the problem and a rounding scheme that produced a (3/2 − 1/k)approximation algorithm. In this paper, we study their geometric relaxation. In particular, we study the worstcase ratio between the value of the relaxation and the value of the minimum multicut (the socalled integrality gap of the relaxation). For k = 3, we show the integrality gap is 12/11, giving tight upper and lower bounds. That is, we exhibit a graph with integrality gap 12/11 and give an algorithm that finds a cut of value 12/11 times the relaxation value. This is the best possible performance guarantee for any algorithm based purely on the value of the relaxation and improves on Calinescu et al.’s factor of 7/6. We also improve the upper bounds for all larger values of k. For k = 4, 5, our best upper bounds are based on computer constructed and analyzed rounding schemes, while for k> 6 we give an algorithm with performance ratio 1.3438 − ɛk. Our results were discovered with the help of computational experiments that we also describe here.