Results 1  10
of
24
Deterministic pivoting algorithms for constrained ranking and Clustering Problems
, 2007
"... We consider ranking and clustering problems related to the aggregation of inconsistent information, in particular, rank aggregation, (weighted) feedback arc set in tournaments, consensus and correlation clustering, and hierarchical clustering. Ailon, Charikar, and Newman [4], Ailon and Charikar [3], ..."
Abstract

Cited by 34 (4 self)
 Add to MetaCart
We consider ranking and clustering problems related to the aggregation of inconsistent information, in particular, rank aggregation, (weighted) feedback arc set in tournaments, consensus and correlation clustering, and hierarchical clustering. Ailon, Charikar, and Newman [4], Ailon and Charikar [3], and Ailon [2] proposed randomized constant factor approximation algorithms for these problems, which recursively generate a solution by choosing a random vertex as “pivot ” and dividing the remaining vertices into two groups based on the pivot vertex. In this paper, we answer an open question in these works by giving deterministic approximation algorithms for these problems. The analysis of our algorithms is simpler than the analysis of the randomized algorithms in [4], [3] and [2]. In addition, we consider the problem of finding minimumcost rankings and clusterings which must obey certain constraints (e.g. an input partial order in the case of ranking problems), which were introduced by Hegde and Jain [25] (see also [34]). We show that the first type of algorithms we propose can also handle these constrained problems. In addition, we show that in the case of a rank aggregation or consensus clustering problem, if the input rankings or clusterings obey the constraints, then we can always ensure that the output of
Uncovering Groups via Heterogeneous Interaction Analysis
"... Abstract—With the pervasive availability of Web 2.0 and social networking sites, people can interact with each other easily through various social media. For instance, popular sites like Del.icio.us, Flickr, and YouTube allow users to comment shared content (bookmark, photos, videos), and users can ..."
Abstract

Cited by 26 (6 self)
 Add to MetaCart
(Show Context)
Abstract—With the pervasive availability of Web 2.0 and social networking sites, people can interact with each other easily through various social media. For instance, popular sites like Del.icio.us, Flickr, and YouTube allow users to comment shared content (bookmark, photos, videos), and users can tag their own favorite content. Users can also connect to each other, and subscribe to or become a fan or a follower of others. These diverse individual activities result in a multidimensional network among actors, forming crossdimension group structures with group members sharing certain similarities. It is challenging to effectively integrate the network information of multiple dimensions in order to discover crossdimension group structures. In this work, we propose a twophase strategy to identify the hidden structures shared across dimensions in multidimensional networks. We extract structural features from each dimension of the network via modularity analysis, and then integrate them all to find out a robust community structure among actors. Experiments on synthetic and realworld data validate the superiority of our strategy, enabling the analysis of collective behavior underneath diverse individual activities in a large scale.
Rank aggregation: Together we’re strong
 In Proc. of 11th ALENEX
, 1998
"... We consider the problem of finding a ranking of a set of elements that is “closest to ” a given set of input rankings of the elements; more precisely, we want to find a permutation that minimizes the Kendalltau distance to the input rankings, where the Kendalltau distance is defined as the sum ove ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
(Show Context)
We consider the problem of finding a ranking of a set of elements that is “closest to ” a given set of input rankings of the elements; more precisely, we want to find a permutation that minimizes the Kendalltau distance to the input rankings, where the Kendalltau distance is defined as the sum over all input rankings of the number of pairs of elements that are in a different order in the input ranking than in the output ranking. If the input rankings are permutations, this problem is known as the Kemeny rank aggregation problem. This problem arises for example in building metasearch engines for Web search, aggregating viewers ’ rankings of movies, or giving recommendations to a user based on several different criteria, where we can think of having one ranking of the
Optimal Meta Search Results Clustering
 Proc. 33rd Int’l ACM SIGIR Conf. Research and Development in Information Retrieval
, 2010
"... By analogy with merging documents rankings, the outputs from multiple search results clustering algorithms can be combined into a single output. In this paper we study the feasibility of meta search results clustering, which has unique features compared to the general meta clustering problem. After ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
(Show Context)
By analogy with merging documents rankings, the outputs from multiple search results clustering algorithms can be combined into a single output. In this paper we study the feasibility of meta search results clustering, which has unique features compared to the general meta clustering problem. After showing that the combination of multiple search results clusterings is empirically justified, we cast meta clustering as an optimization problem of an objective function measuring the probabilistic concordance between the clustering combination and the single clusterings. We then show, using an easily computable upper bound on such a function, that a simple stochastic optimization algorithm delivers reasonable approximations of the optimal value very efficiently, and we also provide a method for labeling the generated clusters with the most agreed upon cluster labels. Optimal meta clustering with meta labeling is applied to three descriptioncentric, stateoftheart search results clustering algorithms. The performance improvement is demonstrated through a range of evaluation techniques (i.e., internal, classificationoriented, and information retrievaloriented), using suitable test collections of search results with documentlevel relevance judgments per subtopic.
Average Parameterization and Partial Kernelization for Computing Medians
 PROC. 9TH LATIN
, 2010
"... We propose an effective polynomialtime preprocessing strategy for intractable median problems. Developing a new methodological framework, we show that if the input instances of generally intractable problems exhibit a sufficiently high degree of similarity between each other on average, then there ..."
Abstract

Cited by 13 (9 self)
 Add to MetaCart
We propose an effective polynomialtime preprocessing strategy for intractable median problems. Developing a new methodological framework, we show that if the input instances of generally intractable problems exhibit a sufficiently high degree of similarity between each other on average, then there are efficient exact solving algorithms. In other words, we show that the median problems Swap Median Permutation, Consensus Clustering, Kemeny Score, and Kemeny Tie Score all are fixedparameter tractable with respect to the parameter “average distance between input objects”. To this end, we develop the new concept of “partial kernelization” and identify interesting polynomialtime solvable special cases for the considered problems.
Bounding and comparing methods for correlation clustering beyond ILP
 In NAACLHLT Workshop on Integer Linear Programming for Natural Language Processing (ILPNLP 2009
, 2009
"... We evaluate several heuristic solvers for correlation clustering, the NPhard problem of partitioning a dataset given pairwise affinities between all points. We experiment on two practical tasks, document clustering and chat disentanglement, to which ILP does not scale. On these datasets, we show th ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
We evaluate several heuristic solvers for correlation clustering, the NPhard problem of partitioning a dataset given pairwise affinities between all points. We experiment on two practical tasks, document clustering and chat disentanglement, to which ILP does not scale. On these datasets, we show that the clustering objective often, but not always, correlates with external metrics, and that local search always improves over greedy solutions. We use semidefinite programming (SDP) to provide a tighter bound, showing that simple algorithms are already close to optimality. 1
Uncovering CrossDimension Group Structures in MultiDimensional Networks
"... With the proliferation of Web 2.0 and social networking sites, people can interact with each other easily through various social media. For instance, popular sites like Del.icio.us, Flickr, and YouTube allow users to comment sharing content (bookmark, photos, videos), and users can tag her own favor ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
(Show Context)
With the proliferation of Web 2.0 and social networking sites, people can interact with each other easily through various social media. For instance, popular sites like Del.icio.us, Flickr, and YouTube allow users to comment sharing content (bookmark, photos, videos), and users can tag her own favorite content. Users can also connect to friends, and subscribe to or become a fan of other users. These diverse individual activities result in a multidimensional network among actors, forming crossdimension group structures with group members focusing on similar topics. It is challenging to effectively integrate the network information of multiple dimensions to find out the crossdimension group structure. In this work, we propose a twophase strategy to identify the hidden structures shared across dimensions in multidimensional networks. We extract structural features from each dimension of the network via modularity analysis, and then integrate them to find out a robust community structure among actors. Experiments on synthetic and realworld data validate the superiority of our strategy, enabling the analysis of collective behavior underneath diverse individual activities in a large scale. 1
Exploring Biological Network Dynamics with Ensembles of Graph Partitions
 In Proceedings of the PSB Pacific Symposium on Biocomputing
, 2010
"... Unveiling the modular structure of biological networks can reveal important organizational patterns in the cell. Many graph partitioning algorithms have been proposed towards this end. However, most approaches only consider a single, optimal decomposition of the network. In this work, we make use of ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Unveiling the modular structure of biological networks can reveal important organizational patterns in the cell. Many graph partitioning algorithms have been proposed towards this end. However, most approaches only consider a single, optimal decomposition of the network. In this work, we make use of the multitude of nearoptimal clusterings in order to explore the dynamics of network clusterings and how those dynamics relate to the structure of the underlying network. We recast the modularity optimization problem as an integer linear program with diversity constraints. These constraints produce an ensemble of dissimilar but still highly modular clusterings. We apply our approach to four social and biological networks and show how optimal and nearoptimal solutions can be used in conjunction to identify deeper community structure in the network, including intercommunity dynamics, communities that are especially resilient to change, and coreandperipheral community members. 1.
STOCHASTIC DATA CLUSTERING
, 2012
"... ... published the theory behind the longterm behavior of a dynamical system that can be described by a nearly uncoupled matrix. Over the past fifty years this theory has been used in a variety of contexts, including queueing theory, brain organization, and ecology. In all of these applications, the ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
... published the theory behind the longterm behavior of a dynamical system that can be described by a nearly uncoupled matrix. Over the past fifty years this theory has been used in a variety of contexts, including queueing theory, brain organization, and ecology. In all of these applications, the structure of the system is known and the point of interest is the various stages the system passes through on its way to some longterm equilibrium. This paper looks at this problem from the other direction. That is, we develop a technique for using the evolution of the system to tell us about its initial structure, and then use this technique to develop an algorithm that takes the varied solutions from multiple data clustering algorithms to arrive at a single data clustering solution.
On the Parameterized Complexity of Consensus Clustering
, 2011
"... Given a collection C of partitions of a base set S, the NPhard Consensus Clustering problem asks for a partition of S which has a total Mirkin distance of at most t to the partitions in C, where t is a nonnegative integer. We present a parameterized algorithm for Consensus Clustering with running ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Given a collection C of partitions of a base set S, the NPhard Consensus Clustering problem asks for a partition of S which has a total Mirkin distance of at most t to the partitions in C, where t is a nonnegative integer. We present a parameterized algorithm for Consensus Clustering with running time O(4.24 k ·k 3 +C·S  2), where k: = t/C is the average Mirkin distance of the solution partition to the partitions of C. Furthermore, we strengthen previous hardness results for Consensus Clustering, showing that Consensus Clustering remains NPhard even when all input partitions contain at most two subsets. Finally, we study a local search variant of Consensus Clustering, showing W[1]hardness for the parameter “radius of the Mirkindistance neighborhood”. In the process, we also consider a local search variant of the related Cluster Editing problem, showing W[1]hardness for the parameter “radius of the edge modification neighborhood”.