Results 1  10
of
44
Discovering Statistically Significant Biclusters in Gene Expression Data
 In Proceedings of ISMB 2002
, 2002
"... In gene expression data, a bicluster is a subset of the genes exhibiting consistent patterns over a subset of the conditions. We propose a new method to detect significant biclusters in large expression datasets. Our approach is graph theoretic coupled with statistical modelling of the data. Under p ..."
Abstract

Cited by 302 (4 self)
 Add to MetaCart
In gene expression data, a bicluster is a subset of the genes exhibiting consistent patterns over a subset of the conditions. We propose a new method to detect significant biclusters in large expression datasets. Our approach is graph theoretic coupled with statistical modelling of the data. Under plausible assumptions, our algorithm is polynomial and is guaranteed to find the most significant biclusters. We tested our method on a collection of yeast expression profiles and on a human cancer dataset. Cross validation results show high specificity in assigning function to genes based on their biclusters, and we are able to annotate in this way 196 uncharacterized yeast genes. We also demonstrate how the biclusters lead to detecting new concrete biological associations. In cancer data we are able to detect and relate finer tissue types than was previously possible. We also show that the method outperforms the biclustering algorithm of Cheng and Church (2000).
The Maximum Edge Biclique Problem is NPcomplete.
 Discr. Appl. Math.
, 2003
"... Abstract We prove that the maximum edge biclique problem in bipartite graphs is NPcomplete. ? ..."
Abstract

Cited by 116 (0 self)
 Add to MetaCart
(Show Context)
Abstract We prove that the maximum edge biclique problem in bipartite graphs is NPcomplete. ?
The Role Mining Problem: Finding a Minimal Descriptive Set of Roles
 In Symposium on Access Control Models and Technologies (SACMAT
, 2007
"... Devising a complete and correct set of roles has been recognized as one of the most important and challenging tasks in implementing role based access control. A key problem related to this is the notion of goodness/interestingness – when is a role good/interesting? In this paper, we define the role ..."
Abstract

Cited by 63 (5 self)
 Add to MetaCart
(Show Context)
Devising a complete and correct set of roles has been recognized as one of the most important and challenging tasks in implementing role based access control. A key problem related to this is the notion of goodness/interestingness – when is a role good/interesting? In this paper, we define the role mining problem (RMP) as the problem of discovering an optimal set of roles from existing user permissions. The main contribution of this paper is to formally define RMP, and analyze its theoretical bounds. In addition to the above basic RMP, we introduce two different variations of the RMP, called the δapprox RMP and the Minimal Noise RMP that have pragmatic implications. We reduce the known “set basis problem ” to RMP to show that RMP is an NPcomplete problem. An important contribution of this paper is also to show the relation of the role mining problem to several problems already identified in the data mining and data analysis literature. By showing that the RMP is in essence reducible to these known problems, we can directly borrow the existing implementation solutions and guide further research in this direction.
Consensus Algorithms for the Generation of All Maximal Bicliques
, 2002
"... We describe a new algorithm for generating all maximal bicliques (i.e. complete bipartite, not necessarily induced subgraphs) of a graph. The algorithm is inspired by, and is quite similar to, the consensus method used in propositional logic. We show that some variants of the algorithm are totally p ..."
Abstract

Cited by 48 (5 self)
 Add to MetaCart
We describe a new algorithm for generating all maximal bicliques (i.e. complete bipartite, not necessarily induced subgraphs) of a graph. The algorithm is inspired by, and is quite similar to, the consensus method used in propositional logic. We show that some variants of the algorithm are totally polynomial, and even incrementally polynomial. The total complexity of the most efficient variant of the algorithms presented here is polynomial in the input size, and only linear in the output size. Computational experiments demonstrate its high efficiency on randomly generated graphs with up to 2,000 vertices and 20,000 edges.
The discrete basis problem
, 2005
"... We consider the Discrete Basis Problem, which can be described as follows: given a collection of Boolean vectors find a collection of k Boolean basis vectors such that the original vectors can be represented using disjunctions of these basis vectors. We show that the decision version of this problem ..."
Abstract

Cited by 41 (13 self)
 Add to MetaCart
We consider the Discrete Basis Problem, which can be described as follows: given a collection of Boolean vectors find a collection of k Boolean basis vectors such that the original vectors can be represented using disjunctions of these basis vectors. We show that the decision version of this problem is NPcomplete and that the optimization version cannot be approximated within any finite ratio. We also study two variations of this problem, where the Boolean basis vectors must be mutually otrhogonal. We show that the other variation is closely related with the wellknown Metric kmedian Problem in Boolean space. To solve these problems, two algorithms will be presented. One is designed for the variations mentioned above, and it is solely based on solving the kmedian problem, while another is a heuristic intended to solve the general Discrete Basis Problem. We will also study the results of extensive experiments made with these two algorithms with both synthetic and realworld data. The results are twofold: with the synthetic data, the algorithms did rather well, but with the realworld data the results were not as good.
Satisfiability planning with constraints on the number of actions
 Proc. of the 15th International Conference on Automated Planning and Scheduling (ICAPS
, 2005
"... We investigate satisfiability planning with restrictions on the number of actions in a plan. Earlier work has considered encodings of sequential plans for which a plan with the minimal number of time steps also has the minimum number of actions, and parallel (partially ordered) plans in which the ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
(Show Context)
We investigate satisfiability planning with restrictions on the number of actions in a plan. Earlier work has considered encodings of sequential plans for which a plan with the minimal number of time steps also has the minimum number of actions, and parallel (partially ordered) plans in which the number of actions may be much higher than the number of time steps. For a given problem instance finding a parallel plan may be much faster than finding a corresponding sequential plan but there is also the possibility that the parallel plan contains unnecessary actions. Our work attempts to combine the advantages of parallel and sequential plans by efficiently finding parallel plans with as few actions as possible. We propose techniques for encoding parallel plans with constraints on the number of actions. Then we give algorithms for finding a plan that is optimal with respect to a given number of steps and an anytime algorithm for successively finding better and better plans. We show that as long as guaranteed optimality is not required, our encodings for parallel plans are often much more efficient in finding plans of good quality than a sequential encoding.
A New Conceptual Clustering Framework
 MACHINE LEARNING
, 2004
"... We propose a new formulation of the conceptual clustering problem where the goal is to explicitly output a collection of simple and meaningful conjunctions of attributes that define the clusters. The formulation differs from previous approaches since the clusters discovered may overlap and also may ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
We propose a new formulation of the conceptual clustering problem where the goal is to explicitly output a collection of simple and meaningful conjunctions of attributes that define the clusters. The formulation differs from previous approaches since the clusters discovered may overlap and also may not cover all the points. In addition, a point may be assigned to a cluster description even if it only satisfies most, and not necessarily all, of the attributes in the conjunction. Connections between this conceptual clustering problem and the maximum edge biclique problem are made. Simple, randomized algorithms are given that discover a collection of approximate conjunctive cluster descriptions in sublinear time.
Finding biclusters by random projections
 In Proc. 15th Annual Combinatorial Pattern Matching Symp. (CPM’04
, 2004
"... Abstract. Given a matrix X composed of symbols, a bicluster is a submatrix of X obtained by removing some of the rows and some of the columns of X in such a way that each row of what is left reads the same string. In this paper, we are concerned with the problem of finding the bicluster with the lar ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Given a matrix X composed of symbols, a bicluster is a submatrix of X obtained by removing some of the rows and some of the columns of X in such a way that each row of what is left reads the same string. In this paper, we are concerned with the problem of finding the bicluster with the largest area in a large matrix X. The problem is first proved to be NPcomplete. We present a fast and efficient randomized algorithm that discovers the largest bicluster by random projections. A detailed probabilistic analysis of the algorithm and an asymptotic study of the statistical significance of the solutions are given. We report results of extensive simulations on synthetic data. 1
On the Size and Recovery of Submatrices of Ones in a Random Binary Matrix
"... Binary matrices, and their associated submatrices of 1s, play a central role in the study of random bipartite graphs and in core data mining problems such as frequent itemset mining (FIM). Motivated by these connections, this paper addresses several statistical questions regarding submatrices of 1s ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
Binary matrices, and their associated submatrices of 1s, play a central role in the study of random bipartite graphs and in core data mining problems such as frequent itemset mining (FIM). Motivated by these connections, this paper addresses several statistical questions regarding submatrices of 1s in a random binary matrix with independent Bernoulli entries. We establish a threepoint concentration result, and a related probability bound, for the size of the largest square submatrix of 1s in a square Bernoulli matrix, and extend these results to nonsquare matrices and submatrices with fixed aspect ratios. We then consider the noise sensitivity of frequent itemset mining under a simple binary additive noise model, and show that, even at small noise levels, large blocks of 1s leave behind fragments of only logarithmic size. As a result, standard FIM algorithms, which search only for submatrices of 1s, cannot directly recover such blocks when noise is present. On the positive side, we show that an errortolerant frequent itemset criterion can recover a submatrix of 1s against a background of 0s plus noise, even when the size of the submatrix of 1s is very small. 1
Instant Recognition of Half Integrality and 2Approximations
 In Proceedings of the 3rd International Workshop on Approximation Algorithms for Combinatorial Optimization
, 1998
"... . We define a class of integer programs with constraints that involve up to three variables each. A generic constraint in such integer program is of the form ax + by z + c, where the variable z appears only in that constraint. For such binary integer programs it is possible to derive half integral ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
. We define a class of integer programs with constraints that involve up to three variables each. A generic constraint in such integer program is of the form ax + by z + c, where the variable z appears only in that constraint. For such binary integer programs it is possible to derive half integral superoptimal solutions in polynomial time. The scheme is also applicable with few modifications to nonbinary integer problems. For some of these problems it is possible to round the half integral solution to a 2approximate solution. This extends the class of integer programs with at most two variables per constraint that were analyzed in [HMNT93]. The approximation algorithms here provide an improvement in running time and range of applicability compared to existing 2approximations. Furthermore, we conclude that problems in the framework are MAX SNPhard and at least as hard to approximate as vertex cover. Problems that are amenable to the analysis provided here are easily recognized. The ...