Results 1  10
of
146
Protein complex prediction via costbased clustering. (Supplementary information) Bioinformatics http://www.cs.utoronto.ca/∼juris/data/pp104/ King,A.D. (2004) Graph clustering with restricted neighbourhood search
, 2004
"... Motivation: Understanding principles of cellular organization and function can be enhanced if we detect known and predict still undiscovered protein complexes within the cell’s protein–protein interaction (PPI) network. Such predictions may be used as an inexpensive tool to direct biological experi ..."
Abstract

Cited by 78 (1 self)
 Add to MetaCart
(Show Context)
Motivation: Understanding principles of cellular organization and function can be enhanced if we detect known and predict still undiscovered protein complexes within the cell’s protein–protein interaction (PPI) network. Such predictions may be used as an inexpensive tool to direct biological experiments. The increasing amount of available PPI data necessitates an accurate and scalable approach to protein complex identification. Results: We have developed the Restricted Neighborhood Search Clustering Algorithm (RNSC) to efficiently partition networks into clusters using a cost function. We applied this costbased clustering algorithm to PPI networks of Saccharomyces cerevisiae, Drosophila melanogaster and Caenorhabditis elegans to identify and predict protein complexes. We have determined functional and graphtheoretic properties of true protein complexes from the MIPS database. Based on these properties, we defined filters to distinguish between identified network clusters and true protein complexes. Conclusions: Our application of the costbased clustering algorithm provides an accurate and scalable method of detecting and predicting protein complexes within a PPI network. Availability: The RNSC algorithm and data processing code are available upon request from the authors. Contact:
Predicting protein complex membership using probabilistic network reliability
 Genome Res
, 2004
"... data ..."
(Show Context)
Iterative cluster analysis of protein interaction data
 Bioinformatics
, 2005
"... Motivation: Generation of fast tools of hierarchical clustering to be applied when distances among elements of a set are constrained, causing frequent distance ties, as happens in protein interaction data. Results: We present in this work the program UVCLUSTER, that iteratively explores distance da ..."
Abstract

Cited by 62 (1 self)
 Add to MetaCart
(Show Context)
Motivation: Generation of fast tools of hierarchical clustering to be applied when distances among elements of a set are constrained, causing frequent distance ties, as happens in protein interaction data. Results: We present in this work the program UVCLUSTER, that iteratively explores distance datasets using hierarchical clustering. Once the user selects a group of proteins, UVCLUSTER converts the set of primary distances among them (i.e. the minimum number of steps, or interactions, required to connect two proteins) into secondary distances that measure the strength of the connection between each pair of proteins when the interactions for all the proteins in the group are considered. We show that this novel strategy has advantages over conventional clustering methods to explore protein–protein interaction data. UVCLUSTER easily incorporates the information of the largest available interaction datasets to generate comprehensive primary distance tables. The versatility, simplicity of use and high speed of UVCLUSTER on standard personal computers suggest that it can be a benchmark analytical tool for interactome data analysis.
On mining crossgraph quasicliques
 In KDD
, 2005
"... Joint mining of multiple data sets can often discover interesting, novel, and reliable patterns which cannot be obtained solely from any single source. For example, in crossmarket customer segmentation, a group of customers who behave similarly in multiple markets should be considered as a more coh ..."
Abstract

Cited by 60 (5 self)
 Add to MetaCart
(Show Context)
Joint mining of multiple data sets can often discover interesting, novel, and reliable patterns which cannot be obtained solely from any single source. For example, in crossmarket customer segmentation, a group of customers who behave similarly in multiple markets should be considered as a more coherent and more reliable cluster than clusters found in a single market. As another example, in bioinformatics, by joint mining of gene expression data and protein interaction data, we can find clusters of genes which show coherent expression patterns and also produce interacting proteins. Such clusters may be potential pathways. In this paper, we investigate a novel data mining problem, mining crossgraph quasicliques, which is generalized from several interesting applications such as crossmarket customer segmentation and joint mining of gene expression data and protein interaction data. We build a general model for mining crossgraph quasicliques, show why the complete set of crossgraph quasicliques cannot be found by previous data mining methods, and study the complexity of the problem. While the problem is difficult, we develop an efficient algorithm, Crochet, which exploits several interesting and effective techniques and heuristics to efficaciously mine crossgraph quasicliques. A systematic performance study is reported on both synthetic and real data sets. We demonstrate some interesting and meaningful crossgraph quasicliques in bioinformatics. The experimental results also show that algorithm Crochet is efficient and scalable.
Conserved network motifs allow protein–protein interaction prediction
, 2004
"... Motivation: Highthroughput protein interaction detection methods are strongly affected by false positive and false negative results. Focused experiments are needed to complement the largescale methods by validating previously detected interactions but it is often difficult to decide which protei ..."
Abstract

Cited by 55 (2 self)
 Add to MetaCart
Motivation: Highthroughput protein interaction detection methods are strongly affected by false positive and false negative results. Focused experiments are needed to complement the largescale methods by validating previously detected interactions but it is often difficult to decide which proteins to probe as interaction partners. Developing reliable computational methods assisting this decision process is a pressing need in bioinformatics. Results: We show that we can use the conserved properties of the protein network to identify and validate interaction candidates. We apply a number of machine learning algorithms to the protein connectivity information and achieve a surprisingly good overall performance in predicting interacting proteins. Using a ‘leaveoneout ’ approach we find average success rates between 20 and 40 % for predicting the correct interaction partner of a protein. We demonstrate that the success of these methods is based on the presence of conserved interaction motifs within the network. Availability: A reference implementation and a table with candidate interacting partners for each yeast protein are available
Graph theory and networks in biology
 IET Systems Biology, 1:89 – 119
, 2007
"... In this paper, we present a survey of the use of graph theoretical techniques in Biology. In particular, we discuss recent work on identifying and modelling the structure of biomolecular networks, as well as the application of centrality measures to interaction networks and research on the hierarch ..."
Abstract

Cited by 44 (0 self)
 Add to MetaCart
(Show Context)
In this paper, we present a survey of the use of graph theoretical techniques in Biology. In particular, we discuss recent work on identifying and modelling the structure of biomolecular networks, as well as the application of centrality measures to interaction networks and research on the hierarchical structure of such networks and network motifs. Work on the link between structural network properties and dynamics is also described, with emphasis on synchronization and disease propagation. 1
Network Properties Revealed Through Matrix Functions
, 2008
"... The newly emerging field of Network Science deals with the tasks of modelling, comparing and summarizing large data sets that describe complex interactions. Because pairwise affinity data can be stored in a twodimensional array, graph theory and applied linear algebra provide extremely useful tools. ..."
Abstract

Cited by 30 (3 self)
 Add to MetaCart
The newly emerging field of Network Science deals with the tasks of modelling, comparing and summarizing large data sets that describe complex interactions. Because pairwise affinity data can be stored in a twodimensional array, graph theory and applied linear algebra provide extremely useful tools. Here, we focus on the general concepts of centrality, communicability and betweenness, each of which quantifies important features in a network. Some recent work in the mathematical physics literature has shown that the exponential of a network’s adjacency matrix can be used as the basis for defining and computing specific versions of these measures. We introduce here a general class of measures based on matrix functions, and show that a particular case involving a matrix resolvent arises naturally from graphtheoretic arguments. We also point out connections between these measures and the quantities typically computed when spectral methods are used for data mining tasks such as clustering and ordering. We finish with computational examples showing the new matrix resolvent version applied to real networks.
Unraveling protein networks with power graph analysis
 PLoS Computational Biology
"... Networks play a crucial role in computational biology, yet their analysis and representation is still an open problem. Power Graph Analysis is a lossless transformation of biological networks into a compact, less redundant representation, exploiting the abundance of cliques and bicliques as elementa ..."
Abstract

Cited by 29 (4 self)
 Add to MetaCart
(Show Context)
Networks play a crucial role in computational biology, yet their analysis and representation is still an open problem. Power Graph Analysis is a lossless transformation of biological networks into a compact, less redundant representation, exploiting the abundance of cliques and bicliques as elementary topological motifs. We demonstrate with five examples the advantages of Power Graph Analysis. Investigating proteinprotein interaction networks, we show how the catalytic subunits of the casein kinase II complex are distinguishable from the regulatory subunits, how interaction profiles and sequence phylogeny of SH3 domains correlate, and how false positive interactions among highthroughput interactions are spotted. Additionally, we demonstrate the generality of Power Graph Analysis by applying it to two other types of networks. We show how power graphs induce a clustering of both transcription factors and target genes in bipartite transcription networks, and how the erosion of a phosphatase domain in type 22 nonreceptor tyrosine phosphatases is detected. We apply Power Graph Analysis to highthroughput protein interaction networks and show that up to 85 % (56 % on average) of the information is redundant. Experimental networks are more compressible than rewired ones of same degree distribution, indicating that experimental networks are rich in cliques and bicliques. Power Graphs are a novel representation of networks, which reduces network complexity by explicitly representing reoccurring network motifs. Power Graphs compress up to 85 % of the edges in protein interaction networks and are applicable to all types of networks such as protein
Revealing Biological Modules via Graph Summarization
, 2008
"... The division of a protein interaction network into biologically meaningful modules can aid with automated complex detection and prediction of biological processes and can uncover the global organization of the cell. We propose a novel graph summarization (GS) technique, based on graph compression, t ..."
Abstract

Cited by 27 (9 self)
 Add to MetaCart
(Show Context)
The division of a protein interaction network into biologically meaningful modules can aid with automated complex detection and prediction of biological processes and can uncover the global organization of the cell. We propose a novel graph summarization (GS) technique, based on graph compression, to cluster protein interaction graphs into biologically relevant modules. The method is motivated by defining a biological module as a set of proteins that have similar sets of interaction partners. We show this definition, put into practice by a GS algorithm, reveals modules that are more biologically enriched than those found by other methods. We also apply GS to predict complex memberships, biological processes, and cocomplexed pairs and show that in most settings GS is preferable over existing methods of protein interaction graph clustering. 1.
Maximal Biclique Subgraphs and Closed Pattern Pairs of the Adjacency Matrix: A Onetoone Correspondence and Mining Algorithms
, 2007
"... Maximal biclique (also known as complete bipartite) subgraphs can model many applications in web mining, business, and bioinformatics. Enumerating maximal biclique subgraphs from a graph is a computationally challenging problem, as the size of the output can become exponentially large with respect ..."
Abstract

Cited by 25 (8 self)
 Add to MetaCart
Maximal biclique (also known as complete bipartite) subgraphs can model many applications in web mining, business, and bioinformatics. Enumerating maximal biclique subgraphs from a graph is a computationally challenging problem, as the size of the output can become exponentially large with respect to the vertex number when the graph grows. In this paper, we efficiently enumerate them through the use of closed patterns of the adjacency matrix of the graph. For an undirected graph G without selfloops, we prove that: (i) the number of closed patterns in the adjacency matrix of G is even; (ii) the number of the closed patterns is precisely double the number of maximal biclique subgraphs of G; and (iii) for every maximal biclique subgraph, there always exists a unique pair of closed patterns that matches the two vertex sets of the subgraph. Therefore, the problem of enumerating maximal bicliques can be solved by using efficient algorithms for mining closed patterns, which are algorithms extensively studied in the data mining field. However, this direct use of existing algorithms causes a duplicated enumeration. To achieve high efficiency, we propose an O(mn) time delay algorithm for a nonduplicated enumeration, in particular for enumerating those maximal bicliques with a large size, where m and n are the number of edges and vertices of the graph respectively. We evaluate the high efficiency of our algorithm by comparing it to stateoftheart algorithms on three categories of graphs: randomly generated graphs, benchmarks, and a reallife protein interaction network. In this paper, we also prove that if selfloops are allowed in a graph, then the number of closed patterns in the adjacency matrix is not necessarily even; but the maximal bicliques are exactly the same as those of the graph after removing all the selfloops.