#### DMCA

## Inferring Meaningful Communities from Topology-Constrained Correlation Networks (2014)

### Citations

5136 |
DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res
- SF, TL, et al.
- 1997
(Show Context)
Citation Context ...to the a-Amylase catalytic domain were gathered from the Homstrad database [46] and these seeded a Blast search restricted to the protein data bank. Here, the search is broaden by seeding a PSI-BLAST =-=[47]-=- search with a PFAM [48] seed alignment of a-Amylase structures (PFAM code PF00128). The PSI-BLAST search was restricted to structures available at the protein data bank (http://www.rcsb. org/pdb/). T... |

4202 | Regression shrinkage and selection via the lasso
- Tibshirani
- 1996
(Show Context)
Citation Context ... assessed. Other methods to re-construct graphs and assess their structure exist, particularly dealing with high-dimensional data. Methods such as sparse graphical models [13] and LASSO-type problems =-=[14]-=- can be applied in graph reconstruction, and sometimes in community structure detection [15]. However, most of these methods rely on the assumption of independence of the variables [14] (or at least t... |

1483 |
Finding and evaluating community structure in networks, Phys
- Newman, Girvan
- 2004
(Show Context)
Citation Context ...elves than to the rest of the graph. The most commonly used algorithm (and the one of focus in this paper) to detect communities in graphs is the modularity optimization proposed by Newman and Girvan =-=[8]-=-. In this algorithm, the modularity score Q is optimized to obtain a partition scheme. Intuitively, Q evaluates the excess of the number of edges inside a group against the expected connectivity of a ... |

630 |
The Pfam protein families database. Nucleic Acids Res 38:D211–D222.
- RD, Mistry, et al.
- 2010
(Show Context)
Citation Context ...ic domain were gathered from the Homstrad database [46] and these seeded a Blast search restricted to the protein data bank. Here, the search is broaden by seeding a PSI-BLAST [47] search with a PFAM =-=[48]-=- seed alignment of a-Amylase structures (PFAM code PF00128). The PSI-BLAST search was restricted to structures available at the protein data bank (http://www.rcsb. org/pdb/). There were in total 135 s... |

475 | On model selection consistency of lasso
- Zhao, Yu
- 2007
(Show Context)
Citation Context ...sometimes in community structure detection [15]. However, most of these methods rely on the assumption of independence of the variables [14] (or at least that the covariates are not highly correlated =-=[16]-=-), on the a priori determination of the number and size of the communities [15], and a full sparcity of the covariation among traits in the data. These kind of limitations makes these particular metho... |

269 |
Graph Theory, volume 173 of Graduate Texts in Mathematics
- Diestel
- 2005
(Show Context)
Citation Context ...ny such properties, but one of special interest here is the community structure which represents how the vertices are arranged in groups densely connected internally and sparsely connected externally =-=[6]-=-. Many networks have heterogeneous edge densities, which may imply a community structure. Communities are groups of nodes whose associations imply new insights in the understanding of a system [7]. A ... |

201 | Sparse graphical models for exploring gene expression data,
- Dobra, Hans, et al.
- 2004
(Show Context)
Citation Context ...stness of the solution can be assessed. Other methods to re-construct graphs and assess their structure exist, particularly dealing with high-dimensional data. Methods such as sparse graphical models =-=[13]-=- and LASSO-type problems [14] can be applied in graph reconstruction, and sometimes in community structure detection [15]. However, most of these methods rely on the assumption of independence of the ... |

130 |
2002.Modern applied statistics with
- Venables, Ripley
(Show Context)
Citation Context ...scriminants (LD) prefiltering of the modularity membership vector. Linear Discriminant Analysis (LDA). The LDA for the present paper was performed using the lda function available in the package MASS =-=[31]-=- in R [32]. Here the fit will be done between the correlation magnitude matrix (as performed in [1]), where each entry row/column corresponds to each variable, and each entry is the magnitude of the c... |

121 |
Structures and mechanisms of glycosyl hydrolases.
- Davies, Henrissat
- 1995
(Show Context)
Citation Context ...party affiliation. a-Amylase structures homologs. The a-Amylase-like family catalyzes the hydrolysis of a-(1,4) glycosidic bonds of polysaccharides, therefore being classified as glycoside hydrolases =-=[36]-=- in the family 13 [37]. It is a multi-reaction catalytic family since its members can catalyze different reactions (hydrolysis, transglycosylation, condensation and cyclization) [38]. All members of t... |

67 |
Statistical pattern recognition: A review
- AK, RPW, et al.
- 2000
(Show Context)
Citation Context ...or classes are optimally separated by maximizing the variance between groups while minimizing the intraclass variance. It has been commonly used as a preprocessing step in pattern recognition systems =-=[25]-=- and is commonly used in other sciences to explore the variate space to find shared properties of samples and variables [26]. It is based on a linear model where a given dependent variable can be expl... |

61 |
Modularity and community structure in networks.
- ME
- 2006
(Show Context)
Citation Context ...tex. Cv and Cw are communities to which the vectors v and w belong to, and the d is a binary function where d(Cv,Cw) is 1 if Cv~Cw and 0 otherwise. This approach has been applied to numerous problems =-=[7,9,10]-=-. Despite its wide use, exact algorithms for modularity optimization are computationally expensive. Some caveats also exist [7]: One example is the fact that high Q can be found in random graphs PLOS ... |

59 |
C (2004) Finding community structure in very large networks
- Clauset, NewmanMEJ
(Show Context)
Citation Context ...on developed by [8] is a membership vector. Here as in [1], the optimization is performed using a fast-greedy algorithm, which has been shown to be a good and fast heuristic for the optimization of Q =-=[30]-=-. After such a membership vector is obtained, the refinement proposed by [1] can be performed. However, some over-fragmentation may occur when a topology-constrained graph is used. To deal with this i... |

43 |
Barthélemy M (2007) Resolution limit in community detection
- Fortunato
(Show Context)
Citation Context ...e 11 | e113438 [11]. This issue might create either an over-fragmentation of the graph into smaller communities, or a failure to detect a small community which size is below a preset resolution limit =-=[12]-=-. Despite these caveats, modularity optimization (and in general community structure detection) is still an important tool in science if the confidence in the robustness of the solution can be assesse... |

30 |
X-ray structures along the reaction pathway of cyclodextrin glycosyltransferase elucidate catalysis in the -amylase family.
- Uitdehaag, Mosi, et al.
- 1999
(Show Context)
Citation Context ...oops that extend from these strands [39]. The catalytic site includes aspartate as a catalytic nucleophile, glutamate as an acid/base, and a second aspartate for stabilization of the transition state =-=[45]-=-. The catalytic triad plus an arginine residue are totally conserved in this family across all catalysis-active members [37]. In [1], the protein structures belonging to the a-Amylase catalytic domain... |

26 |
A (2005) Comparing community structure identification
- Danon, Díaz-Guilera, et al.
(Show Context)
Citation Context ...tex. Cv and Cw are communities to which the vectors v and w belong to, and the d is a binary function where d(Cv,Cw) is 1 if Cv~Cw and 0 otherwise. This approach has been applied to numerous problems =-=[7,9,10]-=-. Despite its wide use, exact algorithms for modularity optimization are computationally expensive. Some caveats also exist [7]: One example is the fact that high Q can be found in random graphs PLOS ... |

26 | N (2011) Graph-theoretical analysis reveals disrupted small-world organization of cortical thickness correlation networks in temporal lobe epilepsy. Cereb Cortex 21 - Bernhardt, Chen, et al. |

23 | The TIM-barrel fold: a versatile framework for efficient enzymes - RK |

13 |
Sales-Pardo M, Amaral LAN (2004) Modularity from fluctuations in random graphs and complex networks
- Guimerà
(Show Context)
Citation Context ...re computationally expensive. Some caveats also exist [7]: One example is the fact that high Q can be found in random graphs PLOS ONE | www.plosone.org 1 November 2014 | Volume 9 | Issue 11 | e113438 =-=[11]-=-. This issue might create either an over-fragmentation of the graph into smaller communities, or a failure to detect a small community which size is below a preset resolution limit [12]. Despite these... |

13 | Alm EJ (2012) Inferring correlation networks from genomic survey data. PLoS Comput Biol 8: e1002687. PubMed: 23028285 - Friedman |

12 |
R: A Language and Environment for Statistical Computing
- DCT
(Show Context)
Citation Context ...s (LD) prefiltering of the modularity membership vector. Linear Discriminant Analysis (LDA). The LDA for the present paper was performed using the lda function available in the package MASS [31] in R =-=[32]-=-. Here the fit will be done between the correlation magnitude matrix (as performed in [1]), where each entry row/column corresponds to each variable, and each entry is the magnitude of the correlation... |

7 | W.: Systematic construction of kinetic models from genome-scale metabolic networks. - Stanford, Lubitz, et al. - 2013 |

7 |
Cowen L (2008) Matt: local flexibility aids protein multiple structure alignment. PLoS Comput Biol 4: e10. Community Detection
- Menke, Berger
(Show Context)
Citation Context ...ide Hydrolase Family 13, GH13) was guaranteed (Available in File S1). Those 135 structures were aligned using the algorithm proposed by [49] that modifies the pairwise MATT flexible structure aligner =-=[50]-=- to complete the multiple structure alignment. After the alignment, the procedure explained in [1] was used, where the coordinates of the centroid of homologous residues are recorded in a data matrix.... |

6 |
Gur-Gershgoren G, Mantegna RN, et al. (2010) Dominating Clasp of the Financial Sector Revealed by Partial Correlation Analysis of the Stock Market. Plos One 5
- DY, Tumminello, et al.
(Show Context)
Citation Context ...d data. Let us consider the case of correlation networks, where the edges are defined as the correlation between two nodes. These networks are important in biological sciences [1,17–19] and economics =-=[20,21]-=- since they constitute an intermediate between topology and the dynamics of the system [22]. Analyzing the community structures of these networks can help identify clusters of co-expressed genes causi... |

6 |
Bornholdt S (2007) Partitioning and modularity of graphs with arbitrary degree distribution
- Reichardt
(Show Context)
Citation Context ...he modularity is inferred on the assumption that the community structure is dictated by correlation. It has also been shown that sparser graphs tend to cluster into more modules than predicted before =-=[24]-=-. Let’s define this effect as over-fragmentation. In some cases the sparsity caused by the constraint is not complete; that is, not the majority of entries in the adjacency matrix are zero. Given this... |

6 |
Thermostability enhancement and change in starch hydrolysis profile of the maltohexaose-forming amylase of Bacillus stearothermophilus US100 strain.
- M, Khemakhem, et al.
- 2006
(Show Context)
Citation Context ...coside hydrolases [36] in the family 13 [37]. It is a multi-reaction catalytic family since its members can catalyze different reactions (hydrolysis, transglycosylation, condensation and cyclization) =-=[38]-=-. All members of this family share a symmetrical TIM-barrel ((b=a)8) catalytic domain [39], including those without any catalytic activity [40]. This fold is highly versatile and widespread among the ... |

6 |
Protein engineering in the a-amylase family: catalytic mechanism, substrate specificity
- Svensson
- 1994
(Show Context)
Citation Context ...e its members can catalyze different reactions (hydrolysis, transglycosylation, condensation and cyclization) [38]. All members of this family share a symmetrical TIM-barrel ((b=a)8) catalytic domain =-=[39]-=-, including those without any catalytic activity [40]. This fold is highly versatile and widespread among the structurally characterized enzymes, being present in almost 10% of them [41–44]. There has... |

5 |
CM, Blundell TL, Overington JP
- Mizuguchi, Deane
- 1998
(Show Context)
Citation Context ...idue are totally conserved in this family across all catalysis-active members [37]. In [1], the protein structures belonging to the a-Amylase catalytic domain were gathered from the Homstrad database =-=[46]-=- and these seeded a Blast search restricted to the protein data bank. Here, the search is broaden by seeding a PSI-BLAST [47] search with a PFAM [48] seed alignment of a-Amylase structures (PFAM code ... |

4 |
Hutt M-T (2007) Consistency analysis of metabolic correlation networks
- Muller-Linow, Weckwerth
(Show Context)
Citation Context ...orrelation between two nodes. These networks are important in biological sciences [1,17–19] and economics [20,21] since they constitute an intermediate between topology and the dynamics of the system =-=[22]-=-. Analyzing the community structures of these networks can help identify clusters of co-expressed genes causing a disease, or groups of stocks that are co-varying in the market. It is important to kno... |

4 |
Taxonomies of networks from community structure
- JP, DJ, et al.
- 2012
(Show Context)
Citation Context ...hown). However, as shown in Table 2, LDA dramatically increases the performance when there is a topology constraint in the graph. Case studies Now some case studies that have been analyzed previously =-=[1,27]-=- will be considered. In this section it will shown how there are some real cases in which a topology-constrained correlation network community structure is over-fragmented. It is also shown how LDA ca... |

2 |
Defining structural and evolutionary modules in proteins: a community detection approach to explore sub-domain architecture. BMC Struct Biol. 2013 Oct 16;13:20. doi
- JS, Susko, et al.
- 2013
(Show Context)
Citation Context ... * Email: jshleap@dal.ca Introduction Many problems in science can be abstracted as networks. For example, in biological sciences, protein structures can be abstracted as graphs of connected residues =-=[1]-=-, metabolic networks can be created by connecting enzymes by their interactions in a given pathway [2], or food webs can be created by joining species with their trophic interactions [3]. Networks are... |

2 |
Tasselli S (2013) Social network analysis: Foundations and frontiers on advantage. Annual Rev
- RS, Kilduff
(Show Context)
Citation Context ...s by their interactions in a given pathway [2], or food webs can be created by joining species with their trophic interactions [3]. Networks are common models for the Internet [4] and social networks =-=[5]-=-. Any kind of data that can be summarized into vertices (nodes) and connections (edges), can be abstracted as a graph. An special case of graphs can be constructed when one is interested in the correl... |

2 |
Castellano C (2012) Community structure in graphs
- Fortunato
(Show Context)
Citation Context ...lly [6]. Many networks have heterogeneous edge densities, which may imply a community structure. Communities are groups of nodes whose associations imply new insights in the understanding of a system =-=[7]-=-. A community can be loosely defined as groups of nodes that share more among themselves than to the rest of the graph. The most commonly used algorithm (and the one of focus in this paper) to detect ... |

2 |
SM (2011) Network clustering: probing biological heterogeneity by sparse graphical models
- Mukherjee, Hill
(Show Context)
Citation Context ...ly dealing with high-dimensional data. Methods such as sparse graphical models [13] and LASSO-type problems [14] can be applied in graph reconstruction, and sometimes in community structure detection =-=[15]-=-. However, most of these methods rely on the assumption of independence of the variables [14] (or at least that the covariates are not highly correlated [16]), on the a priori determination of the num... |

2 | Dejaegere A, et al. (2010) Dynamic correlation networks in human peroxisome proliferator-activated receptorgamma nuclear receptor protein - Fidelak, Ferrer, et al. |

2 |
JH (2010) Legislative success in a small world: Social network analysis and the dynamics of congressional legislation
- WK, Fowler
(Show Context)
Citation Context ...110th Senate. A great effort has been placed into analyzing the political partisanship in the US congress, particularly on how polarized Legislatures can influence the voting on non-particular issues =-=[28]-=-. In the 110th Legislature of the United States, in the second government of G.W. Bush, the polarization was evident. It has been suggested that in highly polarized Legislatures the representatives te... |

2 |
Jp (2009) Group Lasso with Overlap and Graph Lasso
- Jacob, Obozinski, et al.
(Show Context)
Citation Context ...ovember 2014 | Volume 9 | Issue 11 | e113438 clusters that were found to be meaningful in the first place. It can be argued that other methods, such as sparse graphical models and LASSO-based methods =-=[15,29]-=-, exist to cope with the over-fragmentation in sparser graphs. However, correlation graphs normally do not fulfill the assumptions of such methods like independence of the variables, a priori knowledg... |

2 |
Wilmanns M, Sterner R (2001) Stability, catalytic versatility and evolution of the (beta alpha)(8)-barrel fold. Curr Opin Biotechnol 12: 376–381
- Hocker, Jurgens
(Show Context)
Citation Context ...evolution that this fold has been through: convergent, divergent or a mixture of both mechanisms [41]. However, there is some evidence suggesting the divergent evolution hypothesis is the most likely =-=[42]-=-. The catalytic activity and substrate binding residues occurs at the C-termini of b-strands and in loops that extend from these strands [39]. The catalytic site includes aspartate as a catalytic nucl... |

1 |
Cortés E, Mejı́a-Falla PA (2010) Topological analysis of the ecological importance of elasmobranch fishes: A food web study on the gulf of tortugas, colombia. Ecological modelling 221
- AF
(Show Context)
Citation Context ...cted residues [1], metabolic networks can be created by connecting enzymes by their interactions in a given pathway [2], or food webs can be created by joining species with their trophic interactions =-=[3]-=-. Networks are common models for the Internet [4] and social networks [5]. Any kind of data that can be summarized into vertices (nodes) and connections (edges), can be abstracted as a graph. An speci... |

1 |
EJ (2000) The networks of the internet: an analysis of provider networks
- SP, Malecki
(Show Context)
Citation Context ...ted by connecting enzymes by their interactions in a given pathway [2], or food webs can be created by joining species with their trophic interactions [3]. Networks are common models for the Internet =-=[4]-=- and social networks [5]. Any kind of data that can be summarized into vertices (nodes) and connections (edges), can be abstracted as a graph. An special case of graphs can be constructed when one is ... |

1 |
Kocakaplan Y (2011) Topology of the correlation networks among major currencies using hierarchical structure methods. Physica A: Statistical Mechanics and its Applications 390: 719–730
- Keskin, Deviren
(Show Context)
Citation Context ...d data. Let us consider the case of correlation networks, where the edges are defined as the correlation between two nodes. These networks are important in biological sciences [1,17–19] and economics =-=[20,21]-=- since they constitute an intermediate between topology and the dynamics of the system [22]. Analyzing the community structures of these networks can help identify clusters of co-expressed genes causi... |

1 |
Bornholdt S (2009) Innovation Networks: New Approaches in Modelling and Analyzing, Springer, chapter Tools from Statistical Physics for the Analysis of Social Networks
- Reichardt
(Show Context)
Citation Context ...ident vertices are connected by another meaningful property. The extra constraint in topology will create a sparser graph. Sparser graphs show an intrinsic level of modularity due to their topologies =-=[23]-=-. This is a problem if the modularity is inferred on the assumption that the community structure is dictated by correlation. It has also been shown that sparser graphs tend to cluster into more module... |

1 |
Linear statistical inference and its applications, volume 22
- CR
- 2009
(Show Context)
Citation Context ...s been commonly used as a preprocessing step in pattern recognition systems [25] and is commonly used in other sciences to explore the variate space to find shared properties of samples and variables =-=[26]-=-. It is based on a linear model where a given dependent variable can be explained by a linear combination of factors given by the independent variables. Such factors can be a clustering scheme itself.... |

1 |
ED (2013) ellipse: Functions for drawing ellipses and ellipselike confidence regions. URL http://CRAN.R-project.org/package=ellipse. R package version
- Murdoch, Chow
(Show Context)
Citation Context ...and the actual value represents the weight of that edge. Collision test and membership refinement. After the first two LD are obtained, a 95% confidence ellipse is computed. Here, the package ellipse =-=[33]-=- implemented in R [32] is used to compute the ellipses. After the ellipse have been estimated, a collision test is made. A point will be inside or at the edge of any given ellipse if the following ine... |

1 |
Pansu P, Berry JP, Saint-Raymond X
- Berger
- 1984
(Show Context)
Citation Context ...nted in R [32] is used to compute the ellipses. After the ellipse have been estimated, a collision test is made. A point will be inside or at the edge of any given ellipse if the following inequality =-=[34]-=- is satisfied: (x{h)2 r2x z (y{k)2 r2y v~1 ð3Þ where x and y are the coordinates of a given point, h and k are the coordinates of the center of the ellipse, and rx and ry are the semiminor and semi-ma... |

1 |
data available at http://voteview.com
- KT
- 2013
(Show Context)
Citation Context ...of edges in a topology-constrained network might be dealt with. Case studies datasets Voting in the United States 110th Senate. The Roll-Call voting of 110th United States Senate (available online at =-=[35]-=- or in Supporting Information File S1) was used to construct the network. First a data matrix is created where each row represents each senator and each column represents a vote for a given motion or ... |

1 |
S (2013) Glycoside hydrolase family 13. available at URL http://www.cazypedia.org
- Svensson, Janecek
(Show Context)
Citation Context ...mylase structures homologs. The a-Amylase-like family catalyzes the hydrolysis of a-(1,4) glycosidic bonds of polysaccharides, therefore being classified as glycoside hydrolases [36] in the family 13 =-=[37]-=-. It is a multi-reaction catalytic family since its members can catalyze different reactions (hydrolysis, transglycosylation, condensation and cyclization) [38]. All members of this family share a sym... |

1 |
Ferrer-Costa C, Turnay J, et al. (2007) The structure of human 4f2hc ectodomain provides a model for homodimerization and electrostatic interaction with plasma membrane
- Fort, Laura, et al.
(Show Context)
Citation Context ...lysis, transglycosylation, condensation and cyclization) [38]. All members of this family share a symmetrical TIM-barrel ((b=a)8) catalytic domain [39], including those without any catalytic activity =-=[40]-=-. This fold is highly versatile and widespread among the structurally characterized enzymes, being present in almost 10% of them [41–44]. There has been a debate about the type of evolution that this ... |

1 |
An a/b-barrel full of evolutionary trouble. Current opinion in structural biology 3
- GK
- 1993
(Show Context)
Citation Context ...erized enzymes, being present in almost 10% of them [41–44]. There has been a debate about the type of evolution that this fold has been through: convergent, divergent or a mixture of both mechanisms =-=[41]-=-. However, there is some evidence suggesting the divergent evolution hypothesis is the most likely [42]. The catalytic activity and substrate binding residues occurs at the C-termini of b-strands and ... |

1 | Raushel FM (2003) Evolution of function in (b/a)8-barrel enzymes. Current opinion in chemical biology 7 - JA |