#### DMCA

## Correlated Protein Function Prediction via Maximization of Data-Knowledge Consistency

Citations: | 1 - 0 self |

### Citations

2142 |
On information and sufficiency
- Kullback, Leibler
- 1951
(Show Context)
Citation Context ... trimer histogram of a sequence hence can be used to characterize a protein xi, which is denoted as Pi. Because histogram indeed is a probability distribution, we use Kullback-Leibler (KL) divergence =-=[35]-=-, a standard way to assess the difference between two probability distributions, to measure the distance between two proteins, which is defined as: DKL (Pi ‖ Pj) = ∑ k Pi (k) log Pi (k) Pj (k) , (10) ... |

411 | BioGRID: a general repository for interaction datasets
- Stark, Breitkreutz, et al.
- 2006
(Show Context)
Citation Context ...predictive performance using the integrated data from both PPI networks and protein sequences. Data Preparation.We download PPI data for Saccharomyces cerevisiae species from BioGRID (version 2.0.56) =-=[36]-=-. By removing the proteins connected by only one PPI, we end up with 4403 annotated proteins with 86167 PPIs. We represent the protein interaction network as a graph, with vertices corresponding to th... |

182 |
Protein interactions: two methods for assessment of the reliability The Scientific World
- Deane, Salwinski, et al.
- 2002
(Show Context)
Citation Context ...ogy is simply ignored. Second, PPI data suffer from high noise due to the nature of high-throughput technologies, e.g., false positive rate in yeast two-hybrid experiments is estimated as high as 50% =-=[37]-=-. Therefore, we use the Topological Measurement (TM) method [38] to compute the data similarity matrix WPPI, which takes into consideration paths with all possible lengths on a network and weights the... |

157 |
Network-based prediction of protein function
- Sharan
- 2007
(Show Context)
Citation Context ...and 2014 312 H. Wang, H. Huang, and C. Ding Since many biological experimental data can be readily represented as networks, graph-based approaches are the most natural way to predict protein function =-=[1]-=-. Neighborhood-based methods [2–5] assign functions to a protein based on the most frequent functions within a neighborhood of the protein, and they mainly differ in how the “neighborhood” of a protei... |

130 |
Global protein function prediction from protein-protein interaction networks
- Vázquez, Flammini, et al.
- 2003
(Show Context)
Citation Context ...rk, on which protein functions are diffused from annotated proteins to their neighbors in various ways. Other function prediction approaches via biological networks include graph cut based approaches =-=[8, 9]-=-, and those derived from kernel methods [10]. More recently, the authors developed a graph-based protein function prediction method [11] using PPI graph to take advantage of the function-function corr... |

128 |
Kernelbased data fusion and its application to protein function prediction in yeast
- Lanckriet, Deng, et al.
- 2004
(Show Context)
Citation Context .... Experimental data from one single source often incomplete and sometimes even misleading [12], therefore predicting protein function using multiple biological data has attracted increased attention. =-=[13]-=- proposed a kernel-based data fusion approach to integrate multiple experimental data via a hybrid kernel and use support vector machine (SVM) for classification. [14] presented a locally constrained ... |

108 | M.: Convex and semi-nonnegative matrix factorizations - Ding, Li, et al. - 2010 |

100 |
MIPS: a database for genomes and protein sequences. Nucleic Acids Res
- Mewes, Frishman, et al.
- 2002
(Show Context)
Citation Context ...ortunity to improve classification accuracy through label correlations, which are absent in single-label data. For example, when applying Functional Catalogue (FunCat) annotation scheme (version 2.1) =-=[23]-=- on yeast genome, we observe that there is a big overlap between the proteins annotated to function “Cell Fate” (ID: 40) and those annotated to “Cell Type Differentiation” (ID: 43). As shown in the le... |

95 | Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps
- Nabieva, Jim, et al.
- 2005
(Show Context)
Citation Context ...tions to a protein based on the most frequent functions within a neighborhood of the protein, and they mainly differ in how the “neighborhood” of a protein is defined. Network diffusion based methods =-=[6, 7]-=- view the interaction network as a flow network, on which protein functions are diffused from annotated proteins to their neighbors in various ways. Other function prediction approaches via biological... |

86 |
A network of protein-protein interactions
- Schwikowski, Uetz, et al.
- 2000
(Show Context)
Citation Context ...ctive accuracy of our approach against functional similarity weight (FS) approach [4] and kernel-based data fusion (KDF) approach [13]. We also report the performance of majority voting (MV) approach =-=[2]-=- as a baseline. We employ broadly used average precision and average F1 score [4] as performance metrics. Adaptive Decision Boundary for Prediction. To predict specific putative functions for unannota... |

85 | Graph regularized nonnegative matrix factorization for data representation - Cai, He, et al. |

80 |
Orthogonal nonnegative matrix t-factorizations for clustering”,
- Ding, Li, et al.
- 2006
(Show Context)
Citation Context ...the factor matrix F , which inevitably complicates the problem (More detailed analyses can be found in our earlier works [31, 32]). Traditional solutions to symmetric NMF typically rely on heuristics =-=[27, 33]-=-, thus we introduce Algorithm 1 to solve Eq. (7) in a principled way. Due to space limit, the proofs of its correctness and convergence will be provided in the extended journal version. Algorithm 1. A... |

76 |
Whole-genome annotation by using evidence integration in functional-linkage networks
- Karaoz, Murali, et al.
(Show Context)
Citation Context ...rk, on which protein functions are diffused from annotated proteins to their neighbors in various ways. Other function prediction approaches via biological networks include graph cut based approaches =-=[8, 9]-=-, and those derived from kernel methods [10]. More recently, the authors developed a graph-based protein function prediction method [11] using PPI graph to take advantage of the function-function corr... |

74 |
Assessment of prediction accuracy of protein function from protein–protein interaction data
- Hishigaki, Nakai, et al.
- 2001
(Show Context)
Citation Context ...ctions. Therefore, in order to capture correlations among different functions, instead of the simple dot product, we compute the knowledge similarity as following: SK (fi, fj) = fTi C−1fj = fTi Afj , =-=(3)-=- where, for notation simplicity, we denote A = C−1 in the sequel. Note that, compared to the dot product similarity defined by fTi fj based on the Euclidean distance, the knowledge similarity computed... |

72 |
Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions
- Chua
- 2006
(Show Context)
Citation Context ...ic and genetic information, we first evaluate the proposed MDKC approach using protein sequences. We compare the predictive accuracy of our approach against functional similarity weight (FS) approach =-=[4]-=- and kernel-based data fusion (KDF) approach [13]. We also report the performance of majority voting (MV) approach [2] as a baseline. We employ broadly used average precision and average F1 score [4] ... |

66 |
Prediction of protein function from protein sequence and structure
- Whisstock, Lesk
- 2003
(Show Context)
Citation Context ...ein function prediction as a multi-label classification problem, which takes the same perspective as this work. Experimental data from one single source often incomplete and sometimes even misleading =-=[12]-=-, therefore predicting protein function using multiple biological data has attracted increased attention. [13] proposed a kernel-based data fusion approach to integrate multiple experimental data via ... |

55 |
W.S.: Learning kernels from biological networks by maximizing entropy
- Tsuda, Noble
- 2004
(Show Context)
Citation Context ...s attracted increased attention. [13] proposed a kernel-based data fusion approach to integrate multiple experimental data via a hybrid kernel and use support vector machine (SVM) for classification. =-=[14]-=- presented a locally constrained diffusion kernel approach to combine multiple types of biological networks. Artificial neural network is employed in [15] for the integration of different protein inte... |

52 |
Protein ranking: from local to global structure in protein similarity network
- Weston, Elisseeff, et al.
(Show Context)
Citation Context ...tions to a protein based on the most frequent functions within a neighborhood of the protein, and they mainly differ in how the “neighborhood” of a protein is defined. Network diffusion based methods =-=[6, 7]-=- view the interaction network as a flow network, on which protein functions are diffused from annotated proteins to their neighbors in various ways. Other function prediction approaches via biological... |

48 |
Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization
- Li, Ding, et al.
- 2007
(Show Context)
Citation Context ...the factor matrix F , which inevitably complicates the problem (More detailed analyses can be found in our earlier works [31, 32]). Traditional solutions to symmetric NMF typically rely on heuristics =-=[27, 33]-=-, thus we introduce Algorithm 1 to solve Eq. (7) in a principled way. Due to space limit, the proofs of its correctness and convergence will be provided in the extended journal version. Algorithm 1. A... |

44 | Non-negative matrix factorization on manifold - Cai, He, et al. - 2008 |

30 | Co-clustering on manifolds - Gu, Zhou - 2009 |

24 | Image annotation using multi-label correlated green’s function
- Wang, Huang, et al.
- 2009
(Show Context)
Citation Context ...e and negative samples for the k-th class, and e+(bk) and e−(bk) as the numbers of misclassified positive and negative training samples. The adaptive (optimal) decision boundary is given as following =-=[19, 11]-=-: boptk = argmin bk [ e+(bk) |S+| + e−(bk) |S−| ] . (8) 320 H. Wang, H. Huang, and C. Ding And the decision rule to assign a function to protein xi is given by: { xi is annotated with the k-th functio... |

19 | Predicting protein-protein interactions from protein domains using a set cover approach
- Huang, Morcos, et al.
- 2007
(Show Context)
Citation Context ... because the latter involves a fourth-order term due to the symmetric usage of the factor matrix F , which inevitably complicates the problem (More detailed analyses can be found in our earlier works =-=[31, 32]-=-). Traditional solutions to symmetric NMF typically rely on heuristics [27, 33], thus we introduce Algorithm 1 to solve Eq. (7) in a principled way. Due to space limit, the proofs of its correctness a... |

16 |
Using indirect protein interactions for the prediction of Gene Ontology functions
- Chua, Sung, et al.
- 2007
(Show Context)
Citation Context ...e can formalize the data-knowledge consistency assumption in Eq. (2) by the following optimization problem: argmin F n∑ i,j=1 (Wij − K∑ k,l=1 FikAklFjl) 2, (4) s.t. Fik = Yik, ∀ 1 ≤ i ≤ l, 1 ≤ k ≤ K. =-=(5)-=- In standard classification problems in machine learning, Fik (1 ≤ i ≤ l) are fixed for labeled data points. Specifically, a big Fik indicates that data point xi belongs to the k-th class, while a sma... |

16 | Multi-label Linear Discriminant Analysis - Wang, Ding, et al. - 2010 |

16 | C.: Image annotation using bi-relational graph of images and semantic labels - Wang, Huang, et al. - 2011 |

16 | A.: A topological measurement for weighted protein interaction network
- Pei, Zhang
- 2005
(Show Context)
Citation Context ...ue to the nature of high-throughput technologies, e.g., false positive rate in yeast two-hybrid experiments is estimated as high as 50% [37]. Therefore, we use the Topological Measurement (TM) method =-=[38]-=- to compute the data similarity matrix WPPI, which takes into consideration paths with all possible lengths on a network and weights the influence of every path by its length. Specifically, (WPPI)ij b... |

10 | Function-function correlated multi-label protein function prediction over interaction networks
- Wang, Huang, et al.
- 2013
(Show Context)
Citation Context ...es via biological networks include graph cut based approaches [8, 9], and those derived from kernel methods [10]. More recently, the authors developed a graph-based protein function prediction method =-=[11]-=- using PPI graph to take advantage of the function-function correlations by considering protein function prediction as a multi-label classification problem, which takes the same perspective as this wo... |

9 | Multi-label feature transform for image classifications
- Wang, Huang, et al.
- 2010
(Show Context)
Citation Context ...ated by the observation that label indications in a multi-label classification task (i.e., protein function annotations in protein function prediction problems) convey important attribute information =-=[21]-=-, we use the function annotations of a protein as its description, and assess pairwise protein similarities upon such descriptions. The key assumption of our work is that two proteins are likely to ha... |

7 | Fast nonnegative matrix tri-factorization for large-scale data co-clustering - Wang, Nie, et al. - 2011 |

7 | 2011e. Dyadic transfer learning for cross-domain image classification - Wang, Nie, et al. |

6 |
J: Adaptive diffusion kernel learning from biological networks for protein function prediction
- Sun, Ji, et al.
(Show Context)
Citation Context ...is employed in [15] for the integration of different protein interaction data. Most existing computational approaches usually consider protein function prediction as a standard classification problem =-=[13, 16, 17]-=-. Typically, these approaches make prediction one function at a time, fundamentally, i.e., the classification for each functional category is conducted independently. However, in reality most biologic... |

3 |
O.: Graph sharpening plus graph integration: a synergy that improves protein functional classification
- Shin, Lisewski, et al.
- 2007
(Show Context)
Citation Context ...is employed in [15] for the integration of different protein interaction data. Most existing computational approaches usually consider protein function prediction as a standard classification problem =-=[13, 16, 17]-=-. Typically, these approaches make prediction one function at a time, fundamentally, i.e., the classification for each functional category is conducted independently. However, in reality most biologic... |

3 | Nonnegative matrix tri-factorization based high-order co-clustering and its fast implementation
- WANG, NIE, et al.
- 2011
(Show Context)
Citation Context ... because the latter involves a fourth-order term due to the symmetric usage of the factor matrix F , which inevitably complicates the problem (More detailed analyses can be found in our earlier works =-=[31, 32]-=-). Traditional solutions to symmetric NMF typically rely on heuristics [27, 33], thus we introduce Algorithm 1 to solve Eq. (7) in a principled way. Due to space limit, the proofs of its correctness a... |

1 |
Y.: Adaptive diffusion kernel learning from biological networks for protein function prediction
- Liang, Shuiwang, et al.
- 2008
(Show Context)
Citation Context ...om annotated proteins to their neighbors in various ways. Other function prediction approaches via biological networks include graph cut based approaches [8, 9], and those derived from kernel methods =-=[10]-=-. More recently, the authors developed a graph-based protein function prediction method [11] using PPI graph to take advantage of the function-function correlations by considering protein function pre... |

1 |
A.: ANN Based Protein Function Prediction Using Integrated Protein-Protein Interaction Data
- Shi, Cho, et al.
- 2009
(Show Context)
Citation Context ...port vector machine (SVM) for classification. [14] presented a locally constrained diffusion kernel approach to combine multiple types of biological networks. Artificial neural network is employed in =-=[15]-=- for the integration of different protein interaction data. Most existing computational approaches usually consider protein function prediction as a standard classification problem [13, 16, 17]. Typic... |

1 | C.: Protein function prediction via laplacian network partitioning incorporating function category correlations
- Wang, Huang, et al.
(Show Context)
Citation Context ...ctional category is conducted independently. However, in reality most biological functions are highly correlated, and protein functions can be inferred from one another through their interrelatedness =-=[11, 18]-=-. These function category correlations, albeit useful, are seldom utilized in predicting protein function. In this study, we explore this special characteristic of the protein functional categories an... |