#### DMCA

## Thesis Committee: (2012)

### Citations

7393 | Convex Optimization
- Boyd
- 2001
(Show Context)
Citation Context ...unction is differentiable, the whole L1 regularized term can be optimized using projected gradients. We note that methods based on projected gradients are guaranteed to converge to a stationary point =-=[11]-=-. We use this method to learn the structure and parameters of the VGM . We define the loss function L as the negative log of full pseudo likelihood, as defined in Section 3.1.3: 25L(Θ|⃗µ, ⃗κ, Λ) = −l... |

4166 | Regression shrinkage and selection via the lasso
- Tibshirani
- 1996
(Show Context)
Citation Context ...f the full likelihood. Meinshausen et. al. show that each optimization term in the pseudo likelihood becomes equivalent to a regression problem, and can be solved efficiently with the Lasso regression=-=[82]-=-. They prove that this method is consistent, and in the limit recovers the true structure. We use this method to perform the structure in RKH space, and compared it with the nonparametric tree structu... |

1217 |
On estimation of a probability density function and mode
- Parzen
- 1962
(Show Context)
Citation Context ...ation is a fundamental nonparametric technique used for estimating smooth density functions given finite data, which has been used by community since 1960s when Prazen provided formulations for it in =-=[59]-=-. A kernel is a positive semidefinite matrix that defines a measure of similarity between any two data points, based on the linear similarity (i.e dot product) of the two points in some feature space,... |

848 | Scalable molecular dynamics with NAMD
- Phillips, Braun, et al.
- 2005
(Show Context)
Citation Context ...ons were limited to timescales on the order of several tens of nanoseconds. Today, however, the field is in the midst of a revolution, due to a number of technological advances in software (e.g., NAMD=-=[61]-=- and Desmond[10]), distributed computing (e.g., Folding@Home[58]), and specialized hardware (e.g., the use of GPUs[79] and Anton[70]). Collectively, these advances are enabling MD simulations into the... |

771 |
Shortest connection networks and some generalizations
- Prim
- 1957
(Show Context)
Citation Context ... the non paranormal. This model is based on nonparametric Mutual Information, calculated using kernel density estimation. The forest structure is then learned using maximum spanning tree algorithm[34]=-=[63]-=-. 11In this graphical model, if the number of variables is larger than the number of samples, a fully connected graph leads to high variance and over-fits the training data. To solve this issue, Laff... |

729 | High-Dimensional Graphs and Variable Selection with the Lasso
- Meinshausen, Bühlmann
(Show Context)
Citation Context ...of the graph is previously known. This is an assumption that is impractical for our purposes, and we will focus in our proposal to use sparse structure learning methods, such as neighborhood selection=-=[52]-=-, that have been successful in other contexts, to learn the structure in the RKH space, and perform nonparametric inference. Belief Propagation in RKHS Belief propagation in RKHS requires the beliefs ... |

673 |
On the shortest spanning subtree of a graph and the traveling salesman problem
- Kruskal
- 1956
(Show Context)
Citation Context ...e to the non paranormal. This model is based on nonparametric Mutual Information, calculated using kernel density estimation. The forest structure is then learned using maximum spanning tree algorithm=-=[34]-=-[63]. 11In this graphical model, if the number of variables is larger than the number of samples, a fully connected graph leads to high variance and over-fits the training data. To solve this issue, ... |

670 | Loopy belief propagation for approximate inference: An empirical study
- Murphy, Weiss, et al.
- 1999
(Show Context)
Citation Context ...ee-graphical model, BP guarantees to find the correct partition function, as well as any marginal and conditional distribution, very efficiently. A variant of this algorithm, loopy belief propagation =-=[55]-=-, is used on loopy UGMs. While loopy belief propagation is missing the convergence guaranties associated with BP, Yedida et. al. [87] showed that loopy belief propagation converges to Bethe approximat... |

586 | Tibshirani R: Sparse inverse covariance estimation with the graphical lasso
- Friedman, Hastie
(Show Context)
Citation Context ...ning and inference more efficient, by using parametric forms which have fewer parameters to estimate, and sometimes have closed form analytical solutions to certain queries. Gaussian graphical models =-=[20]-=-,[5] are an example of such models. A problem with Gaussian graphical models is that the data is not usually Gaussian, and so as we will see in this thesis, we propose models to learn another parametr... |

581 | Constructing Free-Energy Approximations and Generalized Belief Propagation Algorithms
- Yedidia, Freeman, et al.
- 2005
(Show Context)
Citation Context ...efficiently. A variant of this algorithm, loopy belief propagation [55], is used on loopy UGMs. While loopy belief propagation is missing the convergence guaranties associated with BP, Yedida et. al. =-=[87]-=- showed that loopy belief propagation converges to Bethe approximation of the free energy of the UGM, and presented Generalized BP, which performs message passing between clusters of nodes instead of ... |

473 | Just relax: Convex programming methods for identifying sparse signals in noise
- Tropp
- 2006
(Show Context)
Citation Context ...ention recently (e.g.[37],[26], [68],[85]). Structure learning algorithms based on L1 regularization are particularly interesting because they exhibit consistency and high statistical efficiency (see =-=[83]-=- for a review). We use an algorithm introduced by Schmidt et.al. [68] that solves the L1-regularized maximum likelihood estimation optimization problem using gradient projection. Their algorithm can b... |

446 | Expectation Propagation for approximate Bayesian inference - Minka - 2001 |

431 |
Statistics of Directional Data
- Mardia
- 1972
(Show Context)
Citation Context ...mplies low variance. Unlike the wrapped normal distribution, the von Mises distribution belongs to the exponential family and can be extended to higher dimension. The bivariate von Mises distribution =-=[46]-=- over θ1 and θ2, for example, can be defined as: f(θ1, θ2) = exp {[ ∑2 i=1 κi cos(θi − µi) ] + ⃗ K1M ⃗ KT 2 } , Zc(µ1, µ2, κ1, κ2, M) where µ1 and µ2 are the means of θ1 and θ2, respectively, κ1 and κ... |

414 |
Directional Statistics
- Mardia, Jupp
- 2000
(Show Context)
Citation Context ...statistics to model angles and other circularlydistributed variables [19]. It closely approximates the wrapped normal distribution[12], but has the advantage that it is more tractable, mathematically =-=[49]-=-. Additionally, von Mises distribution can be generalized to distributions over the (p − 1)-dimensional sphere in ℜ p , where it is known as the von Mises-Fisher distribution. Wrapped normal distribut... |

387 |
Statistical Analysis of Circular Data
- Fisher
- 1995
(Show Context)
Citation Context ...Gaussian graphical models is that the data is not usually Gaussian, and so as we will see in this thesis, we propose models to learn another parametric graphical model, based on von-Mises distribution=-=[19]-=-, which as we will see, is a more accurate model of protein structure when the protein structure is specified via its torsion angles. These parametric models have the shortcoming that they lead to uni... |

330 | Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data
- Banerjee, Ghaoui, et al.
- 2008
(Show Context)
Citation Context ... for Parameter Estimation of Regularized Time Varying GGM We use Block Coordinate Descent Algorithm to solve the stationary and time varying problems. This method has been proposed by Banerjee et. al.=-=[4]-=-, and proceeds by forming the dual for the optimization case, and applying block coordinate descent to the dual form. Recall that the primal form of both the stationary and time varying case is as fol... |

278 | Nonparametric Belief Propagation
- Sudderth, Ihler, et al.
- 2002
(Show Context)
Citation Context ...es, and approximates the free energy more accurately. Other variants of the BP exist, such as Expectation Propagation [53], and Gaussian mixture belief propagation[80], and particle belief propagation=-=[28]-=-. In this thesis we specifically focus on methods available for learning and inference in continuous graphical models, since the application that we focus on (computational structural biology) uses co... |

258 | B.: Random features for large-scale kernel machines
- Rahimi, Recht
(Show Context)
Citation Context ...dded graphical models. 3.2.5 Proposed Work: Scaling Reproducing Kernel Hilbert Space Models to Large Datasets Several classic methods, such as Incomplete Cholesky Decomposition[? ], or random features=-=[64]-=- can provide a solution for the scalability issue of nonparametric graphical models. The Incomplete Cholesky Decomposition method uses Gram-Schmidt orthogonalization procedure to factor the Kernel mat... |

248 | The infinite Gaussian mixture model
- Rasmussen
- 2000
(Show Context)
Citation Context ...a non-parametrically, based on Dirichlet process (DP) models [14], so as to solve the issue of model selection. Dirichlet processes are probabilistic processes that allow potentially infinite clusters=-=[65]-=- to be defined in the model, but for a given dataset, will converge to a finite set of clusters. We have previously investigated Dirichlet processes and hierarchical Dirichlet processes [69], and prop... |

222 | Determinant maximization with linear matrix inequality constraints
- Vandenberghe, Boyd, et al.
- 1998
(Show Context)
Citation Context ...timated to be O(n4.5 /ɛ)[4], when converging to ɛ suboptimal solution. This complexity is better than O(n6 / log( 1)), which would ɛ have been achieved using the interior point method on the dual form=-=[84]-=-. We used this algorithm in our experiments, to estimate a L1-Regularized Time-Varying Gaussian Graphical Model on the MD simulation data. The experimental conditions, model selection, and the result ... |

209 | On the influence of the kernel on the consistency of support vector machines
- Steinwart
- 2002
(Show Context)
Citation Context ...hen write the expectations and empirical mean of any arbitrary functions f as: i=1 Ex[f(x)] = 〈µ[Px], f〉 〈µ[X], f〉 = 1/m m∑ f(xi) The authors prove that if the kernel is from a universal kernel family=-=[78]-=- then these mappings are injective, and the empirical estimation of the expectations converges to the expectation under the true probability, with error rate going down with rate of O(m −1/2 ), where ... |

166 | Learning dynamic bayesian networks.
- Ghahramani
- 1998
(Show Context)
Citation Context ...ch block and as small changes between blocks as possible. The group regularization technique is also presented in [32]. Another model to describe time-varying data is Dynamic Bayesian networks (DBNs) =-=[23]-=-, [54]. DBNs are dynamic directed graphical models which describe sequential observations. The dynamics of the data is described through transition probabilities and states in the hidden variables of ... |

142 | Molecular dynamics simulations of biomolecules
- Karplus, A
- 2002
(Show Context)
Citation Context ...putational structural biology. We focus on modeling the protein structure from Molecular Dynamics(MD) simulation data. MD simulations are often used to characterize conformational dynamics of proteins=-=[31]-=-. These simulations are performed by numerically integrating Newton’s laws of motion for a set of atoms. Conformational frames are written to disk into a trajectory for subsequent analysis. Current ha... |

142 | Efficient structure learning of Markov networks using L1-regularization
- Lee, Ganapathi, et al.
- 2006
(Show Context)
Citation Context ...s not given or known, we must also learn the structure of the model, as well as the parameters. The study of the so-called structure learning problem has received considerable attention recently (e.g.=-=[37]-=-,[26], [68],[85]). Structure learning algorithms based on L1 regularization are particularly interesting because they exhibit consistency and high statistical efficiency (see [83] for a review). We us... |

113 | A Hilbert space embedding for distributions.
- Smola, Gretton, et al.
- 2007
(Show Context)
Citation Context ... Hilbert Space embedding of a probability distribution propagation is performed non-parametrically in this space. In the rest of this section we will briefly mention each of these steps. Smola et. al.=-=[72]-=- provided the formulations to non-parametrically embed probability distributions into RKH spaces. Given an iid dataset X = {x1, ..., xm}, they define two main mappings: µ[Px] = Ex[k(x, .)] µ[x] = 1/m ... |

101 | High dimensional graphical model selection using l1-regularized logistic regression
- Wainwright, Ravikumar, et al.
- 2006
(Show Context)
Citation Context ...nown, we must also learn the structure of the model, as well as the parameters. The study of the so-called structure learning problem has received considerable attention recently (e.g.[37],[26], [68],=-=[85]-=-). Structure learning algorithms based on L1 regularization are particularly interesting because they exhibit consistency and high statistical efficiency (see [83] for a review). We use an algorithm i... |

99 | Accelerating molecular modeling applications with graphics processors
- Stone, Phillips, et al.
- 2007
(Show Context)
Citation Context ...et of atoms. Conformational frames are written to disk into a trajectory for subsequent analysis. Current hardware allows MD simulations to continue up to milliseconds for many proteins[10],[58],[70],=-=[79]-=-,[62], This time scale is often enough for identifying and studying the conformational sub-states relevant to biological function. We have access to several MD simulations of important proteins and en... |

92 | The nonparanormal: Semiparametric estimation of high dimensional undirected graphs.
- Liu, Lafferty, et al.
- 2009
(Show Context)
Citation Context ... the protein structure is specified via its torsion angles. These parametric models have the shortcoming that they lead to unimodal marginals for each variable, which is not realistic. Semi-parametric=-=[43]-=-,[36], and non-parametric [74],[76] graphical models can model complex distributions with multi-modal and arbitrary marginals. In 1this thesis we will also focus on using these models for discovering... |

83 | Fast optimization methods for l1 regularization: a comparative study and two new approaches
- Schmidt, Fung, et al.
- 2007
(Show Context)
Citation Context ...ed 1000 fully observed samples. Next, we used our structure learning algorithm (Sec3.1.3) to learn a VGM from each data set. For comparison, we also used the structure learning algorithm presented in =-=[67]-=- to learn a sparse GGM from the same data. Cross Validation and Evaluation Metric In each experiment, we performed leave-one-out cross validation, where for each test data, we assumed 50% of the varia... |

75 | Component selection and smoothing in multivariate nonparametric regression
- Lin, Zhang
- 2006
(Show Context)
Citation Context ...f variables according to a threshold. Alternatively, Huang et. al.[27] assumes the regression to be via additive components, and estimates sparse component selection based on group Lasso. Lin et. al. =-=[41]-=-, on the other hand, uses smoothing splines to perform variable selection. In this thesis we propose to investigate and apply these kernel based sparse regression methods to improve the quality of our... |

66 | Scalable algorithms for molecular dynamics simulations on commodity clusters
- Bowers, Chow, et al.
(Show Context)
Citation Context ... motion for a set of atoms. Conformational frames are written to disk into a trajectory for subsequent analysis. Current hardware allows MD simulations to continue up to milliseconds for many proteins=-=[10]-=-,[58],[70],[79],[62], This time scale is often enough for identifying and studying the conformational sub-states relevant to biological function. We have access to several MD simulations of important ... |

64 |
Method for Estimating the Configurational Entropy of Macromolecules
- Karplus, Kushick
- 1981
(Show Context)
Citation Context ...al statistics (e.g., the radius of gyration, root-mean squared difference from the initial conformation, total energy, etc), and identifying sub-states using techniques such as quasi-harmonic analysis=-=[30]-=- [39], and other Principal Components Analysis (PCA) based techniques[6]. Quasi-harmonic analysis, like all PCA-based methods, implicitly assumes that the frames are drawn from a multivariate Gaussian... |

63 | Variable selection in nonparametric additive models
- Huang, Horowitz, et al.
- 2010
(Show Context)
Citation Context ... regression, called Rodeo, which is based on a series of hypothesis test, via non parametric kernel regression, and selects a subset of variables according to a threshold. Alternatively, Huang et. al.=-=[27]-=- assumes the regression to be via additive components, and estimates sparse component selection based on group Lasso. Lin et. al. [41], on the other hand, uses smoothing splines to perform variable se... |

62 |
Homeodomain proteins.
- Gehring, Affolter, et al.
- 1994
(Show Context)
Citation Context ...tions data, which is a 54-residue DNA binding domain(Figure 3.7. The DNA-binding domains of the homeotic proteins, called homeodomains (HD), play an important role in the development of all metazoans =-=[21]-=- and certain mutations to HDs are known to cause disease in humans [16]. Homeodomains fold into a highly conserved structure consisting of three alpha-helices wherein the C-terminal helix makes sequen... |

62 | Atomistic protein folding simulations on the submillisecond time scale using worldwide distributed computing
- Pande, Baker, et al.
(Show Context)
Citation Context ...on for a set of atoms. Conformational frames are written to disk into a trajectory for subsequent analysis. Current hardware allows MD simulations to continue up to milliseconds for many proteins[10],=-=[58]-=-,[70],[79],[62], This time scale is often enough for identifying and studying the conformational sub-states relevant to biological function. We have access to several MD simulations of important prote... |

56 |
Collective protein dynamics in relation to function. Curr Opin Struct Biol 2000;10:165–169
- Berendsen, Hayward
(Show Context)
Citation Context ... from the initial conformation, total energy, etc), and identifying sub-states using techniques such as quasi-harmonic analysis[30] [39], and other Principal Components Analysis (PCA) based techniques=-=[6]-=-. Quasi-harmonic analysis, like all PCA-based methods, implicitly assumes that the frames are drawn from a multivariate Gaussian distribution. Our method makes the same assumption but differs from qua... |

55 | Estimating TimeVarying Networks
- Kolar, Song, et al.
(Show Context)
Citation Context ...lecular dynamics simulation datasets, it is usually the case where the data contains many locally optimal sub-states and several energy barriers and transitional sub-states between them. Kolar et. al =-=[33]-=- presented two methods to estimate hidden structure and parameters of a time-varying network, both based on temporally smoothed L-1 regularized logistic regression. In one model the assumption is that... |

50 |
Homeodomain-DNA recognition.
- Gehring, Qian, et al.
- 1994
(Show Context)
Citation Context ... disease in humans [16]. Homeodomains fold into a highly conserved structure consisting of three alpha-helices wherein the C-terminal helix makes sequence-specific contacts in the major groove of DNA =-=[22]-=-. The Engrailed Homeodomain is an ultra-fast folding protein that is predicted to exhibit significant amounts of helical structure in the denatured state ensemble [51]. Moreover, the experimentally de... |

50 | Hilbert space embeddings of conditional distributions with applications to dynamical systems.
- Song, Smola, et al.
- 2009
(Show Context)
Citation Context ...e variable space into feature space defined by the reproducing kernel, and the RKHS mappings defined for empirical and true expectations. To embed conditional distributions in RKH space, Song et. al. =-=[73]-=- define covariance operator, on pairs of variables (i.e. DXY = {(x1, y1), (x2, y2), ..., (xm, ym)}), as: 14 i=1CX,Y = EX,Y [φ(X) ⊗ φ(Y )] − µX ⊗ µY where ⊗ is the tensor product, the generalization o... |

48 |
The dynamic energy landscape of dihydrofolate reductase
- Boehr, Dyson, et al.
(Show Context)
Citation Context ...ited by the simulation, as well as the transition between them. 4.1.1 Introduction A system’s ability to visit different sub-states is closely linked to important phenomena, including enzyme catalysis=-=[7]-=- and energy transduction[38]. For example, the primary sub-states associated with an enzyme might correspond to the unbound form, the enzyme-substrate complex, and the 45enzyme-product complex. The e... |

44 | Learning latent tree graphical models.
- Choi, Tan, et al.
- 2011
(Show Context)
Citation Context ...Recently in [77], Song et. al. proposed a method to perform structure learning for tree graphical models in RKH space. Their method is based on the structure learning method proposed by Choi et. al . =-=[13]-=-, where they use a tree metric to estimate a distance measure between node pairs, and use that value to select a tree via a minimum spanning tree algorithm [34][63]. According to [13], if there exists... |

44 |
On estimating regression. Theory Prob.
- Nadaraya
- 1964
(Show Context)
Citation Context ...ty estimation for a sample dataset X = {−2.1, −1.3, −0.4, 1.9, 5.1, 6.9}, using Gaussian kernel with λ = 2.25. In addition to density estimation, kernel methods have been used for nonlinear regression=-=[56]-=-, [86], as well. Roth [66] proposed sparse kernel regression, which uses support vector method to solve the regression problem. 12Figure 2.5: Kernel Density Estimation Example Given a data set D = {(... |

43 | Time Varying Undirected Graphs
- Zhou, Lafferty, et al.
(Show Context)
Citation Context ...els, their model is very slow to converge, and can only handle small discrete tables, and not continuous variables. A continuous and parametric time-varying model was presented in 2008 by Zhou et. al =-=[89]-=-. They introduced a time-varying Gaussian graphical model. They provided a smoothly varying structure and parameter estimation model, where a smoothing kernel was used to calculate weights in the weig... |

35 | Dynamic Bayesian networks,”
- Murphy
- 2002
(Show Context)
Citation Context ...ck and as small changes between blocks as possible. The group regularization technique is also presented in [32]. Another model to describe time-varying data is Dynamic Bayesian networks (DBNs) [23], =-=[54]-=-. DBNs are dynamic directed graphical models which describe sequential observations. The dynamics of the data is described through transition probabilities and states in the hidden variables of a DBN.... |

33 |
A generative, probabilistic model of local protein structure,
- Boomsma, Mardia, et al.
- 2008
(Show Context)
Citation Context ... and use Gibbs sampling for inference. All these models assume a fully connected graph structure. They also perform Gibbs sampling for inference, which becomes slow for larger models. Boomsma et. al. =-=[8]-=-, model protein sequence as a dynamic Bayesian network, in which hidden states generate backbone angle pairs (φ and ψ, for each residue) from a bivariate von Mises distribution. Figure 2.3 shows the g... |

33 | Hash kernels
- Shi, Petterson, et al.
- 2009
(Show Context)
Citation Context ...2e−frac||w||2 22 , for Gaussian Kernel). Given the samples, the projected feature maps are defined as √ 1 z(x) = D [cos(w′ 1x)...cos(w ′ Dx)sin(w′ 1x)...sin(w ′ Dx)]′ In another research, Shi et. al. =-=[71]-=-, introduce hash kernels. Hash kernel are built for high dimensional feature spaces, such as string kernel features, where the features are usually words in the whole corpus. In hash kernels, a hash f... |

30 | A multivariate regression approach to association analysis of a quantitative trait network
- Kim, Sohn, et al.
(Show Context)
Citation Context ...two regularization penalties, Lasso and group Lasso, to enforce sparse structure in each block and as small changes between blocks as possible. The group regularization technique is also presented in =-=[32]-=-. Another model to describe time-varying data is Dynamic Bayesian networks (DBNs) [23], [54]. DBNs are dynamic directed graphical models which describe sequential observations. The dynamics of the dat... |

30 | Kernel belief propagation
- Song, Bickson, et al.
- 2011
(Show Context)
Citation Context ... via its torsion angles. These parametric models have the shortcoming that they lead to unimodal marginals for each variable, which is not realistic. Semi-parametric[43],[36], and non-parametric [74],=-=[76]-=- graphical models can model complex distributions with multi-modal and arbitrary marginals. In 1this thesis we will also focus on using these models for discovering the generative model of protein st... |

29 |
propagation, and structuring in belief networks
- Fusion
- 1986
(Show Context)
Citation Context ...hical model. Inference in UGMs corresponds to evaluating the probability of any query term, given some observations. There are several algorithms for performing the inference. Belief Propagation (BP) =-=[60]-=- is currently the most common approach to calculate the partition function. Belief Propagation, also known as sum-product algorithm, is a message passing algorithm, in which variables send messages to... |

25 | Rodeo: sparse, greedy nonparametric regression.
- Lafferty, Wasserman
- 2008
(Show Context)
Citation Context ...tween the variables are always acyclic, specially in applications such as computational molecular biology. They propose alternative structure learnings based on nonparametric sparse greedy regression =-=[35]-=-, which they have not yet tested in this context. 2.4.3 Nonparametric Kernel Space Embedded Graphical Models Kernel based methods have a long history in statistics and machine learning. Kernel density... |

25 |
Time-Varying Dynamic Bayesian Networks
- Song, Kolar, et al.
(Show Context)
Citation Context ...een the hidden variables is fixed and it is their states that control the dynamic model. For large time-varying networks such as protein structural data, these systems become infeasible. Song et. al. =-=[75]-=- presented time-varying dynamic Bayesian networks, which is similar to the discrete time varying graphical model, based on smoothly varying structure which is designed using a kernel re-weighted loss ... |

24 |
2007b), Protein bioinformatics and mixtures of bivariate von mises distributions for angular data
- Mardia, Taylor, et al.
(Show Context)
Citation Context ...ariant, is generally preferred because it only requires five parameters and is easily expandable to more than 2 variables, as will be demonstrated in the next section. Mardia et. al. provide bivariate=-=[47]-=- and also multivariate[48] von Mises models applied to protein angles, and provide an algorithm based on full pseudo-likelihood to perform parameter estimation.[25] Their formulation of pseudo-likelih... |

22 |
Protein folding and unfolding in microsecond to nanoseconds by experiment and simulation
- Mayor, Johnson, et al.
- 2000
(Show Context)
Citation Context ... folding protein that is predicted to exhibit significant amounts of helical structure in the denatured state ensemble [51]. Moreover, the experimentally determined unfolding rate is of 1.1E + 03/sec =-=[50]-=-, which is also fast. Taken together, these observations suggest that the protein may exhibit substantial conformational fluctuations. We performed three 50-microsecond simulations of the protein at 3... |

21 |
Protein dynamics and enzymatic catalysis: Investigating the peptidyl-prolyl cis/trans isomerization activity of cyclophilin a
- Agarwal, Geist, et al.
(Show Context)
Citation Context ...; (ii) the dipeptide Ala-Pro (PDB ID: 2CYH); and (iii) the tetra-peptide Ala-Ala-ProPhe (PDB ID: 1RMH). Previous studies have identified a set of 25 highly conserved residues in the cyclophilin family=-=[3]-=-. In particular, residues P30, T32, N35, F36, Y48, F53, H54, R55, I57, F60, M61, Q63, G65, F83, E86, L98, M100, T107, Q111, F112, F113, I114, L122, H126, F129 are all 49highly conserved. Experimental... |

21 |
Quasi-harmonic method for studying very low frequency modes in proteins
- Levy, Srinivasan, et al.
- 1984
(Show Context)
Citation Context ...atistics (e.g., the radius of gyration, root-mean squared difference from the initial conformation, total energy, etc), and identifying sub-states using techniques such as quasi-harmonic analysis[30] =-=[39]-=-, and other Principal Components Analysis (PCA) based techniques[6]. Quasi-harmonic analysis, like all PCA-based methods, implicitly assumes that the frames are drawn from a multivariate Gaussian dist... |

20 |
a special-purpose machine for molecular dynamics simulation
- Anton
- 2007
(Show Context)
Citation Context ...r a set of atoms. Conformational frames are written to disk into a trajectory for subsequent analysis. Current hardware allows MD simulations to continue up to milliseconds for many proteins[10],[58],=-=[70]-=-,[79],[62], This time scale is often enough for identifying and studying the conformational sub-states relevant to biological function. We have access to several MD simulations of important proteins a... |

17 |
Catalysis of cis/trans isomerization in native HIV-1 capsid by human cyclophilin
- Bosco, Eisenmesser, et al.
- 2002
(Show Context)
Citation Context ...particular, residues P30, T32, N35, F36, Y48, F53, H54, R55, I57, F60, M61, Q63, G65, F83, E86, L98, M100, T107, Q111, F112, F113, I114, L122, H126, F129 are all 49highly conserved. Experimental work=-=[9]-=- and MD simulations[2, 3] have also implicated these residues as forming a network that influences the substrate isomerization process. Significantly, this network extends from the flexible surface re... |

17 | A multivariate von Mises distribution with applications to bioinformatics.
- Mardia, Hughes, et al.
- 2008
(Show Context)
Citation Context ...rred because it only requires five parameters and is easily expandable to more than 2 variables, as will be demonstrated in the next section. Mardia et. al. provide bivariate[47] and also multivariate=-=[48]-=- von Mises models applied to protein angles, and provide an algorithm based on full pseudo-likelihood to perform parameter estimation.[25] Their formulation of pseudo-likelihood is based on the fact t... |

15 |
Cis/trans isomerization in hiv-1 capsid protein catalyzed by cyclophilin a: Insights from computational and theoretical studies
- Agarwal
(Show Context)
Citation Context ...30, T32, N35, F36, Y48, F53, H54, R55, I57, F60, M61, Q63, G65, F83, E86, L98, M100, T107, Q111, F112, F113, I114, L122, H126, F129 are all 49highly conserved. Experimental work[9] and MD simulations=-=[2, 3]-=- have also implicated these residues as forming a network that influences the substrate isomerization process. Significantly, this network extends from the flexible surface regions of the protein to t... |

15 | Kernel embeddings of latent tree graphical models.
- Song, Xing
- 2011
(Show Context)
Citation Context ...e we sometimes have a few thousand samples, there is an scalability issue which we discuss in the next chapter and propose to solve. Tree Structure Learning for RKHS Tree Graphical Models Recently in =-=[77]-=-, Song et. al. proposed a method to perform structure learning for tree graphical models in RKH space. Their method is based on the structure learning method proposed by Choi et. al . [13], where they... |

14 |
Nonparametric tree graphical models
- Song, Gretton, et al.
(Show Context)
Citation Context ...ified via its torsion angles. These parametric models have the shortcoming that they lead to unimodal marginals for each variable, which is not realistic. Semi-parametric[43],[36], and non-parametric =-=[74]-=-,[76] graphical models can model complex distributions with multi-modal and arbitrary marginals. In 1this thesis we will also focus on using these models for discovering the generative model of prote... |

13 |
Beyond rotamers: a generative, probabilistic model of side chains in proteins.
- Harder, Boomsma, et al.
- 2010
(Show Context)
Citation Context ... methods of structure prediction in discrete domain to the data. In protein structure modeling, side chain conformations are usually categorized into some specific discrete states called Rotamers.[29]=-=[24]-=-. In practice, this approach has several shortcomings: Discretization of a continuous value into very large bins introduces inaccuracies into the data. One can avoid this by increasing the granularity... |

13 |
Energy flow in proteins
- Leitner
(Show Context)
Citation Context ...well as the transition between them. 4.1.1 Introduction A system’s ability to visit different sub-states is closely linked to important phenomena, including enzyme catalysis[7] and energy transduction=-=[38]-=-. For example, the primary sub-states associated with an enzyme might correspond to the unbound form, the enzyme-substrate complex, and the 45enzyme-product complex. The enzyme moves between these su... |

12 | Statistical and computational tradeoffs in stochastic composite likelihood
- Dillon, Lebanon
- 2009
(Show Context)
Citation Context ...estimator is consistent provided that the mapping between conditional probabilities and joint probability is injective, i.e. the joint probability can be uniquely specified by the set of conditionals =-=[17]-=-. This property does hold true for von Mises. Proof : Consider two conditionals with different parameters (⃗κ ∗ 1 and ⃗κ ∗ 2, and ⃗µ ∗ 1 and ⃗µ ∗ 2), which have the same conditional distributions. [I0... |

11 |
distributions via polya urn schemes
- Ferguson
- 1973
(Show Context)
Citation Context ...trajectory is proportional to the number of samples drawn from the model. As the next step, we propose to perform the clustering of the data non-parametrically, based on Dirichlet process (DP) models =-=[14]-=-, so as to solve the issue of model selection. Dirichlet processes are probabilistic processes that allow potentially infinite clusters[65] to be defined in the model, but for a given dataset, will co... |

11 |
Sparse kernel regressors
- Roth
(Show Context)
Citation Context ... dataset X = {−2.1, −1.3, −0.4, 1.9, 5.1, 6.9}, using Gaussian kernel with λ = 2.25. In addition to density estimation, kernel methods have been used for nonlinear regression[56], [86], as well. Roth =-=[66]-=- proposed sparse kernel regression, which uses support vector method to solve the regression problem. 12Figure 2.5: Kernel Density Estimation Example Given a data set D = {(x1, y1), (x2, y2), .., (xN... |

10 | Infinite dynamic Bayesian networks.
- Doshi, Wingate, et al.
- 2011
(Show Context)
Citation Context ...us variables before modeling. As we saw in previous sections, discretization creates several problems, including the large number of parameters to estimate, which leads to over-fitting. Finale et. al.=-=[18]-=- presents infinite dynamic Bayesian network model as one solution, 18where hidden variables, their hidden states, and the transitions and emissions are generated nonparametrically from the data, via ... |

7 |
A bayesian statistics approach to multiscale coarse graining
- Liu, Shi, et al.
(Show Context)
Citation Context ...ctions, which is also used to produce low-complexity models. The use of regularization is common in Statistics and in Machine Learning, but it has only recently been applied to Molecular Dynamics data=-=[44]-=- [45]. Previous applications focus on the problem of learning the parameters of force-fields for coarse-grained models, and rely on a Bayesian prior, in the form of inverse-Wishart distribution[44], o... |

7 |
scalable algorithms for multiscale coarse-graining
- Efficient
(Show Context)
Citation Context ...s, which is also used to produce low-complexity models. The use of regularization is common in Statistics and in Machine Learning, but it has only recently been applied to Molecular Dynamics data[44] =-=[45]-=-. Previous applications focus on the problem of learning the parameters of force-fields for coarse-grained models, and rely on a Bayesian prior, in the form of inverse-Wishart distribution[44], or a G... |

5 |
Computational studies of the mechanism of cis/trans isomerization in hiv-1 catalyzed by cyclophilin a. Proteins: Struct
- Agarwal
(Show Context)
Citation Context ...c and substrate-specific couplings, all of which are automatically discovered by the method. We have discovered that over the course of the reaction, the network regions as identified by previous work=-=[1]-=- couple directly to the active site residues (see Fig. 4.4). The method is also able to pick out subtle changes in the dynamics as seen by the edges that appear in substrate-specific couplings (see Fi... |

5 |
Sparse Nonparametric Graphical Models. ArXiv e-prints
- Lafferty, Liu, et al.
- 2012
(Show Context)
Citation Context ...protein structure is specified via its torsion angles. These parametric models have the shortcoming that they lead to unimodal marginals for each variable, which is not realistic. Semi-parametric[43],=-=[36]-=-, and non-parametric [74],[76] graphical models can model complex distributions with multi-modal and arbitrary marginals. In 1this thesis we will also focus on using these models for discovering the ... |

5 |
The denatured state of engrailed homeodomain under denaturing and native conditions.
- Mayor, Grossmann, et al.
- 2003
(Show Context)
Citation Context ...tacts in the major groove of DNA [22]. The Engrailed Homeodomain is an ultra-fast folding protein that is predicted to exhibit significant amounts of helical structure in the denatured state ensemble =-=[51]-=-. Moreover, the experimentally determined unfolding rate is of 1.1E + 03/sec [50], which is also fast. Taken together, these observations suggest that the protein may exhibit substantial conformationa... |

5 |
Smooth regression analysis. Sankhy: The Indian
- Watson
- 1964
(Show Context)
Citation Context ...imation for a sample dataset X = {−2.1, −1.3, −0.4, 1.9, 5.1, 6.9}, using Gaussian kernel with λ = 2.25. In addition to density estimation, kernel methods have been used for nonlinear regression[56], =-=[86]-=-, as well. Roth [66] proposed sparse kernel regression, which uses support vector method to solve the regression problem. 12Figure 2.5: Kernel Density Estimation Example Given a data set D = {(x1, y1... |

4 |
Daum’e III. From zero to reproducing kernel hilbert spaces in twelve pages or less
- Hal
- 2004
(Show Context)
Citation Context ...space, F , a Hilbert space requires that the result of the dot product be in F as well. For example, the space of vectors in ℜ n is a Hilbert space, since the dot product of any two elements is in ℜ. =-=[15]-=-. Reproducing kernel Hilbert space is a Hilbert space defined over a reproducing kernel function. Reproducing kernels are the family of kernels that define a dot product function space, which allows a... |

4 | Learning sparse markov network structure via ensembleof-trees models
- Lin, Zhu, et al.
- 2009
(Show Context)
Citation Context ... arbitrary distributions, we also propose to perform sparse structure learning in the RKHS space in section 3.2. We will use existing methods such as neighborhood selection[52], tree-ensemble learning=-=[42]-=-, as well as non-parametric regression model[35], to perform sparse structure learning. We present our preliminary results using neighborhood selection method over Engrailed protein dataset. Current R... |

3 |
Multivariate and time series models for circular data with applications to protein conformational angles
- Heughes
(Show Context)
Citation Context ...on. Mardia et. al. provide bivariate[47] and also multivariate[48] von Mises models applied to protein angles, and provide an algorithm based on full pseudo-likelihood to perform parameter estimation.=-=[25]-=- Their formulation of pseudo-likelihood is based on the fact that univariate marginals of the multivariate von-Mises distribution have closed form, and thus can be optimized using gradient descent. Th... |

2 |
Renata Lonigro, and Giuseppe Damante. Missense mutations of human homeoboxes: A review
- DElia, Tell, et al.
- 2001
(Show Context)
Citation Context ...A-binding domains of the homeotic proteins, called homeodomains (HD), play an important role in the development of all metazoans [21] and certain mutations to HDs are known to cause disease in humans =-=[16]-=-. Homeodomains fold into a highly conserved structure consisting of three alpha-helices wherein the C-terminal helix makes sequence-specific contacts in the major groove of DNA [22]. The Engrailed Hom... |

2 |
Dunbrack Jr and Martin Karplus. Backbone-dependent rotamer library for proteins application to side-chain prediction
- Roland
- 1993
(Show Context)
Citation Context ...ting methods of structure prediction in discrete domain to the data. In protein structure modeling, side chain conformations are usually categorized into some specific discrete states called Rotamers.=-=[29]-=-[24]. In practice, this approach has several shortcomings: Discretization of a continuous value into very large bins introduces inaccuracies into the data. One can avoid this by increasing the granula... |

2 |
Support Vector Machines. Talk at
- Lin
- 2006
(Show Context)
Citation Context ...re space into which the data is projected is an infinite dimensional space based on the Taylor expansion of the RBF kernel function, φ(x) = e−λx2 √ 2λ [1, 1! x, √ (2λ) 2 2! x2 √ (2λ) 3 , 3! x3 , ...] =-=[40]-=-, and kRBF (x, y) is equal to the dot product of φ(x) and φ(y). Usually we use such feature spaces in algorithms which only use the dot product of the two feature vectors, and never use one feature ve... |

1 |
Kamisetty Arvind Ramanathan Christopher J. Langmead Narges Razavian, Subhodeep Moitra. Time-varying gaussian graphical models of molecular dynamics data
- Hetu
(Show Context)
Citation Context ...y 61 viList of Figures 2.1 Markov Random Field Example . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Backbone and side-chain dihedral angles of a di-peptide Lys-Ala protein. Picture from =-=[57]-=- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Generative model for Dynamic Bayesian Network with von Mises emissions . . . 10 2.4 Histogram and Gaussian distributi... |

1 | An overview of nonparametric bayesian models and applications to natural language processing
- Sharif-razavian, Zollmann
- 2009
(Show Context)
Citation Context ...e clusters[65] to be defined in the model, but for a given dataset, will converge to a finite set of clusters. We have previously investigated Dirichlet processes and hierarchical Dirichlet processes =-=[69]-=-, and propose to create local optima sub-state and transition sub-state models, based on the initial clustering of the data, and weighted log likelihood. Describing transition states in the folding is... |