Backbone dynamics alignment of proteins: Harnessing biophysical fingerprints in pairwise sequence alignment
Bhawna Dixit
1,2,3,*, and Wim Vranken
2,3,4,*1
IBiTech–BioMMeda Group, Universiteit Gent, Belgium
2
Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Belgium
3
Structural Biology Brussels, Vrije Universiteit Brussel, Belgium
4
VIB Structural Biology Research Centre, Belgium
E-mail: bhawna.dixit@vub.be, wim.vranken@vub.be
Abstract
Background & Motivation
Proposed Alignment Method
Results
Conclusions
Proteins must conserve key biophysical characteristics in order to function correctly. Previous work has shown that patterns of coarse per-residue biophysical features, such as backbone and sidechain dynamics, disordered regions, and early folding regions can be conserved in sequences (1). We propose a method for aligning two protein sequences based on these biophysical properties, in our case predicted by the Bio2Byte tools (https://www.bio2byte.be/b2btools/): DynaMine for backbone rigidity, Efoldmine for early folding propensity, and DisoMine for disorder tendency, with predicted per-residue values between 0 to 1.
1. Orlando, Gabriele, et al. "Rigapollo, a HMM-SVM based approach to sequence alignment."Proceedings of the European Conference on Computational Biology. (2016).
2. Orlando, Gabriele, et al. "Prediction of disordered regions in proteins with recurrent Neural Networks and protein dynamics."bioRxiv(2020).
3. Cilia, Elisa, et al. "The DynaMine webserver: predicting protein dynamics from sequence."Nucleic acids research42.W1 (2014): W264-W270.
4. Cilia, Elisa, et al. "From protein sequence to dynamics and disorder with DynaMine."Nature communications4.1 (2013): 1-10.
5. Raimondi, Daniele, et al. "Exploring the sequence-based prediction of folding initiation sites in proteins."Scientific reports7.1 (2017): 1-11.
6. Xiong, Huiling, et al. "Using generalized procrustes analysis (GPA) for normalization of cDNA microarray data."Bmc Bioinformatics9.1 (2008): 1-13.
7. Cline, Melissa, Richard Hughey, and Kevin Karplus. "Predicting reliable regions in protein sequence alignments."Bioinformatics18.2 (2002): 306-314.
References
• Leverages the information in amino acid residues by aligning their biophysical features
• Incorporates biophysical effect of neighbouring residues : APSRK
• A multidimensional method in which other (biophysical) features can be added/removed (helix/coil propensity).
• Sliding window size can be increased or decreased.
• A promising substitute for general amino acid similarity-based sequence alignment
• Gap-penalties optimization, alignment quality assessment is required.
Low sequence similarity
High structure similarity
• Proteins can contain intrinsically disordered regions, or divergent protein families might consist of the same fold but have very different functionality.
• It is difficult to assess the relation between
proteins when
sequence identity is low, especially when lacking structure information.
Generalized Procrustes analysis
• Multivariate statistical method widely applied in shape analysis to find the optimal superimposition of two or multiple configurations (6).
• Triosephosphate isomerase (TIM)-barrel fold contains 33 superfamilies and 101 families in SCOP database and is most common ancestral fold.
• 10 protein sequences consisting of (TIM)-barrel fold are selected for constructing biophysical test alignments. For these protein sequences, a structure-based alignment is available as the‘ground truth’or reference alignment.
• The Procrustes disparity matrices show high similarity patterns for majority of proteins. However, after alignment construction, only 15% proteins show high biophysical sequence similarity in agreement with structure alignment.
• The reason is that the test alignments are shifted by few residues which can be assessed by Shift score assessment. Shift score gives a measure of alignment error (including misaligned, under-aligned and over-aligned residues) (7).