Computational methods for comparing and integrating multiple probing assays to predict RNA secondary structure

(1)

HAL Id: hal-01558227

https://hal.archives-ouvertes.fr/hal-01558227

Submitted on 7 Jul 2017

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Computational methods for comparing and integrating multiple probing assays to predict RNA secondary

structure

Afaf Saaidi, Delphine Allouche, Yann Ponty, Bruno Sargueil, Mireille Regnier

To cite this version:

Afaf Saaidi, Delphine Allouche, Yann Ponty, Bruno Sargueil, Mireille Regnier. Computational methods for comparing and integrating multiple probing assays to predict RNA secondary structure . Doctorial journey Interface, Ecole polytechnique, Palaiseau, Nov 2016, Palaiseau, France. 10. �hal- 01558227�

(2)

Computational methods for comparing

and integrating multiple probing assays to predict RNA secondary structure

Afaf Saaidi

^1,2,3

, Delphine Allouche

^3,4

Supervisors: Yann Ponty

^1,2,3

, Bruno sargueil

^3,4

, Mireille Regnier

^1,2

1-AMIB team, Inria Saclay 2-LIX ,3-CNRS ,4-Laboratoire de cristallographie et RMN Biologiques, Paris Descartes

1-Introduction

• RNA is key to understand many biological processes.

• RNA maintains a stable tertiary structure.

• The determination of the structure allows understanding its operating mechanism.

• We study the 444nt long VIH1 Gag-IRES.

RNA Structure determination

• 3D structure can be resolved experimentally [remains expensive and time-consuming].

• Computational methods allows to have accurate

secondary structure predictions (PPV ≈ 75%). Less accurate predictions for long RNA.

• + Experimental Data [Chemical(SHAPE) \ Enzymatic]

improve predictions.

State of the art

Evolution of computational approaches to predict the RNA secondary structure:

1981

Thermo

dynamic

model

[4]

1990

Partition

function

[McCaskill]

1999

Structures

sampling

app roach

2003

Sto

chastic samp

ling [2]

2004

Exp

erimental data

probing

2009

SHAPE data

as

pseudo-free

energy [1]

2016

Ongoing

work

>

Objectives

• Apply sampling approach with SHAPE data to predict the RNA secondary structure.

2-Material & Methods 2-1 Experimental data

SHAPE-Map experiments High Throughput Sequencing

SHAPE reactivity calculation

Reactivity(n) = mut_Shape(n) − mut_Control(n)

mut_Denatured(n) [3] V-Enzymatic cleavage targets paired nucleotides.

T-Enzymatic cleavage reveals unpaired nucleotides.

Boltzmann probability to observe a structure S: P (S) = e^−E^kT^S

Z .

with Z the partition function:Z = ^X

S⁰∈S e

−ES0 kT .

For a cluster C

• Coherence ≡ Base pairs Mean Distance in C.

• Diversity ≡ Number of present conditions.

• Stability ≡ Sum of Boltzmann probabilities.

2-2 Sampling/Clustering workflow

experimental data from different conditions

A B C D

Data processing

A B C D

Structure sampling

***** ***** ***** *****

Set of ensemble structures

** ** ** ~~** ** ~~**

** ** ~~** ** ** **

Clustering[Affinity propagation]

** ****** **** *****

Optimal clusters?

Diversity

Maximization –> Pareto Frontier

Optimal Centroid Structures

Stability Coherence

3-Results

Optimal centroid structures from 140. [8000 structures]

A UG GG U GC G A G A G C G U C G G U A U U A A G CGGG G GA G A A U U A G A U A A A U GG GA A A A A A U U C G GUUAA G G C CA GG GG GA A A G A A A C A A U A U A A A C U A A A ACA U A U A G U A U G G GCA A GCA G GGA G C U A G A A C G A U U C GCA G U U A A U C C U GG C CU U U U A G A G A C A U C A G A A G G C UGU A G A C A A A U A C U G G G A C A GCU A C A A C C AUC C C U U C A G A C A G G A U C A G A A G A A CUU A GAUCA U U AUA U A A U A C A A U A G C A G U C C U C U A U U G UGU G CA UC A A A G G A U A G A U G UAA A A G ACA C C A A G G A A G C C U U A G A U A A G A U AG A G GA A G A G C A A AAC A A A A G U A A G A A A A A G G C ACA GCA A GC AGCA GCU G A C A C A G G A A A C A ACA GCC A G G UCA GC CA A A A U U A C C C U A U A G U G C A G A A C C U C C A G G G G C A A AU G G UA C A U C A G G CCA U A UCA

1 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420 430 440 444

-5.02.4060000000000006

HIV

1 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420 430 440 444

-5.02.4060000000000006

HIV

1 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420 430 440 444

-5.02.4060000000000006

HIV

1 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420 430 440 444

-5.02.4060000000000006

HIV

1 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420 430 440 444

-5.02.4060000000000006

HIV

1 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420 430 440 444

-5.02.4060000000000006

HIV

4-Conclusion & perspectives

• We have obtained a set of models supported by our integrative approach, those models are subject to validation.

• Some of centroid structures have shown high

compatibility with existing proposed structural models.

• We will extend the approach to the simultaneous analysis of probing data for a set of RNA variants.

References

[1] Katherine E. Deigan, Tian W. Li, David H. Mathews, and Kevin M. Weeks.

Accurate SHAPE-directed RNA structure determination.

106(1):97–102, jan 2009.

[2] Lawrence CE. Ding.

A statistical sampling algorithm for RNA secondary structure prediction.

Nucleic Acids Research, 31(24):7280–7301, 2003.

[3] Matthew J Smola, Greggory M Rice, Steven Busan, Nathan A Siegfried, and Kevin M Weeks.

Selective 2’-hydroxyl acylation analyzed by primer extension and mutational profiling (SHAPE-MaP) for direct, versatile and accurate RNA structure analysis.

Nature protocols, 10(11):1643–1669, 2015.

[4] Michael Zuker and Patrick Stiegler.

Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information.

Nucleic Acids Research, 9(1):133–148, jan 1981.

Acknowledgments

PhD funded by the "Fondation pour la Recherche Médicale".

Contact address

saaidi.afaf@polytechnique.edu