• Aucun résultat trouvé

Computational methods for comparing and integrating multiple probing assays to predict RNA secondary structure

N/A
N/A
Protected

Academic year: 2021

Partager "Computational methods for comparing and integrating multiple probing assays to predict RNA secondary structure"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-01558227

https://hal.archives-ouvertes.fr/hal-01558227

Submitted on 7 Jul 2017

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Computational methods for comparing and integrating multiple probing assays to predict RNA secondary

structure

Afaf Saaidi, Delphine Allouche, Yann Ponty, Bruno Sargueil, Mireille Regnier

To cite this version:

Afaf Saaidi, Delphine Allouche, Yann Ponty, Bruno Sargueil, Mireille Regnier. Computational meth- ods for comparing and integrating multiple probing assays to predict RNA secondary structure . Doctorial journey Interface, Ecole polytechnique, Palaiseau, Nov 2016, Palaiseau, France. 10. �hal- 01558227�

(2)

Computational methods for comparing

and integrating multiple probing assays to predict RNA secondary structure

Afaf Saaidi

1,2,3

, Delphine Allouche

3,4

Supervisors: Yann Ponty

1,2,3

, Bruno sargueil

3,4

, Mireille Regnier

1,2

1-AMIB team, Inria Saclay 2-LIX ,3-CNRS ,4-Laboratoire de cristallographie et RMN Biologiques, Paris Descartes

1-Introduction

RNA is key to understand many biological processes.

RNA maintains a stable tertiary structure.

The determination of the structure allows understanding its operating mechanism.

We study the 444nt long VIH1 Gag-IRES.

RNA Structure determination

3D structure can be resolved experimentally [remains expensive and time-consuming].

Computational methods allows to have accurate

secondary structure predictions (PPV ≈ 75%). Less accurate predictions for long RNA.

+ Experimental Data [Chemical(SHAPE) \ Enzymatic]

improve predictions.

State of the art

Evolution of computational approaches to predict the RNA secondary structure:

1981

Thermo

dynamic

model

[4]

1990

Partition

function

[McCaskill]

1999

Structures

sampling

app roach

2003

Sto

chastic samp

ling [2]

2004

Exp

erimental data

probing

2009

SHAPE data

as

pseudo-free

energy [1]

2016

Ongoing

work

>

Objectives

Apply sampling approach with SHAPE data to predict the RNA secondary structure.

2-Material & Methods 2-1 Experimental data

SHAPE-Map experiments High Throughput Sequencing

SHAPE reactivity calculation

Reactivity(n) = mutShape(n) − mutControl(n)

mutDenatured(n) [3] V-Enzymatic cleavage targets paired nucleotides.

T-Enzymatic cleavage reveals unpaired nucleotides.

Boltzmann probability to observe a structure S: P (S) = e−EkTS

Z .

with Z the partition function:Z = X

S0∈S e

−ES0 kT .

For a cluster C

Coherence ≡ Base pairs Mean Distance in C.

Diversity ≡ Number of present conditions.

Stability ≡ Sum of Boltzmann probabilities.

2-2 Sampling/Clustering workflow

experimental data from different conditions

A B C D

Data processing

A B C D

Structure sampling

***** ***** ***** *****

Set of ensemble structures

** ** ** ~~** ** ~~**

** ** ~~** ** ** **

Clustering[Affinity propagation]

** ****** **** *****

Optimal clusters?

Diversity

Maximization –> Pareto Frontier

Optimal Centroid Structures

Stability Coherence

3-Results

Optimal centroid structures from 140. [8000 structures]

A UG GG U GC G A G A G C G U C G G U A U U A A G CGGG G GA G A A U U A G A U A A A U GG GA A A A A A U U C G GUUAA G G C CA GG GG GA A A G A A A C A A U A U A A A C U A A A ACA U A U A G U A U G G GCA A GCA G GGA G C U A G A A C G A U U C GCA G U U A A U C C U GG C CU U U U A G A G A C A U C A G A A G G C UGU A G A C A A A U A C U G G G A C A GCU A C A A C C AUC C C U U C A G A C A G G A U C A G A A G A A CUU A GAUCA U U AUA U A A U A C A A U A G C A G U C C U C U A U U G UGU G CA UC A A A G G A U A G A U G UAA A A G ACA C C A A G G A A G C C U U A G A U A A G A U AG A G GA A G A G C A A AAC A A A A G U A A G A A A A A G G C ACA GCA A GC AGCA GCU G A C A C A G G A A A C A ACA GCC A G G UCA GC CA A A A U U A C C C U A U A G U G C A G A A C C U C C A G G G G C A A AU G G UA C A U C A G G CCA U A UCA

1 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420 430 440 444

-5.02.4060000000000006

HIV

A UG GG U GC G A G A G C G U C G G U A U U A A G CGGG G GA G A A U U A G A U A A A U GG GA A A A A A U U C G GUUAA G G C CA GG GG GA A A G A A A C A A U A U A A A C U A A A ACA U A U A G U A U G G GCA A GCA G GGA G C U A G A A C G A U U C GCA G U U A A U C C U GG C CU U U U A G A G A C A U C A G A A G G C UGU A G A C A A A U A C U G G G A C A GCU A C A A C C AUC C C U U C A G A C A G G A U C A G A A G A A CUU A GAUCA U U AUA U A A U A C A A U A G C A G U C C U C U A U U G UGU G CA UC A A A G G A U A G A U G UAA A A G ACA C C A A G G A A G C C U U A G A U A A G A U AG A G GA A G A G C A A AAC A A A A G U A A G A A A A A G G C ACA GCA A GC AGCA GCU G A C A C A G G A A A C A ACA GCC A G G UCA GC CA A A A U U A C C C U A U A G U G C A G A A C C U C C A G G G G C A A AU G G UA C A U C A G G CCA U A UCA

1 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420 430 440 444

-5.02.4060000000000006

HIV

A UG GG U GC G A G A G C G U C G G U A U U A A G CGGG G GA G A A U U A G A U A A A U GG GA A A A A A U U C G GUUAA G G C CA GG GG GA A A G A A A C A A U A U A A A C U A A A ACA U A U A G U A U G G GCA A GCA G GGA G C U A G A A C G A U U C GCA G U U A A U C C U GG C CU U U U A G A G A C A U C A G A A G G C UGU A G A C A A A U A C U G G G A C A GCU A C A A C C AUC C C U U C A G A C A G G A U C A G A A G A A CUU A GAUCA U U AUA U A A U A C A A U A G C A G U C C U C U A U U G UGU G CA UC A A A G G A U A G A U G UAA A A G ACA C C A A G G A A G C C U U A G A U A A G A U AG A G GA A G A G C A A AAC A A A A G U A A G A A A A A G G C ACA GCA A GC AGCA GCU G A C A C A G G A A A C A ACA GCC A G G UCA GC CA A A A U U A C C C U A U A G U G C A G A A C C U C C A G G G G C A A AU G G UA C A U C A G G CCA U A UCA

1 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420 430 440 444

-5.02.4060000000000006

HIV

A UG GG U GC G A G A G C G U C G G U A U U A A G CGGG G GA G A A U U A G A U A A A U GG GA A A A A A U U C G GUUAA G G C CA GG GG GA A A G A A A C A A U A U A A A C U A A A ACA U A U A G U A U G G GCA A GCA G GGA G C U A G A A C G A U U C GCA G U U A A U C C U GG C CU U U U A G A G A C A U C A G A A G G C UGU A G A C A A A U A C U G G G A C A GCU A C A A C C AUC C C U U C A G A C A G G A U C A G A A G A A CUU A GAUCA U U AUA U A A U A C A A U A G C A G U C C U C U A U U G UGU G CA UC A A A G G A U A G A U G UAA A A G ACA C C A A G G A A G C C U U A G A U A A G A U AG A G GA A G A G C A A AAC A A A A G U A A G A A A A A G G C ACA GCA A GC AGCA GCU G A C A C A G G A A A C A ACA GCC A G G UCA GC CA A A A U U A C C C U A U A G U G C A G A A C C U C C A G G G G C A A AU G G UA C A U C A G G CCA U A UCA

1 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420 430 440 444

-5.02.4060000000000006

HIV

A UG GG U GC G A G A G C G U C G G U A U U A A G CGGG G GA G A A U U A G A U A A A U GG GA A A A A A U U C G GUUAA G G C CA GG GG GA A A G A A A C A A U A U A A A C U A A A ACA U A U A G U A U G G GCA A GCA G GGA G C U A G A A C G A U U C GCA G U U A A U C C U GG C CU U U U A G A G A C A U C A G A A G G C UGU A G A C A A A U A C U G G G A C A GCU A C A A C C AUC C C U U C A G A C A G G A U C A G A A G A A CUU A GAUCA U U AUA U A A U A C A A U A G C A G U C C U C U A U U G UGU G CA UC A A A G G A U A G A U G UAA A A G ACA C C A A G G A A G C C U U A G A U A A G A U AG A G GA A G A G C A A AAC A A A A G U A A G A A A A A G G C ACA GCA A GC AGCA GCU G A C A C A G G A A A C A ACA GCC A G G UCA GC CA A A A U U A C C C U A U A G U G C A G A A C C U C C A G G G G C A A AU G G UA C A U C A G G CCA U A UCA

1 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420 430 440 444

-5.02.4060000000000006

HIV

A UG GG U GC G A G A G C G U C G G U A U U A A G CGGG G GA G A A U U A G A U A A A U GG GA A A A A A U U C G GUUAA G G C CA GG GG GA A A G A A A C A A U A U A A A C U A A A ACA U A U A G U A U G G GCA A GCA G GGA G C U A G A A C G A U U C GCA G U U A A U C C U GG C CU U U U A G A G A C A U C A G A A G G C UGU A G A C A A A U A C U G G G A C A GCU A C A A C C AUC C C U U C A G A C A G G A U C A G A A G A A CUU A GAUCA U U AUA U A A U A C A A U A G C A G U C C U C U A U U G UGU G CA UC A A A G G A U A G A U G UAA A A G ACA C C A A G G A A G C C U U A G A U A A G A U AG A G GA A G A G C A A AAC A A A A G U A A G A A A A A G G C ACA GCA A GC AGCA GCU G A C A C A G G A A A C A ACA GCC A G G UCA GC CA A A A U U A C C C U A U A G U G C A G A A C C U C C A G G G G C A A AU G G UA C A U C A G G CCA U A UCA

1 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420 430 440 444

-5.02.4060000000000006

HIV

4-Conclusion & perspectives

We have obtained a set of models supported by our integrative approach, those models are subject to validation.

Some of centroid structures have shown high

compatibility with existing proposed structural models.

We will extend the approach to the simultaneous analysis of probing data for a set of RNA variants.

References

[1] Katherine E. Deigan, Tian W. Li, David H. Mathews, and Kevin M. Weeks.

Accurate SHAPE-directed RNA structure determination.

106(1):97–102, jan 2009.

[2] Lawrence CE. Ding.

A statistical sampling algorithm for RNA secondary structure prediction.

Nucleic Acids Research, 31(24):7280–7301, 2003.

[3] Matthew J Smola, Greggory M Rice, Steven Busan, Nathan A Siegfried, and Kevin M Weeks.

Selective 2’-hydroxyl acylation analyzed by primer extension and mutational profiling (SHAPE-MaP) for direct, versatile and accurate RNA structure analysis.

Nature protocols, 10(11):1643–1669, 2015.

[4] Michael Zuker and Patrick Stiegler.

Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information.

Nucleic Acids Research, 9(1):133–148, jan 1981.

Acknowledgments

PhD funded by the "Fondation pour la Recherche Médicale".

Contact address

saaidi.afaf@polytechnique.edu

Références

Documents relatifs

- Handlers. Handlers are methods that are exported to the user, rather than to other elements in the router configuration. They support simple, text- based read/write

Une cité pleine de charme 4 Chef-lieu de la Gruyère, une région connue dans le monde entier pour ses produits du terroir, sa gastronomie, son artisanat, son folklore, ses

La philosophie de la présente méthode d'assemblage des murs préfabriqués en béton repose sur le fait que le mur consiste d'un ensemble des panneaux de béton et des joints

After downloading the latest version of VARNA as a JAR archive VARNAvx-y.jar from http://varna.lri.fr, the default radial strategy can be used to draw an extended secondary

a) Transfer the resuspended RNAs to PCR tubes and prepare two PCR tubes with 6 pmol of RNA in 10 µl of water for the sequencing reactions. b) Add 1 µl of DMSO to the

λ -calculus and we are ultimately interested in the computational content of classical logic; second, the symmetry of left/right distinguished formulae of LKµ ˜µ allows to give two

Parmi les élèves reçus, calcule le pourcentage d'élèves sans mention en 2018, puis

We see in figure 4a that (Wo) is significantly higher for tRNA than for random sequences, and is close to for tRNA at 300 K, indicating that a real molecule will almost always be in