• Aucun résultat trouvé

Evolution of genes neighborhood within reconciled phylogenies: an ensemble approach

N/A
N/A
Protected

Academic year: 2021

Partager "Evolution of genes neighborhood within reconciled phylogenies: an ensemble approach"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-01216782

https://hal.inria.fr/hal-01216782

Submitted on 17 Oct 2015

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Evolution of genes neighborhood within reconciled phylogenies: an ensemble approach

Cedric Chauve, Yann Ponty, João Paulo Pereira Zanetti

To cite this version:

Cedric Chauve, Yann Ponty, João Paulo Pereira Zanetti. Evolution of genes neighborhood within reconciled phylogenies: an ensemble approach. RECOMB-CG’14, Oct 2014, Cold Spring HArbour, United States. �hal-01216782�

(2)

Evolution of genes neighborhood within reconciled phylogenies:

an ensemble approach

Cedric Chauve (SFU), Yann Ponty (CNRS/LIX/PIMS), Jo˜ ao Paulo Pereira Zanetti (SFU/U. Campinas)

Background: DeCo

Reconstructing ancestral genome features is a classical comparative genomics problem, often addressed with dynamic programming (DP) algorithms. A DP algorithm, called DeCo , was recently introduced for computing parsimonious evolutionary scenarios for gene adjacencies in a duplication-aware framework, motivated by the reconstruction of ancestral gene orders [B´erard et al., 2012 ].

Results: DeClone

We describe DeClone , an extension of DeCo, using Advanced Dynamic Program- ming techniques, that allows the sampling of adjacency forests under the Boltzmann distribution , as well as the computation of probabilities of presence/absence of an- cestral gene adjacencies, again under the Boltzmann distribution.

DeCo

Input:

• Reconciled gene trees G 1 and G 2 ;

• Set of extant adjacencies.

Output:

• Max-parsimony adjacency forest

Parsimony: #adjacencies gains/breaks

A B C

A

1

L

B

L

A

B

1

A

2

B

2

S

C

1

C

2

C

3

C

4

C

5

A

3

B

3

A

4

B

4

G 1

C

6

C

7

C

8

A

1

A

3

A

2

A

4

B

2

B

4

G 2

C

1

C

6

C

3

C

7

C

4

C

7

C

5

C

8

B

1

B

3

F

(Score=1)

Min. Parsimony

Dynamic programming algorithm (excerpt):

. . .

c 1 (v 1 , v 2 ) = min

 

 

 

 

 

 

 

 

 

 

 

 

c 1 (ca (v 1 ), cb (v 2 )) + c 1 (cb (v 1 ), ca (v 2 ))

c 1 (ca (v 1 ), cb (v 2 )) + c 0 (cb (v 1 ), ca (v 2 )) + Break c 0 (ca (v 1 ), cb (v 2 )) + c 1 (cb (v 1 ), ca (v 2 )) + Break c 0 (ca (v 1 ), cb (v 2 )) + c 0 (cb (v 1 ), ca (v 2 )) + 2Break c 1 (ca (v 1 ), ca (v 2 )) + c 1 (cb (v 1 ), cb (v 2 ))

c 1 (ca (v 1 ), ca (v 2 )) + c 0 (cb (v 1 ), cb (v 2 )) + Break c 0 (ca (v 1 ), ca (v 2 )) + c 1 (cb (v 1 ), cb (v 2 )) + Break c 0 (ca (v 1 ), ca (v 2 )) + c 0 (cb (v 1 ), cb (v 2 )) + 2Break

. . .

DeCo+Advanced Dynamic Programming=DeClone

The whole (sub)optimal space is modelled as a Boltzmann Ensemble

A

1

L

B

L

A

B

1

A

2

B

2

G 1

C

1

C

2

C

3

C

4

C

5

A

3

B

3

A

4

B

4

G 2

C

6

C

7

C

8

Score: 1

Score: 2

Score: 2

Score: 3

···

Boltzmann Distribution P (F ) ∼ e

Score(F)kT

Analysis

Average properties

Ancestral adjacencies Synthenic inconsistencies

Recurrent patterns . . .

Max. expected accuracy predictions

Robustness assessment . . .

Suboptimality vs combinatorics

0 2000 4000 6000 8000 10000 12000 14000 16000 18000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

#Adjacency forests

Score

Score vs #Adjacency Forests

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

1 6 11 16

Probability

Score

Score vs Boltzmann probability (Density of states)

kT=0.1 kT=1 kT=10

Impact of pseudo-temperature kT on predicted adjacencies

G1 (Horiz.)

1: B2 2: C3 3: A2 4: C4 5: B1 6: C2 7: LA 8: C5 9: LB 10: C1 11: A1

G2 (Vert.)

1: B4 2: C7 3: A4 4: C8 5: B3 6: C6 7: A3

kT=1

kT=0.1 kT=10

Adjacency probability010.250.50.75

Temperature

Co-optimals Uniform

(kT → 0) (kT →∞ )

1 2

3 4

5 6

7 8

9 10

11 13 2 4

5 67

1 2

3 4

5 6

7 8

9 10

11 13 2 4

5 67

1 2

3 4

5 6

7 8

9 10

11 13 2 4

5 67

distribution

Algorithmic aspects

Algorithmic framework [Ponty/Saule, 2011]

Dyn. Prog. equations → Weighted Hypergraphs

c 1 (v 1 , v 2 ) = min

 

 

 

 

 

c 1 (ca(v 1 ), cb(v 2 )) + c 1 (cb(v 1 ), ca(v 2 ))

c 1 (ca(v 1 ), cb(v 2 )) + c 0 (cb(v 1 ), ca(v 2 )) + Break ...

c 1 (v 1 , v 2 )

c 1 (cb(v 1 ), ca(v 2 )) c 1 (ca(v 1 ), cb(v 2 )) c 0 (cb(v 1 ), ca(v 2 ))

Break

Deco: MaxPars(q ) = min

e=q→(q 1 ,q 2 ...)

w(e) + X

i

MaxPars(q i )

!

Claim: The DeCo DP scheme is unambiguous and complete

Algorithmic corollaries

• Computing the partition function Z in O (|G 1 ||G 2 |) Simple change of algebra (min, +, C ) → ( P

, ×, e kT C )

• Boltzmann sampling of adjacency forests

Each hyperedge chosen with probability proportional to its overall contribution to Z .

• Adjacency dot-plot: O(|G 1 ||G 2 |) inside/outside algo- rithm computes the probability of ancestral adjacencies.

Experiments

Data: 6, 074 DeCo instances, with genes taken from 36 mammalian genomes from the Ensembl database in 2012. Syntenic inconsistencies with parsimonious scenarios concern 5, 817 , genes over 112, 188 ancestral and extant adjacencies.

Methods: for each instance, we sampled 1, 000 ad- jacency forests under a Boltzmann distribution that favours highly co-optimal scenarios.

Results:

Adj. freq. Anc. genes Anc. adj. Synt. Inc.

≥ 0.3 118,687 110,180 10,351

≥ 0.4 117,639 106,973 8,330

≥ 0.5 116,231 103,479 6,677

≥ 0.6 114,538 99,720 5,113

≥ 0.7 112,564 96,039 4,092

≥ 0.8 110,086 91,821 3,276

≥ 0.9 107,564 87,790 2,710

= 1 100,443 79,078 1,348

Discussion

DeClone allows a controlled exploration of the space of all gene adjacency evolution scenarios: exhaustive enumeration, sampling, probabilities computation.

Experiments show that exploring the whole solution space under a Boltzmann distribution biased toward co-optimals allows to reduce significantly the number of syntenic inconsistencies observed when a single ar- bitrary optimal scenario is considered.

References

S. B´ erard, et al.. 2012. Evolution of gene neighbor- hoods within reconciled phylogenies. Bioinformatics, 28(18):382–388.

Y. Ponty and C. Saule. 2011. A Combinatorial Frame- work for Designing (Pseudoknotted) RNA Algorithms.

WABI 2011, pp. 259–269.

1

Références

Documents relatifs

Next, to evaluate the stability of the total number of evolutionary events inferred by parsimonious adjacency forests, we recorded two counts of evolutionary events for each

First of all, reduction of the band-gap (E,) will favour the thermal excitation of charge camers to the conduction band in the neutral state of the conducting polymer, and thus

The rwne oCthe principal and school and school boards have been changed (to Dr M. Gregory's School, Roman Catholic SChool Board, and the Interdc:nominarional School Board) as a means

The Best-Performing Genes Identified for Fungal Phylogenies The phylogeny of fungal species inferred here using a genome-wide sampling of orthologous proteins had the same topology

apparaisse une infinité de fois dans l'écriture décimale de racine de 2, nous sommes très loin de pouvoir le démontrer... w

Cela dit, pour une large classe d’équations d’évolution non-linéaires en profitant de la faiblesse du concept de solutions au sens des distributions, on peut construire

Existing health and environment monitoring Monitoring of micro-level environmental conditions Despite the fact that over 40% of the population are known to be living in

Ne recevant pas de répons», après avoir frappé, il entra at recula épouvanté, quand il aperçut auprès delà porte ie corps de i’iifur- tuné bourrelier qui se