HAL Id: hal-01216782
https://hal.inria.fr/hal-01216782
Submitted on 17 Oct 2015
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Evolution of genes neighborhood within reconciled phylogenies: an ensemble approach
Cedric Chauve, Yann Ponty, João Paulo Pereira Zanetti
To cite this version:
Cedric Chauve, Yann Ponty, João Paulo Pereira Zanetti. Evolution of genes neighborhood within reconciled phylogenies: an ensemble approach. RECOMB-CG’14, Oct 2014, Cold Spring HArbour, United States. �hal-01216782�
Evolution of genes neighborhood within reconciled phylogenies:
an ensemble approach
Cedric Chauve (SFU), Yann Ponty (CNRS/LIX/PIMS), Jo˜ ao Paulo Pereira Zanetti (SFU/U. Campinas)
Background: DeCo
Reconstructing ancestral genome features is a classical comparative genomics problem, often addressed with dynamic programming (DP) algorithms. A DP algorithm, called DeCo , was recently introduced for computing parsimonious evolutionary scenarios for gene adjacencies in a duplication-aware framework, motivated by the reconstruction of ancestral gene orders [B´erard et al., 2012 ].
Results: DeClone
We describe DeClone , an extension of DeCo, using Advanced Dynamic Program- ming techniques, that allows the sampling of adjacency forests under the Boltzmann distribution , as well as the computation of probabilities of presence/absence of an- cestral gene adjacencies, again under the Boltzmann distribution.
DeCo
Input:
• Reconciled gene trees G 1 and G 2 ;
• Set of extant adjacencies.
Output:
• Max-parsimony adjacency forest
Parsimony: #adjacencies gains/breaks
A B C
A
1L
BL
AB
1A
2B
2S
C
1C
2C
3C
4C
5A
3B
3A
4B
4G 1
C
6C
7C
8A
1A
3A
2A
4B
2B
4G 2
C
1C
6C
3C
7C
4C
7C
5C
8B
1B
3F
(Score=1)
Min. Parsimony
Dynamic programming algorithm (excerpt):
. . .
c 1 (v 1 , v 2 ) = min
c 1 (ca (v 1 ), cb (v 2 )) + c 1 (cb (v 1 ), ca (v 2 ))
c 1 (ca (v 1 ), cb (v 2 )) + c 0 (cb (v 1 ), ca (v 2 )) + Break c 0 (ca (v 1 ), cb (v 2 )) + c 1 (cb (v 1 ), ca (v 2 )) + Break c 0 (ca (v 1 ), cb (v 2 )) + c 0 (cb (v 1 ), ca (v 2 )) + 2Break c 1 (ca (v 1 ), ca (v 2 )) + c 1 (cb (v 1 ), cb (v 2 ))
c 1 (ca (v 1 ), ca (v 2 )) + c 0 (cb (v 1 ), cb (v 2 )) + Break c 0 (ca (v 1 ), ca (v 2 )) + c 1 (cb (v 1 ), cb (v 2 )) + Break c 0 (ca (v 1 ), ca (v 2 )) + c 0 (cb (v 1 ), cb (v 2 )) + 2Break
. . .
DeCo+Advanced Dynamic Programming=DeClone
The whole (sub)optimal space is modelled as a Boltzmann Ensemble
A
1L
BL
AB
1A
2B
2G 1
C
1C
2C
3C
4C
5A
3B
3A
4B
4G 2
C
6C
7C
8Score: 1
Score: 2
Score: 2
Score: 3
···
Boltzmann Distribution P (F ) ∼ e
−Score(F)kTAnalysis
Average properties
Ancestral adjacencies Synthenic inconsistencies
Recurrent patterns . . .
Max. expected accuracy predictions
Robustness assessment . . .
Suboptimality vs combinatorics
0 2000 4000 6000 8000 10000 12000 14000 16000 18000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
#Adjacency forests
Score
Score vs #Adjacency Forests
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
1 6 11 16
Probability
Score
Score vs Boltzmann probability (Density of states)
kT=0.1 kT=1 kT=10
Impact of pseudo-temperature kT on predicted adjacencies
G1 (Horiz.)
1: B2 2: C3 3: A2 4: C4 5: B1 6: C2 7: LA 8: C5 9: LB 10: C1 11: A1
G2 (Vert.)
1: B4 2: C7 3: A4 4: C8 5: B3 6: C6 7: A3
kT=1
kT=0.1 kT=10
Adjacency probability010.250.50.75
Temperature
Co-optimals Uniform
(kT → 0) (kT →∞ )
1 2
3 4
5 6
7 8
9 10
11 13 2 4
5 67
1 2
3 4
5 6
7 8
9 10
11 13 2 4
5 67
1 2
3 4
5 6
7 8
9 10
11 13 2 4
5 67