• Aucun résultat trouvé

Integrating Sequence and Topology for Efficient and Accurate Detection of Horizontal Gene Transfer

N/A
N/A
Protected

Academic year: 2022

Partager "Integrating Sequence and Topology for Efficient and Accurate Detection of Horizontal Gene Transfer"

Copied!
24
0
0

Texte intégral

(1)

Integrating Sequence and Topology for Efficient and Accurate

Detection of Horizontal Gene Transfer

Cuong Than, Guohua Jin, and Luay Nakhleh Department of Computer Science

Rice University

(2)

Gene Trees Within the

Branches of a Species Tree

(3)

Horizontal Gene Transfer

and Tree Incongruence

(4)

Horizontal Gene Transfer Detection

A B C D A B C D

species tree gene tree

(5)

Horizontal Gene Transfer Detection

A B C D A B C D

species tree gene tree

A B C D

X Y

(6)

Multiple Solutions for HGT Detection

C

B D E F

A A B C D E F

species tree gene tree

(7)

Multiple Solutions for HGT Detection

C

B D E F

A A B C D E F A B C D E F

scenario 1 scenario 2 scenario 3

(8)

How can we improve the

accuracy of HGT detection?

(9)

Our Method: Assessing the Support of HGT Edges

Assign a support value to each HGT edge

Branches of the gene tree often have bootstrap value

Use those bootstrap values to evaluate HGT edge support

(10)

HGT Edge Support: Example

A B C D E

X Y

Z

A C B D E

Y X Z

gene tree species tree + HGT edge

70 90

60

Support for HGT edge (x, y): max{60, 90, 70} = 90

(11)

HGT Edge Support: Algorithm

Create a network N from the species tree by adding HGT edges

From N, create two trees:

ST’: Keep HGT edges, while removing the other edges

ST’’: Remove HGT edges, while keeping the other edges

(12)

HGT Edge Support: Algorithm

For HGT edge X →Y, find the moving clade P and its sister clade Q

Finding P: let P = LST’(Y)

Finding Q:

Let Y’ = Y

Let p = parent of Y’ in ST’’

If LST’(p) ≠∅, then Q = LST’(p)

If not, let Y’ = p, and repeat

Support = max of bootstrap values of edges from P to Q

(13)

HGT Edge Support:

Illustration

h1

h3 h2

C

A B D E F

u v

x w

y z

r

A B C

u v y x

D E F

r w z

(14)

A G G A A

A

G

Network Parsimony

A G G A

A A

A

A G G A

A A

A

Network parsimony = 1

(15)

Parsimony-based HGT Detection

Input: Sequences for a group of species

Output: A network (tree + HGT edges)

Optimization criterion: Smallest network parsimony score

(16)

Parsimony-based HGT Detection

Accurate, as shown in both simulated and biological data

Slow, because it has to examine all possible networks

(17)

Topology-based HGT Detection

Fast in computing HGT edges

Return many false positives

(18)

Our Method: Combining Topology and Parsimony-

based Methods

RIATA-HGT

Species tree

networks

NEPAL

Sequences

Network, MP Gene trees

Significance of HGTs Selected

(19)

TopSeq Algorithm

(20)

Experimental Data

The biological data by Bergthorsson

HGT transfer to Amborella

20 genes

Donors: Bryophytes, Moss, Eudicots, and Angiosperms

(21)

Experimental Results

(22)

Experimental Results

NEPAL: 12 out of 13 HGTs were found

RIATA-HGT: 12 out of 13 HGTs were found (the missing HGT is different)

The support for 12 HGT edges were high, over 95%

(23)

Conclusions

A method for assessing the support of HGT edges

Integration of sequence-based and

topology-based methods gives promising performance

(24)

Thank You!

http://bioinfo.cs.rice.edu

Références

Documents relatifs

2 L'état des résultats de recherche et les réponses seront acheminés à un jury composé d’acteurs de l’éducation qui les analyseront et poseront à leur tour des questions

The coding region of the PRNP gene from 209 Iberian red deer samples collected in North-East Spain showed three single-nucleotide polymorphisms (SNPs): a silent SNP at position

Il manque actuellement de données sur la répartition des stades de la maladie chez les cas diagnostiqués de syphilis au Canada. En effet, la plupart des études

Nous pouvons généraliser cette conclusion à notre contexte en matière de prévention et d’usage de bonne pratique, en s’appuyant sur les dires de l’auteur qui

In this issue of Molecular Autism, Soorya and colleagues evaluated 32 patients with Phelan-McDermid syndrome, caused by either deletion of 22q13.33 or SHANK3 mutations,

In this pilot study, we integrated virtual labs developed by Labster into four biology-related undergraduate courses at a Canadian university and investigated students’ perceptions

[r]

From the coded data on the learning experiences and the accompanying situations, five themes emerged: (a) how the coaches began to work together and learn from one