HAL Id: hal-02817901
https://hal.inrae.fr/hal-02817901
Submitted on 6 Jun 2020
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Integrating genetic and epidemiological data to determine transmission pathways of foot-and-mouth
disease virus.
Gael Thébaud, E.M. Cottam, - Wadsworth, J., J. Gloster, L. Mansley, D.J. Paton, D.P. King, D.T. Haydon
To cite this version:
Gael Thébaud, E.M. Cottam, - Wadsworth, J., J. Gloster, L. Mansley, et al.. Integrating genetic and epidemiological data to determine transmission pathways of foot-and-mouth disease virus.. Mathe-matics and InforMathe-matics in Evolution and Phylogeny, Jun 2008, Montpellier, France. �hal-02817901�
Integrating genetic and
epidemiological data to determine
virus transmission pathways
1 UMR BGPI (INRA-Montpellier)
2 Division of Environmental and
Evolutionary Biology (University of Glasgow)
3 Institute for Animal Health
(Pirbright)
4 Animal Health Divisional Office
(Perth)
Eleanor COTTAM 2,3, Gaël THÉBAUD 1,2, Jemma WADSWORTH 3,
John GLOSTER 3, Leonard MANSLEY4, David PATON 3,
Molecular epidemiology and directionality
Introduction
• Direction of transmission :
– reference (more or less implicit)
to additional information
• Genetic sequences:
– phylogeny
– clades / groups / types
• Comparison between genetic
similarity and
– geographic proximity – ecological zone
– host species – ...
• type of source and target individuals • transmission distances
• important or missing sources • likely transmission modes
• evolution during 1 transmission cycle
Accessible information
epidemiology
evolution
Why is directionality interesting?
Introduction
• Implications: - logical: - legal: source “target” cause≈ consequences≈ responsible≈ victim≈In theory, complete description of the epidemic In practice, data sets concerning few individuals
• parameterise
epidemiological models (e.g., network models)
• limit virus propagation
• multiscale models
Questions on FMDV
Introduction
• At which scale is there some viral genetic
polymorphism?
– animal, farm, disease focus?
• Can we use the observed polymorphism
to identify transmission chains? How?
• What is the reliability of veterinary
Biological system
• Foot-and-mouth disease virus outbreak (2001)
• 20 complete genomes (~10 kb each)
– 5 initial infections with a known history
– 15 farms from the same focus (Durham County)
10 km A K B D E C G I J M N O L P F 10 km A K B D E C G I J M N O L P F
• Positive-strand RNA virus:
– High mutation rate (~10-4 errors/nucleotide/replication) – Limited recombination
• Known root • 2 independent introductions • 4 groups 0 10 20 30 40 50 60 70 80 90 100 110 Day of outbreak A B C D M E N G I J F K L 1 2 3 5 4 O P 0 10 20 30 40 50 60 70 80 90 100 110 Day of outbreak A B C D M E N G I J F K L 1 2 3 5 4 O P 24 n uc leot ide s ubs ti tu ti ons S A R/ 19/ 200 0 24 n uc leot ide s ubs ti tu ti ons S A R/ 19/ 200 0 TCS 10 km A K B D E C G I J M N O L P F 10 km A K B D E C G I J M N O L P F
Genetic data
Genetic data
0 10 20 30 40 50 60 70 80 90 100 110 Day of outbreak A B C D M E N G I J F K L 1 2 3 5 4 O P 0 10 20 30 40 50 60 70 80 90 100 110 Day of outbreak A B C D M E N G I J F K L 1 2 3 5 4 O P 24 n uc leot ide s ubs t ti ons itu S A R/ 19/ 0 200 24 n uc leot ide s ubs t ti ons itu S A R/ 19/ 0 200 TCS • Known root • 2 independent introductions • 4 groups How to identify transmission history?• 1 known chain of transmissions 0 10 20 30 40 50 60 70 80 90 100 110 A B C D M E N G I J F K L 1 2 3 5 4 O P 0 10 20 30 40 50 60 70 80 90 100 110 A B C D M E N G I J F K L 1 2 3 5 4 O P 24 nucl eoti de s ubs tituti ons S A R/ 19/2000 24 nucl eoti de s ubs tituti ons S A R/ 19/2000 • 3 obvious transmissions
Which is the most likely transmission tree?
• What about the
other ones?? ? ? ? ? ? ? ? ?
Which is the most likely farm for each node ?
• Known root
Use of contact tracing data
Epidemiology
• L : Probability density
for latency (Γ)
• Ii : Probability density
for the infection date of farm i • Fi (t) : Probability for farm i to be infectious at date t 27 J anu ar y 2 001 0 3 Fe br uar y 200 1 1 0 Fe br uar y 200 1 1 7 Fe br uar y 200 1 2 4 Fe br uar y 200 1 03 M ar c h 2001 10 M ar c h 2001 17 M ar c h 2001 24 M ar c h 2001 31 M ar c h 20 01 07 A p ri l 2 001 14 A p ri l 2 001 21 A p ri l 2 001 28 A p ri l 2 001 05 M a y 200 1 12 M a y 200 1 19 M a y 200 1 26 M a y 200 1 02 J une 200 1 1 2 3 4 5 B C A D E F G I J K L M N O P 09 J une 200 1 27 J anu ar y 2 001 0 3 Fe br uar y 200 1 1 0 Fe br uar y 200 1 1 7 Fe br uar y 200 1 2 4 Fe br uar y 200 1 03 M ar c h 2001 10 M ar c h 2001 17 M ar c h 2001 24 M ar c h 2001 31 M ar c h 20 01 07 A p ri l 2 001 14 A p ri l 2 001 21 A p ri l 2 001 28 A p ri l 2 001 05 M a y 200 1 12 M a y 200 1 19 M a y 200 1 26 M a y 200 1 02 J une 200 1 1 2 3 4 5 B C A D E F G I J K L M N O P 09 J une 200 1
Animal movement ban
27 J anuar y 2001 0 3 Feb ruar y 20 01 1 0 Feb ruar y 20 01 1 7 Feb ruar y 20 01 2 4 Feb ruar y 20 01 03 M ar c h 20 01 10 M ar c h 20 01 17 M ar c h 20 01 24 M ar c h 20 01 31 M ar c h 2001 0 7 A pr il 2001 1 4 A pr il 2001 2 1 A pr il 2001 2 8 A pr il 2001 05 M a y 2001 12 M a y 2001 19 M a y 2001 26 M a y 2001 02 J une 2 001 09 J une 2 001 27 J anuar y 2001 0 3 Feb ruar y 20 01 1 0 Feb ruar y 20 01 1 7 Feb ruar y 20 01 2 4 Feb ruar y 20 01 03 M ar c h 20 01 10 M ar c h 20 01 17 M ar c h 20 01 24 M ar c h 20 01 31 M ar c h 2001 0 7 A pr il 2001 1 4 A pr il 2001 2 1 A pr il 2001 2 8 A pr il 2001 05 M a y 2001 12 M a y 2001 19 M a y 2001 26 M a y 2001 02 J une 2 001 09 J une 2 001 27 J 2 001 1 2 3 4 5 B C A D E F G I J K L M N O P 27 J 2 001 1 2 3 4 5 B C A D E F G I J K L M N O P
Epidemiology
• λij: Likelihood of
i Å { j rather than
another observed farm }
27 J anu ar y 2 001 0 3 Fe br uar y 200 1 1 0 Fe br uar y 200 1 1 7 Fe br uar y 200 1 2 4 Fe br uar y 200 1 03 M ar c h 2001 10 M ar c h 2001 17 M ar c h 2001 24 M ar c h 2001 31 M ar c h 20 01 07 A p ri l 2 001 14 A p ri l 2 001 21 A p ri l 2 001 28 A p ri l 2 001 05 M a y 200 1 12 M a y 200 1 19 M a y 200 1 26 M a y 200 1 02 J une 200 1 1 2 3 4 5 B C A D E F G I J K L M N O P 09 J une 200 1 27 J anu ar y 2 001 0 3 Fe br uar y 200 1 1 0 Fe br uar y 200 1 1 7 Fe br uar y 200 1 2 4 Fe br uar y 200 1 03 M ar c h 2001 10 M ar c h 2001 17 M ar c h 2001 24 M ar c h 2001 31 M ar c h 20 01 07 A p ri l 2 001 14 A p ri l 2 001 21 A p ri l 2 001 28 A p ri l 2 001 05 M a y 200 1 12 M a y 200 1 19 M a y 200 1 26 M a y 200 1 02 J une 200 1 1 2 3 4 5 B C A D E F G I J K L M N O P 09 J une 200 1
∑ ∑
∑
≠ = = = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⋅ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⋅ = n i k k k C C t i j C C t i ij I t F t I t F t k j i j 1 ) , min( 0 ) , min( 0 ) ( ) ( ) ( ) ( λAnimal movement ban
• L : Probability density
for latency (Γ)
• Ii : Probability density
for the infection date of farm i
• Fi (t) : Probability for
farm i to be infectious at date t
• λij: likelihood of i Å { j rather than another observed farm }
•
λ
ij can be computed for each transmission• Thus, for a complete transmission tree (k),
λ
k = Πλ
ijE L E L 1 {L,E} G I J F G I J F 2 {F,G} {1,2,K} K K 1728 trees Loglikelihood F requenc y -250 -200 -150 -100 0 5 1 0 15 20 2 5 30
λ
• And
λ
k can be computed for any tree… if all the possible trees can be enumerated
Æ Algorithm defining the possible trees by recurrence from the leaves back to the root
Genetics + epidemiology
All differing from contact tracing results
• Rescaled likelihood:
λ
’k =λ
k / Σλ
k• Which group of trees
represent 95% of the rescaled likelihood?
Which is the most likely group of trees?
4 trees
most likely tree
( # ) Number of distinct sources among the 4 most likely trees [ # ] Likelihood of the most probable transmission
Genetics + epidemiology
0 10 20 30 40 50 60 70 80 90 100 110 A B C D M E N G I J F K L 1 2 3 5 4 O P 0 10 20 30 40 50 60 70 80 90 100 110 A B C D M E N G I J F K L 1 2 3 5 4 O P 24 nu c leot ide s ubs ti tut ions S A R/ 19/ 2000 24 nu c leot ide s ubs ti tut ions S A R/ 19/ 2000 A K O L F
Genetics + epidemiology
Genetics + epidemiology
Spatial pattern
Short distance transmission 10 km A K B D E C G I J M N O L P F 10 km A K B D E C G I J M N O L P F MeanDistSim Fr eq ue nc y 4000 6000 8000 10000 0 5 00 0 1 00 00 1 50 00 2 00 00 P(1-sided) = 1.2 x 10-3 Mean distance 4850 m 7472 mConclusions
• The whole set of possible transmission trees is identified based on genetic data
• Their relative likelihood is evaluated based on epidemiological data
• Interesting method for real-time forensic applications
• Identifying the tree root
• Dealing with censoring / sampling issues • Weighting different sources of information
Difficulties
Summary
Cottam E.M. et al. (2008) Integrating genetic and epidemiological data to determine transmission pathways of foot-and-mouth disease virus. Proc. R. Soc. B 275: 887-895.