Article
Reference
Genetic evidence for complexity in ethnic differentiation and history in East Africa
POLONI, Estella S., et al.
Abstract
The Afro-Asiatic and Nilo-Saharan language families come into contact inWestern Ethiopia.
Ethnic diversity is particularly high in the South, where the Nilo-Saharan Nyangatom and the Afro-Asiatic Daasanach dwell. Despite their linguistic differentiation, both populations rely on a similar agripastoralist mode of subsistence. Analysis of mitochondrial DNA extracted from Nyangatom and Daasanach archival sera revealed high levels of diversity, with most sequences belonging to the L haplogroups, the basal branches of the mitochondrial phylogeny. However, in sharp contrast with other Ethiopian populations, only 5% of the Nyangatom and Daasanach sequences belong to haplogroups M and N. The Nyangatom and Daasanach were found to be significantly differentiated, while each of them displays close affinities with some Tanzanian populations. The strong genetic structure found over East Africa was neither associated with geography nor with language, a result confirmed by the analysis of 6711 HVS-I sequences of 136 populations mainly from Africa. Processes of migration, language shift and group absorption are documented by linguists [...]
POLONI, Estella S., et al. Genetic evidence for complexity in ethnic differentiation and history in East Africa. Annals of Human Genetics, 2009, vol. 73, no. 6, p. 582-600
DOI : 10.1111/j.1469-1809.2009.00541.x
Available at:
http://archive-ouverte.unige.ch/unige:2714
Disclaimer: layout of this document may differ from the published version.
Supporting Information for ms :
Genetic evidence for complexity in ethnic differentiation and history in East Africa by Estella S. Poloni, Yamama Naciri, Rute Bucho, Régine Niba, Barbara Kervaire, Laurent Excoffier, André Langaney and Alicia Sanchez‐Mazas
Text S1. Supporting Information on Materials and Methods and on Results.
References in Supporting Texts, Tables and Figures.
Figure S1. Median joining networks of mitochondrial DNA haplotypes, modified from Kivisild et al.
(2004), so as to include the Nyangatom, Daasanach and Turkana sequences from the LORV sample.
Figure S2. Map with approximate geographic location of the 136 population samples included in the HVS‐I database.
Figure S3. Three successive Principal Coordinates Analyses (PCA) of pairwise Reynolds’ genetic distances among populations computed from HVS‐I sequence‐based ΦST indices.
Table S1. Scored mutated positions in the LORV sequences and classification into haplogroups.
Table S2. The 136 published and new population samples of the mtDNA HVS‐I sequences database used in this study (from Africa, the Middle East and West Asia, and the South of Europe).
Table S3. TMRCA estimates for sub‐haplogroups observed in the LORV samples that included at least five sequences.
Table S4. HVS‐I genetic diversity indices, results of the tests of selective neutrality and population equilibrium, and results of the tests of population expansion performed on the 136 population.
Text S1
Supporting Information on Materials and Methods.
Samples
During the 1971‐1972 sampling campaigns, taking of blood samples with the individuals consent was performed in 5‐10 ml Vacutainer sterile tubes in several settlements of the southernmost section of the Omo River Valley (Southwestern Ethiopia):
‐ the Nyangatom samples were taken in 3 settlements on the banks of the Omo River (Kagnikagn, Shangura and Weleso) and in 6 settlements on the eastern bank of the Kibish River on the Ethiopia‐Sudan boder, directly north of the Nakua Mountains (Lopokori, Nawiapua, Lokwamugnan, Nawiatir, Natir and Rukruk);
‐ the Daasanach were taken in 3 hamlets in the Kalam region (Handall, Rate and Kalam‐
mission), in the village of Nyamorlu, located in the delta of the Omo River, and among the
Daasanach workers in the camp of the French component of the IORE (International Omo Research Expedition);
‐ the Turkana samples were taken in 4 villages located along the western bank of Lake Turkana (Kalokol, Lokitanyella, Kataboi and Loarygnak).
Maps and more details on the sampling are provided in Rodhain et al. (1972, 1975).
Blood fractions were separated by simple decantation and conserved at 4ºC for 2‐3 weeks before being shipped to the Pasteur Institute in Paris. There, the serum samples were kept in a freezer at ‐ 20ºC, unless they were de‐frozen for the purpose of epidemiological studies. This happened about 2 or 3 times since the early seventies. The remaining collection, on which we have worked, was made up of 2 ml of serum per sample on average.
Obviously, since the aim of the 1971‐1972 sampling campaigns was to investigate the geographic spread of a yellow fever epidemic that occurred some ten years earlier, the sampling strategy did not avoid related individuals. The field notes of these campaigns contain detailed information on each sampled individual, among which also kinship relationships (parents, spouses, children,
uncles/nephews). However, it is possible that this last information was not provided or checked in a systematic fashion, and even less so for the maternal line (since family membership of women in these patrilinear societies is subordinated to that of men). Thus, besides those cases where kinship relationships were explicitly noted, we cannot exclude relatedness between other individuals. This is even more probable in the case of small populations, such as the Nyangatom and the Daasanach, in which complex long‐term kinship relations are expected. However, we note that such kinship relationships, if overrepresented in the collection, should bias the estimations of diversity levels inferred for these populations in a conservative way, i.e. by reducing them.
Note that we avoided splitting the samples according to the sampling locations listed above because the field notes contain numerous entries specifying that individuals (and particularly so, women) came from another location than the one they were sampled in.
Nested PCRs
A first PCR (PCR1) was carried out to amplify the whole control (D‐loop) region (∼1274 bp) using primers L15888 and H592, and PCR1 products were directly used as templates for two separate PCR2 rounds specific to each locus with primers H16401 and L15996 for HVS‐I and primers H408 and L16552 for HVS‐II. A second approach was used on samples that did not amplify using the first method, probably because the DNA was too degraded to allow the amplification of the whole D‐loop during PCR1. A first PCR (PCR1) was therefore performed that targeted each of HVS‐I and HVS‐II separately with primer pairs L15988/H16410 for HVS‐I and L16521/H470 for HVS‐II, and PCR2 used the same primers as in the first approach.
SNP genotyping
A 289bp fragment including position 3594 was amplified using primers L3441 (5'ACT ACA ACC CTT CGC TGA CG3') and H3709 (5'TGA GAT TGT TTG GGC TAC TGC3') and used to classify individuals into haplogroups L3 (3594C) or L1/L2 (3594T), after enzymatic digestion using HpaI. A 279bp fragment including positions 10810 and 10873 was amplified using primers L10759 (5'AAT GCT AAA ACT AAT
CGT CC3') and H11018 (5'TTT TTT CGT GAT AGT GGT TC3') that allowed to discriminate between haplogroup L1 (10810C) or L2 (10810T) sequences on one hand (digestion with HinfI for 10810, and with MnlI for 10873), and on the other hand, to assign to haplogroup N (10873T) those sequences that were previously classified as L3. Finally a 219bp fragment including position 10400 was amplified using primers L10267 (5'CCC TCC TTT TAC CCC TAC CA3') and H10466 (5'TGT AAA TGA GGG GCA TTT GG3') and used to classify into haplogroup M (10400T) those sequences that were previously
classified as L3 and non‐N (digestion with AluI). SNPs 16230 and 16390 were additionally scored using the HVS‐I sequencing records. The first one was used to characterize haplogroup L0 (16230G) within the L1 lineage and the second one (16390A) to further confirm the classification into haplogroup L2 of those individuals that were formerly genotyped as 3594T and 10810T. Because of difficulties in amplifying the fragment including the two positions 10810 and 10873, two additional primer pairs were used: L10687 (5'TGG GCC TAG CCC TAC TAG TCT3') and H10839 (5'AAT TAG GCT GTG GGT GGT TG3') for SNP 10810 (172bp), and L10831 (5'AAT CAA CAC AAC CAC CCA CA3’) and H10968 (5'TGT GAG GGG TAG GAG TCA GG3') for SNP 10873 (157bp).
Supporting Information on Results.
Eliminating contamination suspicions
We considered that contamination could not be excluded when sets of identical sequences were obtained during the same extraction, amplification and/or sequencing sessions. This procedure ended up with 80 HVS‐I and HVS‐II sequences (representing 69 individuals) for which the possibility of contamination could not be ruled out on the basis of these criteria. For 27 individuals, serum was still available, and we tested these samples again, starting with newly serum‐extracted DNA. In one sample the initial sequence was not reproduced and thus contamination suspicion could not be ruled out (obviously, this sample was eliminated from the definitive data set). The proportion of sequences for which contamination by another sample could not be ruled out was 3.70% (1/27), which leads to the conservative assumption that about three sequences (80 x 3.70%) out of the 362 sequences of the definitive data set could reflect contamination, i.e. ∼0.8% of the total data set.
Success scores of HVS sequencing and SNPs genotyping in the LORV samples
Obviously, sequencing of the two hypervariable mtDNA segments and genotyping of the 4 SNPs was not successful for all the samples. Detailed success scores were as follows:
‐ HVS‐I, HVS‐II and the 4 SNPs: 28 Nyangatom, 8 Daasanach and 2 Turkana;
‐ HVS‐I, HVS‐II and 3 SNPs (missing result for SNP 10810): 8 Nyangatom and 1 Daasanach;
‐ HVS‐I, HVS‐II and 3 SNPs (missing result for SNP 10400): 1 Nyangatom and 2 Daasanach;
‐ HVS‐I, HVS‐II and 2 SNPs (missing result for SNPs 10810 and 10400): 1 Daasanach;
‐ HVS‐I and the 4 SNPs: 8 Nyangatom;
‐ HVS‐I and 3 SNPs (missing result for SNP 10810): 3 Nyangatom;
‐ HVS‐I and 2 SNPs (missing result for SNPs 10810 and 10400): 1 Nyangatom and 1 Daasanach;
‐ HVS‐II and the 4 SNPs: 11 Nyangatom;
‐ HVS‐I and HVS‐II: 36 Nyangatom, 25 Daasanach and 4 Turkana;
‐ only HVS‐I: 27 Nyangatom, 11 Daasanach and 4 Turkana;
‐ only HVS‐II: 52 Nyangatom, 10 Daasanach and 1 Turkana;
‐ only the 4 SNPs: 7 Nyangatom, 1 Daasanach and 1 Turkana.
The final data set of sequences was thus constituted of 112 Nyangatom, 49 Daasanach and 10 Turkana HVS‐I sequences of about 400 bp, and 136 Nyangatom, 47 Daasanach and 7 Turkana HVS‐II sequences of about 460 bp.
We identified only two cases of maternal kinship in the dataset, i.e. between samples N_182 and N_192, both bearing the same HVS‐I and HVS‐II (L2(pre)b) sequences, and between samples N_229 and N_234, both bearing the same HVS‐I and HVS‐II (L5a2) sequences (Table S1). However, as already explained above, other cases cannot be excluded. Consequently, we decided to conserve these 4 individuals in all the analyses.
The sampling locations represented by this dataset were as follows:
‐ Nyangatom (HVS‐I/HVS‐II): Nawiapua (27/15), Shangura (23/23), Nawiatir (20/24),
Lokwamugnan (13/31), Weleso (12/16), Rukruk (10/20), Lopokori (5/5) and Kagnikagn (2/2);
‐ Daasanach (HVS‐I/HVS‐II): Handall (27/25), Nyamorlu (11/10), Kalam‐mission (4/4) and Rate (2/2), as well as 5/6 unspecified samples that were taken from Daasanach workers in the French IORE camp;
‐ Turkana (HVS‐I/HVS‐II): Kalokol (4/5), Kataboi (3/0), Lokitanyella (2/1) and Loarygnak (1/1).
Genetic diversity of HVS sequences and RFLP in the LORV samples
The HVS‐I sequence alignment used to compare the LORV samples spanned from np 15997 to 16399, excluding the stretch between 16182 and 16193. For HVS‐II, we compared the sequences from np 16529 to 413, excluding a few indel positions in poly‐nucleotide tracks (nps 308, 309, 315, 356).
Ambiguous nucleotides (Y and R) were parsimoniously corrected according to sequences with which they shared all other mutations relative to the revised Cambridge Reference Sequence (rCRS, Andrews et al., 1999), i.e. three ambiguities in one HVS‐I sequence each (Table S1), and seven ambiguities in one HVS‐II sequence each. Thus we considered that heteroplasmy was implausible in these cases, except for the Nyangatom sequence number 380 ( np 16223) for which no close
matching haplotype was found. In HVS‐II, two mutations were found completely fixed with respect to the rCRS: np 16532, deleted in all the LORV samples, and a A to G transition at np 263.
Similar levels of diversity were observed in the three LORV samples, both for HVS‐I and HVS‐II. Gene diversity, mean number of pairwise differences (MNPD) and nucleotide diversity (π) in HVS‐I were of 0.985 (±0.004), 8.98 (±4.17) and 0.023 (±0.0118) in the Nyangatom, of 0.979 (±,0.009), 9.44 (±4.41) and 0.0241 (±0.0125) in the Daasanach, and of 0.978 (±0.054), 9.31 (±4.68) and 0.0238 (±0.0135) in the Turkana. The corresponding indices in HVS‐II were of 0.975 (±0.005), 5.32 (±2.58) and 0.0118 (±0.0064) in the Nyangatom, of 0.978 (±,0.012), 5.61 (±2.74) and 0.0125 (±0.0068) in the Daasanach, and of 1.0 (±0.076), 6.43 (±3.47) and 0.0143 (±0.0088) in the Turkana. Similarly, for both segments, haplotype richness in the Nyangatom (R(g) = 36.8, with g = 49 for HVS‐I, and R(g) = 31.9, with g = 47 for
HVS‐II) was equivalent to the observed number of haplotypes in the Daasanach (i.e. 34 and 35 respectively). As expected, the molecular diversity in HVS‐II was lower than that of HVS‐I in the three LORV samples, although the usable sequence was slightly longer for HVS‐II than for HVS‐I (391).
Indeed, both the number of polymorphic sites and the transition/transversion ration averaged over populations were twice as high in HVS‐I (92 and 11.0, respectively) than in HVS‐II (44 and 5.6, respectively). Finally, HVS‐I and HVS‐II unimodal mismatch distributions were observed in the Nyangatom and the Daasanach.
For all three samples and both D‐loop segments, Tajima’s D was negative but not significant.
Conversely, Fu’s Fs was found negative and highly significant both in the Nyangatom and the Daasanach (respectively, ‐24.6 and ‐15.2 for HVS‐I, ‐25.3 and ‐25.3 for HVS‐II, all values with
P<0.0005). For the Turkana, this last statistic did not reach significance, but this can be due to its very small sample size.
Surprisingly, genetic differentiation between the Nyangatom and the Daasanach was significant for HVS‐I (ΦST = 4.1%, P < 0.0001), but not for HVS‐II (ΦST = 0.4%, P = 0.218). On another hand, the small sample size of the Turkana could again explain that they were not found significantly differentiated either from the Nyangatom or from the Daasanach for both segments.
Classification of the LORV HVS‐I sequences into haplogroups
The classification of the LORV sequences is shown in Table S1 and Figure S1. Thirty‐six sequences were classified as belonging to the L0 haplogroup, , making up 16%, 33% and 20% of the Nyangatom, Daasanach and Turkana sequence samples, respectively (Figure S1a). Among these, we observed nine L0f sequences, eight of which are Daasanach and share the 16129‐16169‐16187‐16189‐16223‐
16230‐16278‐16311‐16327‐16368 motif with the single Amhara L0f sequence of Kivisild et al. (2004).
Twenty‐seven LORV sequences were initially assigned to L0a, ten of which were classified into the L0a1 clade and twelve into L0a2. As in Kivisild et al. (2004) we also observed 5 LORV sequences that remained unclassified at the L0a level, four of which (3 Nyangatom and 1 Daasanach) also lacked the 16188 transversion generally used to pinpoint to L0a sequences. Interestingly, these latter 4
sequences, along with the single Tigrai sequence of Kivisild et al. (2004), seem to build up yet another L0 clade, actually a sister‐clade of L0a, which we propose to label L0g. No L0d or L0k sequences were observed in the LORV samples.
We assigned only two LORV sequences to the L1 haplogroup, one of these being a Daasanach L1b sequence, the other one a Turkana L1c. This latter sequence harbors the 16293 substitution that defines the L1c1 clade in Salas et al. (2002) and it could belong to the L1c1b sub‐clade according to the classification of Batini et al. (2007), on the basis of HVS‐II variation.
By contrast to the scarcity of L1 sequences in the LORV samples, haplogroup L5 was represented by twenty‐seven sequences harboring the characteristic state at position 16166. Ten of these formed a star‐like cluster of three Nyangatom, five Daasanach and two Turkana sequences in the L5a1 clade, together with three Ethiopian sequences (Afar, Oromo and Amhara) of Kivisild et al. (2004). A star‐
like cluster of mainly Nyangatom sequences (8 Nyangatom, 1 Daasanach, 2 Turkana, and 2 Amhara) was also observed in the L5a2 clade. Two further Nyangatom sequences were similar to the L5b
haplotypes described by Kivisild et al. (2004), except that they lack the transitions at 16093 and 16295. Finally, a group of four identical Daasanach sequences remained unclassified at the L5 level.
A total of twenty‐one LORV sequences were assigned to the L2 haplogroup (Figure S1b), fourteen of which to the L2a clade (9 Nyangatom, 3 Daasanach, 2 Turkana, Figure S1c). We found also a single Nyangatom L2b sequence, whereas three other Nyangatom sequences remained unclassified at the L2 level. Because these three sequences shared the L2b state at positions 16114 and 16213, we labeled them L3(pre)b, following the classification scheme of Kivisild et al. (2004).
Three LORV sequences were assigned to haplogroup L6 because these harbored the characteristic state at position 16224, but were found to differ at least at two positions (16048 and 16311) from the Ethiopian and Yemeni L6 sequences of Kivisild et al. (2004).
Twenty‐one LORV sequences were classified as belonging to the L4 haplogroup, among which a Nyangatom sequence presenting the L4 HVS‐I defining motif (16223‐16311‐16362). All other L4 sequences fell into the L4g clade (i.e. 11 Nyangatom and 9 Daasanach), forming a star‐like cluster.
Here again a single Nyangatom sequence presented the L4g HVS‐I defining motif (16223‐16293T‐
16311‐16355‐16362‐16399), but its HVS‐II status could not be verified. We observed that all LORV L4g sequences shared a G at position 244 in HVS‐II (except for a Nyangatom sequence that seems to have reverted at this position). However, contrarily to Kivisild et al. (2004), we found that the transition at position 146 relative to the rCRS was found on multiple haplogroup backgrounds, such as on L6 sequences, most L2 sequences, as well as several L0 and L3 sequences. Thus, 146C was not considered as characteristic of L4g. A group of seven LORV sequences (4 Nyangatom and 3
Daasanach) formed a star‐like sub‐cluster distinguished from the L4g defining motif by a transition at position 16172 that was also shared with haplotype 132 of Kivisild et al. (2004). Consequently, this haplotype was also included in the L4g sub‐cluster, thus tentatively resolving the reticulation observed by Kivisild et al. (2004) between haplotypes 132, 133 and 134 of their study.
Fifty‐three LORV sequences were assigned to the L3 haplogroup (excluding M and N), representing respectively 38% and 18% of the Nyangatom and Daasanach sequences (Figure S1d). Despite the lack of coding‐region information, LORV sequences were found apparently in all of the L3 primary
branches observed by Kivisild et al. (2004), except for L3w. The L3i branch was the most represented in the LORV sample, with respectively fifteen Nyangatom, three Daasanach and one Turkana
sequences, along with a single Amhara sequence (Kivisild et al., 2004), constituting yet another sub‐
clade that we tentatively labeled L3i1 (defining HVS‐I motif : 16153‐16174‐16223‐16319). We note that the classification of the LORV sequences did not seem to confirm the association, put forward in Kivisild et al. (2004), of the 150‐152 motif in HVS‐II with the L3w haplogroup, as this motif was also observed in some sequences of other L3 branches (L3h, L3i, and L3x) as well as on some L2
sequences.
Finally, two Daasanach and one Nyangatom sequences were classified as M, and a further five Nyangatom as N, although we failed to achieve a molecular resolution allowing a more precise assignation of these sequences to specific haplogroups.
We estimated the TMRCA of particular clades in the mtDNA phylogeny of the LORV sequences, using the same approach as Kivisild et al. (2004), i.e. applying the method of Forster et al. (1996) and Saillard et al. (2000) to those sub‐haplogroups that included at least five sequences. Additionally, we
computed the 95% confidence interval of each point‐estimation by bootstrapping the sequences in the clade (10,000 replicates). These estimations are reported in Table S3.
References in Supporting Text, Tables and Figures
Andrews, R.M., Kubacka, I., Chinnery, P.F., Lightowlers, R.N., Turnbull, D.M. & Howell, N. (1999) Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA.
Nat Genet, 23, 147.
Bandelt, H.J., Lahermo, P., Richards, M. & Macaulay, V. (2001) Detecting errors in mtDNA data by phylogenetic analysis. Int J Legal Med, 115, 64‐9.
Batini, C., Coia, V., Battaggia, C., Rocha, J., Pilkington, M.M., Spedini, G., Comas, D., Destro‐Bisol, G. &
Calafell, F. (2007) Phylogeography of the human mitochondrial L1c haplogroup: genetic signatures of the prehistory of Central Africa. Mol Phylogenet Evol, 43, 635‐44.
Beleza, S., Gusmao, L., Amorim, A., Carracedo, A. & Salas, A. (2005) The genetic legacy of western Bantu migrations. Hum Genet, 117, 366‐75.
Bertranpetit, J., Sala, J., Calafell, F., Underhill, P.A., Moral, P. & Comas, D. (1995) Human
mitochondrial DNA variation and the origin of Basques. Ann Hum Genet, 59 ( Pt 1), 63‐81.
Bosch, E., Calafell, F., Gonzalez‐Neira, A., Flaiz, C., Mateu, E., Scheil, H.G., Huckenbeck, W.,
Efremovska, L., Mikerezi, I., Xirotiris, N., Grasa, C., Schmidt, H. & Comas, D. (2006) Paternal and maternal lineages in the Balkans show a homogeneous landscape over linguistic barriers, except for the isolated Aromuns. Ann Hum Genet, 70, 459‐87.
Brakez, Z., Bosch, E., Izaabel, H., Akhayat, O., Comas, D., Bertranpetit, J. & Calafell, F. (2001) Human mitochondrial DNA sequence variation in the Moroccan population of the Souss area. Ann Hum Biol, 28, 295‐307.
Calafell, F., Underhill, P., Tolun, A., Angelicheva, D. & Kalaydjieva, L. (1996) From Asia to Europe:
mitochondrial DNA sequence variability in Bulgarians and Turks. Ann Hum Genet, 60 ( Pt 1), 35‐49.
Cali, F., Le Roux, M.G., D'anna, R., Flugy, A., De Leo, G., Chiavetta, V., Ayala, G.F. & Romano, V. (2001) MtDNA control region and RFLP data for Sicily and France. Int J Legal Med, 114, 229‐31.
Cerny, V., Hajek, M., Cmejla, R., Bruzek, J. & Brdicka, R. (2004) mtDNA sequences of Chadic‐speaking populations from northern Cameroon suggest their affinities with eastern Africa. Ann Hum Biol, 31, 554‐69.
Chen, Y.S., Olckers, A., Schurr, T.G., Kogelnik, A.M., Huoponen, K. & Wallace, D.C. (2000) mtDNA variation in the South African Kung and Khwe‐and their genetic relationships to other African populations. Am J Hum Genet, 66, 1362‐83.
Cherni, L., Loueslati, B.Y., Pereira, L., Ennafaa, H., Amorim, A. & El Gaaied, A.B. (2005) Female gene pools of Berber and Arab neighboring communities in central Tunisia: microstructure of mtDNA variation in North Africa. Hum Biol, 77, 61‐70.
Coia, V., Destro‐Bisol, G., Verginelli, F., Battaggia, C., Boschi, I., Cruciani, F., Spedini, G., Comas, D. &
Calafell, F. (2005) Brief communication: mtDNA variation in North Cameroon: lack of Asian lineages and implications for back migration from Asia to sub‐Saharan Africa. Am J Phys Anthropol, 128, 678‐81.
Comas, D., Calafell, F., Bendukidze, N., Fananas, L. & Bertranpetit, J. (2000) Georgian and kurd mtDNA sequence analysis shows a lack of correlation between languages and female genetic lineages. Am J Phys Anthropol, 112, 5‐16.
Comas, D., Calafell, F., Mateu, E., Perez‐Lezaun, A. & Bertranpetit, J. (1996) Geographic variation in human mitochondrial DNA control region sequence: the population history of Turkey and its relationship to the European populations. Mol Biol Evol, 13, 1067‐77.
Comas, D., Plaza, S., Wells, R.S., Yuldaseva, N., Lao, O., Calafell, F. & Bertranpetit, J. (2004) Admixture, migrations, and dispersals in Central Asia: evidence from maternal DNA lineages. Eur J Hum Genet, 12, 495‐504.
Corte‐Real, H.B., Macaulay, V.A., Richards, M.B., Hariti, G., Issad, M.S., Cambon‐Thomsen, A., Papiha, S., Bertranpetit, J. & Sykes, B.C. (1996) Genetic diversity in the Iberian Peninsula determined from mitochondrial sequence analysis. Ann Hum Genet, 60 ( Pt 4), 331‐50.
Destro‐Bisol, G., Coia, V., Boschi, I., Verginelli, F., Caglia, A., Pascali, V., Spedini, G. & Calafell, F. (2004) The analysis of variation of mtDNA hypervariable region 1 suggests that Eastern and Western Pygmies diverged before the Bantu expansion. Am Nat, 163, 212‐26.
Di Rienzo, A. & Wilson, A.C. (1991) Branching pattern in the evolutionary tree for human mitochondrial DNA. Proc Natl Acad Sci U S A, 88, 1597‐601.
Fadhlaoui‐Zid, K., Plaza, S., Calafell, F., Ben Amor, M., Comas, D. & Bennamar El Gaaied, A. (2004) Mitochondrial DNA heterogeneity in Tunisian Berbers. Ann Hum Genet, 68, 222‐33.
Falchi, A., Giovannoni, L., Calo, C.M., Piras, I.S., Moral, P., Paoli, G., Vona, G. & Varesi, L. (2005) Genetic history of some western Mediterranean human isolates through mtDNA HVR1 polymorphisms. J Hum Genet.
Forster P, Harding R, Torroni A, Bandelt HJ. (1996) Origin and evolution of Native American mtDNA variation: a reappraisal. Am J Hum Genet. 59:935‐45.
Francalacci, P., Bertranpetit, J., Calafell, F. & Underhill, P.A. (1996) Sequence diversity of the control region of mitochondrial DNA in Tuscany and its implications for the peopling of Europe. Am J Phys Anthropol, 100, 443‐60.
Gonder, M.K., Mortensen, H.M., Reed, F.A., De Sousa, A. & Tishkoff, S.A. (2007) Whole‐mtDNA genome sequence analysis of ancient African lineages. Mol Biol Evol, 24, 757‐68.
Graven, L., Passarino, G., Semino, O., Boursot, P., Santachiara‐Benerecetti, S., Langaney, A. &
Excoffier, L. (1995) Evolutionary correlation between control region sequence and restriction polymorphisms in the mitochondrial genome of a large Senegalese Mandenka sample. Mol Biol Evol, 12, 334‐45.
Jackson, B.A., Wilson, J.L., Kirbah, S., Sidney, S.S., Rosenberger, J., Bassie, L., Alie, J.A., Mclean, D.C., Garvey, W.T. & Ely, B. (2005) Mitochondrial DNA genetic diversity among four ethnic groups in Sierra Leone. Am J Phys Anthropol, 128, 156‐63.
Kivisild, T., Reidla, M., Metspalu, E., Rosa, A., Brehm, A., Pennarun, E., Parik, J., Geberhiwot, T., Usanga, E. & Villems, R. (2004) Ethiopian mitochondrial DNA heritage: tracking gene flow across and around the gate of tears. Am J Hum Genet, 75, 752‐70.
Knight, A., Underhill, P.A., Mortensen, H.M., Zhivotovsky, L.A., Lin, A.A., Henn, B.M., Louis, D., Ruhlen, M. & Mountain, J.L. (2003) African Y Chromosome and mtDNA Divergence Provides Insight into the History of Click Languages. Curr Biol, 13, 464‐73.
Krings, M., Salem, A.E., Bauer, K., Geisert, H., Malek, A.K., Chaix, L., Simon, C., Welsby, D., Di Rienzo, A., Utermann, G., Sajantila, A., Paabo, S. & Stoneking, M. (1999) mtDNA analysis of Nile River
Valley populations: A genetic corridor or a barrier to migration? Am J Hum Genet, 64, 1166‐
76.
Macaulay, V., Richards, M., Hickey, E., Vega, E., Cruciani, F., Guida, V., Scozzari, R., Bonne‐Tamir, B., Sykes, B. & Torroni, A. (1999) The emerging tree of West Eurasian mtDNAs: a synthesis of control‐region sequences and RFLPs. Am J Hum Genet, 64, 232‐49.
Mateu, E., Comas, D., Calafell, F., Perez‐Lezaun, A., Abade, A. & Bertranpetit, J. (1997) A tale of two islands: population history and mitochondrial DNA sequence variation of Bioko and Sao Tome, Gulf of Guinea. Ann Hum Genet, 61 ( Pt 6), 507‐18.
Pereira, L., Macaulay, V., Torroni, A., Scozzari, R., Prata, M.J. & Amorim, A. (2001) Prehistoric and historic traces in the mtDNA of Mozambique: insights into the Bantu expansions and the slave trade. Ann Hum Genet, 65, 439‐58.
Plaza, S., Calafell, F., Helal, A., Bouzerna, N., Lefranc, G., Bertranpetit, J. & Comas, D. (2003) Joining the pillars of Hercules: mtDNA sequences show multidirectional gene flow in the western Mediterranean. Ann Hum Genet, 67, 312‐28.
Plaza, S., Salas, A., Calafell, F., Corte‐Real, F., Bertranpetit, J., Carracedo, A. & Comas, D. (2004) Insights into the western Bantu dispersal: mtDNA lineage analysis in Angola. Hum Genet, 115, 439‐47.
Rando, J.C., Pinto, F., Gonzalez, A.M., Hernandez, M., Larruga, J.M., Cabrera, V.M. & Bandelt, H.J.
(1998) Mitochondrial DNA analysis of northwest African populations reveals genetic exchanges with European, near‐eastern, and sub‐Saharan populations. Ann Hum Genet, 62 (Pt 6), 531‐50.
Richards, M., Macaulay, V., Hickey, E., Vega, E., Sykes, B., Guida, V., Rengo, C., Sellitto, D., Cruciani, F., Kivisild, T., Villems, R., Thomas, M., Rychkov, S., Rychkov, O., Rychkov, Y., Golge, M.,
Dimitrov, D., Hill, E., Bradley, D., Romano, V., Cali, F., Vona, G., Demaine, A., Papiha, S., Triantaphyllidis, C., Stefanescu, G., Hatina, J., Belledi, M., Di Rienzo, A., Novelletto, A., Oppenheim, A., Norby, S., Al‐Zaheri, N., Santachiara‐Benerecetti, S., Scozari, R., Torroni, A. &
Bandelt, H.J. (2000) Tracing European founder lineages in the Near Eastern mtDNA pool. Am J Hum Genet, 67, 1251‐76.
Rodhain, F., Ardoin, P., Metselaar, D., Salmon, A.M. & Hannoun, C. (1975) An epidemiologic and serologic study of arboviruses in Lake Rudolf basin. Trop Geogr Med, 27, 307‐12.
Rodhain, F., Hannoun, C. & Metselaar, D. (1972) [Epidemiological and serological study of the
arboviruses in the lower valley of the Omo (southern Ethiopia)]. Bull World Health Organ, 47, 295‐304.
Röhl, A., Brinkmann, B., Forster, L. & Forster, P. (2001) An annotated mtDNA database. Int J Legal Med, 115, 29‐39.
Rosa, A., Brehm, A., Kivisild, T., Metspalu, E. & Villems, R. (2004) MtDNA profile of West Africa Guineans: towards a better understanding of the Senegambia region. Ann Hum Genet, 68, 340‐52.
Saillard J, Forster P, Lynnerup N, Bandelt HJ, Nørby S. (2000) mtDNA variation among Greenland Eskimos: the edge of the Beringian expansion. Am J Hum Genet. 67:718‐26.
Salas, A., Comas, D., Lareu, M.V., Bertranpetit, J. & Carracedo, A. (1998) mtDNA analysis of the Galician population: a genetic edge of European variation. Eur J Hum Genet, 6, 365‐75.
Salas, A., Richards, M., De La Fe, T., Lareu, M.V., Sobrino, B., Sanchez‐Diz, P., Macaulay, V. &
Carracedo, A. (2002) The making of the African mtDNA landscape. Am J Hum Genet, 71, 1082‐111.
Salas, A., Richards, M., Lareu, M.V., Scozzari, R., Coppa, A., Torroni, A., Macaulay, V. & Carracedo, A.
(2004) The African diaspora: mitochondrial DNA and the Atlantic slave trade. Am J Hum Genet, 74, 454‐65.
Sanchez‐Mazas, A. & Poloni, E.S. (2008) Genetic diversity in Africa. In: Encyclopedia of Life Sciences) Encyclopedia of Life Sciences. Chichester: John Wiley & Sons.
Tagliabracci, A., Turchi, C., Buscemi, L. & Sassaroli, C. (2001) Polymorphism of the mitochondrial DNA control region in Italians. Int J Legal Med, 114, 224‐8.
Thomas, M.G., Weale, M.E., Jones, A.L., Richards, M., Smith, A., Redhead, N., Torroni, A., Scozzari, R., Gratrix, F., Tarekegn, A., Wilson, J.F., Capelli, C., Bradman, N. & Goldstein, D.B. (2002) Founding mothers of Jewish communities: geographically separated Jewish groups were independently founded by very few female ancestors. Am J Hum Genet, 70, 1411‐20.
Tishkoff, S.A., Gonder, M.K., Henn, B.M., Mortensen, H., Knight, A., Gignoux, C., Fernandopulle, N., Lema, G., Nyambo, T.B., Ramakrishnan, U., Reed, F.A. & Mountain, J.L. (2007) History of click‐
speaking populations of Africa inferred from mtDNA and Y chromosome genetic variation.
Mol Biol Evol, 24, 2180‐95.
Vigilant, L., Stoneking, M., Harpending, H., Hawkes, K. & Wilson, A.C. (1991) African populations and the evolution of human mitochondrial DNA. Science, 253, 1503‐7.
Watson, E., Bauer, K., Aman, R., Weiss, G., Von Haeseler, A. & Paabo, S. (1996) mtDNA sequence diversity in Africa. Am J Hum Genet, 59, 437‐44.
Watson, E., Forster, P., Richards, M. & Bandelt, H.J. (1997) Mitochondrial footprints of human expansions in Africa. Am J Hum Genet, 61, 691‐704.
Figure S1
Median joining networks of mitochondrial DNA haplotypes, modified from Kivisild et al. (2004), so as to include the Nyangatom, Daasanach and Turkana sequences from the LORV sample.
Notes :
Nodes in the network represent haplotypes, and are drawn proportional to their frequency (see caption). Nyangatom, Daasanach and Turkana sequences are shown in green, blue and red, respectively, whereas the Ethiopian and Yemeni haplotypes of Kivisild et al. (2004) are shown in pink and yellow (see caption). LORV sequences IDs are reported in italics next to nodes. Mutated positions relative to the rCRS (i.e all transitions, unless specified) are shown along the branches connecting the nodes in the network; mutations in HVS‐I are shown in red (position minus 16000), those in HVS‐II in blue, and those in the coding‐region in dark‐brown. Branch lengths were only schematically drawn as proportional to the number of mutational steps, in order to respect the phylogenetic relationships observed by Kivisild et al. (2004).
a : Network of L0, L1 and L5 haplotypes. In the L0a1 clade, Daasanach number 213 and Nyangatom number 413 were reported at midway along the branch connecting haplotypes 36 and 37 of Kivisild et al. (2004), because these haplotypes are distinguished by a single transition at position 3866, not tested in our study. In the L0g clade, Nyangatom sequence 415 is distinguished from Nyangatom sequence 590 by two transitions at positions 16003 and 16004. However, these could be artifacts, since they are located near the beginning of the sequenced stretch. Turkana sequence K06, classified as L1c, was found very similar to a Tanzanian Turu L1c sequence (Gonder et al. 2007, GenBank EF184612), with which it shared the 16017 and 16163 states, while differing from it at positions 16158, 16209, and 151. In the L5a2 clade, Nyangatom sequence 381 is distinguished from sequences 244 (Nyangatom), 732 (Daasanach) and K65 (Turkana) by a transversion at 16399, but this could be a sequencing artifact as it is the last position sequenced. b : Network of L2, L6 and L4 haplotypes. The L2’L6 part of the network was built without considering position 16399, since this position was not scored for most haplotypes in Kivisild et al. (2004). Nevertheless, it is shown in brackets on the branch leading to L2d, as in Salas et al. (2004), since a G was observed at this position in all three Nyangatom L2d2 sequences (IDs 186, 223, 643). c : Network of L2a haplotypes. d : network of L3 haplotypes.
a
L1b1a
L0k
L0f root L0
L0d
L0g
546 623
590 415
692
277
734
754 735
L1
L1b
634
L1c L1c1 K06
L5
L5b
414 425 769, 773,
776, 782
L5a
L5a2 621
724 728 200 745 759
K41 K44
167 756
L5a1
L6 L4 L3 L2
L0a2 L0a
211
228 236 736
240 293 321 217 K34 429
201 361
K106
L0a1
664
428 684
280
726
245 252 413
213 733 721, 762,
767, 781
K21 229 234 262 412 421
381
416 732 K65
244 399C
233 231 360
317
111 254 319 311 176 291
355 360 362
192 93 187
223
172 209 213
265C 93 295
138
3594
129 187 189
166 148 189
254 311 368
10810
85 260 394
172 293
114
289 293
17 158 163 293
126 129 264 270
294 360 230
172 214 291A
148
51 164 188G
278
166C 223 287
169 327
173 368
239
223 230 274 352 52 172 290 325 354
327 316
63 129
3 4
170C 249 173
209 223
187 188 289
129 320 266
184 93
184 241
368 93 188 214 234 311 320 260
168
114A 162 278
209 93
293 320
263
093 150
287 129 294 287 293
Nyangatom Daasanach Turkana
Ethiopian (Kivisild et al. 2004) Yemeni (Kivisild et al. 2004)
1 2 3 4
b
L3
L4 L4g
132
379
783
L4a
L4a1
727 203 742 743 744
391
164 139 142 691
341 399
147 177 751 746
763
root
L5 L1 L0
L6
400
418
230 12
sequences
L2d1 L2 L2d
182 192
L2b 195 344
L2a
408
L2d2
186 223 643
294 75 93
274 189 344
136 192
111 230 86
169 284
293T 355 399
244 153 179 189 239 320 221 261
320
301
93 172 189 168 287
4 180
192 289 301
189
362
260
289 207T 93
207 207
261 318
163 356 264 320
220 244 93 189 207 265C
362 153 189
309 189 264
224
309 311
48 48
362
93 188G
173
295 390 278
3594
10810
129 187 (399) 189
129 189 223 300 354
311 111A 145 239 292 355 294
354 153 311 362
145
129 298
154 179 189
311 114A 390
213
93 184 234 311
Nyangatom Daasanach Turkana
Ethiopian (Kivisild et al. 2004) Yemeni (Kivisild et al. 2004)
1 2 3 4
133 244
c
543
L2a1
408
L2a1c
209 301 354 K110
214 226
L2a1a
322 365 667 7 sequences
L2a
L2d L2 L2b
5 sequen-
ces 665
242 123
738 K10 258
335 51
241C
294 189
93 139
309 37
129 286
150 124 362
292 188 256
213
92
311 192
192 309
354 172 222
309 287
316 172
172 311
229
291 229291 192 187
192 309
173 189
Nyangatom Daasanach Turkana
Ethiopian (Kivisild et al. 2004) Yemeni (Kivisild et al. 2004)
1 2 3 4
d
L3w
L3 L3d
L3bd L3d1
L3i L3d2
L3e
L3f L3f1
L3x1 L3x L3x2
L3e1 L3e3
L3b
L4
L2 L6 L3h
778
273 711
760
417 225
100, 126, 135, 194
723
149 163 423
168 402
176
L3e1a
160 243 404 739
122, 166 189
239
255 128
171
224, 256, 270, 272, 281, 343, 592, 620, 661
777 752 753
190, 191, 237 248, 426
398 5 sequen-
ces
K77 9 sequences 205
320 287
383 sequen-6
ces
L0 L1 L5
L3i1
141 276
165
179 256A
255
284
289 192
175 148
354 354
179 256A 399 124 153
169 311
209
260
311 266
292
111A 129
286 311
126 176 93
93
129 150 234
93 185 189 235 186 327 309
298 256
320
305
10398
66 136 368
171 189A 292 293 192
311 311
126
223
86 147 193 195
235 243
311
223 278
327 311
265T 327 185
129 391
335T 294
185 220
319 75 126
174
239 93
172 138T 66 230 86
278 93
362 316 360 327
254 385
319 111 256
93 368
Nyangatom Daasanach Turkana
Ethiopian (Kivisild et al. 2004) Yemeni (Kivisild et al. 2004)
1 2 3 4
Figure S2
Map with approximate geographic location of the 136 population samples included in the HVS‐I database.
Notes :
The samples are represented by dots color‐coded according to linguistic affiliation: Afro‐Asiatic (green), Nilo‐Saharan (red), Niger‐Congo (yellow), Khoisan and Khoisan‐related languages of Tanzania (dark blue), Indo‐European (light blue), Altaic (pink), and Basque (brown). We defined seven geographic subdivisions, represented by shaded areas on the map, that broadly correspond to those of Salas et al. (2002) and Sanchez‐Mazas & Poloni (2008), i.e. East Africa (EA), Central and Southwest Africa (CSWA), West Africa (WA), Southeast and South Africa (SESA), North Africa (NA), Middle East and West Asia (MEWA) and South Europe (SE).
TuruSandawe Datog Burunge Hadza Sukuma Hadzabe Nalu
NA WA
CSWA
SESA
EA
MEWA
SE
Figure S3
Three successive Principal Coordinates Analyses (PCA) of pairwise Reynolds’ genetic distances among populations computed from HVS‐I sequence‐based ΦST indices.
Notes :
The populations are color‐coded according to geographic location, as shown in the captions. The proportion of the total genetic variance explained by each of the two first coordinates is shown in brackets. a : The PCA of genetic distances among the 136 populations of the HVS‐I database highlights the genetic divergence of the Mbenzele and Ju|'hoansi from all other populations; a correlation coefficient r = 0.711 (95% C.I. = [0.617 ; 0.785]) was found between the latitudinal location of the populations and the PCA’s first axis coordinates. b : The PCA of genetic distances among 134 populations (i.e. excluding the Mbenzele and Ju|'hoansi) highlights the genetic divergence of the Herero; a correlation coefficient r = 0.877 (95% C.I. = [0.832 ; 0.911]) was found between the latitudinal location of the populations and the PCA’s first axis coordinates. c : The PCA of genetic distances among 133 populations (i.e. excluding the Mbenzele, Ju|'hoansi and Herero) resulted in a correlation coefficient r = 0.882 (95% C.I. = [0.838 ; 0.914]) between the latitudinal location of the populations and the PCA’s first axis coordinates.
0.5 0.5
0.3 0.4
a b c
0.0
14%)
0.0
e 1 (69.82%)
0.1 0.2
.36%)
Ju|'hoansi
‐0.5
Coordinate 1 (58.
Herero Coordinat ‐0.5
Datoga
Sandawe Nyangatom
Daasanach
‐0.2
‐0.1 0.0
Coordinate 1 (74
Mbenzele
‐1.0
‐1.0
‐0.5 0.0 0.5 1.0
Coordinate 2 (13.15%)
North Afri a East Afri a
Hadza Hadzabe Lomwe
Sekele
‐0.4
‐0.3
‐1.5
‐1.0 ‐0.5 0.0 0.5
Coordinate 2 (25.09%)
North Africa East Africa
Central and South‐West Africa West Africa
North Africa East Africa
Central and South‐West Africa West Africa South‐East and South Africa South Europe Middle East and West Asia
‐0.5
‐0.2 ‐0.1 0.0 0.1 0.2 0.3 0.4 0.5
Coordinate 2 (11.74%)
North Africa East Africa
C l d S h f i f i
Central and South‐West Africa West Africa South‐East and South Africa South Europe Middle East and West Asia
Central and South‐West Africa West Africa South‐East and South Africa South Europe Middle East and West Asia