Genetic evidence for complexity in ethnic differentiation and history in East Africa

(1)

Article

Reference

Genetic evidence for complexity in ethnic differentiation and history in East Africa

POLONI, Estella S., et al.

Abstract

The Afro-Asiatic and Nilo-Saharan language families come into contact inWestern Ethiopia.

Ethnic diversity is particularly high in the South, where the Nilo-Saharan Nyangatom and the Afro-Asiatic Daasanach dwell. Despite their linguistic differentiation, both populations rely on a similar agripastoralist mode of subsistence. Analysis of mitochondrial DNA extracted from Nyangatom and Daasanach archival sera revealed high levels of diversity, with most sequences belonging to the L haplogroups, the basal branches of the mitochondrial phylogeny. However, in sharp contrast with other Ethiopian populations, only 5% of the Nyangatom and Daasanach sequences belong to haplogroups M and N. The Nyangatom and Daasanach were found to be significantly differentiated, while each of them displays close affinities with some Tanzanian populations. The strong genetic structure found over East Africa was neither associated with geography nor with language, a result confirmed by the analysis of 6711 HVS-I sequences of 136 populations mainly from Africa. Processes of migration, language shift and group absorption are documented by linguists [...]

POLONI, Estella S., et al. Genetic evidence for complexity in ethnic differentiation and history in East Africa. Annals of Human Genetics, 2009, vol. 73, no. 6, p. 582-600

DOI : 10.1111/j.1469-1809.2009.00541.x

Available at:

http://archive-ouverte.unige.ch/unige:2714

Disclaimer: layout of this document may differ from the published version.

(2)

Supporting Information for ms :

Genetic evidence for complexity in ethnic differentiation and history in East Africa by Estella S. Poloni, Yamama Naciri, Rute Bucho, Régine Niba, Barbara Kervaire, Laurent Excoffier, André Langaney and Alicia Sanchez‐Mazas

Text S1. Supporting Information on Materials and Methods and on Results.

References in Supporting Texts, Tables and Figures.

Figure S1. Median joining networks of mitochondrial DNA haplotypes, modified from Kivisild et al.

(2004), so as to include the Nyangatom, Daasanach and Turkana sequences from the LORV sample.

Figure S2. Map with approximate geographic location of the 136 population samples included in the HVS‐I database.

Figure S3. Three successive Principal Coordinates Analyses (PCA) of pairwise Reynolds’ genetic distances among populations computed from HVS‐I sequence‐based Φ_ST indices.

Table S1. Scored mutated positions in the LORV sequences and classification into haplogroups.

Table S2. The 136 published and new population samples of the mtDNA HVS‐I sequences database used in this study (from Africa, the Middle East and West Asia, and the South of Europe).

Table S3. TMRCA estimates for sub‐haplogroups observed in the LORV samples that included at least five sequences.

Table S4. HVS‐I genetic diversity indices, results of the tests of selective neutrality and population equilibrium, and results of the tests of population expansion performed on the 136 population.

Text S1

Supporting Information on Materials and Methods.

Samples

During the 1971‐1972 sampling campaigns, taking of blood samples with the individuals consent was performed in 5‐10 ml Vacutainer sterile tubes in several settlements of the southernmost section of the Omo River Valley (Southwestern Ethiopia):

‐ the Nyangatom samples were taken in 3 settlements on the banks of the Omo River (Kagnikagn, Shangura and Weleso) and in 6 settlements on the eastern bank of the Kibish River on the Ethiopia‐Sudan boder, directly north of the Nakua Mountains (Lopokori, Nawiapua, Lokwamugnan, Nawiatir, Natir and Rukruk);

‐ the Daasanach were taken in 3 hamlets in the Kalam region (Handall, Rate and Kalam‐

mission), in the village of Nyamorlu, located in the delta of the Omo River, and among the

(3)

Daasanach workers in the camp of the French component of the IORE (International Omo Research Expedition);

‐ the Turkana samples were taken in 4 villages located along the western bank of Lake Turkana (Kalokol, Lokitanyella, Kataboi and Loarygnak).

Maps and more details on the sampling are provided in Rodhain et al. (1972, 1975).

Blood fractions were separated by simple decantation and conserved at 4ºC for 2‐3 weeks before being shipped to the Pasteur Institute in Paris. There, the serum samples were kept in a freezer at ‐ 20ºC, unless they were de‐frozen for the purpose of epidemiological studies. This happened about 2 or 3 times since the early seventies. The remaining collection, on which we have worked, was made up of 2 ml of serum per sample on average.

Obviously, since the aim of the 1971‐1972 sampling campaigns was to investigate the geographic spread of a yellow fever epidemic that occurred some ten years earlier, the sampling strategy did not avoid related individuals. The field notes of these campaigns contain detailed information on each sampled individual, among which also kinship relationships (parents, spouses, children,

uncles/nephews). However, it is possible that this last information was not provided or checked in a systematic fashion, and even less so for the maternal line (since family membership of women in these patrilinear societies is subordinated to that of men). Thus, besides those cases where kinship relationships were explicitly noted, we cannot exclude relatedness between other individuals. This is even more probable in the case of small populations, such as the Nyangatom and the Daasanach, in which complex long‐term kinship relations are expected. However, we note that such kinship relationships, if overrepresented in the collection, should bias the estimations of diversity levels inferred for these populations in a conservative way, i.e. by reducing them.

Note that we avoided splitting the samples according to the sampling locations listed above because the field notes contain numerous entries specifying that individuals (and particularly so, women) came from another location than the one they were sampled in.

Nested PCRs

A first PCR (PCR1) was carried out to amplify the whole control (D‐loop) region (∼1274 bp) using primers L15888 and H592, and PCR1 products were directly used as templates for two separate PCR2 rounds specific to each locus with primers H16401 and L15996 for HVS‐I and primers H408 and L16552 for HVS‐II. A second approach was used on samples that did not amplify using the first method, probably because the DNA was too degraded to allow the amplification of the whole D‐loop during PCR1. A first PCR (PCR1) was therefore performed that targeted each of HVS‐I and HVS‐II separately with primer pairs L15988/H16410 for HVS‐I and L16521/H470 for HVS‐II, and PCR2 used the same primers as in the first approach.

SNP genotyping

A 289bp fragment including position 3594 was amplified using primers L3441 (^5'ACT ACA ACC CTT CGC TGA CG^3') and H3709 (^5'TGA GAT TGT TTG GGC TAC TGC^3') and used to classify individuals into haplogroups L3 (3594C) or L1/L2 (3594T), after enzymatic digestion using HpaI. A 279bp fragment including positions 10810 and 10873 was amplified using primers L10759 (^5'AAT GCT AAA ACT AAT

(4)

CGT CC^3') and H11018 (^5'TTT TTT CGT GAT AGT GGT TC^3') that allowed to discriminate between haplogroup L1 (10810C) or L2 (10810T) sequences on one hand (digestion with HinfI for 10810, and with MnlI for 10873), and on the other hand, to assign to haplogroup N (10873T) those sequences that were previously classified as L3. Finally a 219bp fragment including position 10400 was amplified using primers L10267 (^5'CCC TCC TTT TAC CCC TAC CA^3') and H10466 (^5'TGT AAA TGA GGG GCA TTT GG^3') and used to classify into haplogroup M (10400T) those sequences that were previously

classified as L3 and non‐N (digestion with AluI). SNPs 16230 and 16390 were additionally scored using the HVS‐I sequencing records. The first one was used to characterize haplogroup L0 (16230G) within the L1 lineage and the second one (16390A) to further confirm the classification into haplogroup L2 of those individuals that were formerly genotyped as 3594T and 10810T. Because of difficulties in amplifying the fragment including the two positions 10810 and 10873, two additional primer pairs were used: L10687 (^5'TGG GCC TAG CCC TAC TAG TCT^3') and H10839 (^5'AAT TAG GCT GTG GGT GGT TG^3') for SNP 10810 (172bp), and L10831 (^5'AAT CAA CAC AAC CAC CCA CA^3’) and H10968 (^5'TGT GAG GGG TAG GAG TCA GG^3') for SNP 10873 (157bp).

Supporting Information on Results.

Eliminating contamination suspicions

We considered that contamination could not be excluded when sets of identical sequences were obtained during the same extraction, amplification and/or sequencing sessions. This procedure ended up with 80 HVS‐I and HVS‐II sequences (representing 69 individuals) for which the possibility of contamination could not be ruled out on the basis of these criteria. For 27 individuals, serum was still available, and we tested these samples again, starting with newly serum‐extracted DNA. In one sample the initial sequence was not reproduced and thus contamination suspicion could not be ruled out (obviously, this sample was eliminated from the definitive data set). The proportion of sequences for which contamination by another sample could not be ruled out was 3.70% (1/27), which leads to the conservative assumption that about three sequences (80 x 3.70%) out of the 362 sequences of the definitive data set could reflect contamination, i.e. ∼0.8% of the total data set.

Success scores of HVS sequencing and SNPs genotyping in the LORV samples

Obviously, sequencing of the two hypervariable mtDNA segments and genotyping of the 4 SNPs was not successful for all the samples. Detailed success scores were as follows:

‐ HVS‐I, HVS‐II and the 4 SNPs: 28 Nyangatom, 8 Daasanach and 2 Turkana;

‐ HVS‐I, HVS‐II and 3 SNPs (missing result for SNP 10810): 8 Nyangatom and 1 Daasanach;

‐ HVS‐I, HVS‐II and 3 SNPs (missing result for SNP 10400): 1 Nyangatom and 2 Daasanach;

‐ HVS‐I, HVS‐II and 2 SNPs (missing result for SNPs 10810 and 10400): 1 Daasanach;

‐ HVS‐I and the 4 SNPs: 8 Nyangatom;

‐ HVS‐I and 3 SNPs (missing result for SNP 10810): 3 Nyangatom;

‐ HVS‐I and 2 SNPs (missing result for SNPs 10810 and 10400): 1 Nyangatom and 1 Daasanach;

‐ HVS‐II and the 4 SNPs: 11 Nyangatom;

‐ HVS‐I and HVS‐II: 36 Nyangatom, 25 Daasanach and 4 Turkana;

(5)

‐ only HVS‐I: 27 Nyangatom, 11 Daasanach and 4 Turkana;

‐ only HVS‐II: 52 Nyangatom, 10 Daasanach and 1 Turkana;

‐ only the 4 SNPs: 7 Nyangatom, 1 Daasanach and 1 Turkana.

The final data set of sequences was thus constituted of 112 Nyangatom, 49 Daasanach and 10 Turkana HVS‐I sequences of about 400 bp, and 136 Nyangatom, 47 Daasanach and 7 Turkana HVS‐II sequences of about 460 bp.

We identified only two cases of maternal kinship in the dataset, i.e. between samples N_182 and N_192, both bearing the same HVS‐I and HVS‐II (L2(pre)b) sequences, and between samples N_229 and N_234, both bearing the same HVS‐I and HVS‐II (L5a2) sequences (Table S1). However, as already explained above, other cases cannot be excluded. Consequently, we decided to conserve these 4 individuals in all the analyses.

The sampling locations represented by this dataset were as follows:

‐ Nyangatom (HVS‐I/HVS‐II): Nawiapua (27/15), Shangura (23/23), Nawiatir (20/24),

Lokwamugnan (13/31), Weleso (12/16), Rukruk (10/20), Lopokori (5/5) and Kagnikagn (2/2);

‐ Daasanach (HVS‐I/HVS‐II): Handall (27/25), Nyamorlu (11/10), Kalam‐mission (4/4) and Rate (2/2), as well as 5/6 unspecified samples that were taken from Daasanach workers in the French IORE camp;

‐ Turkana (HVS‐I/HVS‐II): Kalokol (4/5), Kataboi (3/0), Lokitanyella (2/1) and Loarygnak (1/1).

Genetic diversity of HVS sequences and RFLP in the LORV samples

The HVS‐I sequence alignment used to compare the LORV samples spanned from np 15997 to 16399, excluding the stretch between 16182 and 16193. For HVS‐II, we compared the sequences from np 16529 to 413, excluding a few indel positions in poly‐nucleotide tracks (nps 308, 309, 315, 356).

Ambiguous nucleotides (Y and R) were parsimoniously corrected according to sequences with which they shared all other mutations relative to the revised Cambridge Reference Sequence (rCRS, Andrews et al., 1999), i.e. three ambiguities in one HVS‐I sequence each (Table S1), and seven ambiguities in one HVS‐II sequence each. Thus we considered that heteroplasmy was implausible in these cases, except for the Nyangatom sequence number 380 ( np 16223) for which no close

matching haplotype was found. In HVS‐II, two mutations were found completely fixed with respect to the rCRS: np 16532, deleted in all the LORV samples, and a A to G transition at np 263.

Similar levels of diversity were observed in the three LORV samples, both for HVS‐I and HVS‐II. Gene diversity, mean number of pairwise differences (MNPD) and nucleotide diversity (π) in HVS‐I were of 0.985 (±0.004), 8.98 (±4.17) and 0.023 (±0.0118) in the Nyangatom, of 0.979 (±,0.009), 9.44 (±4.41) and 0.0241 (±0.0125) in the Daasanach, and of 0.978 (±0.054), 9.31 (±4.68) and 0.0238 (±0.0135) in the Turkana. The corresponding indices in HVS‐II were of 0.975 (±0.005), 5.32 (±2.58) and 0.0118 (±0.0064) in the Nyangatom, of 0.978 (±,0.012), 5.61 (±2.74) and 0.0125 (±0.0068) in the Daasanach, and of 1.0 (±0.076), 6.43 (±3.47) and 0.0143 (±0.0088) in the Turkana. Similarly, for both segments, haplotype richness in the Nyangatom (R_(g) = 36.8, with g = 49 for HVS‐I, and R_(g) = 31.9, with g = 47 for

(6)

HVS‐II) was equivalent to the observed number of haplotypes in the Daasanach (i.e. 34 and 35 respectively). As expected, the molecular diversity in HVS‐II was lower than that of HVS‐I in the three LORV samples, although the usable sequence was slightly longer for HVS‐II than for HVS‐I (391).

Indeed, both the number of polymorphic sites and the transition/transversion ration averaged over populations were twice as high in HVS‐I (92 and 11.0, respectively) than in HVS‐II (44 and 5.6, respectively). Finally, HVS‐I and HVS‐II unimodal mismatch distributions were observed in the Nyangatom and the Daasanach.

For all three samples and both D‐loop segments, Tajima’s D was negative but not significant.

Conversely, Fu’s Fs was found negative and highly significant both in the Nyangatom and the Daasanach (respectively, ‐24.6 and ‐15.2 for HVS‐I, ‐25.3 and ‐25.3 for HVS‐II, all values with

P<0.0005). For the Turkana, this last statistic did not reach significance, but this can be due to its very small sample size.

Surprisingly, genetic differentiation between the Nyangatom and the Daasanach was significant for HVS‐I (Φ_ST = 4.1%, P < 0.0001), but not for HVS‐II (Φ_ST = 0.4%, P = 0.218). On another hand, the small sample size of the Turkana could again explain that they were not found significantly differentiated either from the Nyangatom or from the Daasanach for both segments.

Classification of the LORV HVS‐I sequences into haplogroups

The classification of the LORV sequences is shown in Table S1 and Figure S1. Thirty‐six sequences were classified as belonging to the L0 haplogroup, , making up 16%, 33% and 20% of the Nyangatom, Daasanach and Turkana sequence samples, respectively (Figure S1a). Among these, we observed nine L0f sequences, eight of which are Daasanach and share the 16129‐16169‐16187‐16189‐16223‐

16230‐16278‐16311‐16327‐16368 motif with the single Amhara L0f sequence of Kivisild et al. (2004).

Twenty‐seven LORV sequences were initially assigned to L0a, ten of which were classified into the L0a1 clade and twelve into L0a2. As in Kivisild et al. (2004) we also observed 5 LORV sequences that remained unclassified at the L0a level, four of which (3 Nyangatom and 1 Daasanach) also lacked the 16188 transversion generally used to pinpoint to L0a sequences. Interestingly, these latter 4

sequences, along with the single Tigrai sequence of Kivisild et al. (2004), seem to build up yet another L0 clade, actually a sister‐clade of L0a, which we propose to label L0g. No L0d or L0k sequences were observed in the LORV samples.

We assigned only two LORV sequences to the L1 haplogroup, one of these being a Daasanach L1b sequence, the other one a Turkana L1c. This latter sequence harbors the 16293 substitution that defines the L1c1 clade in Salas et al. (2002) and it could belong to the L1c1b sub‐clade according to the classification of Batini et al. (2007), on the basis of HVS‐II variation.

By contrast to the scarcity of L1 sequences in the LORV samples, haplogroup L5 was represented by twenty‐seven sequences harboring the characteristic state at position 16166. Ten of these formed a star‐like cluster of three Nyangatom, five Daasanach and two Turkana sequences in the L5a1 clade, together with three Ethiopian sequences (Afar, Oromo and Amhara) of Kivisild et al. (2004). A star‐

like cluster of mainly Nyangatom sequences (8 Nyangatom, 1 Daasanach, 2 Turkana, and 2 Amhara) was also observed in the L5a2 clade. Two further Nyangatom sequences were similar to the L5b

(7)

haplotypes described by Kivisild et al. (2004), except that they lack the transitions at 16093 and 16295. Finally, a group of four identical Daasanach sequences remained unclassified at the L5 level.

A total of twenty‐one LORV sequences were assigned to the L2 haplogroup (Figure S1b), fourteen of which to the L2a clade (9 Nyangatom, 3 Daasanach, 2 Turkana, Figure S1c). We found also a single Nyangatom L2b sequence, whereas three other Nyangatom sequences remained unclassified at the L2 level. Because these three sequences shared the L2b state at positions 16114 and 16213, we labeled them L3(pre)b, following the classification scheme of Kivisild et al. (2004).

Three LORV sequences were assigned to haplogroup L6 because these harbored the characteristic state at position 16224, but were found to differ at least at two positions (16048 and 16311) from the Ethiopian and Yemeni L6 sequences of Kivisild et al. (2004).

Twenty‐one LORV sequences were classified as belonging to the L4 haplogroup, among which a Nyangatom sequence presenting the L4 HVS‐I defining motif (16223‐16311‐16362). All other L4 sequences fell into the L4g clade (i.e. 11 Nyangatom and 9 Daasanach), forming a star‐like cluster.

Here again a single Nyangatom sequence presented the L4g HVS‐I defining motif (16223‐16293T‐

16311‐16355‐16362‐16399), but its HVS‐II status could not be verified. We observed that all LORV L4g sequences shared a G at position 244 in HVS‐II (except for a Nyangatom sequence that seems to have reverted at this position). However, contrarily to Kivisild et al. (2004), we found that the transition at position 146 relative to the rCRS was found on multiple haplogroup backgrounds, such as on L6 sequences, most L2 sequences, as well as several L0 and L3 sequences. Thus, 146C was not considered as characteristic of L4g. A group of seven LORV sequences (4 Nyangatom and 3

Daasanach) formed a star‐like sub‐cluster distinguished from the L4g defining motif by a transition at position 16172 that was also shared with haplotype 132 of Kivisild et al. (2004). Consequently, this haplotype was also included in the L4g sub‐cluster, thus tentatively resolving the reticulation observed by Kivisild et al. (2004) between haplotypes 132, 133 and 134 of their study.

Fifty‐three LORV sequences were assigned to the L3 haplogroup (excluding M and N), representing respectively 38% and 18% of the Nyangatom and Daasanach sequences (Figure S1d). Despite the lack of coding‐region information, LORV sequences were found apparently in all of the L3 primary

branches observed by Kivisild et al. (2004), except for L3w. The L3i branch was the most represented in the LORV sample, with respectively fifteen Nyangatom, three Daasanach and one Turkana

sequences, along with a single Amhara sequence (Kivisild et al., 2004), constituting yet another sub‐

clade that we tentatively labeled L3i1 (defining HVS‐I motif : 16153‐16174‐16223‐16319). We note that the classification of the LORV sequences did not seem to confirm the association, put forward in Kivisild et al. (2004), of the 150‐152 motif in HVS‐II with the L3w haplogroup, as this motif was also observed in some sequences of other L3 branches (L3h, L3i, and L3x) as well as on some L2

sequences.

Finally, two Daasanach and one Nyangatom sequences were classified as M, and a further five Nyangatom as N, although we failed to achieve a molecular resolution allowing a more precise assignation of these sequences to specific haplogroups.

We estimated the TMRCA of particular clades in the mtDNA phylogeny of the LORV sequences, using the same approach as Kivisild et al. (2004), i.e. applying the method of Forster et al. (1996) and Saillard et al. (2000) to those sub‐haplogroups that included at least five sequences. Additionally, we

(8)

computed the 95% confidence interval of each point‐estimation by bootstrapping the sequences in the clade (10,000 replicates). These estimations are reported in Table S3.

References in Supporting Text, Tables and Figures

Andrews, R.M., Kubacka, I., Chinnery, P.F., Lightowlers, R.N., Turnbull, D.M. & Howell, N. (1999) Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA.

Nat Genet, 23, 147.

Bandelt, H.J., Lahermo, P., Richards, M. & Macaulay, V. (2001) Detecting errors in mtDNA data by phylogenetic analysis. Int J Legal Med, 115, 64‐9.

Batini, C., Coia, V., Battaggia, C., Rocha, J., Pilkington, M.M., Spedini, G., Comas, D., Destro‐Bisol, G. &

Calafell, F. (2007) Phylogeography of the human mitochondrial L1c haplogroup: genetic signatures of the prehistory of Central Africa. Mol Phylogenet Evol, 43, 635‐44.

Beleza, S., Gusmao, L., Amorim, A., Carracedo, A. & Salas, A. (2005) The genetic legacy of western Bantu migrations. Hum Genet, 117, 366‐75.

Bertranpetit, J., Sala, J., Calafell, F., Underhill, P.A., Moral, P. & Comas, D. (1995) Human

mitochondrial DNA variation and the origin of Basques. Ann Hum Genet, 59 ( Pt 1), 63‐81.

Bosch, E., Calafell, F., Gonzalez‐Neira, A., Flaiz, C., Mateu, E., Scheil, H.G., Huckenbeck, W.,

Efremovska, L., Mikerezi, I., Xirotiris, N., Grasa, C., Schmidt, H. & Comas, D. (2006) Paternal and maternal lineages in the Balkans show a homogeneous landscape over linguistic barriers, except for the isolated Aromuns. Ann Hum Genet, 70, 459‐87.

Brakez, Z., Bosch, E., Izaabel, H., Akhayat, O., Comas, D., Bertranpetit, J. & Calafell, F. (2001) Human mitochondrial DNA sequence variation in the Moroccan population of the Souss area. Ann Hum Biol, 28, 295‐307.

Calafell, F., Underhill, P., Tolun, A., Angelicheva, D. & Kalaydjieva, L. (1996) From Asia to Europe:

mitochondrial DNA sequence variability in Bulgarians and Turks. Ann Hum Genet, 60 ( Pt 1), 35‐49.

Cali, F., Le Roux, M.G., D'anna, R., Flugy, A., De Leo, G., Chiavetta, V., Ayala, G.F. & Romano, V. (2001) MtDNA control region and RFLP data for Sicily and France. Int J Legal Med, 114, 229‐31.

Cerny, V., Hajek, M., Cmejla, R., Bruzek, J. & Brdicka, R. (2004) mtDNA sequences of Chadic‐speaking populations from northern Cameroon suggest their affinities with eastern Africa. Ann Hum Biol, 31, 554‐69.

Chen, Y.S., Olckers, A., Schurr, T.G., Kogelnik, A.M., Huoponen, K. & Wallace, D.C. (2000) mtDNA variation in the South African Kung and Khwe‐and their genetic relationships to other African populations. Am J Hum Genet, 66, 1362‐83.

Cherni, L., Loueslati, B.Y., Pereira, L., Ennafaa, H., Amorim, A. & El Gaaied, A.B. (2005) Female gene pools of Berber and Arab neighboring communities in central Tunisia: microstructure of mtDNA variation in North Africa. Hum Biol, 77, 61‐70.

Coia, V., Destro‐Bisol, G., Verginelli, F., Battaggia, C., Boschi, I., Cruciani, F., Spedini, G., Comas, D. &

Calafell, F. (2005) Brief communication: mtDNA variation in North Cameroon: lack of Asian lineages and implications for back migration from Asia to sub‐Saharan Africa. Am J Phys Anthropol, 128, 678‐81.

(9)

Comas, D., Calafell, F., Bendukidze, N., Fananas, L. & Bertranpetit, J. (2000) Georgian and kurd mtDNA sequence analysis shows a lack of correlation between languages and female genetic lineages. Am J Phys Anthropol, 112, 5‐16.

Comas, D., Calafell, F., Mateu, E., Perez‐Lezaun, A. & Bertranpetit, J. (1996) Geographic variation in human mitochondrial DNA control region sequence: the population history of Turkey and its relationship to the European populations. Mol Biol Evol, 13, 1067‐77.

Comas, D., Plaza, S., Wells, R.S., Yuldaseva, N., Lao, O., Calafell, F. & Bertranpetit, J. (2004) Admixture, migrations, and dispersals in Central Asia: evidence from maternal DNA lineages. Eur J Hum Genet, 12, 495‐504.

Corte‐Real, H.B., Macaulay, V.A., Richards, M.B., Hariti, G., Issad, M.S., Cambon‐Thomsen, A., Papiha, S., Bertranpetit, J. & Sykes, B.C. (1996) Genetic diversity in the Iberian Peninsula determined from mitochondrial sequence analysis. Ann Hum Genet, 60 ( Pt 4), 331‐50.

Destro‐Bisol, G., Coia, V., Boschi, I., Verginelli, F., Caglia, A., Pascali, V., Spedini, G. & Calafell, F. (2004) The analysis of variation of mtDNA hypervariable region 1 suggests that Eastern and Western Pygmies diverged before the Bantu expansion. Am Nat, 163, 212‐26.

Di Rienzo, A. & Wilson, A.C. (1991) Branching pattern in the evolutionary tree for human mitochondrial DNA. Proc Natl Acad Sci U S A, 88, 1597‐601.

Fadhlaoui‐Zid, K., Plaza, S., Calafell, F., Ben Amor, M., Comas, D. & Bennamar El Gaaied, A. (2004) Mitochondrial DNA heterogeneity in Tunisian Berbers. Ann Hum Genet, 68, 222‐33.

Falchi, A., Giovannoni, L., Calo, C.M., Piras, I.S., Moral, P., Paoli, G., Vona, G. & Varesi, L. (2005) Genetic history of some western Mediterranean human isolates through mtDNA HVR1 polymorphisms. J Hum Genet.

Forster P, Harding R, Torroni A, Bandelt HJ. (1996) Origin and evolution of Native American mtDNA variation: a reappraisal. Am J Hum Genet. 59:935‐45.

Francalacci, P., Bertranpetit, J., Calafell, F. & Underhill, P.A. (1996) Sequence diversity of the control region of mitochondrial DNA in Tuscany and its implications for the peopling of Europe. Am J Phys Anthropol, 100, 443‐60.

Gonder, M.K., Mortensen, H.M., Reed, F.A., De Sousa, A. & Tishkoff, S.A. (2007) Whole‐mtDNA genome sequence analysis of ancient African lineages. Mol Biol Evol, 24, 757‐68.

Graven, L., Passarino, G., Semino, O., Boursot, P., Santachiara‐Benerecetti, S., Langaney, A. &

Excoffier, L. (1995) Evolutionary correlation between control region sequence and restriction polymorphisms in the mitochondrial genome of a large Senegalese Mandenka sample. Mol Biol Evol, 12, 334‐45.

Jackson, B.A., Wilson, J.L., Kirbah, S., Sidney, S.S., Rosenberger, J., Bassie, L., Alie, J.A., Mclean, D.C., Garvey, W.T. & Ely, B. (2005) Mitochondrial DNA genetic diversity among four ethnic groups in Sierra Leone. Am J Phys Anthropol, 128, 156‐63.

Kivisild, T., Reidla, M., Metspalu, E., Rosa, A., Brehm, A., Pennarun, E., Parik, J., Geberhiwot, T., Usanga, E. & Villems, R. (2004) Ethiopian mitochondrial DNA heritage: tracking gene flow across and around the gate of tears. Am J Hum Genet, 75, 752‐70.

Knight, A., Underhill, P.A., Mortensen, H.M., Zhivotovsky, L.A., Lin, A.A., Henn, B.M., Louis, D., Ruhlen, M. & Mountain, J.L. (2003) African Y Chromosome and mtDNA Divergence Provides Insight into the History of Click Languages. Curr Biol, 13, 464‐73.

Krings, M., Salem, A.E., Bauer, K., Geisert, H., Malek, A.K., Chaix, L., Simon, C., Welsby, D., Di Rienzo, A., Utermann, G., Sajantila, A., Paabo, S. & Stoneking, M. (1999) mtDNA analysis of Nile River

(10)

Valley populations: A genetic corridor or a barrier to migration? Am J Hum Genet, 64, 1166‐

76.

Macaulay, V., Richards, M., Hickey, E., Vega, E., Cruciani, F., Guida, V., Scozzari, R., Bonne‐Tamir, B., Sykes, B. & Torroni, A. (1999) The emerging tree of West Eurasian mtDNAs: a synthesis of control‐region sequences and RFLPs. Am J Hum Genet, 64, 232‐49.

Mateu, E., Comas, D., Calafell, F., Perez‐Lezaun, A., Abade, A. & Bertranpetit, J. (1997) A tale of two islands: population history and mitochondrial DNA sequence variation of Bioko and Sao Tome, Gulf of Guinea. Ann Hum Genet, 61 ( Pt 6), 507‐18.

Pereira, L., Macaulay, V., Torroni, A., Scozzari, R., Prata, M.J. & Amorim, A. (2001) Prehistoric and historic traces in the mtDNA of Mozambique: insights into the Bantu expansions and the slave trade. Ann Hum Genet, 65, 439‐58.

Plaza, S., Calafell, F., Helal, A., Bouzerna, N., Lefranc, G., Bertranpetit, J. & Comas, D. (2003) Joining the pillars of Hercules: mtDNA sequences show multidirectional gene flow in the western Mediterranean. Ann Hum Genet, 67, 312‐28.

Plaza, S., Salas, A., Calafell, F., Corte‐Real, F., Bertranpetit, J., Carracedo, A. & Comas, D. (2004) Insights into the western Bantu dispersal: mtDNA lineage analysis in Angola. Hum Genet, 115, 439‐47.

Rando, J.C., Pinto, F., Gonzalez, A.M., Hernandez, M., Larruga, J.M., Cabrera, V.M. & Bandelt, H.J.

(1998) Mitochondrial DNA analysis of northwest African populations reveals genetic exchanges with European, near‐eastern, and sub‐Saharan populations. Ann Hum Genet, 62 (Pt 6), 531‐50.

Richards, M., Macaulay, V., Hickey, E., Vega, E., Sykes, B., Guida, V., Rengo, C., Sellitto, D., Cruciani, F., Kivisild, T., Villems, R., Thomas, M., Rychkov, S., Rychkov, O., Rychkov, Y., Golge, M.,

Dimitrov, D., Hill, E., Bradley, D., Romano, V., Cali, F., Vona, G., Demaine, A., Papiha, S., Triantaphyllidis, C., Stefanescu, G., Hatina, J., Belledi, M., Di Rienzo, A., Novelletto, A., Oppenheim, A., Norby, S., Al‐Zaheri, N., Santachiara‐Benerecetti, S., Scozari, R., Torroni, A. &

Bandelt, H.J. (2000) Tracing European founder lineages in the Near Eastern mtDNA pool. Am J Hum Genet, 67, 1251‐76.

Rodhain, F., Ardoin, P., Metselaar, D., Salmon, A.M. & Hannoun, C. (1975) An epidemiologic and serologic study of arboviruses in Lake Rudolf basin. Trop Geogr Med, 27, 307‐12.

Rodhain, F., Hannoun, C. & Metselaar, D. (1972) [Epidemiological and serological study of the

arboviruses in the lower valley of the Omo (southern Ethiopia)]. Bull World Health Organ, 47, 295‐304.

Röhl, A., Brinkmann, B., Forster, L. & Forster, P. (2001) An annotated mtDNA database. Int J Legal Med, 115, 29‐39.

Rosa, A., Brehm, A., Kivisild, T., Metspalu, E. & Villems, R. (2004) MtDNA profile of West Africa Guineans: towards a better understanding of the Senegambia region. Ann Hum Genet, 68, 340‐52.

Saillard J, Forster P, Lynnerup N, Bandelt HJ, Nørby S. (2000) mtDNA variation among Greenland Eskimos: the edge of the Beringian expansion. Am J Hum Genet. 67:718‐26.

Salas, A., Comas, D., Lareu, M.V., Bertranpetit, J. & Carracedo, A. (1998) mtDNA analysis of the Galician population: a genetic edge of European variation. Eur J Hum Genet, 6, 365‐75.

Salas, A., Richards, M., De La Fe, T., Lareu, M.V., Sobrino, B., Sanchez‐Diz, P., Macaulay, V. &

Carracedo, A. (2002) The making of the African mtDNA landscape. Am J Hum Genet, 71, 1082‐111.

(11)

Salas, A., Richards, M., Lareu, M.V., Scozzari, R., Coppa, A., Torroni, A., Macaulay, V. & Carracedo, A.

(2004) The African diaspora: mitochondrial DNA and the Atlantic slave trade. Am J Hum Genet, 74, 454‐65.

Sanchez‐Mazas, A. & Poloni, E.S. (2008) Genetic diversity in Africa. In: Encyclopedia of Life Sciences) Encyclopedia of Life Sciences. Chichester: John Wiley & Sons.

Tagliabracci, A., Turchi, C., Buscemi, L. & Sassaroli, C. (2001) Polymorphism of the mitochondrial DNA control region in Italians. Int J Legal Med, 114, 224‐8.

Thomas, M.G., Weale, M.E., Jones, A.L., Richards, M., Smith, A., Redhead, N., Torroni, A., Scozzari, R., Gratrix, F., Tarekegn, A., Wilson, J.F., Capelli, C., Bradman, N. & Goldstein, D.B. (2002) Founding mothers of Jewish communities: geographically separated Jewish groups were independently founded by very few female ancestors. Am J Hum Genet, 70, 1411‐20.

Tishkoff, S.A., Gonder, M.K., Henn, B.M., Mortensen, H., Knight, A., Gignoux, C., Fernandopulle, N., Lema, G., Nyambo, T.B., Ramakrishnan, U., Reed, F.A. & Mountain, J.L. (2007) History of click‐

speaking populations of Africa inferred from mtDNA and Y chromosome genetic variation.

Mol Biol Evol, 24, 2180‐95.

Vigilant, L., Stoneking, M., Harpending, H., Hawkes, K. & Wilson, A.C. (1991) African populations and the evolution of human mitochondrial DNA. Science, 253, 1503‐7.

Watson, E., Bauer, K., Aman, R., Weiss, G., Von Haeseler, A. & Paabo, S. (1996) mtDNA sequence diversity in Africa. Am J Hum Genet, 59, 437‐44.

Watson, E., Forster, P., Richards, M. & Bandelt, H.J. (1997) Mitochondrial footprints of human expansions in Africa. Am J Hum Genet, 61, 691‐704.

(12)

Figure S1

Median joining networks of mitochondrial DNA haplotypes, modified from Kivisild et al. (2004), so as to include the Nyangatom, Daasanach and Turkana sequences from the LORV sample.

Notes :

Nodes in the network represent haplotypes, and are drawn proportional to their frequency (see caption). Nyangatom, Daasanach and Turkana sequences are shown in green, blue and red, respectively, whereas the Ethiopian and Yemeni haplotypes of Kivisild et al. (2004) are shown in pink and yellow (see caption). LORV sequences IDs are reported in italics next to nodes. Mutated positions relative to the rCRS (i.e all transitions, unless specified) are shown along the branches connecting the nodes in the network; mutations in HVS‐I are shown in red (position minus 16000), those in HVS‐II in blue, and those in the coding‐region in dark‐brown. Branch lengths were only schematically drawn as proportional to the number of mutational steps, in order to respect the phylogenetic relationships observed by Kivisild et al. (2004).

a : Network of L0, L1 and L5 haplotypes. In the L0a1 clade, Daasanach number 213 and Nyangatom number 413 were reported at midway along the branch connecting haplotypes 36 and 37 of Kivisild et al. (2004), because these haplotypes are distinguished by a single transition at position 3866, not tested in our study. In the L0g clade, Nyangatom sequence 415 is distinguished from Nyangatom sequence 590 by two transitions at positions 16003 and 16004. However, these could be artifacts, since they are located near the beginning of the sequenced stretch. Turkana sequence K06, classified as L1c, was found very similar to a Tanzanian Turu L1c sequence (Gonder et al. 2007, GenBank EF184612), with which it shared the 16017 and 16163 states, while differing from it at positions 16158, 16209, and 151. In the L5a2 clade, Nyangatom sequence 381 is distinguished from sequences 244 (Nyangatom), 732 (Daasanach) and K65 (Turkana) by a transversion at 16399, but this could be a sequencing artifact as it is the last position sequenced. b : Network of L2, L6 and L4 haplotypes. The L2’L6 part of the network was built without considering position 16399, since this position was not scored for most haplotypes in Kivisild et al. (2004). Nevertheless, it is shown in brackets on the branch leading to L2d, as in Salas et al. (2004), since a G was observed at this position in all three Nyangatom L2d2 sequences (IDs 186, 223, 643). c : Network of L2a haplotypes. d : network of L3 haplotypes.

(13)

a

L1b1a

L0k

L0f root L0

L0d

L0g

546 623

590 415

692

277

734

754 735

L1

L1b

634

L1c L1c1 ^K06

L5

L5b

414 425 769, 773,

776, 782

L5a

L5a2 ₆₂₁

724 728 200 745 759

K41 K44

167 756

L5a1

L6 L4 L3 L2

L0a2 L0a

211

228 236 736

240 293 321 217 K34 429

201 361

K106

L0a1

664

428 684

280

726

245 252 413

213 733 721, 762,

767, 781

K21 229 234 262 412 421

381

416 732 K65

244 399C

233 231 360

317

111 254 319 311 176 291

355 360 362

192 93 187

223

172 209 213

265C 93 295

138

3594

129 187 189

166 148 189

254 311 368

10810

85 260 394

172 293

114

289 293

17 158 163 293

126 129 264 270

294 360 230

172 214 291A

148

51 164 188G

278

166C 223 287

169 327

173 368

239

223 230 274 352 52 172 290 325 354

327 316

63 129

3 4

170C 249 173

209 223

187 188 289

129 320 266

184 93

184 241

368 93 188 214 234 311 320 260

168

114A 162 278

209 93

293 320

263

093 150

287 129 294 287 293

Nyangatom Daasanach Turkana

Ethiopian (Kivisild et al. 2004) Yemeni (Kivisild et al. 2004)

1 2 3 4

(14)

b

L3

L4 L4g

132

379

783

L4a

L4a1

727 203 742 743 744

391

164 139 142 691

341 399

147 177 751 746

763

root

L5 L1 L0

L6

400

418

230 12

sequences

L2d1 L2 L2d

182 192

L2b 195 344

L2a

408

L2d2

186 223 643

294 75 93

274 189 344

136 192

111 230 86

169 284

293T 355 399

244 153 179 189 239 320 221 261

320

301

93 172 189 168 287

4 180

192 289 301

189

362

260

289 207T 93

207 207

261 318

163 356 264 320

220 244 93 189 207 265C

362 153 189

309 189 264

224

309 311

48 48

362

93 188G

173

295 390 278

3594

10810

129 187 (399) 189

129 189 223 300 354

311 111A 145 239 292 355 294

354 153 311 362

145

129 298

154 179 189

311 114A 390

213

93 184 234 311

1 2 3 4

133 244

(15)

c

543

L2a1

408

L2a1c

209 301 354 K110

214 226

L2a1a

322 365 667 7 sequences

L2a

L2d L2 L2b

5 sequen-

ces 665

242 123

738 K10 258

335 51

241C

294 189

93 139

309 37

129 286

150 124 362

292 188 256

213

92

311 192

192 309

354 172 222

309 287

316 172

172 311

229

291 229291 192 187

192 309

173 189

1 2 3 4

(16)

d

L3w

L3 L3d

L3bd L3d1

L3i L3d2

L3e

L3f L3f1

L3x1 L3x L3x2

L3e1 L3e3

L3b

L4

L2 L6 L3h

778

273 711

760

417 225

100, 126, 135, 194

723

149 163 423

168 402

176

L3e1a

160 243 404 739

122, 166 189

239

255 128

171

224, 256, 270, 272, 281, 343, 592, 620, 661

777 752 753

190, 191, 237 248, 426

398 5 sequen-

ces

K77 9 sequences 205

320 287

383 sequen-⁶

ces

L0 L1 L5

L3i1

141 276

165

179 256A

255

284

289 192

175 148

354 354

179 256A 399 124 153

169 311

209

260

311 266

292

111A 129

286 311

126 176 93

93

129 150 234

93 185 189 235 186 327 309

298 256

320

305

10398

66 136 368

171 189A 292 293 192

311 311

126

223

86 147 193 195

235 243

311

223 278

327 311

265T 327 185

129 391

335T 294

185 220

319 75 126

174

239 93

172 138T 66 230 86

278 93

362 316 360 327

254 385

319 111 256

93 368

1 2 3 4

(17)

Figure S2

Map with approximate geographic location of the 136 population samples included in the HVS‐I database.

Notes :

The samples are represented by dots color‐coded according to linguistic affiliation: Afro‐Asiatic (green), Nilo‐Saharan (red), Niger‐Congo (yellow), Khoisan and Khoisan‐related languages of Tanzania (dark blue), Indo‐European (light blue), Altaic (pink), and Basque (brown). We defined seven geographic subdivisions, represented by shaded areas on the map, that broadly correspond to those of Salas et al. (2002) and Sanchez‐Mazas & Poloni (2008), i.e. East Africa (EA), Central and Southwest Africa (CSWA), West Africa (WA), Southeast and South Africa (SESA), North Africa (NA), Middle East and West Asia (MEWA) and South Europe (SE).

(18)

TuruSandawe Datog Burunge Hadza Sukuma Hadzabe Nalu

NA WA

CSWA

SESA

EA

MEWA

SE

(19)

Figure S3

Three successive Principal Coordinates Analyses (PCA) of pairwise Reynolds’ genetic distances among populations computed from HVS‐I sequence‐based Φ_ST indices.

Notes :

The populations are color‐coded according to geographic location, as shown in the captions. The proportion of the total genetic variance explained by each of the two first coordinates is shown in brackets. a : The PCA of genetic distances among the 136 populations of the HVS‐I database highlights the genetic divergence of the Mbenzele and Ju|'hoansi from all other populations; a correlation coefficient r = 0.711 (95% C.I. = [0.617 ; 0.785]) was found between the latitudinal location of the populations and the PCA’s first axis coordinates. b : The PCA of genetic distances among 134 populations (i.e. excluding the Mbenzele and Ju|'hoansi) highlights the genetic divergence of the Herero; a correlation coefficient r = 0.877 (95% C.I. = [0.832 ; 0.911]) was found between the latitudinal location of the populations and the PCA’s first axis coordinates. c : The PCA of genetic distances among 133 populations (i.e. excluding the Mbenzele, Ju|'hoansi and Herero) resulted in a correlation coefficient r = 0.882 (95% C.I. = [0.838 ; 0.914]) between the latitudinal location of the populations and the PCA’s first axis coordinates.

(20)

0.5 0.5

0.3 0.4

a b c

0.0

14%)

0.0

e 1 (69.82%)

0.1 0.2

.36%)

Ju|'hoansi

‐0.5

Coordinate 1 (58.

Herero Coordinat ‐0.5

Datoga

Sandawe Nyangatom

Daasanach

‐0.2

‐0.1 0.0

Coordinate 1 (74

Mbenzele

‐1.0

‐0.5 0.0 0.5 1.0

Coordinate 2 (13.15%)

North Afri a East Afri a

Hadza Hadzabe Lomwe

Sekele

‐0.4

‐0.3

‐1.5

‐1.0 ‐0.5 0.0 0.5

North Africa East Africa

Central and South‐West Africa West Africa

Central and South‐West Africa West Africa South‐East and South Africa South Europe Middle East and West Asia

‐0.5

‐0.2 ‐0.1 0.0 0.1 0.2 0.3 0.4 0.5

C l d S h f i f i

Central and South‐West Africa West Africa South‐East and South Africa South Europe Middle East and West Asia