HAL Id: hal-03241670
https://hal.archives-ouvertes.fr/hal-03241670
Preprint submitted on 28 May 2021HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
A draft genome sequence of the common, or spectacled
caiman Caiman crocodilus
Kenichi Okamoto, Nichole Dopkins, Elias Kinfu
To cite this version:
A draft genome sequence of the common, or spectacled caiman
Caiman crocodilus
Kenichi W. Okamoto , Nichole S. Dopkins, Elias S. Kinfu Department of Biology
Abstract
The common, or specatacled, caiman Caiman crocodilus is an abundant, widely dis-tributed Neotropical crocodilian exhibiting notable morphological and molecular diversifi-cation. The species also accounts by far for the largest share of crocodilian hides on the global market, with the C. crocodilus hide trade alone valued at about US$86.5 million per year. We obtained 239,911,946 paired-end reads comprising approximately 72 G bases using Illuminasequencing of tissue sampled from a single Caiman crocodilus individual. These reads were de-novo assembled and progressively aligned against the genomes of increasingly related crocodilians; liftoff was used to annotate the draft C. crocodilus genome assembly based on an Alligator mississipiensis (a confamilial species) annotation. The draft assembly and annotation are available at (doi.org/10.5281/zenodo.4755063).
keywords: Caiman crocodilus, spectacled caiman, genome, assembly, next-generation se-quencing, crocodilian, vertebrate genome
Introduction
1
The common, or spectacled, caiman Caiman crocodilus is one of the most widely distributed 2
and abundant crocodilian species, ranging continuously from Mexico to Argentina (Busack and 3
Pandya 2001; US Fish and Wildlife Service 2018). A generalist predator, C. crocodilus is re-4
markably adaptable, occupying a wide range of habitats from urban to seasonal savannahs to 5
tropical rainforests (Medem 1981, 1983), and has recently been introduced to Cuba, Puerto 6
Rico and Florida where it is considered an invasive species (US Fish and Wildlife Service 2018). 7
The broad distrbution and diversity of habitats has facilitated considerable intraspecific diver-8
sification within C. crocodilus; a recent single-locus molecular analysis by Roberto et al. (2020) 9
identified between seven and ten lineages within C. crocodilus across differing biogeographic 10
regions and watersheds throughout Central and South America. Within-species diversity is 11
also morphologically apparent, with skull shape in particular exhibiting systematic patterns 12
of regional differentiation (Medem 1955; Gans 1980; Medem 1981, 1983; Ayarzag¨uena 1984; 13
Escobedo-Galv´an et al. 2015). These intraspecific patterns of cranial shape variation within C. 14
crocodilus have been shown to parallel patterns of interspecific cranial diversity found in extant 15
crocodilians (Okamoto et al. 2015). 16
Additionally, C. crocodilus is a species of commercial importance, chiefly in the leather 17
industry. While the hides of C. crocodilus contain osteoderms that render the manufacturing 18
process more difficult than for other crocodilians, a majority of the approximately 1.5 million 19
crocodilian skins traded globally come from C. crocodilus (Brazaitis et al. 1998; Caldwell 2015). 20
As with other crocodilians, most legal hides come from commercial farming operations, and 21
the market for caiman hides is estimated to be over US $85 million (Caldwell 2015). Wild 22
populations of C. crocodilus are also hunted for meat and even fishing bait (Da Silveira and 23
Thorbjarnarson 1999; Brum et al. 2015; Pimenta et al. 2018) and provide ecosystem services 24
including nutrient cycling and biological control (Valencia-Aguilar et al. 2013; Marley et al. 25
2019). Due to its role as an apex predator, C. crocodilus exhibits considerable bioaccumulation, 26
with genotoxic analyses demonstrating molecular signatures of pollution on the C. crocodilus 27
genome (Oliveira et al. 2021). 28
Thus, a draft genome sequence for C. crocodilus can not only help provide insight into 29
evolutionary processes driving intraspecific diversification, but can also assist with improved 1
husbandry, ecotoxicology and wildlife management. 2
Methods
3
DNA was extracted from a tissue sample belonging to a single Caiman crocodilus museum 4
specimen (UF-FLMNH 171438) using the DNeasykit from Qiagen (Hilden, Germany). DNA 5
was quantitated using Thermofisher’s (Waltham, MA, USA) Picogreenkit (for a final Picogreen 6
concentration of 77.78 ng/µL). Tecan’s (M¨annedorf, Switzerland) NuGEN Celerokit was then 7
used to construct a paired-end library, which was subsequently sequenced on a single Illumina 8
(San Diego, CA) NovaSeq S4 lane. This yielded 239,911,946 paired-end reads of 2x150bp each. 9
Nucleic acid isolation, quantitation, library generation and raw-read sequencing were performed 10
at the University of Minnesota Genomics Center. 11
The reads were assembled de novo using the Iterative de Bruijn Graph Assembler (IDBA-12
UD; Peng et al. 2012). To assess the reliability of our pipeline from sequencing to de novo 13
assembly using IDBA-UD, we repeated the sequencing and assembly using a museum-derived 14
tissue sample from a single Alligator mississippiensis individual (UF-FLMNH 175565). This 15
resulted in 249,325,204 paired-end reads of 2x150bp each. As was the case for the C. crocodilus 16
individual, the reads were then de novo assembled using IDBA-UD, and we used QUAST 17
(Gurevich et al. 2013) to determine that the IDBA assembly of A. missippiensis captured 18
approximately 94.2% of a recently published A. missippiensis assembly (GCA_000281125.4; 19
Rice et al. 2017), with an N50 of 21172 based on de novo assembled contigs alone. 20
We scaffolded the resulting draft C. crocodilus contigs using a two-step procedure. First, 21
we scaffolded the caiman’s contigs against a Crocodylus porosus assembly (GCF_001723895.1; 22
Ghosh et al. 2020) using ragtag (Alonge et al. 2019). We then re-scaffolded the resulting 23
contigs/scaffolds against the confamilial Alligator mississipiensis assembly (GCA_000281125. 24
4), again using ragtag. The draft assembly was then submitted to the National Center for 25
Biotechnology Information (NCBI). Contaminants, mitochondrial DNA, vectors, adapters, and 26
sequences shorter than 200 bp identified by NCBI were manually removed using seqkit (Shen 27
et al. 2016) and custom scripts (available at http://github.com/kewok/ncbi_scrubber). 28
The resulting scaffold (10.5281/zenodo.4755063) was then masked using RepeatMasker 1
(Smit et al. 2015) relying on the HMMER database (Finn et al. 2011) and with “alligator” 2
specified as species. Finally, liftoff (Shumate and Salzberg 2020) was used to generate a draft 3
annotation based on the masked assembly using the annotations associated with A. mississip-4
iensis (GCA_000281125.4; Rice et al. 2017) as a reference. table2asn gff (National Center 5
for Biotecnology Information 2020) was then used to generate a Sequin file (National Center 6
for Biotechnology Information (US) 2014), and features flagged as errors were manually re-7
moved using custom scripts (available at https://github.com/kewok/ncbi_scrubber); the 8
draft annotation is available at 10.5281/zenodo.4755063). 9
Results
10
Our assembly yielded a draft assembly of length 2,341,057,913 bp with 465,471 scaffolds and 11
contigs, and an N50 of 70,464,410 bp (Proch::N50 - Telatin 2018). A total of 297,374 gene 12
features were predicted. 13
Conclusion
14
Here we have described the first draft assembly and annotation of the C. crocodilus genome. We 15
feel this can assist natural resource management, agriculture and research into broader questions 16
about the interplay between microevolutionary and macroevolutionary processes across broad 17
biogeographic scales. 18
Acknowledgments
We are especially indebted to Dr. P. S. Soltis, T. A. Lott and the Herpetology Collection at the University of Florida - Florida Natural History Museum (UF-FLNHM) for generously providing us with tissue samples. We would also like to thank Dr. A. Deshpande, D. Johnson, E. Froehling and the staff at the University of Minnesota Genomics Center (Minneapolis, MN, USA) for isolating DNA from museum samples, library preparation and raw sequencing. We wish to thank S. Landwehr and the Minnesota Supercomputing Institute (MSI) at the University
of Minnesota and Dr. J. P. Layfield at the University of St. Thomas for allowing us to access critical computational resources. Finally, we are very grateful to Dr. S. Pirro at Iridian Genomes (Bethesda, MD, USA) for valuable insight on scaffolding the draft assemblies. This research was made possible by start-up funds to KWO from the University of St. Thomas.
References
Alonge, M., Soyk, S., Ramakrishnan, S., Wang, X., Goodwin, S., Sedlazeck, F. J., Lippman, Z. B., and Schatz, M. C. 2019. RaGOO: Fast and accurate reference-guided scaffolding of draft genomes. Genome Biology 20:1–17.
Ayarzag¨uena, J. 1984. Variaciones en la dieta de Caiman sclerops. La relacion entre mor-fologia bucal y dieta. Memoria De La Sociedad De Ciencias Naturales La Salle 44:123–140. Brazaitis, P., Watanabe, M. E., and Amato, G. 1998. The Caiman Trade. Scientific
American 278:70–76.
Brum, S. M., Da Silva, V. M., Rossoni, F., and Castello, L. 2015. Use of dolphins and caimans as bait for Calophysus macropterus (Lichtenstein, 1819) (Siluriforme: Pimelodidae) in the Amazon. Journal of Applied Ichthyology 31:675–680.
Busack, S. D. and Pandya, S. 2001. Geographic variation in Caiman crocodilus and Caiman yacare (Crocodylia : Alligatoridae): Systematic and legal implications. Herpetologica 57:294– 312.
Caldwell, J. 2015. World Trade in Crocodilian Skins 2013-2015. Technical report, UN Environment Programme World Conservation Monitoring Centre.
Da Silveira, R. and Thorbjarnarson, J. B. 1999. Conservation implications of commercial hunting of black and spectacled caiman in the Mamiraua Sustainable Development Reserve, Brazil. Biological Conservation 88:103–109.
Escobedo-Galv´an, A. H., Velasco, J. A., Gonz´alez-Maya, J. F., and Resetar, A. 2015. Morphometric analysis of the Rio Apaporis Caiman (Reptilia, Crocodylia, Alligatori-dae). Zootaxa 4059:541–54.
Finn, R. D., Clements, J., and Eddy, S. R. 2011. HMMER web server: Interactive sequence similarity searching. Nucleic Acids Research 39:W29–W37.
Gans, C. 1980. Allometric Changes in the Skull and Brain of Caiman crocodilus. Journal of Herpetology 14:297–301.
Ghosh, A., Johnson, M. G., Osmanski, A. B., Louha, S., Bayona-V´asquez, N. J., Glenn, T. C., Gongora, J., Green, R. E., Isberg, S., Stevens, R. D., and Ray, D. A. 2020. A High-Quality Reference Genome Assembly of the Saltwater Crocodile, Crocodylus porosus, Reveals Patterns of Selection in Crocodylidae. Genome Biology and Evolution 12:3635–3646.
Gurevich, A., Saveliev, V., Vyahhi, N., and Tesler, G. 2013. QUAST: Quality assess-ment tool for genome assemblies. Bioinformatics 29:1072–1075.
Marley, G., Lawrence, A. J., Phillip, D. A., and Hayden, B. 2019. Mangrove and mudflat food webs are segregated across four trophic levels, yet connected by highly mobile top predators. Marine Ecology Progress Series 632:13–25.
Medem, F. 1955. A new subspecies of Caiman sclerops from Colombia. Fieldiana: Zoology 37:339–343.
Medem, F. 1981. Los Crocodylia de Sur America Volumen I. Ministerio de Educaci´on Nacional, Bogot´a, Colombia.
Medem, F. 1983. Los Crocodylia de Sur America Volumen II. Ministerio de Educaci´on Nacional, Bogot´a, Colombia.
National Center for Biotechnology Information (US) 2014. Submitting Sequences using Specific NCBI Submission Tools., p. NBK566995. In The GenBank Submissions Hand-book [Internet]. Bethesda, MD.
National Center for Biotecnology Information 2020. table2asn gff.
Okamoto, K. W., Langerhans, R. B., Rashid, R., and Amarasekare, P. 2015. Mi-croevolutionary patterns in the common caiman predict maMi-croevolutionary trends across extant crocodilians. Biological Journal of the Linnean Society p. In press.
Oliveira, V. C. S., Viana, P. F., Gross, M. C., Feldberg, E., Da Silveira, R., de Bello Cioffi, M., Bertollo, L. A. C., and Schneider, C. H. 2021. Looking for genetic effects of polluted anthropized environments on Caiman crocodilus crocodilus (Reptilia, Crocodylia): A comparative genotoxic and chromosomal analysis. Ecotoxicology and Environmental Safety 209:111835.
Peng, Y., Leung, H. C., Yiu, S. M., and Chin, F. Y. 2012. IDBA-UD: A de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28:1420–1428.
Pimenta, N. C., Barnett, A. A., Botero-Arias, R., and Marmontel, M. 2018. When predators become prey: Community-based monitoring of caiman and dolphin hunting for the catfish fishery and the broader implications on Amazonian human-natural systems. Biological Conservation 222:154–163.
Rice, E. S., Kohno, S., St John, J., Pham, S., Howard, J., Lareau, L. F., O’Connell, B. L., Hickey, G., Armstrong, J., Deran, A., Fiddes, I., Platt, R. N., Gresham, C., McCarthy, F., Kern, C., Haan, D., Phan, T., Schmidt, C., Sanford, J. R., Ray, D. A., Paten, B., Guillette, L. J., and Green, R. E. 2017. Improved genome assembly of American alligator genome reveals conserved architecture of estrogen signaling. Genome Research 27:686–696.
Roberto, I. J., Bittencourt, P. S., Muniz, F. L., Hern´andez-Rangel, S. M., N´obrega, Y. C., ´Avila, R. W., Souza, B. C., Alvarez, G., Miranda-Chumacero, G., Campos, Z., Farias, I. P., and Hrbek, T. 2020. Unexpected but unsurprising lin-eage diversity within the most widespread Neotropical crocodilian genus Caiman (Crocodylia, Alligatoridae). Systematics and Biodiversity 18:377–395.
Shen, W., Le, S., Li, Y., and Hu, F. 2016. SeqKit: A cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLoS ONE 11:e0163962.
Shumate, A. and Salzberg, S. L. 2020. Liftoff: accurate mapping of gene annotations. Bioinformatics In press.
Smit, A., Hubley, R., and Grenn, P. 2015. RepeatMasker Open-4.0. Telatin, A. 2018. Proch::N50.
US Fish and Wildlife Service 2018. Common Caiman (Caiman crocodilus) Ecological Risk Screening Summary. Technical report, US Fish and Wildlife Service.
Valencia-Aguilar, A., Cort´es-G´omez, A. M., and Ruiz-Agudelo, C. A. 2013. Ecosys-tem services provided by amphibians and reptiles in Neotropical ecosysEcosys-tems. International Journal of Biodiversity Science, Ecosystem Services and Management 9:257–272.