Chapter 4 - Materials and methods
4.10 RNA-‐seq data
4.10.1 RNA-seq data generation
RNA sequencing (RNA-seq) libraries were prepared with the Illumina TruSeq Stranded mRNA protocol and sequenced on a HiSeq 2500 machine, as single-end, 100 base pairs (bp) reads. Prior to analyses, RNA-seq reads were trimmed to remove 3’ end adapters.
4.10.2 Mapping of RNA-seq reads to reference genome
Reads were aligned using Tophat2 (Kim et al., 2013) to the ENSEMBL mm10 genome for mouse samples, the king cobra genome (Vonk et al., 2013) for corn snake
samples and the Burmese python genome (Castoe et al., 2013) for python samples. The UCSC genome browser and the IGV genome browser were used to visualize the mapping.
4.10.3 Global analysis of the RNA-seq mapped data
Bam files generated by Tophat were then used to generate a count table with the function summarizeOverlaps from the GenomicAlignments R package (Lawrence et al., 2013). Count table data was then analysed using the DESeq2 R package (Love et al., 2014).
Data was rlog transformed and sample distances calculated using the dist R function to be plotted in a heatmap. Rlog transformed data was also used to generate a principal component analysis (PCA) plot by using the plotPCA function.
Differential expression analysis was run using the DEseq R function and MA-plots showing differential expression between E9.5 mouse tail RNA transcripts and other samples were done using the MAplot function. A clustered differential expression heatmap was generated by selecting 100 genes with the highest variability across rows (i.e. stages) and calculating for each gene the deviation from the row mean.
4.10.4 De novo transcript assembly with Trinity
We used Trinity (Grabherr et al., 2011) version 2.0.6 to assemble transcripts from RNA-seq data, without relying on genome sequence or annotations. To obtain maximum sensitivity, we combined the RNA-seq data for all available samples in the transcript reconstruction procedure. To reduce memory requirements, we used the read normalization procedure implemented in Trinity, ran separately for each RNA-seq sample. To minimize the occurrence of retained introns in assembled transcripts, the minimum count for K-mers was set to 2 (Haas et al., 2013). As some retained introns or unspliced transcripts can still appear, we detected splice junctions in the assembled transcripts with TopHat (Kim et al., 2013) version 2.0.10, using the Trinity output as a genome index. The predicted splice junctions supported by at least 2 reads were considered to be putative retained introns and the spliced form of the corresponding Trinity contig was added to the set of predicted transcripts, while keeping also the original unspliced form. Whenever several introns were detected in a single Trinity contig, a single spliced form removing all introns was constructed.
4.10.5 Trinity transcript identification
To determine the genes from which Trinity-predicted transcripts were derived, we used an approach based on sequence similarity with annotated mouse proteins. First, we
extracted mouse protein sequences from the Ensembl database, release 81. For genes that had multiple protein-coding isoforms, we kept only the longest protein. For the Hox genes, we manually verified annotated proteins and extracted the one that best fitted the canonical form.
Second, we determined all possible open reading frames (ORFs, defined as any stretch of codons starting at an ATG and ending at a stop codon) within each Trinity contig. Partial ORFs (lacking the initial ATG or the stop codon) starting or ending at the first/last base of the Trinity contig were also permitted. Third, we searched for sequence similarity between the translated ORFs and mouse proteins using blastp (Altschul et al., 1990) and we extracted the best hits between translated ORF and proteins, using the blastp alignment score as a criterion.
We discarded cases where the best hits between translated ORFs and mouse proteins were not unique. We further filtered the blastp hits to select those with a percentage sequence identity above 95% (for mouse contigs) or 40% (for corn snake), and which were aligned across at least 25% of the protein length. We thus were able to predict the identity of 14,286 Trinity contigs for the mouse and 12,702 contigs for the corn snake.
4.10.6 Gene expression quantification
We used the previously identified Trinity contig ORFs to compute gene expression levels. For each contig, we extracted the part of the ORF that aligned with the mouse protein and constructed a Bowtie2 (Langmead and Salzberg, 2012) index containing exclusively these sequences. We then aligned the RNA-seq reads on this index using TopHat 2.0.10 and we computed expression levels for each selected Trinity contig using Cufflinks (Trapnell et al., 2012). All mapped reads were used for gene expression estimation, with the Cufflinks correction procedure for multiple mapped reads.
4.10.7 Gene expression normalization
We normalized gene expression levels across samples using a previously described approach (Brawand et al., 2011). Briefly, we identified the set of 200 genes that vary the least across samples, in terms of expression ranks. We then linearly scaled the FPKM expression levels to bring the medians of these 200 genes at the same value for all samples.
Acknowledgements
I would like to thank Denis Duboule for having accepted me as a PhD student in his lab. He fosters an atmosphere of high scientific level and critical thinking, while providing the freedom for our own initiative and creativity. I appreciate the opportunities that he put at my disposal to gain experience and improve on different aspects of my academic career.
I am also thankful to José-Luis Gómez Skarmeta, Ivan Rodriguez and Marcelo Sánchez for accepting to be my thesis jury. I look forward to discuss my results with such accomplished researchers.
I would like to acknowledge Moisés Mallo, my former supervisor, for the great collaboration that allowed for several mouse lines to be generated in a record amount of time.
It is a pleasure to keep such a good relationship even after the end of my Master. Thanks to Ana for having done the injections so skillfully.
I thank Michel Milinkovitch that allowed me to use the snake eggs that were essential for my project. It was a luxury to have this opportunity to work so closely with him and his lab. I would also like to thank Adrien for bringing me the eggs and taking care of the snakes. I also thank Suzanne and Sophie that were often helpful in assisting me or giving advise on snake related issues.
Thanks to Leo and Joost, two postdocs with whom I overlapped the longest and that were always happy to help, discuss ideas and give interesting input. It was a privilege to share the lab with Joost, the person that pioneered Hox gene studies in snakes and with whom I had extensive Evo-Devo related discussions with. Having shared the office and the lab with Leo for over 4 years, a great spirit of collaboration, support and friendship resulted. Both of them were very important during my PhD both at a professional and personal level.
I would also like to acknowledge Sandra, a very talented and reliable help during the last year of my thesis. Her assistance was essential for completing the last necessary experiments for my project.
I am very thankful to Anouk, an extremely talented bioinformatician that worked a lot on the corn snake transcriptome data analysis and without whom I wouldn’t have had the resources to generate the interspecies comparative study. In addition to her expertise she is an amazing person, always ready to help and patient enough to teach R and RNA analysis tricks to us.
I would also like to acknowledge Joska, an invaluable presence in the lab. One can only benefit from discussing with such an experienced and knowledgeable researcher. I would also like to acknowledge Aurélie for helping me with the French version of the summary. She is a very talented, motivated and hard working Master student that I am having the pleasure to co-supervise. I thank Béné as well as Hanh and Julien for keeping our mouse
lines safe. I would like to thank Eddie for bringing to a new level all discussions related to 3D chromatin conformation techniques, normalizations and interpretations and Imane for the great time I had supervising her 1 month-internship in this lab.
I would like to acknowledge the transgenic facilities at EPFL (Isabelle Barde) and CMU (Nicolas Steiner) as well as the genomics facility at CMU (Mylène Docquier) and the bioinformatics facility at the EPFL (Jacques Rougemont and Marion Leleu).
I am thankful to all members of the lab in Geneva or Lausanne past and present for a great atmosphere, advice and critical input.
Finally I would like to thank my family and my friends. My parents have always supported me in any decision and have from early on encouraged my curiosity for nature and science. My brother for constant challenging and companionship and my grandmothers for unconditional care. My friends for the fun spent together, essential for surviving the frustrations that come with research. I am thankful to Laurent for his unconditional support and care during this entire period.
References transgenic mice. Mechanisms of Development 52, 291-‐303.
Beisel, C., and Paro, R. (2011). Silencing chromatin: comparing modes and Esophageal Defects and Vertebral Transformations. Developmental Biology 177, 232-‐249. of transcriptional and post-‐transcriptional regulation are required to define the domain of Hoxb4 expression. Development 130, 2717-‐2728.
Brent, A.E., and Tabin, C.J. (2002). Developmental regulation of somite derivatives: muscle, cartilage and tendon. Current Opinion in Genetics &
Development 12, 548-‐557.
Burge, C., and Karlin, S. (1997). Prediction of complete gene structures in human
genomic DNA. J Mol Biol 268, 78-‐94.
Burke, A.C., Nelson, C.E., Morgan, B.A., and Tabin, C. (1995). Hox genes and the evolution of vertebrate axial morphology. Development 121, 333-‐346.
Cameron, R.A., Rowen, L., Nesbitt, R., Bloom, S., Rast, J.P., Berney, K., Arenas-‐
Mena, C., Martinez, P., Lucas, S., Richardson, P.M., et al. (2006). Unusual gene order and organization of the sea urchin Hox cluster. Journal of Experimental Zoology Part B-‐Molecular and Developmental Evolution 306B, 45-‐58.
Carapuço, M., Nóvoa, A., Bobola, N., and Mallo, M. (2005). Hox genes specify
Elements Increases Responsiveness to Positional Information. Developmental
Biology 171, 294-‐305.
paralogous genes hoxa-‐3 and hoxd-‐3 reveal synergistic interactions. Nature 370,
304-‐307.
protostome evolution. Nature 399, 772-‐776.
del Corral, R.D., and Storey, K.G. (2004). Opposing FGF and retinoid pathways: a
<italic>Hoxd</italic> Genes in Metanephric Kidney Development. PLoS Genet 3, e232. through heterochrony. Development 1994, 135-‐142.
Duboule, D. (1998). Hox is in the hair: a break in colinearity? Genes & Boundary Position and Regulates Segmentation Clock Control of Spatiotemporal Hox Gene Activation. Cell 106, 219-‐232.
Economides, K.D., Zeltser, L., and Capecchi, M.R. (2003). Hoxb13 mutations cause overgrowth of caudal spinal cordand tail vertebrae. Developmental Biology 256, 317-‐330.
Featherstone, M.S., Baron, A., Gaunt, S.J., Mattei, M.G., and Duboule, D. (1988).
Hox-‐5.1 defines a homeobox-‐containing gene locus on mouse chromosome 2.
Proceedings of the National Academy of Sciences of the United States of America
85, 4760-‐4764.Fernandez-‐Teran, M., and Ros, M.A. (2008). The Apical Ectodermal Ridge:
morphological aspects and signaling pathways. International Journal of Developmental Biology 52, 857-‐871.
Ferrier, D.E.K., and Minguillon, C. (2003). Evolution of the Hox/ParaHox gene clusters. International Journal of Developmental Biology 47, 605-‐611.
Feschotte, C., and Pritham, E.J. (2007). DNA Transposons and the Evolution of Organization of Two Hemichordate Hox Clusters. Current Biology 22, 2053-‐2058.
Friedli, M., Barde, I., Arcangeli, M., Verp, S., Quazzola, A., Zakany, J., Lin-‐Marq, N.,
regulation during digit development. Developmental Biology 306, 847-‐859.
Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., Amit, I., functional equivalence during paralogous Hox gene evolution. Nature 403, 661-‐
665. sequence reconstruction from RNA-‐seq using the Trinity platform for reference generation and analysis. Nat Protocols 8, 1494-‐1512.
Held I, L.J. (2014). How the Snake Lost its Legs: Curious Tales from the Frontier of Evo-‐Devo (Cambridge University Press).
Herault, Y., Beckers, J., Kondo, T., Fraudeau, N., and Duboule, D. (1998). Genetic cluster: Its dispersed structure and residual colinear expression in development.
Proceedings of the National Academy of Sciences of the United States of America
101, 15118-‐15123.Infante, Carlos R., Mihala, Alexandra G., Park, S., Wang, Jialiang S., Johnson, Kenji K., Lauderdale, James D., and Menke, Douglas B. (2015). Shared Enhancer Activity in the Limbs and Phallus and Functional Divergence of a Limb-‐Genital cis-‐Regulatory Element in Snakes. Developmental Cell 35, 107-‐119.
Kamm, K., Schierwater, B., Jakob, W., Dellaporta, S.L., and Miller, D.J. (2006). Axial
Kidwell, M.G., and Lisch, D. (1997). Transposable elements as sources of
and microarray-‐based analysis of protein location. Nature protocols 1, 729-‐748.
Lemons, D., and McGinnis, W. (2006). Genomic Evolution of Hox Gene Clusters.
International Journal of Biological Sciences 2, 95-‐103.
Morimoto, M., Takahashi, Y., Endo, M., and Saga, Y. (2005). The Mesp2 transcription factor establishes segmental borders by suppressing Notch activity.
Nature 435, 354-‐359.
Mortlock, D.P., and Innis, J.W. (1997). Mutation of HOXA13 in hand-‐foot-‐genital
syndrome. Nat Genet 15, 179-‐180.
Noordermeer, D., Leleu, M., Schorderet, P., Joye, E., Chabaud, F., and Duboule, D. transcriptional regulation has diverged significantly between human and mouse.
Nat Genet 39, 730-‐732.
Olivera-‐Martinez, I., Harada, H., Halley, P.A., and Storey, K.G. (2012). Loss of FGF-‐
Dependent Mesoderm Identity and Rise of Endogenous Retinoid Signalling Determine Cessation of Body Axis Elongation. PLoS Biol 10, e1001415.
Oosterveen, T., Niederreither, K., Dollé, P., Chambon, P., Meijlink, F., and Deschamps, J. (2003). Retinoids regulate the anterior expression boundaries of 5
′