• Aucun résultat trouvé

Factors influencing mutation rates

1. Cancer and genomics

1.4. Comprehensive analysis of large datasets

1.4.2. Factors influencing mutation rates

As mentioned before, the most conventional approach for the identification of somatic mutations and genes directly involved in tumorigenesis is the comparison of tumor and germline DNA of large sample sets, followed by statistical analysis to identify SMGs. It was initially thought that large datasets would increase the sensitivity and the specificity of the analyses, but in most cases what increased was the number of false positives, since highly

30 of 299

mutable genes such as olfactory receptors and large genes such as TTN and PCLO, were systematically identified and sometimes even nominated as cancer genes. To better understand mutational processes in cancer, over 3,000 tumor-germline pairs of 27 different cancer types for which whole exome or whole genome sequencing had been carried out, were studied as part of the Pan-Cancer analysis project (Lawrence et al., 2013).

The authors observed large variation in mutation rates among different cancer types (Figure 5). Disparity within tumors of the same type was also impressive, with some specimens having mutation rates as low as 1 mutation/Mb and others with mutation rates of over 100 mutations/Mb, as in the case of melanoma (Figure 5). Many times, increased or decreased mutations rates could be correlated to confounding factors, such as tobacco use in lung carcinomas, exposure or not to UV-light in melanoma, or the presence of mismatch repair mutations in colon cancers.

Lawrence and collaborators also looked at mutational profiles per tumor type, and if they could be used to cluster different tumor types together. They observed that different types of lung cancers share the C>A signature consistent with tobacco exposure, while melanomas have mostly C>T mutations, classical sign of UV-light induced mutations.

Tumors from the gastrointestinal tract show C>T mutations in a CpG context and epithelial cancers (bladder, cervical and head and neck) display a large fraction of C>T/G mutations in a TpC context, that could be caused by APOBEC restricting viral infection, a common coadjutant factor in a fraction of these types of cancer.

31 of 299

Mutation frequency was found to vary significantly throughout the genome of specific tumors and within tumor types. This could be a consequence of specific genomic features, such as gene expression (Pleasance et al., 2010a). Germline mutation rate is lower in highly expressed genes due to transcription-coupled repair (Fousteri and Mullenders, 2008), and this was confirmed in their sample set, where they found that mutations were less frequent in highly expressed genes. Average mutation rate is almost 3 times higher in the lowest expressed genes when compared to the highest expressed ones. Another feature found to be important was DNA replication time, which is also known to be correlated to germline mutation (Stamatoyannopoulos et al., 2009). Late-replicating regions are expected to have higher mutation rates, and this correlation was indeed observed in Lawrence and collaborators dataset, where the mutation rate was three times higher in the late versus the earliest replication regions. These observations explain some of the false positive cancer genes. For example, both olfactory receptors and large genes are lowly expressed, late replicating, and have a large number of silent or intronic mutations (Lawrence et al., 2013).

The authors integrated their observations to create a powerful algorithm to identify SMGs in cancer. This algorithm, called MutSigCV, takes into account the mutational covariates Figure 5. Somatic mutation frequencies in 27 different tumor types. Each dot was obtained through a tumor-matched normal comparison; the vertical axis indicates the frequency of somatic mutations in the exome. Tumor types are ordered based on their median somatic mutation frequency and the number of samples per tumor type is indicated above the plot.

Reproduced from Lawrence et al., 2013.

32 of 299

described in their publication and performs well when identifying SMG, effectively reducing false positives. Since its development, MutSigCV has been the program of choice to identify cancer genes and it has been used in numerous publications (Cancer Genome Atlas Research, 2012, Cancer Genome Atlas Network, 2015, Cancer Genome Atlas, 2015, Gao et al., 2014, Jones et al., 2012, Pugh et al., 2013, Pickering et al., 2014).

The study of a very similar dataset confirms the results regarding mutation rate among cancer types and mutational signatures and how they can be used to cluster tumors by their class and sometimes etiology (Kandoth et al., 2013). This study additionally identified several genes that were very frequently mutated in cancer. The most frequently mutated gene in their 3,281 tumors of 12 different types was TP53 (42% of the samples), followed by PIK3CA (mutated in 10% of their samples). Mutations in these two genes were specific to particular groups of cancers, TP53 being more frequent in ovarian or endometrial carcinomas and basal breast cancer, while PIK3CA did not occur in ovarian, lung or kidney cancers. Mutations in SMG across the 12 cancer types were subjected to unsupervised clustering, and it was found that 72% of the samples were indeed clustering with tumors of the same tissue type, having mutations in the same driver genes. Pairwise comparisons among mutations in the 127 SMG identified in this study found 14 mutually exclusive gene pairs. For example, TP53 and CDH1 are mutually exclusive in breast cancer (FDR 0.05), while TP53, PTEN, VHL, NPM1 and GATA3 are mutually exclusive across the full dataset (P=0.01). In contrast, there were a number of associations detected, with 148 co-occurring mutations across the SMGs in the dataset.

Furthermore, Kandoth et al. (2013) were able to temporally place the occurrence of mutations in the history of a tumor by looking at variant allele fraction (VAF) distribution of mutations in some of the SMGs in acute myeloid leukemia (AML), breast and uterine/cervical cancers. TP53 had the higher VAF in these cancer types, suggesting it tends to appear early in tumorigenesis although its elevated VAF might be due to cnLOH, a common event in TP53 and other tumor suppressors such as BRCA1, BRCA2 and PTEN.

Other genes were also identified in specific tumor types. For example, in AML, DNMT3A and SMC3 had significantly higher VAFs (P<0.0003 and P< 0.05 respectively) than average.

These analyses are interesting because they can allow the identification of the primary

33 of 299

drivers of tumorigenesis among all the contributing drivers in a specific cancer type. The results of this study, along with the ones described before, and many other reports have help us understand the mechanisms of cancer and how this disease begins.

Even though most of the mutations identified in cancer are passenger mutations, they are the product of the same mutational processes that give place to the drivers. The mutations bear the signature of said mutational processes, DNA damage, length and strength of exposure to said mutational processes, or DNA repair mechanisms. What we ultimately see when a tumor is sequenced, is a combination of different mutational signatures consequence of all the processes involved in cancer formation (Figure 6). Our understanding of these processes was until recently, limited to specific genes, but genome-wide approaches are starting to be applied to the analysis of large or comprehensive datasets (Helleday et al., 2014, Alexandrov and Stratton, 2014).

The most convenient dataset to study the full landscape of mutational signatures in cancer is WGS of cancer samples. Available sample sets of these kind are limited and apart from a complete but at the same time concise report of mutational signatures in melanoma and lung cancer (Pleasance et al., 2010a, Pleasance et al., 2010b) (briefly discussed above), none had been studied until Serena Nik-Zainal et al. (2012) analyzed mutational signatures in 21 genomes of breast cancers, where nine of them had germline predisposing mutations in BRCA1 or BRCA2. The 21 tumors and matched germline were sequenced to >30x and somatic variants were called. RNAseq was performed for 17 of the 21 samples.

The authors observed significant variation in the frequency of each type of substitution (C>A, C>G, C>T, T>A, T>C, and T>G) across all samples. When integrating the bases at the 5’and 3’ of the mutated base as a context of mutation, they observe that certain contexts are