Contents
1 Introduction 1
1.1 Breast Cancers . . . . 1
1.1.1 Breast Cancers Prognostication . . . . 3
1.1.2 Prediction of Treatment Efficiency in Breast Cancers . . . . . 4
1.1.3 High Throughput Tumour Profiling . . . . 6
1.2 Epigenetics . . . . 7
1.3 Infinium Technology . . . . 8
1.4 Extracting Signatures . . . . 9
2 Aim of the Thesis & Original Contributions 11 2.1 Original Work . . . . 12
2.1.1 Infinium HumanMethylation Beadarrays Evaluation . . . . 12
2.1.2 Breast Cancers MeTIL Signature Extraction . . . . 13
2.1.3 Other Related Projects . . . . 14
3 Biological Background 17 3.1 Breast Cancers . . . . 17
3.1.1 Anatomopathological, Histological Classification and Staging . 17 3.1.2 Clinical and Molecular Classification . . . . 19
3.1.3 Tumour Microenvironment . . . . 23
3.2 Epigenetics . . . . 25
3.2.1 Epigenetics & the central dogma of molecular biology . . . . . 25
3.2.2 Chromatin Structure . . . . 26
3.2.3 Epigenetic Modifications . . . . 28
3.2.3.1 DNA Modifications . . . . 28
3.2.3.2 Histones Modifications . . . . 31
3.2.3.3 Noncoding RNAs . . . . 33
3.2.3.4 RNA Modifications . . . . 33
3.2.4 Epigenetic at cis-Regulatory Elements . . . . 33
3.2.4.1 Promoter . . . . 33
3.2.4.2 Gene Body . . . . 36
3.2.4.3 Enhancers . . . . 38
3.2.4.4 Regulatory Region Identification . . . . 39
3.2.5 Epigenetic Alterations in Breast Cancers . . . . 40
3.3 DNA Methylation Assessment . . . . 44
3.3.1 Bisulphite Conversion . . . . 44
3.3.2 Massively Parallel Sequencing . . . . 46
3.3.3 Infinium beadarrays . . . . 49
3.3.3.1 Promoter-centric Infinium Arrays (GoldenGate and HumanMethylation27 beadarrays) 49 3.3.3.2 High Coverage Infinium Arrays (HumanMethylation450 and HumanMethylation850 Beadarrays) . . . . 50
3.3.4 Methods for Validation at Single-site Scale . . . . 51
4 Bioinformatic Background 54 4.1 Unreliable Infinium Probes Filtering . . . . 54
4.1.1 High Detection P values . . . . 54
4.1.2 Cross-reactive Probes . . . . 55
4.1.3 Probes Containing Common SNPs . . . . 55
4.1.4 Probes Located on Heterochromosomes . . . . 56
4.2 Infinium HumanMethylation Beadarrays Normalisation . . . . 57
4.2.1 Inheritance from Expression Array Normalisation . . . . 57
4.2.2 Within-array Normalisation . . . . 60
4.2.3 Between-array Normalisation . . . . 64
4.3 Extracting Signatures from microarray data . . . . 67
4.3.1 Gene Expression Signatures in Breast Cancers . . . . 68
4.3.2 Machine-Learning-based Signature Extraction . . . . 71
4.3.2.1 Biological Knowledge . . . . 72
4.3.2.2 Feature Extraction . . . . 73
4.3.2.3 Filter Feature selection . . . . 74
4.3.2.4 Embedded Feature selection . . . . 76
4.3.2.5 Wrapper Feature selection . . . . 78
4.3.2.6 Data Balancing . . . . 80
5 Infinium HumanMethylation Beadarrays Evaluation 82 5.1 Processing of Infinium HumanMethylation High-density Beadarrays . 85
5.2 Dataset Description . . . . 86
5.3 Filtering Impact in 450k, 850k and RRBS Technologies . . . . 88
5.4 Evaluation of Normalisation Methods . . . . 90
5.4.1 Evaluation of 450k Within-array Normalisation Methods . . . 90
5.4.2 Evaluation of 450k Between-array Normalisation Methods . . 93
5.4.3 Evaluation of Normalisation Methods on 850k Data . . . . 95
5.4.4 Variance Heterogeneity . . . . 98
5.5 Biological Features Covered by Infinium Beadarrays . . . 103
5.5.1 Development of Alternative Annotations . . . 103
5.5.1.1 Regulatory Regions . . . 106
5.5.1.2 Association to Transcript . . . 106
5.5.1.3 CpG Island-associated Regions . . . 107
5.5.1.4 Promoter/Non-promoter Regions . . . 107
5.5.1.5 Illumina Default Annotation . . . 108
5.5.2 Infinium HumanMethylation850 Coverage Evaluation . . . 108
5.5.3 Epigenetic-based 850k Annotation . . . 112
5.5.4 Differential Methylation Analysis with 850k . . . 116
5.6 Discussion . . . 119
5.6.1 The Epigenetic-based Annotation We Developed Improves Infinium Interpretability . . . 119
5.6.2 Our Study Reveals the Broad Methylome View Provided by 850k Relatively to RRBS . . . 121
5.6.3 Our Comparative Study Highlights PBC and NOOB as Best Within-array Normalisation . . . 122
5.6.4 Our Comparative Study Reveals that Between-array Normalisation can Artefactually Distort Data . . . 124
5.6.5 Our Between-replicates Analysis and Side Projects Show the Need for a Methylation Difference Threshold . . . . 125
6 The MeTIL Score: Predicting TIL Amount with DNA Methylation thanks to Machine Learning 131 6.1 Data and Cohort Description . . . 133
6.2 Derivation of the MeTIL Signature . . . 135
6.2.1 Initial feature selection . . . 135
6.2.2 Generation of a Signature Population . . . 137
6.2.3 Final Signature Selection . . . 142
6.3 Computation of the MeTIL Score from the Signature . . . 148
6.4 Evaluation of the MeTIL Score Performance . . . 150
6.4.1 Evaluation of TIL Distributions Using the MeTIL Score . . . 150
6.4.2 Prediction of Survival and Response to Chemotherapy with the MeTIL Score . . . 159
6.4.3 Evaluation of TILs through Bisulphite Pyrosequencing of MeTIL Markers . . . 162
6.4.4 Prediction of Survival Outcome in Other Cancer Types with the MeTIL Score . . . 164
6.5 Discussion . . . 165
6.5.1 Our Original Machine Learning Approach Extracts the Representative Signature from a Signatures Population . . . . 165
6.5.2 Our MeTIL signature Specifically Reflects TILs . . . 169
6.5.3 Our MeTIL Score Predict Outcome and Response to Chemotherapy . . . 171
6.5.4 Our MeTIL Score May be Transferred in Clinics Using Pyrosequencing . . . 172
6.5.5 Our MeTIL Score is Prognostic in Other Cancers . . . 172
7 Conclusions & Perspectives 175 7.1 Summary of the Contributions of this Thesis . . . 175
7.1.1 Infinium HumanMethylation Preprocessing . . . 175
7.1.2 Epigenetic-based Annotation . . . 176
7.1.3 Development of a Score Reflecting TILs . . . 177
7.2 Future Works . . . 178
7.2.1 Improvement of Infinium Processing . . . 178
7.2.2 Exploration of the Signature Population . . . 179
7.2.3 Extension of the MeTIL Signature . . . 180
7.2.4 Epigenetic in Breast Cancers . . . 180
A Background: Supplementary Information 181 A.1 Epigenetic Modifications . . . 181
A.1.1 Histone Modifications . . . 181
A.1.2 Noncoding RNAs . . . 183
A.1.3 RNA Modifications . . . 188
A.2 DNA methylation Assessment . . . 194
A.2.1 Methylation-sensitive Restriction Enzymes . . . 194
A.2.2 Affinity Enrichment . . . 194
A.2.3 Massively Parallel Sequencing . . . 195
A.2.3.1 Restriction-based Sequencing (Methyl-seq) . . . 195
A.2.3.2 Affinity-based Sequencing (MeDIP & MethylCap) . . 195
A.2.4 Microarrays . . . 195
A.2.4.1 Restriction-based Microarrays (MethylScope and CHARM) . . . 195
A.3 Signature Extraction from microarrays . . . 197
A.3.1 Cox Regression . . . 197
A.3.2 Feature Extraction . . . 197
A.3.3 Mutual Information . . . 198
A.3.4 Logistic Regression . . . 198
B Infinium Evaluation: Supplementary Material 199 B.1 Normalisation . . . 199
B.2 Biological features covered by Infinium arrays . . . 205
C MeTIL score: Supplementary Material 208 C.1 Extracting Signatures . . . 208
C.2 Patient Cohorts . . . 208
D Publications 224
Bibliography 315