HAL Id: pasteur-02088893
https://hal-pasteur.archives-ouvertes.fr/pasteur-02088893
Submitted on 3 Apr 2019
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
A Blood RNA Signature Detecting Severe Disease in Young Dengue Patients at Hospital Arrival
Iryna Nikolayeva, Pierre Bost, Isabelle Casademont, Veasna Duong, Fanny Koeth, Matthieu Prot, Urszula Czerwinska, Sowath Ly, Kevin Bleakley,
Tineke Cantaert, et al.
To cite this version:
Iryna Nikolayeva, Pierre Bost, Isabelle Casademont, Veasna Duong, Fanny Koeth, et al.. A Blood RNA Signature Detecting Severe Disease in Young Dengue Patients at Hospital Arrival. Journal of Infec- tious Diseases, Oxford University Press (OUP), 2018, 217 (11), pp.1690-1698. �10.1093/infdis/jiy086�.
�pasteur-02088893�
The Journal of Infectious Diseases
A Blood RNA Signature Detecting Severe Disease in Young Dengue Patients at Hospital Arrival
Iryna Nikolayeva,
1Pierre Bost,
1,2Isabelle Casademont,
3,4Veasna Duong,
5Fanny Koeth,
3, 4Matthieu Prot,
3,4Urszula Czerwinska,
6Sowath Ly,
5Kevin Bleakley,
7,8Tineke Cantaert,
9Philippe Dussart,
5Philippe Buchy,
5,10Etienne Simon-Lorière,
3,4Anavaj Sakuntabhai,
3,4and Benno Schwikowski
11
Systems Biology Lab, Center for Bioinformatics, Biostatistics, and Integrative Biology (C3BI), USR 3756 – Institut Pasteur and CNRS,
2Graduate School of Life Sciences ED515, Sorbonne Universités UPMC Paris VI,
3Unité de Génétique fonctionnelle des maladies infectieuses, Institut Pasteur, and
4CNRS UMR2000: Génomique évolutive, modélisation et santé, Institut Pasteur, Paris, France;
5Virology Unit, Institut Pasteur du Cambodge, Institut Pasteur International Network, Phnom Penh, Cambodia;
6Institut Curie, PSL Research University, Mines ParisTech, INSERM, U 900,
7INRIA Saclay, Palaiseau, and
8Département de Mathématiques d’Orsay, Orsay, France;
9Immunology Group, Institut Pasteur du Cambodge, Institut Pasteur International Network, Phnom Penh, Cambodia; and
10GSK vaccines R&D, Singapore
Background. Early detection of severe dengue can improve patient care and survival. To date, no reliable single-gene biomarker exists. We hypothesized that robust multigene signatures exist.
Methods. We performed a prospective study on Cambodian dengue patients aged 4 to 22 years. Peripheral blood mononuclear cells (PBMCs) were obtained at hospital admission. We analyzed 42 transcriptomic profiles of patients with secondary dengue infected with dengue serotype 1. Our novel signature discovery approach controls the number of included genes and captures non- linear relationships between transcript concentrations and severity. We evaluated the signature on secondary cases infected with different serotypes using 2 datasets: 22 PBMC samples from additional patients in our cohort and 32 whole blood samples from an independent cohort.
Results. We identified an 18-gene signature for detecting severe dengue in patients with secondary infection upon hospital admission with a sensitivity of 0.93 (95% confidence interval [CI], .82–.98), specificity of 0.67 (95% CI, .53–.80), and area under the receiver operating characteristic curve (AUC) of 0.86 (95% CI, .75–.97). At validation, the signature had empirical AUCs of 0.85 (95% CI, .69–1.00) and 0.83 (95% CI, .68–.98) for the PBMCs and whole blood datasets, respectively.
Conclusions. The signature could detect severe dengue in secondary-infected patients upon hospital admission. Its genes offer new insights into the pathogenesis of severe dengue.
Keywords. dengue; severity; blood; RNA; transcriptome; biomarker; prognosis; risk; signature.
Dengue is the most widespread mosquito-borne viral infection worldwide. Currently, 40%–50% of the world population lives in areas at risk for dengue virus (DENV) transmission [1]. If the majority of dengue cases are uncomplicated, it is estimated that each year 500 000 cases, mostly children, progress to severe den- gue and require hospitalization. According to the World Health Organization (WHO), approximately 2.5% of those affected by severe dengue requiring hospitalization are still dying from complications [1]. The recent explosive spread of the related Zika virus might further increase this burden. Indeed, the com- plications associated with severe dengue are more common after secondary infection than after primary infection [2], and recent studies both in vitro and in vivo have highlighted the potential of anti-Zika immunity to trigger dengue enhancement [3]. As
recently highlighted by the WHO, robust and early detection of severe dengue, along with access to proper medical care, would not only decrease the fatality rate to 1% but also reduce health- care costs and economic burden of the disease [1].
Although diagnosis methods for dengue infection are well established, there are no prognostic tests to help the clinician evaluate the risk of infection progressing to severe dengue.
A number of signatures that use clinical variables for detecting severe cases of dengue infection have been proposed for adults and/or children [4–9]. Nevertheless, none of the signatures we found in the literature have been replicated on indepen- dent datasets. In addition to these studies, others have aimed to identify molecular biomarkers, based on either messenger RNA expression or on protein or cytokine levels. A number of genome-wide expression profiling studies have also been performed in Nicaragua, Cambodia, Thailand, and Vietnam [10–14]. Every study uncovered differentially expressed genes associated with severe dengue. Many of these genes have func- tions associated with innate immunity, vascular permeabil- ity, coagulation, neutrophil-derived antimicrobial resistance, inflammation, and lipid metabolism. However, their capacity to detect severe cases among dengue patients was not evaluated [6–8, 10–13], or the studies excluded children [9].
M A J O R A R T I C L E
© The Author(s) 2018. Published by Oxford University Press for the Infectious Diseases Society of America. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs licence (http://creativecommons.org/licenses/
by-nc-nd/4.0/), which permits non-commercial reproduction and distribution of the work, in any medium, provided the original work is not altered or transformed in any way, and that the work is properly cited. For commercial re-use, please contact journals.permissions@oup.com DOI: 10.1093/infdis/jiy086
Received 13 October 2017; editorial decision 8 February 2018; accepted 13 February 2018.
Correspondence: Benno Schwikowski, PhD, Systems Biology Lab, C3BI Institut Pasteur 25–28 Rue du Dr. Roux, 75015 Paris, France (benno.schwikowski@pasteur.fr).
XX XXXX
OA-CC-BY-NC-ND
The Journal of Infectious Diseases
®2018;XX00:1–9
We hypothesized that a simple combination of a small number of gene expression markers may be robust enough to establish reproducible detection of severe cases among newly admitted dengue patients. With this in mind, we attempted to develop a signature discovery algorithm—one that allowed for not only linear but also more general monotonic relationships among features, meaning more complex, but still easily inter- pretable, relationships among genes.
MATERIALS AND METHODS Cohorts Studied
We conducted a prospective study in the Kampong Cham refer- ral hospital in Cambodia during a 3-year period (2011–2013).
Patients with suspected dengue infection were identified on the date of admission to participate in the study and were enrolled after obtaining agreement and written informed consent from the patient or from parents or guardians.
On the same day, the patient or his/her parents were inter- viewed and stated the day of fever onset. Patients were processed as follows: plasma was used for dengue confirmatory diagnos- tics, including serology and molecular diagnostics, as described elsewhere [15], and blood clots and peripheral blood mononu- clear cells (PBMCs) were kept for later analyses (see description below). Briefly, plasma was tested for dengue infection by NS1 rapid diagnostic test at the hospital and sent, on the same day, to Institut Pasteur Cambodia for DENV reverse transcription polymerase chain reaction (RT-PCR) confirmation and DENV serotyping. The RT-PCR result was sent back to the hospi- tal. A second blood sample was taken at the day of discharge from the hospital and sent to Institut Pasteur Cambodia. Both
samples were tested for immunoglobulin M by M antibody-cap- tured enzyme-linked immunosorbent assay and total antibod- ies by hemagglutination inhibition assay. Primary or secondary immune status of DENV infections was determined by hemag- glutination inhibition assay according to WHO criteria [16] and was confirmed by comparing the hemagglutination inhibition assay titer between the 2 samples. We subsequently used only the transcriptome of the first blood sample collected at hospital admission for multigene signature discovery.
Although 438 patients were initially enrolled, only a fraction donated an amount of blood compatible with the biochemis- try and serological tests, as well as the isolation of PBMCs for transcriptomic analysis. Out of these, only 1 case was a primary infection, and few were non–DENV-1-infected. To maximize the likelihood for obtaining a meaningful RNA signature, we further limited our training set to samples from secondary, DENV-1–infected patients. This training dataset consisted of 42 patients (15 with severe dengue, 27 with nonsevere den- gue). From the remaining patients, 22 samples (7 with severe dengue, 15 with nonsevere dengue, from secondary patients with different serotypes) contained sufficient amounts of RNA for quantitative real-time polymerase chain reaction (qRT- PCR). These 22 samples were used as a validation dataset. The mean ages of these 2 datasets were 9.0 ± 3.8 years (7.7 ± 2.4 y for severe patients; 9.8 ± 4.2 y for nonsevere patients) and 8.8 ± 3.1 years (7.0 ± 1.5 y for severe patients; 9.7 ± 4.3 for nonsevere patients) (Table 1 and Supplementary Material 1).
For this PBMC cohort, disease severity was classified accord- ing to the 2009 WHO criteria [16] using clinical and biolog- ical data recorded at admission and throughout the entire
Table 1. Two Cohorts and Training and Validation Subcohorts
Case characteristics
Clinical cohort Devignot et al[11] cohort
Entire clinical cohort
PBMC microarray subcohort
PBMC qRT-PCR
subcohort Whole blood microarray subcohort
Training set Validation set Validation set
Suspected dengue cases 438 42 22 32
Age, y 8.8 ± 3.3 9.0 ± 3.8 8.8 ± 3.1 7.8 ± 2.5
Sex, % female 51 52 60 66
Mean day of blood sampling after onset of fever
3.3 ± 1.5 3.1 ± 1.9 3.6 ± 1.3 5.1 ± 1.0
Confirmed dengue 316 42 22 32
DENV-1 265 42 10 4
DENV-2 15 0 6 3
DENV-3 0 0 0 16
DENV-4 28 0 6 1
Unknown 8 0 0 8
Secondary infection 183 42 22 32
Classification according to WHO 2009 criteria Nonsevere dengue, with or
without warning signs 236 27 15 14
Severe dengue 80 15 7 18
Abbreviations: DENV, dengue virus; PBMC, peripheral blood mononuclear cells; qRT-PCR, quantitative real-time polymerase chain reaction; WHO, World Health Organization.
hospitalization period. Additionally, an independent publicly available dataset was used for a second validation [11]. In that dataset, gene expression had been quantified globally using microarrays from Cambodian blood samples. Disease sever- ity was classified according to the description in the section
“Signature Discovery” below. Table 1 provides additional sum- mary information across all datasets we used.
Ethics Statement
The study was approved by the Cambodian National Ethics Committee for Health Research (approval no. 087NECHR /2011 and no. 063NECHR/2012). Before enrollment, written consent signed by the participant or by a legal representative for participants aged <16 years was obtained.
RNA Preparation, Microarray Hybridization, and Quantitative Real-Time Polymerase Chain Reaction Validation
RNA was extracted from PBMCs stored in RNA protect cell reagent (Qiagen, Hilden, Germany) with a miRNeasy kit (Qiagen), and RNA quality was checked on a BioAnalyzer 2100 (Agilent, Santa Clara, CA).
For microarray analysis of the training cohort, gene expression in PBMCs was analyzed using Affymetrix Human Transcriptome Array 2 (HTA2) GeneChips. The HTA2 chips were prepared, hybridized, and scanned according to the manufacturer’s instruc- tions. For qRT-PCR of the PBMC validation cohort, 200 ng of RNA were reverse-transcribed with SuperScript VILO cDNA synthesis kit (Invitrogen, Life Technologies, Carlsbad, CA) using a combination of random hexamer and Oligo(dT)12–18 primers.
TaqMan Gene Expression Assays (Life Technologies) were used for each candidate gene according to the manufacturer’s instruc- tions. Relative expression was calculated with the 2
-ΔΔCtmethod, using beta-glucuronidase as an endogenous control for normal- ization and a calibrator sample as a comparator for every sample.
Machine Learning Methodology
Our signature was created using a machine learning approach based on monotonic regression on a training cohort (Supplementary Material 3–6). Briefly, new predictions made by the signature are based on 0/1 (nonsevere/severe) predictions (“votes”) derived from pairs of transcripts in the signature. Measured concentrations for any given transcript pair are turned into a binary vote using a 2-dimensional monotonic function [17], a generalization of a linear function that monotonically increases or decreases with the concen- tration of each transcript. The final consensus prediction is “severe”
if the mean of all votes is above a suitably chosen threshold t, and
“nonsevere” otherwise. The performance of individual transcript pairs on future patients is estimated using cross-validation.
Initially, a set of transcript pairs with optimal performance estimate is determined. Using a permutation test, those genes that do not confer a statistical performance advantage over the performance of their partner alone are eliminated. The result- ing model represents a unique combination of lower- and
higher-complexity features tailored toward the discovery of complex disease signatures. The monotonic model represents a generalization of linear models. Nevertheless, the result- ing predictors can still be visually and intuitively understood.
Controlling the number of transcripts in the signature allows different trade-offs between performance, robustness, and assay cost (Supplementary Material 9).
The signature is rescaled to different datasets by mapping transcripts to genes and quantile-normalizing the expression values (Supplementary Material 6). In cases where only 1 of 2 transcript measurements is available, the missing variable is replaced by a duplicate of the variable that is available.
Signature Discovery
We applied the above machine learning approach to microarray transcriptomes of the PBMC training cohort, leading to an initial assessment of its performance via rigorous cross-validation. After applying quantile normalization (Supplementary Material 6), we evaluated this signature on 2 other datasets (Figure 1).
The first validation dataset consisted of 22 patients (7 with severe dengue, 15 with nonsevere dengue) from the PBMC validation cohort whose gene expression was measured using qRT-PCR.
The second validation dataset was an independent, publicly available, Cambodian whole blood dataset selected for its large size and high quality [11]. It consisted of whole blood transcriptome data from 48 dengue-infected patients. At the time of that study, phenotype was still established according to the 1997 WHO clas- sification: DSS (dengue shock syndrome), DHF (dengue hemor- rhagic fever), and DF (dengue fever) [18]. To make phenotype data comparable, we reclassified the disease severity as well as possible in terms of the 2009 WHO classification. We considered all 18 DSS patients as severe dengue and all 14 DF patients as nonsevere, considering that DF patients that are reclassified as severe dengue in the 2009 WHO classification are rare. The DHF patients could not be classified without additional clinical information that was unavailable to us and were thus excluded.
Performance Evaluation
We summarized the empirical performance of our signature using receiver operating characteristic curves, which consist of the different combinations of empirical true- and false-positive rates that are obtained by varying the above threshold t between 0 and 1. For evaluations of the performance of state-of-the-art machine learning methods, we used the implementations from the Python sklearn package [19].
RESULTS
Our automated machine learning approach for signature dis-
covery resulted in an 18-gene signature that allows the detec-
tion of severe dengue from a blood sample taken from dengue
patients upon hospital arrival. We obtained area under the
receiver operating characteristic curve (AUC) values of 0.86,
A 1.0
B
0.8
True Positive Rate or (Sensitivity)
False Positive Rate or (1-Sensitivity)
Ensemble Pair Monotonic Regression (AUC = 0.86 CI = [0.75 ; 0.97]) Random Forests (AUC = 0.64 CI = [0.47 ; 0.81])
K Nearest Neighbors (AUC = 0.76 CI = [0.61 ; 0.90]) Decision Trees (AUC = 0.77 CI = [0.63 ; 0.91]) SVM (AUC = 0.78 CI = [0.64 ; 0.92])
Linear Discriminant Analysis (AUC = 0.81 CI = [0.68 ; 0.94]) Logistic Regression (AUC = 0.81 CI = [0.69 ; 0.94]) Logistic LASSO (AUC = 0.79 CI = [0.66 ; 0.93])
Whole blood validation dataset (AUC = 0.83 CI = [0.68 ; 0.98]) PBMC validation dataset (AUC = 0.85 CI = [0.69 ; 1.00]) 0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
1.0
0.8
True Positive Rate or (Sensitivity)
False Positive Rate or (1-Sensitivity) 0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Figure 2. Evaluation of classification performance. A, Cross-validated training data, various methods. B, Validation data, our signature.
Cross-validation
Blind validation
Rescale to qRT-PCR Rescale to microarray
PBMC cohort Additional patients qRT-PCR
Whole blood cohort Independent study Micro-array
SD cases Non-SD controls
DSS cases DF controls SD cases
Non-SD controls
Discovery PBMC cohort Micro-array
18-gene Biomarker
Figure 1. Discovery and validation of the severe dengue signature.
0.83, and 0.85 for the training set and 2 validation datasets with widely different characteristics, respectively (Figure 2).
Examples of empirical sensitivity-specificity combinations are shown in Table 2. The difference in the threshold required for PBMCs and whole blood may be attributable to the differ- ence in composition between these 2 sample types. Twelve of the 18 genes in the signature are immune-related (Table 3).
Certain genes have already been associated with severe dengue.
To determine whether the inclusion of a larger number of genes or the restriction to a linear state-of-the-art variable selec- tion model would have increased classification accuracy, we estimated the performance of several well-known classification methods (Figure 2A). Even though all pairs of confidence inter- vals overlap, the AUC estimate of our method was among the highest. This was also the case for the public whole blood dataset (Supplementary Material 10). For the PBMC qRT-PCR dataset, we only measured transcripts from our signature and therefore could not compare results with the performance of other meth- ods on this dataset, except logistic regression with a lasso penalty (“logistic lasso”). Contrary to our method, logistic lasso gener- ated a classifier whose performance was not better than random on the PBMC qRT-PCR dataset (Supplementary Material 11).
Figure 3 provides a visualization of the models associated with the transcript pairs of the signature. The panels show the points corresponding to transcripts from the PBMC training cohort. Different monotonic functions capture different types of gene–gene interactions. For example, for the second pair of transcripts (JUNB and ARG1), patients have a severe pheno- type when JUNB expression is high or ARG1 expression is low.
For the OX40L/CD40LG pair, OX40L and CD40LG both are underexpressed in the severe patients. For the EGR3/MGAM pair, the lower the EGR3 expression and the higher the MGAM expression, the more likely the patient is to be predicted severe.
DISCUSSION
We have identified and independently validated an RNA sig- nature for the detection of severe cases among young, second- ary-dengue patients from blood samples taken upon arrival at the hospital. Previous attempts have addressed the complexity
of dengue infection by measuring multiple molecules. Nhi et al identified 19 plasma proteins exhibiting significantly different relative concentrations (P < .05) on 16 patients (6 with severe dengue, 10 with nonsevere dengue) [20]. Among them, a com- bination of antithrombin III and angiotensin had strong power to detect the 6 severe dengue patients (AUC = 0.87). Pang et al developed a signature combining transcript, protein, and clini- cal markers, mostly linked to innate immunity and coagulation, that was able to detect patients with warning signs and need- ing to be hospitalized with sensitivity of 96% and specificity of 54.6% on a validation cohort that was chosen to match the learning cohort [9]. However, none of the signatures have been replicated on an independent, publicly available cohort.
Our objective was to identify an RNA signature to detect severe dengue cases from blood samples taken upon patient admission to hospital. We used data from a prospective study in Cambodia, in which young patients with suspected dengue gave blood samples upon hospital admission. Severe dengue cases were identified according to the WHO 2009 criteria using data at admission and during hospital stay. Because in our cohort only 1 severe patient appeared to have primary dengue and most had DENV-1 serotype, we limited our analysis to the 42 high-quality samples from secondary DENV-1–infected cases.
We then globally quantified gene expression profiles of PBMCs using transcriptome microarrays.
Our novel signature discovery approach models linear and nonlinear monotonic interactions between transcript levels with controlled complexity and preserves interpretability and applicability for small input datasets. To our knowledge, the resulting RNA signature comprising 18 genes represents the first signature to detect severe cases in dengue patients with a demonstrated high performance across several validation datasets.
The consistency of the empirical performance estimates of our RNA signature may indicate a high degree of robustness across a number of sample characteristics. First, blood sam- ple type: training and first validation datasets were generated from PBMCs, but the second validation dataset originates from whole blood. Second, DENV serotype: the signature was derived from DENV-1 samples, but the validation datasets were a mixture of different serotypes. Third, measurement technol- ogies: both array-based and qRT-PCR technologies were used to measure mRNA concentrations. Further heterogeneity arose from the fact that, even though dengue is considered a pedi- atric disease in Cambodia, the age of patients varied. In addi- tion, our data are from a single time sample from the day of hospital admission (ie, at variable times after fever onset). The above sources of heterogeneity point to possibilities to obtain refined, and potentially higher-performing, signatures with tighter performance estimates in studies with larger and more homogeneous cohorts, in particular for different age groups.
Finally, one important source of heterogeneity that we did not Table 2. Empirically Observed Combinations of Sensitivity and Specificity
Across Datasets
Sensitivity (95% CI)
Specificity
(95% CI) Threshold
aPBMC microarray training
dataset 0.93 (.82–.98) 0.67 (.53–.80) 0.40
PBMC qRT-PCR validation dataset
0.86 (.67–.96) 0.67 (.48–.84) 0.40 Whole blood microarray
validation dataset
0.83 (.67–.93) 0.79 (.63–.90) 0.91
Abbreviations: CI, confidence interval; PBMC, peripheral blood mononuclear cells; qRT- PCR, quantitative real-time polymerase chain reaction.
a