• Aucun résultat trouvé

Supplementary Information for deltaRpkm: an R package for a rapid detection of differential gene presence between related genomes

N/A
N/A
Protected

Academic year: 2021

Partager "Supplementary Information for deltaRpkm: an R package for a rapid detection of differential gene presence between related genomes"

Copied!
7
0
0

Texte intégral

(1)

Supplementary Information for

deltaRpkm: an R package for a rapid detection of

differential gene presence between related genomes

Hatice Akarsu

1,4#

, Lisandra Aguilar-Bultet

2,3,4#

, Laurent Falquet

1,4*

1

Department of Biology, University of Fribourg, Switzerland, 2

Institute of Veterinary Bacteriology, Vetsuisse Faculty, University of Bern, Bern, Switzerland, 3

Graduate School for Cellular and Biomedical Sciences, University of Bern, Bern Switzerland, 4

Swiss Institute of Bioinformatics, BUGFri group, Fribourg, Switzerland.

*To whom correspondence should be addressed.

#

(2)

Fig. S1. RPKM values distribution of all genes in the dataset. This can be used to fine

(3)

Fig. S2. Dataset size effect on δRPKM values distribution. A. Boxplots for datasets from N=7 to N=225 samples. The dataset size does not influence the median δRPKM

values that are used when computing the differentially present gene selection based on the 2*standard deviation of median δRPKM values. Two datasets are highlighted for illustration, N=51 samples and N=225 samples. B. Dataset size effect on threshold value (2*standard

(4)

Fig. S3. The selected differentially present gene set is robust. Downsampling shows that

even with small size dataset, the identified genes overlap (N=115) highly with the datasets of greater size.

(5)

Fig. S4. deltaRpkm performance: dataset size effect on runtime. The whole analysis

pipeline with deltaRpkm can be run in less than 20min in R for a dataset with N=225 samples of Listeria monocytogenes (~3Mb, ~3K genes). Ubuntu 14.04, R 3.4.4, Intel Core i-4790 CPU @3.60Gzx8.

(6)

Fig. S5. deltaRpkm performance: dataset size effect on memory usage. The whole

analysis pipeline with deltaRpkm uses less than 4G of memory in R for a dataset with N=225 samples of Listeria monocytogenes (~3Mb, ~3K genes). Ubuntu 14.04, R 3.4.4, Intel Core i-4790 CPU @3.60Gzx8.

(7)

Fig. S6. deltaRpkm performance: real (A) versus randomized datasets (B). The genes

differential presence gives shorter and non-robust list of genes when using randomized datasets of different sizes. Corrected RPKM.

Figure

Fig.  S1.  RPKM  values distribution  of  all  genes  in the  dataset. This  can  be  used to fine  tune the heatmap colors breaks parameters
Fig.  S2.  Dataset  size  effect  on  δRPKM  values  distribution.  A.  Boxplots  for  datasets  from  N=7  to  N=225  samples
Fig. S3. The selected differentially present gene set is robust. Downsampling shows that  even with small size dataset, the identified genes overlap (N=115) highly with the datasets of  greater size
Fig.  S4.  deltaRpkm  performance:  dataset  size  effect  on  runtime.  The  whole  analysis  pipeline with deltaRpkm can be run in less than 20min in R for a dataset with N=225 samples  of Listeria monocytogenes (~3Mb, ~3K genes)
+3

Références

Documents relatifs

The same conclusions than in Section 4.2 can be driven: some metabolites were extracted by both analyses (creatinine, betaine, hippuric acid, guanidinoacetic acid, alanine,

Compared to baseline, the NIV mode corrected premature cycling in four ventilators (Extend, Newport e500, Raphael, Vela), and worsened it in four others (Elysée, Raphael, Servo i,

[ 8 , 9 ] to: (1) assess the efficiency of L. monocytogenes EGDe PrfA* invasion into HeLa cells treated with dimethyl sulf- oxide (DMSO) or SW172006 at 1 hour post infection (hpi);

pathogenesis-related protein 1; GO: Gene ontology; HIF1A: Hypoxia-inducible factor 1-alpha; IL8: Interleukin 8; IL24: Interleukin 24; JUN: Jun proto-oncogene; JUNB: Transcription

The ZIP employ two different process : a binary distribution that generate structural

The Sim.DiffProc package provides a simulation of diffusion processes and the differences methods of simulation of solutions for stochastic differential equations (SDEs) of the Itˆ

L’originalit´ e du travail pr´ esent´ e r´ eside avant tout dans la proposition d’une m´ etho- dologie probabiliste capable de prendre en compte les incertitudes dans la probl´

Solving the Kalman filter system associated with the centralized version of our alignment problem, we obtain an estimate of the relative distance of each agent i from the center of