• Aucun résultat trouvé

Statistical methodology for the analysis of dye-switch microarray experiments

N/A
N/A
Protected

Academic year: 2021

Partager "Statistical methodology for the analysis of dye-switch microarray experiments"

Copied!
18
0
0

Texte intégral

(1)

HAL Id: hal-01197571

https://hal.archives-ouvertes.fr/hal-01197571

Submitted on 6 Jun 2020

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Statistical methodology for the analysis of dye-switch

microarray experiments

Tristan Mary-Huard, Julie Aubert, Nadera Mansouri, Olivier Sandra,

Jean-Jacques Daudin

To cite this version:

(2)

Statistical methodology for the analysis of dye-switch

microarray experiments

Tristan Mary-Huard, Julie Aubert, Nadera Mansouri-Attia, Olivier Sandra, Jean-Jacques Daudin

(3)

Background

We consider a two-color microarray experiment where two treatments are compared, and where the number of slides is fixed.

We expect the experimental design to guarantee the balance between the two fluorochromes,

a good compromise between technical and biological replicates.

(4)

Globally balanced design : dye-switch

array 1 2 3 4 5 6 7 8 9 10

Cy5 A1 B2 A3 B4 A5 B6 A7 B8 A9 B10

Cy3 B1 A2 B3 A4 B5 A6 B7 A8 B9 A10

Ai : ithbiological sample in condition A The 10 arrays are independent

10 biological replicates per condition, no technical replicate.

Thedye-switchdesign is often considered as optimal :

?maximum number of biological replicates,

?variance estimated with 9 df,

(5)

Individually-balanced design : dye-swap

array 1 2 3 4 5 6 7 8 9 10

Cy5 A1 B1 A2 B2 A3 B3 A4 B4 A5 B5

Cy3 B1 A1 B2 A2 B3 A3 B4 A4 B5 A5

5 pairs of biological samples

Each pair is used two times (with reverse dyes) Each dye-swap is independent of the other ones

Thedye-swapdesign is commonly used :

?used when the technical error is high,

?the statistical analysis (after averaging by swap) is straightforward,

(6)

Individually-balanced : “hybrid” dye-switch

array 1 2 3 4 5 6 7 8 9 10

Cy5 A1 B1 A2 B2 A3 B3 A4 B4 A5 B5

Cy3 B1 A2 B2 A3 B3 A4 B4 A5 B5 A1

5 biological replicates per condition

Each individual is used two times (one with Cy3, one with Cy5). Array k is correlated with k−1 and k+1

Thehybriddesign could be an alternative :

?variance estimated with more than 4 df.

?the statistical analysis is not straightforward,

(7)

Statistical model for a gene

After log-transformation :

Xi = µA + δl(i) + Bj(i) + Mi + Ti Yi = µB + δl0(i) + Bj0(i) + Mi + Ti0

Bj(i)N

(

0,σ2B

)

, j

(

i

)

: sample number corresponding to condition A and array i.

Mi∼N

(

0,σ2M

)

effect of the spot associated to the gene under concern in microarray i

(8)

Model on the difference of expressions

For slide i=1, ...,2n : Di = Xi−Yi

= µ +

(

−1

)

i+1δ +BD i+TDi

µ = µA− µB : true differential expression between A and B, BDi=Bj(i)−Bj0(i)∼N

(

0,2σ2B

)

,

TDi=Ti−Ti0∼N

(

0,2σ2T

)

,

δ = δ1− δ2: gene-specific dye bias. E

(

Di

)

= µ +

(

−1

)

i+1δ V

(

Di

)

= 2σ2B+2σ

2 T

(9)

Unbiased estimate of V

(

D

)

Unbiased estimator ofµ : D=2n1 P

Di

VD= 2n1

(

4σ2B+2σ2T

)

Naive variance estimate : S2= 1 2n−2

P

i

(

Di−D(i)

)

2

E

(

S2

)

=2σ2B+2σ2T<4σ2B+2σ2T

Unbiased variance estimate C=2n1

(10)

Test statistics (for each gene)

Mixed Model with ML estimation (ML) Mixed Model with REML estimation (REML) Naive Method (NM)

TN=

p

2npD S2

Unbiased Paired Method (UP)

TUP=

p

(11)

Level (in %) of the test procedures (simulation of 10000

genes, nominal level= 5%)

(12)
(13)

Compared CPU time

n UP CPU REML CPU no REML convergence

5 2.3 787 56.9

10 2.6 212 5

20 2.8 467 0

30 3.2 1046 0.16

(14)

Embriogenomics experiment

In Mammals, the implantation of the embryo is a key event in the establishment of a pregnancy.

Microarray experiment : 13300 genes in the endometrium of cows. Only 5 animals were available for each conditionhybrid dye-switch design.

Condition A : cyclic (day 20 of cycle)

(15)
(16)

Plots of variance estimates for 93 genes

x-axis :ML x-axis :ML x-axis :UP

y-axis :UP y-axis :REML y-axis :REML

(17)

Conclusions and perspectives

Practical guidelines:

?Correlations must be taken into account !

?UP should be preferred to REML (more robust, computationally efficient).

?ANOVA estimators for variances should be considered.

Extension to more complex designs:

?Loop designs

?Interwoven loop designs

(18)

Bibliography

T. Mary-Huard, J. Aubert , N. Mansouri-Attia, O. Sandra & J.J. Daudin (2008), Statistical methodology for the analysis of dye-switch microarray experiments, BMC Bioinformatics,9, 98.

E. Wit, A. Nobile, & R. Khanin (2005), Near-optimal designs for dual channel microarray studies, JRSSC,54(5), 817–830

Références

Documents relatifs

The world of learning games is currently reaping the benefits, not only of conceptual advances in the way designers and developers are linking learning and game play, but also

A critical aspect in the design of microarray studies is the determination of the sample size necessary to declare genes differentially expressed across different experi-

Semi-Lagrangian guiding center simulations are performed on sinu- soidal perturbations of cartesian grids, and on deformed polar grid with different boundary conditions..

Les défis liés à la recherche sur la formation auprès des étudiants francophones minoritaires Les contextes plus larges, dont les études sur les immigrations

L’économie Algérienne a donc franchi un pas très important, qui devait passer par l’instauration d’une économie de marché concurrentielle frappant à la porte de ce

If there is indeed a correlation between the relative gene expression levels and the class, more genes should appear to be correlated in the experimental condition than in the

To this end, we study a quantum switch that serves entangled states to pairs of users in a star topology, with the objective of determining the capacity of the switch, as well as

We would like to obtain closed-form expressions for the expected number of stored qubits at the switch as well as the capacity of the switch, expressed in the maximum possible number