Data analysis and stochastic modeling
Lecture 8 – Hypothesis testing
Guillaume Gravier
Data analysis and stochastic modeling – Hypothesis testing – p.
Where are we?
1. A gentle introduction to probability 2. Data analysis
3. Cluster analysis
4. Estimation theory and practice 5. Mixture models
6. Random processes 7. Queing systems 8. Hypothesis testing
◦ what for?
◦ the mechanics of statistical tests
◦ some common tests
Data analysis and stochastic modeling – Hypothesis testing – p.
What for?
Hypothesis testing = make some decision on whether something is true or not based on experimental evidences, yet knowing the risk we are taking with that decision.
Typical hypotheses we want to test are:
◦ the mean time to failure of a system exceeds a threshold
θ
0(or not)◦ the job arrival rate in a queing system is equal to
λ
0(or not)◦ the distribution of some samples is Gaussian (or not)
◦ two estimated mean values correspond to the same mean (or not)
◦ an observed arrival process is Poisonian (or not)
◦ etc.
Data analysis and stochastic modeling – Hypothesis testing – p.
The rainmakers example
◦ rain levels (in mm) are assumed to be Gaussian
N (600, 100)
◦ people claim they can increase rain levels by 50 mm and, using their method over 9 years, we observed the following rain levels
year 1 2 3 4 5 6 7 8 9
mm 510 614 780 512 501 534 603 788 650
◦ Is this a scam or not?
Two hypotheses are confronted
H
0m = 600
mmH
1m = 650
mmSince the method to increase rain levels is expensive, we want to use it only if we are pretty sure that it works, i.e. if we have only a 5 % chance of wrongly accepting
H
1from the evidences (riskα = 0.05
).Data analysis and stochastic modeling – Hypothesis testing – p.
The rainmakers example
(cont’d)We study the empirical mean
X
over the nine year period◦ if
H
0is true,X N (600, 100/ √
9)
[see lecture 4]◦ decision rule
⊲ if
X > k
, acceptH
1(or rejectH
0)⊲ if
X < k
, rejectH
1(or acceptH
0)⊲
k
is determined so that the probability of wrongly acceptingH
1is 0.05k = 600 + 100
√ 9 1.64 = 655
Conclusion: those rainmakers are ripping you off!
Data analysis and stochastic modeling – Hypothesis testing – p.
The rainmakers example
(cont’d)What if rainmakers were very unlucky over those nine years?
◦ if they were right,
X N (650, 100/ √ 9)
◦ an error is made each time
X < 655
◦ the probability of wrongly accepting
H
0(or wrongly rejectingH
1) is given by the riskβ
β = P
"
655 − 650 100/ √
9
#
= 0.56
◦
H
1defines the shape of the critical decision region (650 > 600
)but the threshold
k
only depends onH
0andα
◦ additional knowledge on
H
1is used to compute the riskβ
Data analysis and stochastic modeling – Hypothesis testing – p.
Concepts and vocabulary
◦ a statistical hypothesis is an assertion which can be valid or not
◦ a statistical test is a procedure to make a decision
◦ the null hypothesis
H
0is a claim that we are interested in accepting or rejecting◦ the alternative hypothesis
H
1is the contradiction ofH
0◦ the critical or rejection region is the region where we reject
H
0(abovek
in the example) as opposed to the acceptance region◦ wrongly rejecting
H
0is known as the type 1 error and the probabilityα
that this happens is the level of significance of the test
◦ wrongly rejecting
H
1is a type 2 error and the quantity1 − β
is thepower of the test
Hypotheses and types of errors
◦
α
= probability of choosingH
1whenH
0is true◦
β
= probability of choosingH
0whenH
1is truetruth\decision H0 H1
H0 1−α α H1 β 1−β
In practice,
α
is given by the decision maker andH
0corresponds to the following◦ a well established hypothesis, not contradicted so far
◦ a safe decision
e.g. when testing a vaccine,H0is the less favorable hypothesis
◦ the only hypothesis that is easy to formulate
e.g.m=m0is easir to formulate thanm6=m0
Methodology
1. define H
0and H
12. determine the variables on which to make the decision 3. determine the shape of the critical region based on H
14. compute the critical region given α
5. eventually compute the power of the test
6. compute the experimental value of the decision variable 7. accept or reject H
0Data analysis and stochastic modeling – Hypothesis testing – p.
Various tests to compare two mean values
H
0H
1 Reject if Test statisticµ = µ
0µ 6 = µ
0| z
0| > z
α/2(
σ
known)µ < µ
0z
0< − z
αz
0=
Xσ/¯−√µn0µ > µ
0z
0> z
αµ = µ
0µ 6 = µ
0| t
0| > t
α/2,n−1(
σ
unknown)µ < µ
0t
0< − t
α,n−1t
0=
Xs/¯−√µn0µ > µ
0t
0> t
α,n−1Data analysis and stochastic modeling – Hypothesis testing – p. 10
Optimal decision for simple tests
Consider
X
with densityf(x; θ)
, and denoteL( x , θ)
the density of the sample
x
.H0 θ =θ0 H1 θ =θ1
Maximize the power of the test!
⇓
P[W|H1] = 1−β=Z
W
L(x;θ1)dx
Jerzy Neyman, 1894 – 1981
Karl Pearson, 1857 – 1936
The optimal critical region is defined by the points such that
L( x ; θ
1)
L( x ; θ
0) > k
α.
Data analysis and stochastic modeling – Hypothesis testing – p. 11
Testing composite hypotheses
H
0θ = θ
0H
1θ = θ
1vs
H
0θ = θ
0H
1θ 6 = θ
1⇒
the riskβ
(and hence the power) depends onθ
H
0m = 600 H
1m > 600
Data analysis and stochastic modeling – Hypothesis testing – p. 12
The likelihood ratio test
H
0θ = θ
0H
1θ 6 = θ
0with
θ ∈ R
p◦ Test statistics
λ = L( x ; θ
0) sup
θ
L(x; θ)
⊲ Note that replacingL(x;θ0)bysup
θ
L(x;θ)is like using the ML estimate ofθ
◦ Critical region =
{ x | λ < K
α}
◦ Asymptotic distribution of the test statistics under
H
0:λ χ
2pData analysis and stochastic modeling – Hypothesis testing – p. 13
Classical test for mean values of N (m, σ)
◦ standard deviation is known
⊲ test statistics =
X N (m,
√σn)
⊲ critical region =
X > k
α⊲ cf. introductory example
◦ standard deviation is known
⊲ test statistics = Student
T
n−1= √
n − 1 X − m S
⊲ critical region =
| T
n−1| > k
α⊲ Example:
⋄
H
0m = 30
vs.H
1m 6 = 30
⋄ 15 samples with
x = 37.2
ands = 6.2
⋄ under
H
0,t = √
14 37.2 − 30
6.2 = 4.35
⋄ critical value at
α = 0.05
forT
14= 1.761⇒
REJECTH
0!Data analysis and stochastic modeling – Hypothesis testing – p. 14
The χ 2 fitting tests
X
is a random variable divided intok
classes of resp. probabilitiesp
1, p
2, . . . , p
kand we observe a sample of the variable with population size in each classN
1= n
1, N
2= n
2, . . . N
k= n
kNote thatE[Ni] =npi
We want to test if the underlying law of the observed process fits the theoretical distribution defined by the
p
i’s (H
0) or not.◦ test statistics is
D
2=
k
X
i=1
(N
i− np
i)
2np
i◦ asymptotic distribution of the test statistics =
χ
2k−1
Notes:
◦ ifmparameters are estimated from the sample (e.g.λin Poisson)D2 χ2k−1−m
◦ need for at least 5 (or 3) elements per class
◦ equivalent to the likelihood ratio test for complex hypotheses
Sample comparison & statistical significance
Are two independent samples of resp. sizes
n
1andn
2coming from the same population/distribution?Gaussian case:
X
1N (m
1, σ
1)
andX
1N (m
2, σ
2)
◦ test variance equality
⇒
Fisher-Snedecor◦ test mean equality
⇒
StudentT
n1+n2−2= √
n
1+ n
2− 2 (X
1− m
1) − (X
2− m
2) s
(n
1S
12+ n
2S
22) 1
n
1+ 1 n
2Note. The test is robust to distributions others than the Gaussian one.
Sample comparison & statistical significance
(cont’Are two independent samples of resp. sizes
n
1andn
2coming from the same population/distribution?General case: compare
{ x
1, . . . , x
n}
and{ y
1, . . . , y
m}
◦ Smirnov to compare empirical distributions
◦ Wilcoxon
⊲ mix all values
x
i andy
i and sort them (ascending order)⊲ test statistics
U
= number of pairs(x
i, y
j)
such thatx
i> y
j⋄ U = 0⇒x1, . . . , xn, y1, . . . ym
⋄ U =nm⇒y1, . . . ym, x1, . . . , xn
⋄ underH0the ranking should be “homogeneous”
⊲ asymptotic distribution is
N nm 2 ,
r nm(n + m + 1) 12
!
⊲ critical region
U −
nm2> k
αData analysis and stochastic modeling – Hypothesis testing – p. 17
Speaker or face identity verification
◦
H
0: the person is who he/she says he/she is◦
H
1: the person is an impostorp(y
1T; H
0) p(y
1T; H
1)
H
0>
H <
1β
In speaker verification
◦
p(y
1T; H
0)
= Gaussian mixture model trained with the speaker’s speech◦
p(y
1T; H
1)
= GMM of speech in generalData analysis and stochastic modeling – Hypothesis testing – p. 18
Speaker or face identity verification
(cont’d)Two errors : false acceptance(type 1) etfalse rejection(type 2).
0.1 0.5 2 5 10 20 30 40 50 60
0.005 0.025 0.1 0.5 2 5 10 20 30 40 50
Miss probability (in %)
False Alarms probability (in %) IRISA max. F-measure min. HTER LIA ENST/TSI
◦ text known or not
◦ size of the data set
◦ signal duration
◦ signal quality
◦ speaker
◦ ...
site ENST IRISA LIA
%fa 9.8 0.3 2.8
%fr 25.3 23.6 30.6
F-measure 46.9 84.3 66.0
Data analysis and stochastic modeling – Hypothesis testing – p. 19
Thanks for attending until the end!
Data analysis and stochastic modeling – Hypothesis testing – p. 20