• Aucun résultat trouvé

Miss probability (in %)

N/A
N/A
Protected

Academic year: 2022

Partager "Miss probability (in %)"

Copied!
5
0
0

Texte intégral

(1)

Data analysis and stochastic modeling

Lecture 8 – Hypothesis testing

Guillaume Gravier

[email protected]

Data analysis and stochastic modeling – Hypothesis testing – p.

Where are we?

1. A gentle introduction to probability 2. Data analysis

3. Cluster analysis

4. Estimation theory and practice 5. Mixture models

6. Random processes 7. Queing systems 8. Hypothesis testing

◦ what for?

◦ the mechanics of statistical tests

◦ some common tests

Data analysis and stochastic modeling – Hypothesis testing – p.

What for?

Hypothesis testing = make some decision on whether something is true or not based on experimental evidences, yet knowing the risk we are taking with that decision.

Typical hypotheses we want to test are:

◦ the mean time to failure of a system exceeds a threshold

θ

0(or not)

◦ the job arrival rate in a queing system is equal to

λ

0(or not)

◦ the distribution of some samples is Gaussian (or not)

◦ two estimated mean values correspond to the same mean (or not)

◦ an observed arrival process is Poisonian (or not)

◦ etc.

Data analysis and stochastic modeling – Hypothesis testing – p.

The rainmakers example

◦ rain levels (in mm) are assumed to be Gaussian

N (600, 100)

◦ people claim they can increase rain levels by 50 mm and, using their method over 9 years, we observed the following rain levels

year 1 2 3 4 5 6 7 8 9

mm 510 614 780 512 501 534 603 788 650

◦ Is this a scam or not?

Two hypotheses are confronted

H

0

m = 600

mm

H

1

m = 650

mm

Since the method to increase rain levels is expensive, we want to use it only if we are pretty sure that it works, i.e. if we have only a 5 % chance of wrongly accepting

H

1from the evidences (risk

α = 0.05

).

Data analysis and stochastic modeling – Hypothesis testing – p.

(2)

The rainmakers example

(cont’d)

We study the empirical mean

X

over the nine year period

◦ if

H

0is true,

X N (600, 100/ √

9)

[see lecture 4]

◦ decision rule

if

X > k

, accept

H

1(or reject

H

0)

if

X < k

, reject

H

1(or accept

H

0)

k

is determined so that the probability of wrongly accepting

H

1is 0.05

k = 600 + 100

√ 9 1.64 = 655

Conclusion: those rainmakers are ripping you off!

Data analysis and stochastic modeling – Hypothesis testing – p.

The rainmakers example

(cont’d)

What if rainmakers were very unlucky over those nine years?

◦ if they were right,

X N (650, 100/ √ 9)

◦ an error is made each time

X < 655

◦ the probability of wrongly accepting

H

0(or wrongly rejecting

H

1) is given by the risk

β

β = P

"

655 − 650 100/ √

9

#

= 0.56

H

1defines the shape of the critical decision region (

650 > 600

)

but the threshold

k

only depends on

H

0and

α

◦ additional knowledge on

H

1is used to compute the risk

β

Data analysis and stochastic modeling – Hypothesis testing – p.

Concepts and vocabulary

a statistical hypothesis is an assertion which can be valid or not

a statistical test is a procedure to make a decision

the null hypothesis

H

0is a claim that we are interested in accepting or rejecting

the alternative hypothesis

H

1is the contradiction of

H

0

the critical or rejection region is the region where we reject

H

0(above

k

in the example) as opposed to the acceptance region

◦ wrongly rejecting

H

0is known as the type 1 error and the probability

α

that this happens is the level of significance of the test

◦ wrongly rejecting

H

1is a type 2 error and the quantity

1 − β

is the

power of the test

Hypotheses and types of errors

α

= probability of choosing

H

1when

H

0is true

β

= probability of choosing

H

0when

H

1is true

truth\decision H0 H1

H0 1−α α H1 β 1−β

In practice,

α

is given by the decision maker and

H

0corresponds to the following

◦ a well established hypothesis, not contradicted so far

◦ a safe decision

e.g. when testing a vaccine,H0is the less favorable hypothesis

◦ the only hypothesis that is easy to formulate

e.g.m=m0is easir to formulate thanm6=m0

(3)

Methodology

1. define H

0

and H

1

2. determine the variables on which to make the decision 3. determine the shape of the critical region based on H

1

4. compute the critical region given α

5. eventually compute the power of the test

6. compute the experimental value of the decision variable 7. accept or reject H

0

Data analysis and stochastic modeling – Hypothesis testing – p.

Various tests to compare two mean values

H

0

H

1 Reject if Test statistic

µ = µ

0

µ 6 = µ

0

| z

0

| > z

α/2

(

σ

known)

µ < µ

0

z

0

< − z

α

z

0

=

Xσ/¯µn0

µ > µ

0

z

0

> z

α

µ = µ

0

µ 6 = µ

0

| t

0

| > t

α/2,n1

(

σ

unknown)

µ < µ

0

t

0

< − t

α,n1

t

0

=

Xs/¯µn0

µ > µ

0

t

0

> t

α,n−1

Data analysis and stochastic modeling – Hypothesis testing – p. 10

Optimal decision for simple tests

Consider

X

with density

f(x; θ)

, and denote

L( x , θ)

the density of the sample

x

.

H0 θ =θ0 H1 θ =θ1

Maximize the power of the test!

P[W|H1] = 1−β=

Z

W

L(x;θ1)dx

Jerzy Neyman, 1894 – 1981

Karl Pearson, 1857 – 1936

The optimal critical region is defined by the points such that

L( x ; θ

1

)

L( x ; θ

0

) > k

α

.

Data analysis and stochastic modeling – Hypothesis testing – p. 11

Testing composite hypotheses

H

0

θ = θ

0

H

1

θ = θ

1

vs

H

0

θ = θ

0

H

1

θ 6 = θ

1

the risk

β

(and hence the power) depends on

θ

H

0

m = 600 H

1

m > 600

Data analysis and stochastic modeling – Hypothesis testing – p. 12

(4)

The likelihood ratio test

H

0

θ = θ

0

H

1

θ 6 = θ

0

with

θ ∈ R

p

◦ Test statistics

λ = L( x ; θ

0

) sup

θ

L(x; θ)

Note that replacingL(x;θ0)bysup

θ

L(x;θ)is like using the ML estimate ofθ

◦ Critical region =

{ x | λ < K

α

}

◦ Asymptotic distribution of the test statistics under

H

0:

λ χ

2p

Data analysis and stochastic modeling – Hypothesis testing – p. 13

Classical test for mean values of N (m, σ)

standard deviation is known

test statistics =

X N (m,

σn

)

critical region =

X > k

α

cf. introductory example

standard deviation is known

test statistics = Student

T

n1

= √

n − 1 X − m S

critical region =

| T

n1

| > k

α

Example:

H

0

m = 30

vs.

H

1

m 6 = 30

⋄ 15 samples with

x = 37.2

and

s = 6.2

⋄ under

H

0,

t = √

14 37.2 − 30

6.2 = 4.35

⋄ critical value at

α = 0.05

for

T

14= 1.761

REJECT

H

0!

Data analysis and stochastic modeling – Hypothesis testing – p. 14

The χ 2 fitting tests

X

is a random variable divided into

k

classes of resp. probabilities

p

1

, p

2

, . . . , p

kand we observe a sample of the variable with population size in each class

N

1

= n

1

, N

2

= n

2

, . . . N

k

= n

k

Note thatE[Ni] =npi

We want to test if the underlying law of the observed process fits the theoretical distribution defined by the

p

i’s (

H

0) or not.

◦ test statistics is

D

2

=

k

X

i=1

(N

i

− np

i

)

2

np

i

◦ asymptotic distribution of the test statistics =

χ

2k

1

Notes:

ifmparameters are estimated from the sample (e.g.λin Poisson)D2 χ2k1m

need for at least 5 (or 3) elements per class

equivalent to the likelihood ratio test for complex hypotheses

Sample comparison & statistical significance

Are two independent samples of resp. sizes

n

1and

n

2coming from the same population/distribution?

Gaussian case:

X

1

N (m

1

, σ

1

)

and

X

1

N (m

2

, σ

2

)

◦ test variance equality

Fisher-Snedecor

◦ test mean equality

Student

T

n1+n22

= √

n

1

+ n

2

− 2 (X

1

− m

1

) − (X

2

− m

2

) s

(n

1

S

12

+ n

2

S

22

) 1

n

1

+ 1 n

2

Note. The test is robust to distributions others than the Gaussian one.

(5)

Sample comparison & statistical significance

(cont’

Are two independent samples of resp. sizes

n

1and

n

2coming from the same population/distribution?

General case: compare

{ x

1

, . . . , x

n

}

and

{ y

1

, . . . , y

m

}

◦ Smirnov to compare empirical distributions

◦ Wilcoxon

mix all values

x

i and

y

i and sort them (ascending order)

test statistics

U

= number of pairs

(x

i

, y

j

)

such that

x

i

> y

j

⋄ U = 0⇒x1, . . . , xn, y1, . . . ym

⋄ U =nm⇒y1, . . . ym, x1, . . . , xn

underH0the ranking should be “homogeneous”

asymptotic distribution is

N nm 2 ,

r nm(n + m + 1) 12

!

critical region

U −

nm2

> k

α

Data analysis and stochastic modeling – Hypothesis testing – p. 17

Speaker or face identity verification

H

0: the person is who he/she says he/she is

H

1: the person is an impostor

p(y

1T

; H

0

) p(y

1T

; H

1

)

H

0

>

H <

1

β

In speaker verification

p(y

1T

; H

0

)

= Gaussian mixture model trained with the speaker’s speech

p(y

1T

; H

1

)

= GMM of speech in general

Data analysis and stochastic modeling – Hypothesis testing – p. 18

Speaker or face identity verification

(cont’d)

Two errors : false acceptance(type 1) etfalse rejection(type 2).

0.1 0.5 2 5 10 20 30 40 50 60

0.005 0.025 0.1 0.5 2 5 10 20 30 40 50

Miss probability (in %)

False Alarms probability (in %) IRISA max. F-measure min. HTER LIA ENST/TSI

text known or not

size of the data set

signal duration

signal quality

speaker

...

site ENST IRISA LIA

%fa 9.8 0.3 2.8

%fr 25.3 23.6 30.6

F-measure 46.9 84.3 66.0

Data analysis and stochastic modeling – Hypothesis testing – p. 19

Thanks for attending until the end!

Data analysis and stochastic modeling – Hypothesis testing – p. 20

Références

Documents relatifs

Further, the degree of the negative effect of uncertainty on the exporting decision of firms varies depending on whether firms are risk averse or risk taking

[1] addresses two main issues: the links between belowground and aboveground plant traits and the links between plant strategies (as defined by these traits) and the

Cette désignation a pour objectif, si nécessaire, d’associer un proche aux choix thérapeutiques que pourraient être amenés à faire les médecins qui vous prendront

trouver sa forme, son expression, son style, son harmonie, non seulement par le verbe, mais aussi par l'action (les explorateurs furent souvent des rêveurs). L' Art forme et

We show empirically that utilizing the Same-Decision Probability is a viable and intu- itive approach for determining question selection in Bayesian-based Computer Adaptive Tests,

tinuous contact with Member States and Headquarters ensure that communicable disease information of international importance witlun a region is conveyed to health

In more concrete words, reaching voluntary agreements between sellers and buyers of environmental services frequently requires trust that can be achieved through the

Suppose R is a right noetherian, left P-injective, and left min-CS ring such that every nonzero complement left ideal is not small (or not singular).. Then R