• Aucun résultat trouvé

Paris, France

N/A
N/A
Protected

Academic year: 2022

Partager "Paris, France "

Copied!
3
0
0

Texte intégral

(1)

Jh.t"aoltier, C.O.R.E.P., Boul~e-Bil"tteourt, fa~nd G. Saph,

Rene

Descartes

University,

Paris, France

Il arrive couvent en analyse discriminante que l'appartenance des individus aux classes d'une partition de la population ne soit pae connue avec certitude, ou qu'il eoit d6licat d'attribuer strictement un individu fh une cat6gorie lorsque la partition est &&finie A partir d'une variable num6rique d&coup&e en classes.

Il semble alors plus raisonrtable de se donner une distribution de probabilit4 sur les classes qu'une fonction booliSenne d'appartenance surtout si un individu est proche de la fronti*re entre deux classes.

On 6tablit alors les modifications & apporter aux techniques usuelles de diacri- mination (factorielle et d€cisionnell ainsi que les cons6quencea de ces modi- fications aur les indices usuels de qualit4 d'une discrimination.

Keywords :Fuzzy Discrimination Discriminant Analysis

Discriminant analyais 1s the replicative or predictive study of a qualitative variable over a set of predictora that are generally numerical.

In certain situations, one cannot attribute with certainty a category of the variable to be explained to certain (even to ail) individuala in the sample. This is partieularly the caae if :

-

The classes are poorly defined, e.g., imprecise nomenclature or else discrete coding of a quantitative variable (1).

-

The clam to which an individual belongs cannot be determined with cer- tainty, e.g., appearance of anymptomafter absorption of a medicine. Thare la nothing to prove that thesymptomcould not have appeared naturally.

But it may also be that the determination of the claas ia too coatly, and that one is content with an eatimate.

From the forma1 viewpoint, we will therefore suppose that with each individual, there la aasociated a probability distribution P - (i) for the classes j

-

1,.

. .,

k of the variable to be explained, rather tlian aJset of mutuallyexclusive indi- cative variables.

(1) In this caae, the quantitative variable being divided up into classes,

an

individual in the neighborhood of the border between 2 clasaee probably cannot be sttributed to a single one of the classes, if only because of possible mea- auring errors.

Va ni11 dcnote bY

x

the i~itrix b , P) Of the p numericd predictors, n being the m p l e aire

hence ws are going to describe how to extend to thia situation the various

-

research results of diacriminant analyste, i . e . , seek the beat separation between the claasea in the sense of

a

particular criterion (geometrical methoda), or attempt directly to eatimate the Pj (1) for avery individual for which one sums up the p prmiictora (Bayeaian reaearch).

For gecmetrical reaearch, this extension is made by introducing weightings into the calculations and by counting esch observation i as an flement of al1 of the classes j for which ; Pj (il 4 0. The weighting of observation i in cleas j in then P ( i l 1 * . noka

d, im1

p. (1) tha wetght of the class j,

I It will be ahown that in reality, it is not neeeaaary to work on matrices of di- mension k.n, but only n (on condition of having written the programa on an ad hoc

,

baais).

The variods quallty measures of the discrimination are affected by the fuzziness of the claaaea of the variable to be explained, and for certain ones of these criteria, vie will propose a U n i t calculation that w i U make it possible to judge the real discriminating power of a variable, or of a set of variables, relative

to this limit. 4

As to the bayesian methoda, if one excepts the case in which one considered the Pj(i} as a sample of a random variable, the generalization is carried out in the same fashion as for the geometrical methods, aince the only things hffected by the fuzziness of the classification are the estimator~ cf the parameters of the conditional distributions.

However, another approach is possible : a direct search for a formula for adjuat- ment of the P by means of explanatory variables : logiatic regresaion or linear regresaion wigh conatraints

.

1. GEOMETRICAL METHQDS

1.1 Evaluation of the canter of gravity of the claaaea

WC will denote by P the diagonal matrix in x al of the Pj (i) aaaocisted with the group j, and

1

the vector of which the n components are equal to 1.

Since the classes are fuzzy, the centers of gravity g of each une of t h m are obtained by taking the averago, welghted by the PI (i), of the coordinates of ' the obeervatione, hence :

Ç =&'

p,

1.

1.2 Expression of the matrices of variance The total matrix of variance would then be :

if the data are centered.

(2)

1% c u r r e n t t e m 1s worth :

The i n t e r - c l a e ~ matrix o f v a r i a n c e E i8 t h e r e f o r e e q u i v a l e n t ta 4

1

And i t à c u r r e n t term I n :

1.3 Calculation o f t h e d i s t a n c e from a p o i n t t o t h e c e n t e r o f g r a v i t y o f t h e c l a s s e s

7

Let be e p o i n t o f

H'

; i f t h e m a t r i c e s V a r e n o t a i g n i f i c a n t l y d i f f e r e n t , on*

u a e ~ the çahalanobl m e t r i c

r1

t o c a l c d a t e t h e a e distanCe :

Me perccive t h a t t h e s e formulas hardly fmm thoae o f t h e cl&asic ca8e.

T h < b main i n t w à § a 18 t h a t they supplp zi method o f d i r e c t celculatl.on net brin- ginfc i n m a t r i c e s of dimension kn.

I 1.4 C r i t e r i a o f t h e q u a l i t y of the d i s c r i m i n a t i o n

1

I The f i r s t c r i t e r i o n t h a t coma% t o mind is t h e one of the percentage bf c o r r e c t c l a s s i f i c a t i o n by t h e method o f reassigrunent (about which it is known, ineiden- t e l l y , t h a t it y i e l d a b i a s e d r e s u l t s j . I n t h e c l s a s i c c a s e , t h i s percentage can r e s c h 100 % ; h e r e it i a q u i t e obvioualy l i m i t e d by :

l

!

1

1 I n t h e c a s e o f 2 moups, t h e Mahalanobis d i e t a n c e D between th* t u o c e n t e r a l o f t e n s e r v e s a s a c r i t e r i o n f o r a e p a r a b i l i t y . i n ptkrticular f o r t h e s e l e c t i o n of

t h e v a r i a b l e s , o t h e r c r i t e r i a such a s t h e F. ean be daduced from it f o r a monotonie t r w f e m a t i o n :

I n t h e usual c a s e , t h i s d i s t a n c e 1s n o t bounded above and may t h e o r e t i c a l l y be i n f i n i t e i f and only i f esch group i a reduced t o a p o i n t p r o j e c t e d o n t 0 s t r a i g h t l i n e gl g2. Here t h i a d i s t a n c e i 8 alwaye l i m i t e d . and t h i s l i m i t can be caicu- l a t e d by the following procedure :

D s a maximum i f , i n p r o j e c t i o n on a t r a i g h t U n e $1 Q , a l 1 t h e p o i n t s such

I t h a p ( i ) S Ã ˆ P ( i ) a r e confused i n a p o i n t Xi, and a l 1 t h e o t h e r a i n Xa. Hence one

d

l e d bec& t o a uni-dimensional problem on s t r a i g h t l i n e g l gi. Let us p l a c e

'

o u r s e l v e a i n t h i s c a s e . 02 being I n v a r i a n t f o r change o f r a d i u s and o f s c a l e on t h e v a r i a b l e s , one may suppose t h a t on a t r a i g h t U n e gl

a?,

t h e t o t a l v a r i a n c e is equsl t o 1. and t h a t t h e v a r i a b l e is cantered. From t h i a , one deduces a t

l t h e v a l u e s o f Xi and X2 and t h e value o f a 2 which is n o t 2er0, s i n c e t h e varian- c e s o f each w o u p cannot be z e r o i f t h e r e la a t l e a a t one i i n which P ( 1 ) i s

d i f f e r e n t from 0 o r from 1, 1

whence D2

< (a% -

g2j2

o2

I f we Wnote by P t h e p r o p o r t i o n o f I n d i v i d u e l 1 n f f e c t e d i n group 1,

~ u c h t h r t p i i ~ k p ~ t i ) :

(3)

The calculation o f

&

al and Q depende on the 3 pmametera t pl* al = pL (1)

,

k a2 = p2 ( i l .

I n the eaae o f k groups, a ueabZe c r i t m à ¯ o is t h e sum o f the W a l m o b i s distenc*s of the groups taken tw by t# ; es p r w i o u s l y , t h e a m of the

f g j ; gl) is wisum i f i n the space created b$f t h e k c e n t e r s o f p a v i t y , a l 1 o f t h e obsem8tione such t h a t p { 1 ) i a 8 d m m a r e proJ8ctea ont0 t h e same p o i n t

zj .

ûn Say alwayw suppdae t h e *ewations

&

o e n t e r e d ~ d t h a t the m t r i x o f over-al1 vapithrue i a Ik-%* which l m d 8 to a s i n g l e ' c o n f i - guretion of the

S.,

riegleeting ta& isometry. On8 c m Men c e l c u l a t e the c r i t e r i o n t h a t euppllee t h e desired increttwt, which depends m l y an khe & s t r i k t i m o f t h e veighting8.

ûn eoncreke calculation method cont$ists &n takinga MY mut of k p o i n t s

E~ . .. zk

and i n carrying out t h e l i n e w trensformntlm t h a t lead8 ta tha

3 4-

11.- PROBABXLISTIC E T H O D S

11.1. Bayeeien sethode with hypothesis o f normality See Mtchison esidBqg (1976).

I f one makes the hypothesis o f a normal distribution e l a s a , t h e only prob1em consiet6 i n estimating t h e p bsfore applying Bsyeet formula ( c f . Anderson).

The estimators o f t h e pj end o f m e precieely the g

previously de f ined. 3 and

11.2. Direct estimation of t h e P

Since one has a a m p l e of p and exp18natory v a r i a b l a s X, one may use the '

~ e g r e s s i o n technique@ i n th&! broad sense* o r :

Thie method reduces t o suppoaing t h a t Log is a l i n e e r function of the k

explanstory variables. The c o e f f i c i e n t s of these f u n c t i m a being estintated then by the method of a m i m m likelihood (Cox's nmdel).

b ) Linear regression unùe constra2nt

ûn regreaees eech p, on the explmatory v e r i a b l e v h i l e iinpming t h e c o n s t r a i n t

J

n p t i m i x a t i ~ p r q p m on conet$. I n o t h e r uordu, it is a q u e s t i ~ n of carrying o u t the ctiinonical a n e l y s i s betwesn a cmvex e o m and a v e c t o r i a l eub-space.

1 hHûBRSOi43.A (19721 S e p u e t e s m p l e l o g i s t i c discrimination,

l B i m e t r i h 59, t 9-35

1

MARTIN J . F . [19@0) Le codage flou e t s e s a p p l i c a t i o m a n s t a t i s t i q u e . The60 3Ã cycle Universite de Pau

SEBESTYEN (1962) qfDeeiaiori making p r o c e a d s i n .pattern r e c o g n i t i m "

Mc Millan, New York.

kknowledqement :

The aathbra wuld l i k e txi thank P. l k s s e f o r h i s cormnents anà corrections.

l

l

1

Références

Documents relatifs

A process should be able to determine how many messages are left to be processed on a given input channel. Two uses are readily thought of. Given pending inputs on several channels

mechanisms, with a forward error correction scheme being used to repair all single packet losses, and those receivers experiencing burst losses, and willing to accept

High resolution images are available for the media to view and download free of charge from www.sabmiller.com or www.newscast.co.uk Enquiries SABMiller plc Tel: +44 20 7659 0100

(b) An entrance meeting involving presentations by key representatives of the French Government, the General Directorate for Nuclear Safety and Radiation Protection (DGSNR),

At the Bundesamt für Strahlenschutz (the competent authority for the safe transport of radioactive material in Germany) she has been involved in the implementation by Germany of

Article 139 of the maritime regulations states that “Vessels carrying radioactive substances shall be required to provide current proof of financial responsibility and

At the Bundesamt für Strahlenschutz (the competent authority for the safe transport of radioactive material in Germany) she has been involved in the implementation of the 1985

“The UK TranSAS Mission shall address all modes of transport (road, rail, maritime and air) with an emphasis on maritime transport, and shall consider all relevant aspects of