# On the use of copulas to simulate multicriteria data

(1)

## On the use of copulas to simulate multicriteria data

1

2

3

Abstract.

in practice

### But sometimes only very few data are available; for example, from an preference learning point of view, the dataset should be so limited

1Lab. ERIC, Universit´e Lyon 2, email: Jairo.Cugliari@univ-lyon2.fr

2Lab. ERIC, Universit´e Lyon 2, email: Antoine.Rolland@univ-lyon2.fr

3 Lab. ERIC, Universit´e Lyon 2, email: Thi-minh-thuy.Tran@etu.univ- lyon1.fr

1

n;

≥2

### are random variables with uniform distri- bution in

[0,1], then a copula

: [0,1]n7→[0,1]

1

n) =

(U1

1

n

n)

(2)

(x1

n)

X= (X1

n)

1

n) =

### C(F

(x1), . . . , F(xn)),

### (2) where F

(x1), . . . , F(xn)

### are the univariate marginal distribution functions of the vector

X. Moreover, this representation is unique

1

n) =

### F

(F−1(x1), . . . , F−1(xn)).

X

X

X1

Xn

### of

X, one can estimate the joint multivariate

1

n) =

### c(F

1(x1), . . . , Fn(xn))f1(x1)

n(xn)

1

n) =

n

1

n)

1

n

### (5) The difficulty of this problem depends on the data dimension n. In the bivariate case, e.g. n

= 2, only one pair-copula must be estimated

(x1

n) =

n)f(xn−1|xn)

1|x2

n).

i

### f(x

i|v) =cxi,vj|v−j(F(xi|v−j), F(vj|v−j))

×

i|v−j).

j

−j

j

1

2

3

1|x2

### x

3) =c12|3(F(x1|x3), F(x2|x3))

×

1|x3).

d

### C

x,vj|v−j(F(x|vj), F(x|v−j),|θ)

(vj|v−j)

−1(u|v, θ)

1

2

n

1=

1

2=

−1(w2|x1),

3=

−1(w3|x1

2),

n=

−1(wn−1|x1

−1).

(3)

example

sampled

R

CDVine

Step 1.

Step 2.

CDVineCopSelect

CDVine. Param-

Step 3.

CDVineMLE

Step 4.

CDVineSim

CDVine.

Step 5.

vs

2

### Alternatively, one could use a binary classifier to test whether the merged data is easy to discriminate in terms of the added labels

real vs.simulated. We use the Random Forest algorithm [10] as a su-

Case 1.

Case 2.

Case 3.

### each of the MCDA data sets. On the first two cases, it is hard to tell

(4)

C1

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.8

●●

0.00.20.40.60.81.0

0.05

C2

0.05

0.05

C3

0.00.20.40.60.81.0

0.0 0.2 0.4 0.6 0.8

0.00.20.40.60.81.0

0.05

0.01 0.05

0.0 0.2 0.4 0.6 0.8 1.0 0.05

C4

C1

10 30 50 70

● ●

30 50 70

020406080100

10305070

C2

C3

10203040

0 20 40 60 80 100

305070

10 20 30 40

● ●

C4

Figure 1

### In order to ensure that results are not purely due to randomness, we also produced 1000 simulated data sets without any hypothesis of dependence between criteria, i.e. we generated criteria values in- dependently, following only the marginal distributions for each cri- terion on each dataset. The results of such simulation are also listed in table 1.

RandomForest Error Ratio Data Set With copulas Without copulas

Case 1 [6] 0.534 0.5

Case 2 [11] 0.573 0.012

Case 3 [17] 0.204 0.119

Table 1

Case 1.

Updating...

## References

Related subjects :