• Aucun résultat trouvé

CADNA demo

N/A
N/A
Protected

Academic year: 2022

Partager "CADNA demo"

Copied!
25
0
0

Texte intégral

(1)

CADNA demo

Fabienne Jézéquel

Sorbonne Université, Laboratoire d’Informatique de Paris 6 (LIP6), France

Workshop on Large-scale Parallel Numerical Computing Technology RIKEN Center for Computational Science, Kobe, Japan, 6-8 June 2019

(2)

gunzip cadna_c-3.1.3.tar.gz tar -xvf cadna_c-3.1.3.tar cd cadna_c-3.1.3

./configure --prefix=‘pwd‘ --enable-fortran make install

Automatic detection of OpenMP, MPI. Up to 10 libraries can be created.

Ex: libcadnaC.a, libcadnaCdebug.a, libcadnaMPICforOpenMP.a, libcadnaMPICdebugforOpenMP.a, libcadnaMPIFortran.a,...

optimized versions: inlining, -O3 debug versions: no inlining, -O0, -g

(3)

Outline

1 CADNA installation

2 Numerical validation of C/C++ codes

3 Numerical validation of Fortran codes

4 The CADTRACE tool

(4)

P=333.75y6+x2(11x2y2y6121y42)+5.5y8+x/(2y) withx=77617andy=33096

Exact result :P-0.827396059946821368141165095479816292

§Execution without/with CADNA

(5)

Example 2

The roots of the following second order equation are computed:

0.3x22.1x+3.675=0.

The exact values are:

Discriminantd=0 x1=x2=3.5

§Execution without/with CADNA:

without CADNA:

wrong branchingthe result is false with CADNA:

i f(d==0.) is satisfied ifdis numerical noise

(6)

The determinant of Hilbert’s matrix of size 11 defined by ai,j=1/(i+j1)

is computed without pivoting strategy.

After triangularization, the determinant is the product of the diagonal elements.

§Execution without/with CADNA

(7)

Example 4

[J.-M. Muller, 1987]

The 25 first iterations of the following recurrent sequence are computed:

Un+1=1111130/Un+3000/(UnUn−1)

withU0=5.5andU1=61/11. The exact limit is 6.

§Execution without/with CADNA

With CADNA, instabilities related to DSA self-validation are detected.

Then, the accuracy estimation is not reliable.

(8)

A root of the polynomial

f(x)=1.47x3+1.19x21.83x+0.45

is computed by Newton’s method. The sequence is initialized withx=0.5. The iterative algorithm

xn+1=xnf(xn)/f0(xn) is stopped by the criterion

|xnxn−1| <10−12.

§Execution without/with CADNA

(9)

Example 5

Without CADNA

stopping criterionfabs(x-y)<1.e-12 x( 35) = +4.285714252078272e-01 x( 36) = +4.285714252078272e-01

The stopping criterion is satisfied by chance.

With CADNA

stopping criterionfabs(x-y)<1.e-12 x(100) = 0.4285714E+000

x(101) = 0.4285714E+000

ifx-yis numerical noise, the test is not satisfied

maximum number of iterations

Because of instable divisions detected by CADNA, a multiple root is suspected.

(10)

With CADNA

stopping criterionfabs(x-y)<=1.e-12or x==y x( 23) = 0.428571437E+000

x( 24) = 0.42857143E+000

ifx-yis numerical noise, the test is satisfied.

we simplify the faction, and so compte a simple root.

stopping criterion x==y

x( 45) = 0.428571428571430E+000 x( 46) = 0.428571428571429E+000

(11)

Example 6

A linear system of size 4 is solved using Gaussian elimination with partial pivoting.

§Execution without/with CADNA Without CADNA

wheni=2,a[2][2]is 4864.

But that value has actually no correct digits (associate exact value: 0).

a[2][2]is chosen as the pivot and leads to round-off errors in the subsequent computation.

With CADNA

One can observe thata[2][2]has no correct digits.

The test fabsf(a[j][i])>pmax fails.

a[3][2], that is computed accurately, is chosen as the pivot.

(12)

We compute several times:

x=6.83561e+5; y=6.83560e+5; z=1.00000000007;

r = ((z-x)+y) + ((z-y)+x-2);

Exact result:1.4 10−10

§Execution without/with CADNA Without CADNA:

using IEEE double precision with rounding to nearest r=2.32830643653870E-10

With CADNA:

we perform close evaluations:((zx)+y)and((zy)+x2).

If the same rounding mode is chosen for both parts, the final result appears as

(13)

Example with OpenMP

[Eberhart et al., 2016]

The sum S of an arrayAof size2n withn=106is computed in single precision:

#pragma omp parallel for for (i=0;i<2*n;i=i+2) {

A[i+1]=(float)i+1.;

A[i]=-(float)i;

} S=0.;

#pragma omp parallel for reduction(+:S) schedule(static,1) for (i=0;i<2*n;i++)

S=S+A[i];

Exact result:S=106

§Execution without/with CADNA

(14)

# threads without CADNA with CADNA without CADNA with CADNA 1 1.000000E+06 1.000000E+06 1.000000E+06 1.000000E+06 2 1.000000E+06 1.000000E+06 1.966080E+06 @.0 3 1.000000E+06 1.000000E+06 1.000000E+06 1.000000E+06 32 1.000000E+06 1.000000E+06 1.892352E+06 @.0 64 1.000000E+06 1.000000E+06 1.787904E+06 @.0 128 1.000000E+06 1.000000E+06 1.609728E+06 @.0 239 1.000000E+06 1.000000E+06 1.000000E+06 1.000000E+06 240 1.000000E+06 1.000000E+06 1.617920E+05 @.0

(static): the loop is divided into chunks of size2n/T whereT is the number of threads.

Each thread has in charge contiguous elements of A and alternatively sums positive and negative values: all results are accurate.

(static,1): the chunk size is 1.

(15)

With (static,1) and an even number of threads

With 2 threads:

A: 0 1 2 3 4 5 6 7 8 9 10 11 . . .

thread 0 thread 1 thread 0 computesP(n)=Pn1

i=0(2i)= −n2+n and thread 1Q(n)=Pn1

i=0(2i+1)=n2. Fork=2, . . . ,n,

P(k)=P(k1)(2k2)= −((k1)2+k1)(2k2) andQ(k)=Q(k1)+2k1=(k1)2+2k1.

The computation ofP(k)andQ(k)involves values with different orders of magnitude and generates round-off errors forksufficiently high.

With an even number of threads>2: same phenomenon.

The reduction involves inaccurate results with close absolute values and different signs and thus provides numerical noise.

(16)

For example, with 3 threads:

A: 0 1 2 3 4 5 6 7 8 9 10 11 . . .

thread 0 thread 1 thread 2

Each thread has in charge positive and negative elements of A.

No cancellation occurs and all results are accurately computed.

(17)

Example with MPI

[S.M. Rump, 1983]

We compute using 4 MPI processesP=9x4y4+2y2withx=10864and y=18817.

Processes 1, 2 and 3:

compute respectively9x4,y4and2y2 send their local result to process 0.

Process 0 receives and sums the results.

Exact result :P=1.

§Execution without/with CADNA mpirun -np 4 exampleMPI1

mpirun -np 4 exampleMPI1_cad

(18)

We compute using 4 MPI processesP=9x y +2y withx=10864and y=18817.

Processes 1, 2 and 3:

compute respectively9x4,y4and2y2in parallel using OpenMP send their local result to process 0.

Process 0 receives and sums the results.

Exact result :P=1.

§Execution without/with CADNA mpirun -np 4 exampleMPI_OMP1 mpirun -np 4 exampleMPI_OMP1_cad

(19)

Outline

1 CADNA installation

2 Numerical validation of C/C++ codes

3 Numerical validation of Fortran codes

4 The CADTRACE tool

(20)

interface operator(+)

module procedure add_double_st_double_st end interface operator(+)

interface

pure function cpp_add_double_st_double_st(a, b) bind(C) import double_st

type(double_st), intent(in) :: a type(double_st), intent(in) :: b

type(double_st) cpp_add_double_st_double_st end function cpp_add_double_st_double_st end interface

in cadna_add_binding.cc:

double_st cpp_add_double_st_double_st( double_st &a, double_st &b ) { return a+b; }

+ specific Fortran sources for overloading of array functions (MATMUL, SUM,

(21)

Outline

1 CADNA installation

2 Numerical validation of C/C++ codes

3 Numerical validation of Fortran codes

4 The CADTRACE tool

(22)

Installation

gunzip cadtrace-2.2.tar.gz tar -xvf cadtrace-2.2.tar cd cadtrace-2.2

./configure --prefix=‘pwd‘

make install

Usegdbwith thegdb_c.infile provided in theextra-filesdirectory and generate agdb.outfile.

§Example with a C program in theexamplesCdirectory:

gdb ex5_cad<gdb_c.in>gdb.out To obtain a detailed list of instabilities:

(23)

§Example with a Fortran program in theexamplesFortrandirectory:

gdb ex5_cad<gdb_c.in>gdb.out To obtain a detailed list of instabilities:

cadtrace_gcc gdb.out

You can also specify the number of function calls that generate each instability.

For instance, to get 3 levels of function calls:

cadtrace_gcc -n 3 gdb.out

Remark: thecadna_enable,cadna_disablefunctions may help for numerical debugging.

(24)

Thanks to Jean-Marie Chesneaux, Julien Brajard, Romuald Carpentier, Patrick Corde, Pacôme Eberhart, Pierre Fortin, Jean-Luc Lamotte, ...

(25)

Thanks to Jean-Marie Chesneaux, Julien Brajard, Romuald Carpentier, Patrick Corde, Pacôme Eberhart, Pierre Fortin, Jean-Luc Lamotte, ...

Thank you for your attention!

Références

Documents relatifs

On peut sans doute dans l’histoire des relations de l’Homme avec les autres êtres vivants, végétaux ou animaux, pro- poser, au moins comme base de discussion, l’existence

Ce livre, écrit dans un genre jugé audacieux de nos jours (il s’agit de conversations philosophiques rappelant à la fois les dialogues de Platon et les

« Symposium on applied organometallic chemistry » à Munich en novembre 1993 entre le Prix Nobel de chimie Jean-Marie Lehn, le docteur Peter Gölitz, éditeur de la revue Angewandte

Les utilisateurs doivent toujours associer à toute unité documentaire les éléments bibliographiques permettant de l'identifier correctement et notamment toujours faire mention du nom

Cette méthode s’applique en mécanique classique comme en relativité restreinte ou générale; elle structure les interactions de la matière avec le champ électro-magnétique

M´ ethodes de G´ eom´ etrie diff´ erentielle en Physique Math´ ematique, volume 836 of Lect.. Groupes

-Deuxième corps électoral ou liste électorale spéciale provinciale (Question régie par l’article 188 de la loi organique du 19 mars 1999) tout citoyen de

J ’ ai commencé à travailler dans son laboratoire Nancéen et depuis 40 années nous avons toujours collaboré et refait le monde.. L ’ écologie : une idée visionnaire à l