• Aucun résultat trouvé

Reproducible and Accurate Matrix Multiplication for High-Performance Computing

N/A
N/A
Protected

Academic year: 2021

Partager "Reproducible and Accurate Matrix Multiplication for High-Performance Computing"

Copied!
3
0
0

Texte intégral

(1)

HAL Id: hal-01215627

https://hal.archives-ouvertes.fr/hal-01215627

Submitted on 23 Nov 2016

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Reproducible and Accurate Matrix Multiplication for High-Performance Computing

Caroline Collange, David Defour, Stef Graillat, Roman Iakymchuk

To cite this version:

Caroline Collange, David Defour, Stef Graillat, Roman Iakymchuk. Reproducible and Accurate Matrix Multiplication for High-Performance Computing. SCAN: Scientific Computing, Computer Arithmetic and Validated Numerics, Sep 2014, Wuerzburg, Germany. pp.42-43. �hal-01215627�

(2)

Reproducible and Accurate Matrix Multiplication for High-Performance Computing

Sylvain Collange, David Defour, Stef Graillat, and Roman Iakymchuk

INRIA – Centre de recherche Rennes – Bretagne Atlantique Campus de Beaulieu, F-35042 Rennes Cedex, France

sylvain.collange@inria.fr DALI–LIRMM, Universit´e de Perpignan 52 avenue Paul Alduy, F-66860 Perpignan, France

david.defour@univ-perp.fr

Sorbonne Universit´es, UPMC Univ Paris 06, UMR 7606, LIP6 F-75005 Paris, France

CNRS, UMR 7606, LIP6, F-75005 Paris, France Sorbonne Universit´es, UPMC Univ Paris 06, ICS

F-75005 Paris, France

{stef.graillat, roman.iakymchuk}@lip6.fr

Keywords: Matrix multiplication, reproducibility, accuracy, long ac- cumulator, multi-precision, multi- and many-core architectures.

The increasing power of current computers enables one to solve more and more complex problems. This, therefore, requires to perform a high number of floating-point operations, each one leading to a round- o↵error. Because of round-o↵ error propagation, some problems must be solved with a longer floating-point format.

As Exascale computing (1018 operations per second) is likely to be reached within a decade, getting accurate results in floating-point arithmetic on such computers will be a challenge. However, another challenge will be the reproducibility of the results – meaning getting a bitwise identical floating-point result from multiple runs of the same code – due to non-associativity of floating-point operations and dy- namic scheduling on parallel computers.

42

(3)

Reproducibility is becoming so important that Intel proposed a

“Conditional Numerical Reproducibility” (CNR) in its MKL (Math Kernel Library). However, CNR is slow and does not give any guar- antee concerning the accuracy of the result. Recently, Demmel and Nguyen [1] proposed an algorithm for reproducible summation. Even though their algorithm is fast, no information is given on the accuracy.

More recently, we introduced [2] an approach to compute determin- istic sums of floating-point numbers efficiently and with the best pos- sible accuracy. Our multi-level algorithm consists of two main stages:

filtering that relies upon fast vectorized floating-point expansions; ac- cumulation which is based on superaccumulators in a high-radix carry- save representation. We presented implementations on recent Intel desktop and server processors, on Intel Xeon Phi accelerator, and on both AMD and NVIDIA GPUs. We showed that the numerical repro- ducibility and bit-perfect accuracy can be achieved at no additional cost for large sums that have dynamic ranges of up to 90 orders of magnitude by leveraging arithmetic units that are left underused by standard reduction algorithms.

In this talk, we will present a reproducible and accurate (rounding to the nearest) algorithm for the product of two floating-point matrices in parallel environments like GPU and Xeon Phi. This algorithm is based on the DGEMM implementation. We will show that the perfor- mance of our algorithm is comparable with the classic DGEMM.

References:

[1] J. Demmel, H. D. Nguyen, Fast Reproducible Floating-Point Summation, Proceeding of the 21st IEEE Symposium on Computer Arithmetic, Austin, Texas, USA (2013), pp. 163-172.

[2] S. Collange, D. Defour, S. Graillat, R. Iakymchuk, Full- Speed Deterministic Bit-Accurate Parallel Floating-Point Summa- tion on Multi- and Many-Core Architectures,Research Report. HAL ID: hal-00949355. February 2014.

43

Références

Documents relatifs

Any E-function is solution of an E-operator so that the finite non-zero singularities of the minimal non-zero operator satisfied by a given E-function are

Hence, the surface position results from the interplay of the three involved physical mechanisms: surface tension and probe/liquid interaction forces act directly, whereas

More precisely, we provide sharp conditions on the network architecture and the sample under which the error on the parameters defining the network scales linearly with

Dans le quatrième chapitre, nous avons étudié les propriétés structurales, électroniques, optiques des composés binaires ZnO, ZnS, CdO ,CdS, et CdTe en utilisant la méthode

In all these works, global existence for arbitrary large data is shown by combining the standard energy estimate, which holds for general solutions of the Navier-Stokes equations,

later, Britton Harris (1965) suggested that the Alonso approach could also be used to tackle the problem of consumer choice among a fixed, heterogen- eous housing stock in the

The rest of this thesis is organized as follows: after briefly presenting in Chapter 2 an overview of the modern FPGA architecture and particularly, the features relevant for

Based on the review of the current research, these children were fostered in a language sensitive way placing emphasis in the relationship of different