• Aucun résultat trouvé

Meta-programming for Cross-Domain Tensor Optimizations

N/A
N/A
Protected

Academic year: 2021

Partager "Meta-programming for Cross-Domain Tensor Optimizations"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-01939535

https://hal-mines-paristech.archives-ouvertes.fr/hal-01939535

Submitted on 29 Nov 2018

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Meta-programming for Cross-Domain Tensor Optimizations

Adilla Susungi, Norman A. Rink, Jeronimo Castrillon, Claude Tadonki

To cite this version:

Adilla Susungi, Norman A. Rink, Jeronimo Castrillon, Claude Tadonki. Meta-programming for Cross-Domain Tensor Optimizations. conference on Systems, Programming, Languages and Applications: Software for Humanity (SPLASH), Nov 2018, Boston, United States. �hal-01939535�

(2)

Meta-programming for Cross-Domain Tensor Optimizations

Adilla Susungi, Norman A. Rink, Albert Cohen, Jer´

onimo Castrill´

on, Claude Tadonki

adilla.susungi@mines-paristech.fr — norman.rink@tu-dresden.de

Design and semantics of a tensor optimization meta-language

Functional transformation meta-language with high-level abstractions for tensor

computations to enable the composition of different type of transformations (e.g. loop, algebraic or layout transformations).

Formal specification of tensor operations

and loop transformations using denotational semantics.

Examples of use: empirical tuning engines, meta-programming optimizations by experts.

TeML Grammar

hprogrami ::= hstmti hprogrami | 

hstmti ::= hid i = hexpressioni | hid i = @ hid i: hexpressioni | codegen (hidsi)

| init (. . . )

hexpressioni ::= hTexpressioni | hLexpressioni

hTexpressioni ::= tensor ([hintsi]) | eq (hid i, hitersi? → hitersi)

| vop (hid i, hid i, [hitersi?, hitersi?])

| op (hid i, hid i, [hitersi?, hitersi?] → hitersi)

hLexpressioni ::= build (hid i)

| stripmine (hid i, hinti, hinti) | interchange (hid i, hinti, hinti) | ...

hitersi ::= [hidsi]

hidsi ::= hid i (, hid i)* hintsi ::= hinti (, hinti)*

Code Sample

# -- Begin program specification

w = tensor(double, [13])

u = tensor(double, [13, 13, 13]) L = tensor(double, [13, 13])

M_ = outerproduct([w, w, w])

Lh = div(L, w, [[i1, i2], [i2]] -> [i1, i2]) ,→ M = entrywise_mul(M_, u) r1 = contract(Lh, M, [[2, 1]]) r2 = contract(Lh, M, [[2, 2]]) r3 = contract(Lh, M, [[2, 3]])

# -- End program specification

# Code generation without transformations

l1 = build(M_) l2 = build(Lh) l3 = build(M) l4 = build(r1) l5 = build(r2) l6 = build(r3) codegen([l1, l2, l3, l4, l5, l6])

# Code generation with loop fusions only

l7 = fuse(l4, l5, 3) l8 = fuse(l7, l6, 3)

codegen([l1, l2, l3, l7])

# Code generation with fusion, parallelism and vectorization ,→ l9 = parallelize(l1, 1, None) l10 = parallelize(l2, 1, None) l11 = parallelize(l3, 1, None) l12 = parallelize(l8, 1, None) l13 = vectorize(l9, 3) l14 = vectorize(l10, 2) l15 = vectorize(l11, 3) codegen([l13, l14, l15, l12]) Semantics Foundations Domains and state

T = { h(op, S, I), tsi | (ts = []) ∨ (ts = [t1, . . . , tk] ∧ ti ∈ T)}

L = { hid, [x1, . . . , xk]i | xi ∈ L ∪ T } σ : identifier → (T + L) Valuation functions PprogJs pK = PprogJpK ◦ PstmtJsK EtJtensor(S)K = λσ.h(, S, ), []i EtJeq(t, I0 → I1)K =

λσ.let h(op, S, I0), ysi = σ(t)

y = h(op, S, I00), ysi x = h(, S0, I1), []i in h(=, •, •), [x, y]i , where  I 0 6=  ∧ I00 = I0 , if I 0 =  I0 =  ∧ I00 = I0 , if I0 6= 

ElJbuild(t)K = λσ.let r = “number of iterators in σ(t)”

ik = (0, ubk, 1) for k = 1, . . . , r in hi1, . . . , hir, [σ(t)]i . . . i , where σ(t) = h(=, •, •), [x, y]i ElJstripmine(l, r, v)K = λσ.let hi1, . . . hir, xsi . . . i = σ(l) (b, e, 1) = ir i0r = (0, (e − b)/v − 1, 1) i0r+1 = (b + v · i0r, b + v · i0r + (v − 1), 1) in hi1, . . . hi0r, [hi0r+1, xsi]i . . . i ElJinterchange(l, r1, r2)K = λσ.let hi1, . . . hir1, . . . hir2, xsi . . . i . . . i = σ(l) in hi1, . . . hir2, . . . hir1, xsi . . . i . . . i Tensor Expressions Matrix transposition A = tensor([N1, N2])

B = eq(A, [i1, i2] -> [i2, i1])

(=, •, •)

(B, [N2, N1], [i2, i1]) (A, [N1, N2], [i1, i2])

ElJbuild(B)Kσ2 = hi1, [hi2, [σ2(B)]i]i :

for (int i1 = 0; i1 <= (N1-1); i1++) for (int i2 = 0; i2 <= (N2-1); i2++) B[i2][i1] = A[i1][i2]; Compositions Contraction PstmtJt0 = contract(t0, t1, [r0, r1])K = Pprog u v t2 = vmul(t0, t1, [I, J ]) t0 = add(t0, t2, [I0, ] → I0) } ~ where

I = [i0, . . . , i(r0 − 1), k, i(r0 + 1), . . . , is0]

J = [j0, . . . , j(r1 − 1), k, j(r1 + 1), . . . , js1]

I0 = (I \{k}) || (J \{k})

Tiling

Initial loop nest

i1 i3 i5 xs

stripmine n( , 3, v) has introduced i02, i04, and i06

i01 i02 i03 i04 i05 i06 xs

* After three times interchange n.

i01 i03 i05 i02 i04 i06 xs Experiments Helmholtz 1 2 4 8 0 2 4 6 Cores Pluto TeML Mttkrp 1 2 4 8 0 2 4 6 8 Cores

TensorFlow Pluto TeML

Grouped convolutions 1 2 4 8 0 2 4 6 8 10 Cores Pluto TeML

On Intel(R) Core(TM) i7-4910MQ CPU

(2.90GHz, 8 hyperthreads, 8192KB of shared L3 cache), Ubuntu 16.04.

Generated C programs compiled with the Intel C compiler ICC 18.02 (flags: -O3 -xHost

-qopenmp) With TeML:

Pluto (v 0.11.4) optimizations reproducible

Capability to express better optimization paths

Future Work

Abstractions for memory virtualization, stencil patterns, sparse tensors and corresponding semantics, extensions for parallelism support, type system

Références

Documents relatifs

Using the Fo¨rster formulation of FRET and combining the PM3 calculations of the dipole moments of the aromatic portions of the chromophores, docking process via BiGGER software,

Si certains travaux ont abordé le sujet des dermatoses à l’officine sur un plan théorique, aucun n’a concerné, à notre connaissance, les demandes d’avis

Later in the clinical course, measurement of newly established indicators of erythropoiesis disclosed in 2 patients with persistent anemia inappropriately low

Households’ livelihood and adaptive capacity in peri- urban interfaces : A case study on the city of Khon Kaen, Thailand.. Architecture,

3 Assez logiquement, cette double caractéristique se retrouve également chez la plupart des hommes peuplant la maison d’arrêt étudiée. 111-113) qu’est la surreprésentation

Et si, d’un côté, nous avons chez Carrier des contes plus construits, plus « littéraires », tendant vers la nouvelle (le titre le dit : jolis deuils. L’auteur fait d’une

la RCP n’est pas forcément utilisée comme elle devrait l’être, c'est-à-dire un lieu de coordination et de concertation mais elle peut être utilisée par certains comme un lieu

The change of sound attenuation at the separation line between the two zones, to- gether with the sound attenuation slopes, are equally well predicted by the room-acoustic diffusion