• Aucun résultat trouvé

Compiling Image Processing Applications for Many-Core Accelerators

N/A
N/A
Protected

Academic year: 2021

Partager "Compiling Image Processing Applications for Many-Core Accelerators"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-01254412

https://hal-mines-paristech.archives-ouvertes.fr/hal-01254412

Submitted on 12 Jan 2016

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Compiling Image Processing Applications for Many-Core Accelerators

Pierre Guillou

To cite this version:

Pierre Guillou. Compiling Image Processing Applications for Many-Core Accelerators . ACACES Summer School : Eleventh International Summer School on Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems, Jul 2015, Fiuggi, Italy. �hal-01254412�

(2)

Compiling Image Processing Applications

for Many-Core Accelerators

Pierre Guillou

– CRI MINES ParisTech, PSL Research University

Image Processing

image analysis:

detect geometrical structures in an image

mathematical morphology:

image analysis theory and technique based

on lattices theory

Mathematical Morphology Base Operators

arithmetic operators

unary (pixel ⊗ parameter, 1 input image) binary (pixel ⊗ pixel, 2 input images) + − × ÷ min max = & | ∼

morphological operators

stencils

neighbor selection + min/max/avg

reduction operators

global max/min/sum

other operators

threshold, mask, log2, . . .

=⇒ Sigma-C agent library

The MPPA-256 Chip

Compute cluster Compute cluster Compute cluster Compute cluster Compute cluster Compute cluster Compute cluster Compute cluster Compute cluster Compute cluster Compute cluster Compute cluster Compute cluster Compute cluster Compute cluster Compute cluster

I/O cluster

I/O

clu

ster

I/O

clu

ster

I/O cluster

Shared

memory

(2 MB)

System core Compute core Compute core Compute core Compute core Compute core Compute core Compute core Compute core Debug support unit Compute core Compute core Compute core Compute core Compute core Compute core Compute core Compute core

NoC Interface

NoC Interface

Host

machine

Local

memory

(3 GB)

PCI-E

DDR

Example: Licence Plate Extraction

Input Output

Sigma-C, a Dataflow Programming Language

Agent foo

=⇒

input 0 input 1 output a g e n t foo () { // d e s c r i b e a g e n t i n t e r f a c e i n t e r f a c e { in<int> input0 , i n p u t 1 ; out<int> o u t p u t ; // d e c l a r e the s t a t e m a c h i n e spec{ i n p u t 0 [2] , input1 , o u t p u t [ 3 ] } ; }

// loop over the s t a t e

void s t a r t() e x c h a n g e ( i n p u t 0 inp0 [2] , i n p u t 1 inp1 , o u t p u t outp [3]) { outp [0] = inp0 [0]; outp [1] = inp1 ; outp [2] = inp0 [1]; } } s u b g r a p h bar () { // d e s c r i b e s u b g r a p h i n t e r f a c e i n t e r f a c e { /* ... */ } map { // i n s t a n c i a t e a g e n t s a g e n t a1 = new A g e n t 1 (); a g e n t a3 = new S u b g r a p h 3 (); // ... // c o n n e c t a g e n t s to s u b g r a p h i n t e r f a c e s c o n n e c t ( input0 , a1 . i n p u t 0 ); c o n n e c t ( a5 . output , o u t p u t 1 ); // ... // c o n n e c t a g e n t s c o n n e c t ( a1 . output0 , a2 . i n p u t ); c o n n e c t ( a3 . output , a5 . i n p u t 1 ); // ... } } Agent 1 Agent 2 Subgraph 3 Agent 4 Agent 5

Subgraph bar

Compilation Chain

original application source-to-source

compiler call graph optimisations control code streaming kernels operator library

target-specific compiler GCC compute binaries host binary

Runtime Environment

Accelerator runtime

on I/O clusters

Host runtime

Control code

Compute clusters

Unix

pip

es

PCIe

NoC

launch computation send load from HD receive write on HD transfer transfer transfer store stream images read store result computation n computation i computation 1

Optimisations

unrolling of converging loops

arithmetic operators aggregation

generation of kernel-specific convolutions

data parallelization for compute-intensive operators

Results: Execution Times and Energy Consumption

(MPPA-256 = 1, lower is better)

anr999 antibio burner deblocking lp oop retina toggle GMEAN 10−2

10−1 100 101

Image processing applications

Relative

execution

time

MPPA-256 (Sigma-C/10 W) SPoC (FPGA/25 W)

AMD 4-core (OpenCL/60 W) Quadro 600 (OpenCL/40 W) Tesla C 2050 (OpenCL/240 W)

anr999 antibio burner deblocking lp oop retina toggle GMEAN 100

101 102

Image processing applications

Relative

energy

consumption

Future Work

Other programming models:

Pthreads/OpenMP on compute clusters, communication library between clusters OpenCL via local memory pagination

Improve data-parallelism to take better advantage of the curent architecture

Implement more complex algorithms: watershed, arrow, labelling, minima, . . .

References

Pierre Guillou, Fabien Coelho, and François Irigoin.

Automatic Streamization of Image Processing Applications.

The 27th International Workshop on Languages and Compilers for Parallel Computing (LCPC), 2014.

Références

Documents relatifs

Figure 3: Improvements in Public Sector Management benefit a Digital Government initiative, and this benefits the Delivery of Government Services2. This chapter intends to

Internal Report (National Research Council of Canada. Division of Building Research), 1986-02-01.. READ THESE TERMS AND CONDITIONS CAREFULLY BEFORE USING

La résistance des meubles rembourrés à l'inflammation est beaucoup plus difficile à évaluer que celle des matelas, car les matériaux utilisés pour fabriquer les premiers, ainsi

Keywords: Mathematical editing, content markup, packrat parsing, syntax checker, syntax corrector, TEX

By contrast, in the remainder of the genome, comparison of chimpanzee draft sequence with human reference sequence suggests that the gene content of the two species differs by &lt;

En s’appuyant sur les connaissances générales du fonctionnement des cours d’eau, sur l’étude des impacts des modifications des débits sur les communautés

All these groups found that the serogroup antigen has a high lipid content, although it is not known whether this could account for the &gt; 80% (saline extract pool B, phenol

Instead he amassed a remarkable collection of marble and porphyry objects and of Asian porcelains, almost all mounted in elegant gilt bronze settings (Fig. [43] The combined display