Signal processing system System complexity

(1)

Safouane NOUBIR, Yannick BORNAT and Bertrand LE GAL

IMS Laboratory, UMR CNRS 5218,

Bordeaux INP, University of Bordeaux, France

A scalable and efficient digital signal processing system for real time biological spike detection

Signal processing system System complexity

Conclusion and Perspectives Design methodology

Application domain Abstract

Processing system for cell network identification

Behavioral description of the processing system

System design constraints

ICECS : 25th IEEE International Conference on Electronics Circuits and Systems Bordeaux, France, 9-12 December 2018

(a) spike detection

HPF SWT Mono CM OUT

IN

(b) inter-channel correlation Signal processing system

<latexit sha1_base64="QlCqhBrKZNLJfqGbfEpluChfxGY=">AAAC8XicjVLLSsQwFD3W1/gedemmOAiuhlYEXQ66cangzAgq0tbMGO2LJhVE/Ae3uhK3fpF/oPgTnsQKPvCR0vbk3HPuzU0S5rFU2vMeB5zBoeGR0drY+MTk1PRMfXauo7KyiEQ7yuKs2AsDJWKZiraWOhZ7eSGCJIxFNzzbNPHuuSiUzNJdfZGLwyTop7Ino0CT6hwo2U+Co3rDa3p2uN+BX4EGqrGd1V9wgGNkiFAigUAKTRwjgOKzDx8ecnKHuCRXEEkbF7jCOL0lVYKKgOwZv33O9is25dzkVNYdsUrMt6DTxRI9GXUFsanm2nhpMxv2p9yXNqdZ2wX/YZUrIatxQvYv37vyvz7Ti0YP67YHyZ5yy5juoipLaXfFrNz90JVmhpycwceMF8SRdb7vs2s9yvZu9jaw8SerNKyZR5W2xPOv3eWs0GNto+XJ8CL4X4/9O+isNH2v6e+sNlob1ZWoYQGLWOa5r6GFLWyjzTqnuMYNbh3l3Dh3zv2b1BmoPPP4NJyHV+M7moc=</latexit><latexit sha1_base64="QlCqhBrKZNLJfqGbfEpluChfxGY=">AAAC8XicjVLLSsQwFD3W1/gedemmOAiuhlYEXQ66cangzAgq0tbMGO2LJhVE/Ae3uhK3fpF/oPgTnsQKPvCR0vbk3HPuzU0S5rFU2vMeB5zBoeGR0drY+MTk1PRMfXauo7KyiEQ7yuKs2AsDJWKZiraWOhZ7eSGCJIxFNzzbNPHuuSiUzNJdfZGLwyTop7Ino0CT6hwo2U+Co3rDa3p2uN+BX4EGqrGd1V9wgGNkiFAigUAKTRwjgOKzDx8ecnKHuCRXEEkbF7jCOL0lVYKKgOwZv33O9is25dzkVNYdsUrMt6DTxRI9GXUFsanm2nhpMxv2p9yXNqdZ2wX/YZUrIatxQvYv37vyvz7Ti0YP67YHyZ5yy5juoipLaXfFrNz90JVmhpycwceMF8SRdb7vs2s9yvZu9jaw8SerNKyZR5W2xPOv3eWs0GNto+XJ8CL4X4/9O+isNH2v6e+sNlob1ZWoYQGLWOa5r6GFLWyjzTqnuMYNbh3l3Dh3zv2b1BmoPPP4NJyHV+M7moc=</latexit><latexit sha1_base64="QlCqhBrKZNLJfqGbfEpluChfxGY=">AAAC8XicjVLLSsQwFD3W1/gedemmOAiuhlYEXQ66cangzAgq0tbMGO2LJhVE/Ae3uhK3fpF/oPgTnsQKPvCR0vbk3HPuzU0S5rFU2vMeB5zBoeGR0drY+MTk1PRMfXauo7KyiEQ7yuKs2AsDJWKZiraWOhZ7eSGCJIxFNzzbNPHuuSiUzNJdfZGLwyTop7Ino0CT6hwo2U+Co3rDa3p2uN+BX4EGqrGd1V9wgGNkiFAigUAKTRwjgOKzDx8ecnKHuCRXEEkbF7jCOL0lVYKKgOwZv33O9is25dzkVNYdsUrMt6DTxRI9GXUFsanm2nhpMxv2p9yXNqdZ2wX/YZUrIatxQvYv37vyvz7Ti0YP67YHyZ5yy5juoipLaXfFrNz90JVmhpycwceMF8SRdb7vs2s9yvZu9jaw8SerNKyZR5W2xPOv3eWs0GNto+XJ8CL4X4/9O+isNH2v6e+sNlob1ZWoYQGLWOa5r6GFLWyjzTqnuMYNbh3l3Dh3zv2b1BmoPPP4NJyHV+M7moc=</latexit><latexit sha1_base64="QlCqhBrKZNLJfqGbfEpluChfxGY=">AAAC8XicjVLLSsQwFD3W1/gedemmOAiuhlYEXQ66cangzAgq0tbMGO2LJhVE/Ae3uhK3fpF/oPgTnsQKPvCR0vbk3HPuzU0S5rFU2vMeB5zBoeGR0drY+MTk1PRMfXauo7KyiEQ7yuKs2AsDJWKZiraWOhZ7eSGCJIxFNzzbNPHuuSiUzNJdfZGLwyTop7Ino0CT6hwo2U+Co3rDa3p2uN+BX4EGqrGd1V9wgGNkiFAigUAKTRwjgOKzDx8ecnKHuCRXEEkbF7jCOL0lVYKKgOwZv33O9is25dzkVNYdsUrMt6DTxRI9GXUFsanm2nhpMxv2p9yXNqdZ2wX/YZUrIatxQvYv37vyvz7Ti0YP67YHyZ5yy5juoipLaXfFrNz90JVmhpycwceMF8SRdb7vs2s9yvZu9jaw8SerNKyZR5W2xPOv3eWs0GNto+XJ8CL4X4/9O+isNH2v6e+sNlob1ZWoYQGLWOa5r6GFLWyjzTqnuMYNbh3l3Dh3zv2b1BmoPPP4NJyHV+M7moc=</latexit>

e ^x

<latexit sha1_base64="9VcLpn75v0Ow2R4s5D+sbVL4PXU=">AAAC8XicjVLLSsNAFD3GV31XXboJFsGNJRFBl6IblxXsA+qDJJ3W2DQJyUSU0n9wW1fi1i/yDxR/wjNjBB/4mJDkzLnn3Dt3Ztw48FNpWY8jxujY+MRkYWp6ZnZufqG4uFRLoyzxRNWLgihpuE4qAj8UVenLQDTiRDg9NxB1t7uv4vVLkaR+FB7J61ic9JxO6Ld9z5GkauK0v3E1OCuWrLKlh/kd2DkoIR+VqPiCY7QQwUOGHgRCSOIADlI+TdiwEJM7QZ9cQuTruMAA0/RmVAkqHLJdfjucNXM25FzlTLXbY5WAb0KniTV6IuoSYlXN1PFMZ1bsT7n7Oqda2zX/bp6rR1binOxfvnflf32qF4k2dnQPPnuKNaO68/Ismd4VtXLzQ1eSGWJyCrcYT4g97XzfZ1N7Ut272ltHx5+0UrFq7uXaDM+/dhezQpu1lZYnw4tgfz3276C2Wbatsn24Vdrdy69EAStYxTrPfRu7OEAFVda5wA2GuDVSY2jcGfdvUmMk9yzj0zAeXgHB+Zp6</latexit><latexit sha1_base64="9VcLpn75v0Ow2R4s5D+sbVL4PXU=">AAAC8XicjVLLSsNAFD3GV31XXboJFsGNJRFBl6IblxXsA+qDJJ3W2DQJyUSU0n9wW1fi1i/yDxR/wjNjBB/4mJDkzLnn3Dt3Ztw48FNpWY8jxujY+MRkYWp6ZnZufqG4uFRLoyzxRNWLgihpuE4qAj8UVenLQDTiRDg9NxB1t7uv4vVLkaR+FB7J61ic9JxO6Ld9z5GkauK0v3E1OCuWrLKlh/kd2DkoIR+VqPiCY7QQwUOGHgRCSOIADlI+TdiwEJM7QZ9cQuTruMAA0/RmVAkqHLJdfjucNXM25FzlTLXbY5WAb0KniTV6IuoSYlXN1PFMZ1bsT7n7Oqda2zX/bp6rR1binOxfvnflf32qF4k2dnQPPnuKNaO68/Ismd4VtXLzQ1eSGWJyCrcYT4g97XzfZ1N7Ut272ltHx5+0UrFq7uXaDM+/dhezQpu1lZYnw4tgfz3276C2Wbatsn24Vdrdy69EAStYxTrPfRu7OEAFVda5wA2GuDVSY2jcGfdvUmMk9yzj0zAeXgHB+Zp6</latexit><latexit sha1_base64="9VcLpn75v0Ow2R4s5D+sbVL4PXU=">AAAC8XicjVLLSsNAFD3GV31XXboJFsGNJRFBl6IblxXsA+qDJJ3W2DQJyUSU0n9wW1fi1i/yDxR/wjNjBB/4mJDkzLnn3Dt3Ztw48FNpWY8jxujY+MRkYWp6ZnZufqG4uFRLoyzxRNWLgihpuE4qAj8UVenLQDTiRDg9NxB1t7uv4vVLkaR+FB7J61ic9JxO6Ld9z5GkauK0v3E1OCuWrLKlh/kd2DkoIR+VqPiCY7QQwUOGHgRCSOIADlI+TdiwEJM7QZ9cQuTruMAA0/RmVAkqHLJdfjucNXM25FzlTLXbY5WAb0KniTV6IuoSYlXN1PFMZ1bsT7n7Oqda2zX/bp6rR1binOxfvnflf32qF4k2dnQPPnuKNaO68/Ismd4VtXLzQ1eSGWJyCrcYT4g97XzfZ1N7Ut272ltHx5+0UrFq7uXaDM+/dhezQpu1lZYnw4tgfz3276C2Wbatsn24Vdrdy69EAStYxTrPfRu7OEAFVda5wA2GuDVSY2jcGfdvUmMk9yzj0zAeXgHB+Zp6</latexit><latexit sha1_base64="9VcLpn75v0Ow2R4s5D+sbVL4PXU=">AAAC8XicjVLLSsNAFD3GV31XXboJFsGNJRFBl6IblxXsA+qDJJ3W2DQJyUSU0n9wW1fi1i/yDxR/wjNjBB/4mJDkzLnn3Dt3Ztw48FNpWY8jxujY+MRkYWp6ZnZufqG4uFRLoyzxRNWLgihpuE4qAj8UVenLQDTiRDg9NxB1t7uv4vVLkaR+FB7J61ic9JxO6Ld9z5GkauK0v3E1OCuWrLKlh/kd2DkoIR+VqPiCY7QQwUOGHgRCSOIADlI+TdiwEJM7QZ9cQuTruMAA0/RmVAkqHLJdfjucNXM25FzlTLXbY5WAb0KniTV6IuoSYlXN1PFMZ1bsT7n7Oqda2zX/bp6rR1binOxfvnflf32qF4k2dnQPPnuKNaO68/Ismd4VtXLzQ1eSGWJyCrcYT4g97XzfZ1N7Ut272ltHx5+0UrFq7uXaDM+/dhezQpu1lZYnw4tgfz3276C2Wbatsn24Vdrdy69EAStYxTrPfRu7OEAFVda5wA2GuDVSY2jcGfdvUmMk9yzj0zAeXgHB+Zp6</latexit>

Fig. 1. Digital signal processing chain used to extract information from biological signals

order to maintain high levels of flexibility and genericity, the choice was made to turn to hardware generation from models using HLS tools [1]. Consequently, all the modules presented above have been described at the behavioural level using the SystemC language. These SystemC models have been made flexible and generic through the use of tempered parameters.

For example, the number of channels or the type of data can be adapted to the application needs. Data transfer between the different modules is ensured by FIFOs. It allows smoothing the processing load over time. In order to improve module implementation parameters, annotations have been added to the module source code. The latter allow for example to specify the code sections to be unrolled or pipelined.

IV. EXPERIMENTAL RESULTS

The main objective of the project was to design a real-time system to process at least the signals from 64 electrodes in parallel. The sampling frequency of the electrodes is 10 kHz.

These specifications correspond to the specifications of the analog acquisition boards available on the commercial market.

The SystemC models describing the system (Figure 1) were provided to the Vivado HLS 2018.1 synthesis tool.

The latter is in charge of generating the RTL-level hardware architecture. To simplify placement and routing on an ARTIX- 7 (XC7A100T-1CSG324C) component, a target working frequency of 100 MHz for the architecture has been set.

The whole system consists of 10 modules described in SystemC, which represents about 981 lines of C++ code, was entrusted in one go to the HLS tool. The digital architecture generated and which meets the specifications is composed of 1315 LUTs, 1887 FFs, 6 DSPs and 26 BRAMs (18k) after placement and routing. That’s about 9% of the FPGA circuit. A functional validation of the system was performed on FPGA using several technical resources (OLED screen, UART monitoring, SDcard reading) and real or synthetic data.

From the collected post-place and route data we notice that the system can be greatly extended in terms of number of channels processed in parallel. Indeed, in the target component, up to 700 channels in parallel can be supported if the maximal event rate at the output of the spike detection processing is about 100 spikes per second and per channel.

An analysis of the resources occupied by the system according to the number of channels processed is provided in Figure 2. Memory is the bottleneck of the system. Currently, it is impossible to exceed 512 channels on a Nexys-4 FPGA

board. In order to exceed this threshold, and then to address much complex cell experiments, it is necessary to switch at least to a mid-range FPGA device.

V. C^ONCLUSION

The objective of the demonstration is to highlight an application from the biomedical field whose processing properties are quite different from video and communication system ones. Then it also demonstrates that HLS methodologies are currently mature enough to generate complete processing systems. However, performance level depends on SystemC model quality and user defined annotations.

ACKNOWLEDGEMENT

The authors thank the SIS GPU at IMS Laboratory for its financial support.

R^EFERENCES

[1] Xilinx, Vivado Design Suite User Guide, UG902 (v2017.4), 2018

[2] J.H. Siegle, A. Cuevas L´opez, Y.A. Patel, K. Abramov, S. Ohayon and J. Voigts, Open Ephys: an open-source, plugin-based platform for multichannel electrophysiology. J Neural Eng 14: 045003

[3] G. Le Masson, S. Renaud-Le Masson, D. Debay, and T. Bal, Feedback inhibition controls spike transfer in hybrid thalamic circuits Nature. vol.

417, pp. 854–858, 2002.

[4] A. Quotb, Y. Bornat, M. Raoux, J. Lang, S. Renaud, NeuroBetaMed: A re- configurable wavelet-based event detection circuit for in vitro biological signals. ISCAS, 2012.

[5] E Casseau, B Le Gal, P Bomel, C Jego, S Huet and E Martin, C-based rapid prototyping for digital signal processing. EUSIPCO, 2005.

[6] R.R. Harrison, A low-power integrated circuit for adaptive detection of action potentials in noisy signals, IEEE EMBC, 2003.

[7] Yang Dan, Mu-ming Poo, Spike Timing-Dependent Plasticity of Neural Circuits. Neuron, vol. 44, 2004.

0 10 20 30 40 50 60 70 80 90 100 0

512 1,024 1,536 2,048

FPGA BRAM usage (%)

Numberofchannels

Nexys-4 VC-707 Genesis-2

Fig. 2. Number of supported channels depending on the FPGA device

2.3. Synthèse et implémentation 13

une paire de filtres (HPF 2.6 & LPF 2.4) :

d^l[n] =

2^l

X

i=1

g^l[k]a^l ¹[n k] (2.3)

a^l[n] =

2^l

X

i=1

h^l[k]a^l ¹[n k] (2.4)

Dans les équations 2.6 et 2.4,L représente le niveau de décomposition auquel le filtre appartient. Chaque niveau correspond à une certaine utilisation et type de signal, il dé- pend généralement du SNR, de l’échantillonnage et de la fréquence de spikes. Afin de traiter plusieurs types de signaux, une SWT avec plusieurs niveaux doit être implémentée.

Cependant, passe d’un niveau un autre nécessite deux fois plus d’échantillons (2, 4, 8, 16, ...) :

d¹[n] =g¹[1]a⁰[n 1] +g²[2]a⁰[n 2] (2.5)

d²[n] =g²[1]a¹[n 1] +g²[2]a¹[n 2] +g²[3]a¹[n 3] +g²[4]a¹[n 4] (2.6) Finalement, la valeur fournie par le RMS est employé comme valeur seuil. cependant, une mise en échelle est nécessaire avant de la comparée au signal biologique filtré comme montré dans la figure 2.7.

0,3 0 0,3

Input

0,1 0 0,1

Filteredsig.

Events 0

Figure2.7 – Signal biologique avant et après HPF & SWT

2.3 Synthèse et implémentation

La chaîne de traitement de signal présentée dans précédemment a été décrite en Sys- temC. La modélisation en SystemC et sa validation en simulation à nécessité 5 semaines.

Normalement, la chaîne est adaptée pour les neurones et les cellules pancréatique mais il

d^l[n] =

2^l

X

i=1

g^l[k]a^l ¹[n k] (2.3)

a^l[n] =

2^l

X

i=1

h^l[k]a^l ¹[n k] (2.4)

Dans les équations 2.6 et 2.4, Lreprésente le niveau de décomposition auquel le filtre appartient. Chaque niveau correspond à une certaine utilisation et type de signal, il dé- pend généralement du SNR, de l’échantillonnage et de la fréquence de spikes. Afin de traiter plusieurs types de signaux, une SWT avec plusieurs niveaux doit être implémentée.

d¹[n] =g¹[1]a⁰[n 1] +g²[2]a⁰[n 2] (2.5)

0,3 0 0,3

Input

0,1 0 0,1

Filteredsig.

Events 0

22Chapitre3.Extractiondesliensentreneuronnes

Figure3.6–Illustrationdelamatrice

essayerdecontournerleproblème,lafigure3.8représentel’évolutiondelalimiteenfonctiondelafréquence.Ilexisteaussidescellulesquiaurontunefréquencedespikeinférieurà100hzetdoncauraitunelimiteplusgrande.

3.5Résultatsd’implantation

Enreprenantlesmêmesparamètresdelapremièrepartie(64canaux,10khzdefré-quenced’échantillonnageetuneNexys4),uneimplémentationdetouslessystèmesaétéeﬀectué.Leprojetutiliseauxtotaux10modulesdécritenSytemC,cequiestpresqueégalà981lignesdecodeC++.L’architecturedigitalegénéréeàpartirdesesmodulesutilise1315LUTs,1887FFs,6DSPser26BRAMs(18k)égalepresqueà9%desressourcesdelacarte.Vulahauteconsommationderessources,ilestimpossiblededépasser64canauxsurNexys4.Lafigure3.9représentel’évocationdel’utilisationderessourcesenfonctiondunombredescanauxpourdiﬀérentescartes.

d^l[n] =

2^l

X

i=1

g^l[k]a^l ¹[n k] (2.3)

a^l[n] =

2^l

X

i=1

h^l[k]a^l ¹[n k] (2.4)

Dans les équations 2.6 et 2.4, L représente le niveau de décomposition auquel le filtre appartient. Chaque niveau correspond à une certaine utilisation et type de signal, il dé- pend généralement du SNR, de l’échantillonnage et de la fréquence de spikes. Afin de traiter plusieurs types de signaux, une SWT avec plusieurs niveaux doit être implémentée.

d¹[n] =g¹[1]a⁰[n 1] +g²[2]a⁰[n 2] (2.5)

0,3 0 0,3

Input

0,1 0 0,1

Filteredsig.

Events 0

Figure 2.7 – Signal biologique avant et après HPF & SWT

d^l[n] =

2^l

X

i=1

g^l[k]a^l ¹[n k] (2.3)

a^l[n] =

2^l

X

i=1

h^l[k]a^l ¹[n k] (2.4)

d¹[n] =g¹[1]a⁰[n 1] +g²[2]a⁰[n 2] (2.5)

0,3 0 0,3

Input

0,1 0 0,1

Filteredsig.

Events 0

d^l[n] =

2^l

X

i=1

g^l[k]a^l ¹[n k] (2.3)

a^l[n] =

2^l

X

i=1

h^l[k]a^l ¹[n k] (2.4)

d¹[n] =g¹[1]a⁰[n 1] +g²[2]a⁰[n 2] (2.5)

0,3 0 0,3

Input

0,1 0 0,1

Filteredsig.

Events 0

d^l[n] =

2^l

X

i=1

g^l[k]a^l ¹[n k] (2.3)

a^l[n] =

2^l

X

i=1

h^l[k]a^l ¹[n k] (2.4)

d¹[n] =g¹[1]a⁰[n 1] +g²[2]a⁰[n 2] (2.5)

0,3 0 0,3

Input

0,1 0 0,1

Filteredsig.

Events 0

3.1. Analyse aux limites 19

(read & write) ce qui est équivalent à 16 opérations, en généralisant, chaque spike mène à 2n opérations. Dans la matrice suivante, il est possible d’observer le nombre d’opération apporté par un seul spike.

2 66 4

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

3 77 5!

2 66 4

0 0 1 0

0 0 2 0

↵₁ ↵₂ ↵₃ + ₃ ↵₄

0 0 4 0

3 77 5

3.1.2 L’exponentielle

Comme vu précédemment, la modification de point se fait avec la formule : e ^t/⌧, où t est le temps entre deux spikes consécutives. L’outil HLS fourni des fonctions publiques réalisant le calcul exponentiel directement, cependant, l’exponentiel est une fonction com- plexe nécessitant beaucoup de ressources et de latence dans certains cas, figure 3.2.

Table 3.1 – Utilisation de ressource HLS Maths float fixed

Quantization 32b 16b

Design sequential sequential

LUTs 864 2660

FFs 245 349

BRAMs (18k) 1 3

DSP 7 1

Latency 16 211

3.1.3 Équation diﬀérentielle

Parmi les solutions possible permettant de simplifier le calcul, on trouve la résolution continue de l’équation diﬀerentielle d’une exponentielle :

@y(t)

@t = 1

⌧y(t) (3.3)

yn = xn + 1 +⌧

⌧ yn 1 (3.4)

Résoudre l’équation revient est équivalent à filtrer les spikes à travers un filtre dont la réponse pulsionnelle est une e ^t/⌧. Dans ce cas, chaque exponentielle sera transformé en 18 Chapitre 3. Extraction des liens entre neuronnes

00 1

Neurone1

00 1

Neurone2

00 1

Influence

et à chaque spike, une modification de poids (incrémentations ou décrémentation) se fait.

Cette modification est calculée en fonction du temps écoulé entre deux spike avec la formule e ^t/⌧ ou ⌧ est le temps entre deux spikes successives.

Dans la figure 3.2, il possible de voir la modification qu’un spike peut apporter en fonction du temps écoulé.

si j spike wi!ⁱ(t) =wi!ⁱ(t dt) + X

spikes

e ^t^t (3.1)

wj!ⁱ(t) =wj!ⁱ(t dt) X

spikes

e ^t^t (3.2)

Dans le milieu de culture, il est possible qu’un haut potentiel se propage et donc, une électrode peut récupérer des signaux provenant de plusieurs cellules et peut introduire des erreurs lors du calcul du poids il est donc nécessaire de supposer dans la suite que chaque électrode représente un seul neurone.

3.1 Analyse aux limites

3.1.1 Quelques détails sur cette méthode

En utilisant la méthode décrite précédemment, un système prenant en entrée les spikes et calcul les poids entre chaque neurone. Ce poids sera stocké dans une matrice n suivant et sera appelé matrice de corrélation dans la suite.

Il est nécessaire de comprendre que la mise à jour de la matrice de corrélation nécessite un grand nombre d’opération. Par exemple, pour 4 neurones, si un spike est détecté, il faut modifier 8 cases dans la matrice, chaque modification représente deux opérations

Inter-spike time dependent value update (decreasing

exponential penalty)

explored according to system requirements. This design space exploration, can also be applied to other FPGA devices or families. Indeed, for pharmaceutical industry drug screening, processing thousands of channels could be required, thus justifying other architectural solution needs.

B. Relative comparison

Hardware generation from model is fast and provides design space exploration capabilities. However, we expect the generated architectures to be less efficient. Consequently, an evaluation of the flexibility cost was done. A fixed-point version of the presented processing architecture was previously manually designed and optimized for a wider system [2]. This hand made architecture was designed and area optimized to process 64 channels in parallel. The same setup was applied to the generated architectures from model on the same FPGA board. Post-PaR results are provided in Table I.

On one hand, the performance comparison shows that model-based architectures are less efficient in terms of hardware complexity. All the hardware resources are affected.

Nevertheless, the overall model-based architectures provides a lower latency and thus a higher processing throughput. In order to remove this performance gap, it would be necessary to develop specific model for SWT processing to reproduce the area optimization applied on hand-crafted RTL architecture.

On the other hand, HLS generated architecture is attrac- tive in terms of processing efficiency and also in terms of development time: contrary to hand-crafted architecture that required months, the first model-based architecture was developed in weeks. Currently, the model based methodology enables design space exploration (e.g. switching from semi- sequential architecture to pipeline ones) in minutes whereas for hand-crafted architecture a partial redesign of the is needed involving days of work. Finally, the floating point system is more area consuming than the others but it avoids fixed- point refinement work for designer, making idea testing more efficient.

V. CONCLUSION

In this article, the presented work demonstrate the interest of model-based methodologies to prototype or design real-time

10² 10³ 0

20 40 60 80 100

# of channels

#ofLUTs

float fixed

10² 10³ 0

20 40 60 80 100

# of channels

BRAM18usage(%)

float fixed

Fig. 5. Hardware complexity depending on the data representation format and the number of processed channels.

FPGA architectures for cell activity detection. Indeed, even if a hardware complexity over-cost exists, it is still limited.

However, it is important to notice that achieved performance heavily depends on model quality and user application knowl- edge.

ACKNOWLEDGEMENT

REFERENCES

[1] G. L. Masson, S. R.-L. Masson, D. Debay, and T. Bal, “Feedback inhibition controls spike transfer in hybrid thalamic circuits,” Nature, vol. 417, pp. 854–858, 2002.

[2] A. Pirog, Y. Bornat, R. Perrier, M. Raoux, M. Jaffredo, A. Quotb, J. Lang, N. Lewis, and S. Renaud, “Multimed: An integrated, multi- application platform for the real-time recording and sub-millisecond processing of biosignals,” Sensors, vol. 18, no. 7, 2018.

[3] J. A. et al., “Design space exploration of LDPC decoders using high- level synthesis,” IEEE Access, 2017.

[4] X. Liu, Y. Chen, T. Nguyen, S. Gurumani, K. Rupnow, and D. Chen,

“High level synthesis of complex applications: An H.264 video decoder,”

in Proceedings of the ACM/SIGDA International Symposium on Field- Programmable Gate Arrays, 2016, pp. 224–233.

[5] E. Casseau, B. Le Gal, P. Bomel, C. Jego, S. Huet, and E. Martin, “C- based rapid prototyping for digital signal processing,” in Proceedings of the 13th European Signal Processing Conference (EUSIPCO), 2005, pp. 1–4.

[6] S. K. Jain and B. Bhaumik, “An ultra low power ECG signal processor design for cardiovascular disease detection.” in Proceedings of the International Conference of the IEEE Engineering in Medicine and Biology Society, August 2015, pp. 857–860.

[7] X. Liu, Y. J. Zheng, M. W. Phyu, B. Zhao, M. Je, and X. J. Yuan,

“A miniature on-chip multi-functional ECG signal processor with 30 µW ultra-low power consumption,” in Proceedings of the International Conference of the IEEE Engineering in Medicine and Biology Society, 2010, pp. 2577–2580.

[8] A. Quotb, Y. Bornat, M. Raoux, J. Lang, and S. Renaud, “Neurobetamed:

A re-configurable wavelet-based event detection circuit for in vitro biological signals,” inProceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), 2012, pp. 1532–1535.

[9] R. Harrison, “A low-power integrated circuit for adaptive detection of action potentials in noisy signals,” in Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, vol. 4, 2003, pp. 3325–3328.

[10] A. Maccione, M. Gandolfo, P. Massobrio, A. Novellino, S. Martinoia, and M. Chiappalone, “A novel algorithm for precise identification of spikes in extracellularly recorded neuronal signals,” Journal of Neuro- science Methods, vol. 177, no. 1, pp. 241 – 249, 2009.

[11] G. Martin and G. Smith, “High-level synthesis: Past, present, and future,”

IEEE Design & Test of Computers, 2009.

[12] P. Coussy and A. Morawiec, Eds., High-Level Synthesis from Algorithm to Digital Circuit. Springer, 2008.

[13] Xilinx, Vivado Design Suite User Guide - High-Level Synthesis, ug902 (v2017.1) ed., 2017.

TABLE I

HARDWARE COMPARISON WITH HANDMADE DESIGN

HLS generated [2]

fixed float fixed fixed

Quantization 18b 32b 18b –b

Design pipeline pipeline semi-seq. semi-seq.

LUTs 1003 2899 979 352

FFs 1242 3935 1097 181

BRAMs 7.5 6 6.5 5

DSPs 6 43 4 2

Fmax (MHz) 200 200 166 153

Latency (cycles) 72 125 664 3204