Automatic labeling of cortical sulci using patch- or CNN-based segmentation techniques combined with bottom-up geometric constraints

(1)

HAL Id: hal-02517321

https://hal.archives-ouvertes.fr/hal-02517321

Submitted on 24 Mar 2020

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Automatic labeling of cortical sulci using patch- or

CNN-based segmentation techniques combined with

bottom-up geometric constraints

Léonie Borne, Denis Rivière, Martial Mancip, Jean-François Mangin

To cite this version:

Léonie Borne, Denis Rivière, Martial Mancip, Jean-François Mangin. Automatic labeling of

corti-cal sulci using patch- or CNN-based segmentation techniques combined with bottom-up geometric

constraints. Medical Image Analysis, Elsevier, 2020, 62, pp.101651. �10.1016/j.media.2020.101651�.

�hal-02517321�

(2)

ContentslistsavailableatScienceDirect

Medical

Image

Analysis

journalhomepage:www.elsevier.com/locate/media

Automatic

labeling

of

cortical

sulci

using

patch-

or

CNN-based

segmentation

techniques

combined

with

bottom-up

geometric

constraints

Léonie

Borne

a ,∗

_,

_Denis

_Rivière

a

_,

_Martial

_Mancip

b

_,

_{Jean-François}

_Mangin

a a Université Paris-Saclay, CEA, CNRS, Neurospin, Baobab, Gif-sur-Yvette, 91191, France

b Maison de la Simulation, CNRS, CEA Saclay, Gif-sur-Yvette, 91191, France

a

r

t

i

c

l

e

i

n

f

o

Article history: Received 28 May 2019 Revised 14 January 2020 Accepted 16 January 2020 Available online 28 February 2020

Keywords:

Convolutional neural network Multi-atlas segmentation Cortical sulci labeling

a

b

s

t

r

a

c

t

Theextremevariabilityofthefoldingpatternofthehumancortexmakestherecognitionofcorticalsulci, bothautomaticandmanual,particularlychallenging.Reliableidentificationofthehumancorticalsulciin itsentirety, isextremelydifficultand is practicedbyonlyafew experts. Moreover,thesesulci corre-spondto morethanahundred differentstructures, whichmakes manuallabelinglongand fastidious andthereforelimits accesstolargelabeled databasestotrainmachinelearning.Here,weseekto im-provethecurrentmodelproposedintheMorphologisttoolbox,awidelyusedsulcusrecognitiontoolbox includedintheBrainVISApackage.Twonovelapproachesareproposed:patch-basedmulti-atlas segmen-tation(MAS)techniquesandconvolutionalneuralnetwork(CNN)-basedapproaches.Botharecurrently appliedforanatomicalsegmentationsbecausetheyembedmuchbetterrepresentationsofinter-subject variabilitythanapproachesbasedonasingletemplateatlas.However,thesemethodstypicallyfocuson voxel-wiselabeling,disregardingcertaingeometricalandtopologicalpropertiesofinterestforsulcus mor-phometry.Therefore,weproposetorefinetheseapproacheswithdomainspecificbottom-upgeometric constraintsprovidedbytheMorphologisttoolbox.Theseconstraintsareutilizedtoprovideasingle sul-cuslabeltoeachtopologicallyelementaryfold,thebuildingblocksofthepatternrecognitionproblem.To eliminatetheshortcomings associatedwiththeMorphologist’spre-segmentationintoelementaryfolds, wecomplementthisregularization schemeusingatop-downperspective whichtriggersan additional cleavageofthe elementaryfolds when required.Allthe newlyproposed models outperformthe cur-rentMorphologistmodel,themostefficientbeingaCNNU-Net-basedapproachwhichcarriesoutsulcus recognitionwithinafewseconds.

1. Introduction

The surface of the brain is divided into many convolutions, calledgyri,delimitedbyfolds,calledsulci.Themainsulciare con-sideredasthelimitsbetweenfunctionallyandarchitecturally dif-ferentregions. Additionally,cortex morphometryisused to quan-tify brain development and degenerative diseases. Despite the manytoolsavailable for3Dvisualization ofsulci,sulci labelingis a longandfastidiousprocess. Ittakesseveralhoursforan expert tolabelallsulciinasinglebrainandreliablelabelingrequiresthe opinion ofseveralexperts.However, becauseofthelarge

variabil-∗ _{Corresponding author.}

E-mail address: [email protected] (L. Borne).

ityofthe foldingpatterninthe generalpopulation, inferring de-velopmentalbiomarkers requiresthe miningofdata fromalarge numberofbrains.Thesebiomarkers maycorrespondto character-isticsofthesulci, such assize,depthoropening.However, these measuresrequirethepriorlabelingofsulci.Therefore,automation ofthesulcusrecognitionisessential.

Nevertheless, learning to label sulci is an extremely complex challengeforseveralreasons.First,asillustratedinFig. 1 ,sulciare highlyvariablestructures,somesulciareevenabsentinmorethan 70%ofbrainsandsomesubjectshaveupto8sulcimissing. Addi-tionally,eachbraincontainsmorethan120differentsulciandonly asmallnumberofsegmentationalgorithmsaremadeforasmany structures.Finally,thenumberofmanuallylabeledsubjectswhich canbeusedforsupervisedlearningislimited.

https://doi.org/10.1016/j.media.2020.101651

(3)

Fig. 1. Illustration of cortical folds variability. The manual labeling of the three right hemispheres represented here shows the variability of cortical sulci by their shape, size, and position.

1.1. Overview of automatic sulci recognition methods

Algorithms dedicated to automatic sulci recognition are pri-marily based on graphical representations, which represents the relative positions of the sulci with respect to each other, as well as their position and their location in a standardized space (Royackkers et al., 1998; Riviere et al., 2002; Vivodtzev et al., 2006; Shi et al., 2007; Yang and Kruggel, 2009; Belaggoune et al., 2014 ). Toensuretheirrobustrecognition,othermethodshavepreviously been experimented with to model inter-subject variability using severalframeworks rangingfromprincipalcomponentanalysisto Bayesian approaches (Lohmann and von Cramon, 20 0 0; Behnke et al., 2003; Fischl et al., 2004; Perrot et al., 2011 ). All of these methods are based on a segmentation algorithm followed by a classiﬁcationalgorithm, inwhich thesulci are ﬁrst extracted, ac-cordingtodifferentrepresentations,thenlabeled.

In thispaper, theobjective isto improvethe modelproposed intheBrainVISA/Morphologistpackage(Perrot et al., 2011 ). Todo this,wefocusedontwoaspects ofthepipeline:ontheonehand, thesulcilabelingalgorithmand,ontheotherhand,the regulariza-tionoftheresults.Note thatwe didnot tryto improvethesulci extractionalgorithm.

1.2. New sulci labeling approaches: MAS and CNN

Currently, the sulci labeling model proposed in the Brain-VISA/Morphologistpackage,referredastheStatisticalProbabilistic AnatomyMap(SPAM)modelinthispaper,isbasedonaBayesian approach. As this labeling model has shown signiﬁcant weak-nesses,wehavebeeninspiredbytwosegmentationapproachesfor biomedicalapplicationsthatare amongthemostwidelyused to-day,multi-atlassegmentation(MAS)andconvolutionalneural net-works(CNNs).

MAS techniques,initially introduced by Rohlﬁng et al. (2004) , useeach manually segmented image as an atlas: the atlases are adjustedtotheimage tobe segmentedandthe bestmatchesare used to participate in the segmentation. Thus, MAS techniques makeitpossibletomoreaccuratelyrepresentanatomical variabil-ityby notattempting tomodela segmentationproblemusingan averagemodel.Thesetechniquesarenowwidelyused,buthavea majordisadvantage:theregistrationoftheatlasestotheimagesis particularlyexpensive.

Among the many variations of these techniques, the patch-based approach introduced by Coupé et al. (2011) and

Rousseau et al. (2011) have particularly attracted our attention. By using a patch-basedsearch strategy to identify matches with the atlases, the image no longer needs to be aligned globally with all the atlases via expensive non-linear registration. Thus, the registrationandselection of matchingpatches can be partic-ularlyaccelerated thanksto the OptimizedPatchMatch algorithm proposed by Ta et al. (2014) . This algorithm is an adaptation to segmentation of 3D images of the PatchMatch algorithm (Barnes et al., 2009 )thataimstoassigntoeachpatchofanimage, apatchsimilartoitinanotherimage.

Inspired by these approaches, we propose two algorithms for cortical sulci recognition. The ﬁrst is directly inspired by

Romero et al. (2017) , that proposesa cerebellumlobule segmen-tationmethodusinganapproachsimilartotheoneoriginally pro-posedbyCoupé et al. (2011) ;Rousseau et al. (2011) withsome im-provements.Inthesecondalgorithm,weproposeanewpatch gen-eration strategybasedon ahighlevelrepresentationofthe sulci, asthestandardwayofextractingcubicpatchesdoesnotseem ca-pableoptimallyexploitingthesulcigeometryandtherelations be-tweenthem,whichwebelievetobethediscriminativefeaturesfor theirrecognition.Thesetwoalgorithmswillbedesignated respec-tivelybyPMAS(forPatch-basedMAS)andHPMAS(forPatch-based MASwithHighlevelrepresentationofthedata).

TheCNNs wereinitially developedtoaddressproblemsin im-age classiﬁcationand are nowrenowned fortheir formidable ef-fectiveness in dealingwith numerous computer vision problems. Thesetechniquesalloweffectiveimageanalysisbylearningan ab-stractrepresentationoftheimage.Concerningsegmentation prob-lems,theﬁrstapproachwasproposedapproximatelytenyearsago by Ciresan et al. (2012) where a neural network was trained to classify each voxel of the image to be segmented from its sur-rounding patch. Sincethen, newapproachesallow theentire im-agesegmentation usingfullyconvolutional neural networks,such asthe oneinitially proposed by Long et al. (2015) anddedicated to semantic segmentation. Concerningsegmentation problems in medicalimaging, themostcommonlyused architecture isthe U-Net,a fullyconvolutional neuralnetworkwhichwasinitially pro-posed by Ronneberger et al. (2015) andwhose adaptation to 3D imageswasproposedin(Çiçek et al., 2016; Milletari et al., 2016 ). Here,weproposetocomparetwoapproachesbasedonCNNs.The

(4)

ﬁrst isinspiredby Ciresan et al. (2012) ,adaptedtoaddress prob-lems associated with3D imaging. The second uses the3D U-Net architectureproposedin(Çiçek et al., 2016 ).Thesetwoapproaches willbecalledPCNN(forPatch-basedCNN)andUNET,respectively. Tothebestofourknowledge,despitetheir currentpopularity, no MASorCNN-basedapproach hasyetbeenproposed for corti-calsulcirecognition.Notethatthesetwoapproachesaregenerally usedtosegmenttheentireimagewhileinthisstudyonlythe pre-segmentedfoldsneedtobelabeled,requiringseveraladjustments intheproposedmodels.

1.3. Bottom-up geometric constraints

Thereisnoguaranteethatthegeometricdeﬁnitionofasulcus, asa set oftopologically simple surfaces,isrespected inthe case of MAS and CNN-basedmethods described above. Thisis partic-ularlydisadvantageousformorphometricstudieswhose measure-ments are based on the deﬁnition of sulci. To remedy this, the BrainVISA/Morphologistpipelineprovidesanalgorithmfor bottom-upaggregationofvoxelsintoelementaryfolds,whicharethe geo-metricbuildingblocksoftheproblem. Oncethevoxels havebeen labeled by one of the methods proposed above, it is possible to regularize the results at the scale ofthe elementary folds. How-ever,the upstream extractionoftheelementary foldsmay some-timesbeinaccurate.AlthoughfromthesameMRI,vastlydifferent fragmentationscanbeobtainedbecauseofstochasticoptimizations embedded inthe pipeline. Thiswas previously a problem inthe modelproposedin(Perrot et al., 2011 ),whichusesthesame geo-metricentitiesto performrecognition,butisnotcapableof auto-maticallyre-dividingtheelementaryfolds.

In thispaper,we propose tousevoxel-wise labeling togive a top-down perspective toa traditionalbottom-up pattern recogni-tion system. Thus, the initial cutting into elementary folds pro-posed by BrainVISA/Morphologist is challenged by voxel-wise la-beling, eliminating under-segmentation errors in the model. The proposed approach is particularly robust to the spatial inconsis-tencies that can occur duringvoxel labeling andto the potential incorrectdeﬁnitionofupstreamgeometricentities.

2. Database

The training base is composed of 62 healthy brains selected fromdifferentheterogeneousdatabasesandlabeledwithamodel containing 63 sulci for the right hemisphere and 64 for the left hemisphere. The “unknown” label is used to designate unidenti-ﬁedstructures(usuallysmallsulci).Thetwoventriclesarelabeled butnotconsideredassulci.Mostofthesubjectsareright-handed men,aged25to35yearsold.

Unfortunately,thereisnogoldstandarddefinitionofsulci mor-phology.Eventheboundariesofthewell-knowncentralsulcuscan be difficulttodefine (Fig. 2 ).Moreover,Fig. 2 showsthatthe def-initionofsulci morphologyimpactsthelevelofgranularityofthe nomenclature. Therefore, for this study, the elementary folds of each brainwere manually labeled according to a sulcus nomen-clature following a long iterativeprocess to achieve a consensus across a panel of several expertson cortex morphology. The last iterationofthedatabaselabelingwasperformedusingtheTileViz visualizationtool (Mancip et al., 2018 ).Thistoolallows theentire databasetobevisualizedandlabeledsimultaneouslyonawallof screens(See Fig.19insupplementary material). Untilnowit was onlypossibletolabelandsimultaneouslyevaluatealimited num-berofhemispheres,generallyfour,onastandardscreen.Thus,this toolhelpstolimitthebiasoflabelinginducedbyarestrictedview ofthedatabase.Tosupportthisnewiteration,theelementaryfolds weremanuallycutwhennecessary,whichwasnotpossibleduring thestudyofPerrot et al. (2011) .

Fig. 2. Where should the central sulcus end? The folds that may belong to the cen- tral sulcus are shown in red. Limits 1 or 2 can be chosen according to the morpho- logical definition of the central sulcus used. Note that depending on the definition chosen, the question then arises of adding a label to the nomenclature to identify the sulcus located between boundaries 1 and 2. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Note that compared to traditional labeling approaches where onlyoneexpertcanlabelimages,thisdatabasehasbeen progres-sivelylabeledbyseveralexperts,bothsuccessivelyand simultane-ously.Thisconsensus-basedlabelinghassometimesledtothe in-troductionof newsulci labelswhen it wasconsiderednecessary, making it essential to usethe video wall. However, the different expertshavethusnotproducedindependentlabelings,which pre-ventsusfromassessinghuman-levelperformanceonthisdataset.

Compared to (Perrot et al., 2011 ), the same MRI acquisitions wereused butanewiteration oflabeling wasperformed, result-ingintheintroductionoffournewsulciinthenomenclatureused. ThenewnomenclatureisdescribedintheFig. 3 .Amoredetailed description is provided in the Fig. 23 of the supplementary ma-terialssubsection.Themanually labeleddatabaseisnowavailable ontheBrainVISAwebsite(http://brainvisa.info/data/sulci _ database/ base _ 62/2019 ).

3. Method

TheMorphologist/BrainVISApipelinepresentedin(Perrot et al., 2011 )hastwomajordeﬁciencies.First,theSPAMmodelofsulci la-belingmakesobviouslabelingerrorsthatareproblematicin prac-tice.Typically,ittendstoduplicatethecentralsulcus,whichisan aberration.Then,themodelusesbottom-up geometricconstraints togroupthevoxelstobelabeledinelementaryfolds,andthisstep issubject to errors.In this article,we therefore seekto improve theperformanceofthesulci labelingmodelanditsrobustnessto sub-segmentationerrorsinelementaryfolds.

Inthissection,sulci labelingfroman MRIisdescribedinthree steps(Fig. 4 ).First,thefoldsaresegmentedfromtheMRIusingthe BrainVISA/Morphologistpipeline(3.1.).Then,theyarelabeledusing different algorithms (3.2.). Finally, the agglomeration of the vox-elsintoelementaryfoldsproposedby theBrainVISA/Morphologist pipelineisusedtoregularizetheresults(3.3.).

Notethatthe strategiesusedto setthemethod hyperparame-tersaredetailedinthesupplementarymaterial.

3.1. Folds representation

The Morphologist pipeline of the BrainVISA software (www. brainvisa.info ), a widely used resource for studying cortical

(5)

Fig. 3. New nomenclature used to label sulci. The visualization of the sulci labels is done thanks to the SPAM representation used by Perrot et al. (2011) which averages the position of the sulci as probability maps that are thresholded for this image. The new nomenclature includes 63 labels for the right hemisphere and 64 for the left hemisphere. Only the left hemisphere is represented in this ﬁgure. The right hemisphere has the same labels except the S.GSM. label. Compared to Perrot et al. (2011) , two new sulci are labeled (S.intraCing. and S.R.sup.). The ventricle label does not correspond to a sulcus label, but belongs to the fold skeleton extracted by the BrainVISA/Morphologist toolbox. Only the “unknown” label is not shown in this ﬁgure. Please refer to the Fig. 23 of the supplementary material section for English translations of each label.

Fig. 4. MRI to labeled cortical sulci: a three-step pipeline. First, the fold skeleton is extracted using the BrainVISA/Morphologist toolbox. This toolbox also makes it possible to fragment the skeleton into elementary folds. Second, skeleton voxels are labeled by different algorithms. Algorithms based on MAS techniques (PMAS, HPMAS) and CNN- based algorithms (PCNN, UNET) label each skeleton voxel while the SPAM algorithm directly labels the elementary folds. Third, voxelwise labeling is regularized through the elementary folds, while automatically re-dividing them when the labeling indicates that it is required.

anatomy,allowsfirsttorepresentthefoldsasa setofvoxels cor-respondingtoaskeleton ofthecerebrospinalfluid fillingthefold andthen tolabel themusing the SPAM model(Fig. 5 ). This first stepoffoldsegmentation iscommonto allthemodels presented inthisarticle.Itconsistsofthreemajorsteps:first,the segmenta-tionofwhiteandgreymatterfromMRI,thentheextractionofthe skeletonofcorticalfolds,followedbyits divisioninto elementary folds.Thefragmentationintoelementaryfoldssatisfiestopological andgeometricconstraintsspecifictothesulci’sdefinition.Itisfirst basedonthetopologicalcharacterizationofasimplesurface pro-posedbyMalandain et al. (1993) whichisolatessurfacepiecesthat donotincludeanyjunction.Theskeletonisalsofragmentedatthe leveloftheburiedgyri(Fig. 6 ).

The skeleton representation has three main advantages. First, this3Drepresentationisessentialduringmanuallabelingbecause itallows thevisualization oftherelative positionofthesulci be-tweeneach other andtheevaluationoftheir depth,size,etc. Ad-ditionally, the agglomeration of the voxels into elementary folds makes it possibleto speed up labeling by givinga labelto a set of voxels rather than individually. Second, as the data are par-ticularly inﬂuenced by the type of MRI sequence, the ageof the subjects (which has a signiﬁcant impact on the opening of the sulci)or even their pathologies,thispre-processing enables opti-malnormalizationofthedata.Moreover, thealgorithm canfocus onlabelingonlyafteritssegmentation.Finally,thisrepresentation haspreviouslybeenusedinotherpipelines,makingitpossibleto

(6)

Fig. 5. A computer vision pipeline mimicking a human anatomist ( Mangin et al., 2015 ). A: interface between the cerebral envelope and the cortex. B: interface between white matter and grey matter. C: extraction of the fold skeleton. D: cutting of the skeleton into elementary folds. E: Folds labeling using the SPAM model of

Perrot et al. (2011)

Fig. 6. Schematic representation of the fold skeleton. The fragmentation into ele- mentary folds isolates the internal and external branches and cuts the skeleton at the level of the buried gyri. Image taken from Riviere et al. (2002) .

automatethecalculationofmeasurements(depth,length, connec-tivity, etc.)usedinmorphometricstudies ortorealign thebrains according tothe major sulci (Auzias et al., 2011; 2013 ), whichis whywehavechosentokeepit.However,ifwehadchosento con-struct a model to recognize thesulci, that carriesout both their extraction and labeling without relying on this representation,it washighlyprobablethattheresultsobtainedwouldnot conform to therepresentation usedby thesepipelines andthat some sig-niﬁcantpostprocessingstepswouldbenecessary.

Althoughtheextractionofthefold skeletonisrobust,its frag-mentation into elementary folds demonstrates certain signiﬁcant instabilities, such as vastly different fragmentations can be ob-servedfromthesameMRI(Fig. 7 ).Severalstochasticoptimizations were includedinthe segmentationpipeline(e.g. forbias correc-tion,brainmasking,skeletonization,etc.).Theseoptimizationsonly have a slightimpact on the shape ofthe resulting fold skeleton. However, forthetopological fragmentationintoelementary folds, asinglevoxelcanthenmakethedifference.Thus,thesestochastic optimizationscanhaveimportantconsequencesonthe fragmenta-tionoflargesimplesurfaces.Toremedythis,duringmanual label-ing,thefoldswerecutmanuallywhennecessary.Duringautomatic labeling,weproposeatechnique,basedonaclusteringalgorithm, toautomaticallyredividetheelementaryfoldsfromavoxelwise

la-Fig. 7. Extraction of the elementary folds from the same MRI. In the two lower brains, each color represents a different elementary fold. We observed that the skeleton extraction is visually stable, but its division into elementary folds can produce very different results. (For interpretation of the references to color in this ﬁg- ure legend, the reader is referred to the web version of this article.)

belingduringtheregularizationstep.Thistechniqueisdescribedin

Section 3.3 .

3.2. Labeling methods

The methods described belowseek to automatically label the voxelsofthefoldskeleton.Amongthepossiblelabels,whilemost correspondtocorticalsulci,threeotherlabelsareused:those cor-respondingtotherightandleft ventriclesandthe“unknown” la-bel.According to themethods presentedhere, the ventriclesare treatedassulci,asthey arerelativelystableanatomicalstructures ofthebrainnegativemold. However, the“unknown” label, corre-spondingtovoxelsthatdonotbelong toanyoftheother labeled structures,mustbetreateddifferentlyinsomecases.

3.2.1. Statistical probabilistic anatomy map (SPAM) models

Inthiscomparativestudy,thereferencemethodcorrespondsto the one described in (Perrot et al., 2011 ), where they propose a coherentBayesianframeworktoautomaticallyidentifysulci based on a probabilistic atlas (a mixture of SPAM models) estimating simultaneously normalization parameters. This method, currently availableintheBrainVISA/Morphologist pipeline,hasbeenwidely usedonverylargedatabasesforlarge-scalemorphometricstudies (Le Guen et al., 2019 ).However,themodelisstillmakingobvious errorsandwebelievethatthisisduetothefactthattheSPAM ap-proachisbasedonasingletemplateatlas,whichpreventsitfrom fullyrepresentingthehighvariabilityoffoldingpatterns.Each sul-cus canhave severalconﬁgurations, whichmayprove diﬃcult to representwithasingleaveragemodel.

3.2.2. MAS approaches

TwoMASapproaches, PMASandHPMAS,arecomparedinthis section.Theﬁrstapproachislargelyinspiredby theoneproposed in (Romero et al., 2017 ) in which, unlike most MAS approaches, similaratlasesaresearchedbetweentwocubicpatches,insteadof two full images. The second MAS algorithm presented here, and describedinBorne et al. (2018) , aims todeﬁne a library oflocal patches embedding enough geometrical information to minimize

(7)

Fig. 8. Comparison of MAS approaches: PMAS vs. HPMAS. First of all, the patches are designed. Second, they are transferred to a new image to be labeled, where the fold skeleton has been extracted. Third, the best matches were selected and patch labels were propagated on the image to be labeled. Finally, the propagated labels are used to calculate the label score maps. In order to make the ﬁgures as readable as possible, we have chosen to represent the images in 2D while they are processed in 3D. All images are represented in 2 ∗₂∗_{2 mm resolution, while for HPMAS, images are processed with the acquisition resolution. The acronym ANNs refers to the Approximate}

Nearest Neighbors patches obtained by the multiple run of the Optimized PatchMatch (OPM) algorithm.

ambiguities when searching for a high similarity hit in the un-knownsubjectmorphology.Therefore,insteadoftakingnative cu-bicpatches,thisalgorithmbuildsvirtualpatchescontainingwhole sulci.

Thesetwoapproachesaredescribedinfoursteps:ﬁrst,the de-signofthepatches (patchgeneration), second,thestrategy of re-aligningthepatchesbetweenthemandselectingthebestmatches (distancecalculation),third,thestrategyofpropagatingthelabels fromthepatchtothe brainto belabeled (labelpropagation)and ﬁnallythecombinationofthelabelsofthepropagatedpatches (la-belfusion)(Fig. 8 ).

Patch-based MAS approach (PMAS)

Patch generation. The patches are cubes containing the fold skeleton. They are extracted from images with a resolution of 2∗2∗2 mm, that has been automatically relocated thanks to the BrainVISA/Morphologist pipeline in the well-known MNI space (Collins et al., 1994 ), whichaligns theroughshapesofthe brains throughan aﬃnetransformation.Wechosetoharmonizethe res-olutionoftheimagesat2∗2∗2mm,becauseitseemedsuﬃcientto ustovisuallyrecognizethesulci.

We choseto takeinto accountonlythe patcheswiththe cen-tral voxel belonging to the fold skeleton for two main reasons. First,itlimitsthenumberofpatchmatchesthatrequire optimiza-tionasthevoxelsbelongingtotheskeletonrepresentonlyasmall partoftheimage’svoxels.Second,sincethepatchesareextracted frombinarizedimages,thecalculationofthedistancebetweentwo patchescanbe successfulonlyifthepatches containa minimum numberofskeletonvoxels.

As proposed in(Giraud et al., 2016 ),we adopteda multi-scale approach, which involves the independent use of several patch sizes (determined by inner cross validation), to produce several scoremapsperlabel,whicharethenaveraged.

Distance calculation. In order to ﬁnd the most similar set of patches, we aimed to optimizethe following distance d between two patches P (S A) and P (S B), respectively belonging to the fold

skeletons S A and S B(superimposedbyasimpletranslation):

|

P

(

SA

)

|

pA∈P(SA) min pB∈SB [d2 E

(

pA,pB

)

] (2)

Notethat,inordertoavoidbordereffects,theclosestneighbor of p Aissearchedintheentireskeleton S B andnotonlyamongthe

skeletonvoxelscontainedinthepatch P (S B).

Realigning andcomparing all the patches inthe databasefor each skeleton voxelto be labeled wouldbe extremely expensive, making itimpossibletolabelwithin areasonabletime. Addition-ally, it would increase the probability of spurious matching be-tween remote areas in the brain while the images are already roughly alignedwith each other.It isimportant to note that be-causeweusebinarizedimages,theriskofobtainingfalsepositives ishigherthanusual.

In(Romero et al., 2017 ),the OptimizedPatch MatchLabel fu-sion(OPAL)(Ta et al., 2014; Giraud et al., 2016 )wasused.This seg-mentation method isbased on the OptimizedPatchMatch (OPM) algorithmwhichusesacooperativeandrandomstrategyresulting inaverylowcomputationalburden.ComparedtothePatchMatch algorithm (Barnes et al., 2009 ) from whichit isinspired, OPM is adaptedto3Danatomicalsegmentationbytakingintoaccountthe

(8)

Fig. 9. Calculation of the distance from the patch P ( S A ) to the skeleton S B for the

PMAS method. The grey voxel represents the central voxel of the patch P ( S A ) which

is superposed with a voxel of the skeleton S B . For each voxel p A ∈ P ( S A ), we look

for its closest neighbor among the voxels of the skeleton S B . The Euclidean distance

between these two voxels is calculated. The distances over all the points p A ∈ P ( S A )

and their nearest neighbors are then averaged to obtain d ( P ( S A ) → S B ).

rough alignment of images. Here, as only patches with the central voxel belonging to the fold skeleton are considered, an adapted version of the OPM algorithm has been implemented. Pleaserefertosupplementarymaterialformoredetails.

Label propagation. InordertoselectseveralApproximate Near-est Neighbors (ANNs)patch per skeleton voxel fora given patch size, multiple independent OPM were launched. The number of ANNstobeselectedisdeterminedbyinnercross-validation.Once theANNshavebeenselected,allthevoxelsofeachANNpatch par-ticipatesinthelabeling,asdonein(Rousseau et al., 2011; Giraud et al., 2016 ).However,thereareonlyafewvoxelsbelongingtothe skeletonof thepatchthat overlapwiththe skeletonvoxels tobe labeled.Thus,we proposetopropagatethelabelofeachskeleton voxelofthepatchtoitsnearestneighborintheskeletontobe la-beled.

Label fusion. Forthismethod,we haveimplementedthe non-localpatch-basedlabelfusionusedin(Romero et al., 2017 ).Inthis strategy,thedistancebetweenpatchesisusedtoperformarobust weighted average of the labels. The label fusion strategy corre-spondstothemultipointestimationdescribedin(Rousseau et al., 2011 ).Oncethenon-localmeansestimatorhasbeencalculatedfor allpatchsizes, theﬁnal estimationisobtainedbyaveragingthese estimations thanks to a late fusion (Snoek et al., 2005 ). Thus, a scoremapisestimatedforeachlabelinthedatabase.

Concerning the “unknown” label, present in the manually la-beleddatabase,itistreatedlikeasulcuslabel.

Patch-based MAS approach with High level representation of the data (HPMAS)

Fig. 10. 3D representation of the HPMAS method. As for the Fig. 8 which represents the method in 2D, the approach is described in four steps: generating the virtual patches, registering them on the image to be labeled, propagating the labels of the selected virtual patches and ﬁnally merging the propagated labels to obtain the ﬁnal labeling.

(9)

As thestandard wayof extractingpatches does not seem ca-pableof exploitingthesulci geometryand therelations between them,which we believe to be the distinguishing features neces-saryfor recognition,we have proposed a newvirtual patch gen-erationstrategy based ona highlevel representationofthe sulci (Borne et al., 2018 ). This framework is well adapted to leverage moreinformationaboutthedifferentfoldingconﬁgurationsinthe trainingdataset.

Note that this method is the only one of the proposed new methodstohavebeenspeciﬁcallydevelopedfortherecognitionof corticalsulci.Itincludesmanyarrangementsspeciﬁctothis appli-cation.Its complexdesigngivesan ideaofthescoresthat canbe obtainedby pushing asfar aspossible in thisdirection. To facil-itatetheunderstanding ofthisad-hoc method,Fig. 10 represents thepipelinein3D,whichcomplementsthe2Drepresentation pro-videdinFig. 8 .

Patch generation. In order to take into account as much ge-ometric information as possible, the idea was to define virtual patches containingwhole sulci. These virtual patches correspond toa voxel cloud representinga pairof sulci,extracted fromMNI space at the image resolution. By defining patches as clouds of voxelsandnotascubes, itallowsto takeintoaccount thesulcus initsentiretywithoutparasitizingthepatchwithallits surround-ingsulci.Notethattheshapeofsmallsulciisnotspecificenough to prevent spurious hits. That is why we have chosen to aggre-gatetwosulcitocreatediscriminativelocalshapes.Inthe follow-ing,wedefinea typeofvirtualpatchesforeachpairofsulcithat areneighborsinthebrain.

In practice, a pair of sulci is selected in the circumstance that the two sulci are neighbors in at least one brain of the atlas dataset, according to the topology provided by the Brain-VISA/Morphologistpipeline that produces the folds.Thispipeline endowsthe listof foldswith a graphstructure corresponding to eitherdirectconnectionsortothefactthattwofoldsareseparated byapieceofgyrus.Finally,each typeismadeupoftheinstances ofthepairofsulciintheatlasdataset,mostofthetimeasmany shapesasatlases(someatlasesmissafewsmallsulci)(Fig. 10 .1).

Notethatonlytheunknownsulcuslabelisnotselectedtoform virtualpatches,asitdoesnotconstitute acoherentstructurelike theotherlabels.Thus,unlike thepreviousPMASmethod,the un-knownlabelisnottreatedlikeothersulcuslabels.

Distance calculation. Forthe distancecalculation step, theset offoldsofthebraintosegmentandthevirtualpatches ofthe li-braryarerepresentedbypointclouds.Inordertoﬁndanoptimal alignmentofeachvirtualpatchintotheskeletonpointcloudofthe braintosegment,thewell-knowniterativeclosestpointsalgorithm (Besl and McKay, 1992 )isused,withtherobustimplementationof

Holz et al. (2015) .Thisalgorithmiterativelyadjuststhe transforma-tions(translation androtation) inordertominimize the distance betweentwo setofpoints. Note that comparedto the PMAS ap-proachwhich only uses translations to superimposepatches, the registrationhereallowsrotations.

To build the measure used to rank the matches, the nearest voxelsinthenewfoldskeleton S Bofeachskeletonvoxel p A∈ P (S A)

aresavedasactivatedvoxels p ∗_B∈S ∗_B_,P(S

] (3)

Note that by dividing by

|

S ∗_B_,P(_S

A)

|

, we take into account the

numberofdifferentactivatedpoints. Thisallows thepenalization ofvirtual patcheswhereseveralpointsactivatethesamepoint of theskeletontobelabeled(Fig. 11 ).

Fig. 11. Calculation of the distance from the virtual patch P ( S A ) to the skeleton S B

for the HPMAS method. For the sake of clarity, the skeletons S A and S B represented

do not overlap in this Figure. For each voxel p A ∈ P ( S A ), we look for its closest neigh-

bor among the voxels of the skeleton S B . The Euclidean distance between these two

voxels is calculated. The distances over all the points p A ∈ P ( S A ) and their nearest

neighbor are then summed and divided by the number of different activated points

p∗B to obtain d ( P ( S A ) → S B ). The two conﬁgurations represented are penalized by the

division by the number of different activated points rather than by the number of points in P ( S A ) as for a classical average. On the ﬁrst conﬁguration, we observe that

the proposed distance penalizes the virtual patch more if its shape is more complex or if its size is larger than the structure on which it has been registered. On the second conﬁguration, we observe a greater penalization of the virtual patch if it has only one connected component and if it is registered on two different components.

With regards to each type of virtual patch, all matches are ranked according to the distance proposed above. A ﬁxed num-ber of matches (determinedby inner cross-validation) leading to theshortestdistancesisselectedtopropagatethetwoparentsulci. Alltypesofvirtualpatchesareselectedthesamenumberoftimes eveniftheyarenotallequallyinformative.Itisimportanttonote thatsomesulcusinstancesareselectedseveraltimes,becausethey winthecompetitionforseveralvirtualpatchtypes,buttheir mul-tiplecontributionswillbe associatedwithslightlydifferent align-ments. Hence, sulcus instances maximizing regional similarity to theunknownsubjectgetmoreweight.

Label Propagation. Eachselectedvirtualpatchaftertheoptimal alignment tothe unknown subject, concomitantly propagatesthe labelofeach voxel toits nearest neighbor inthe targetbrain. To considerthe virtual patchstructure,each connectedsetof voxels inthe virtualpatchshould correspondto aunique connectedset inthe targetbrain:the smallestnon-connectedsets areexcluded (Fig. 10 .3).

Label Fusion. Post complete propagation of all the proposed virtual patches p ∈ V l that containthe sulcus l ,the scoremap S l

iscalculatedbyaveragingthenumberoftimesthepointsof coor-dinates(x, y, z )areactivatedbydifferentvirtualpatches:

Sl

(

x,y,z

)

=

p∈Vlactp

(

x,y,z

)

|

Vl

|

(4)

with act p(x, y, z ) equalsto1ifthevoxelofcoordinates(x, y, z )is

activatedbythepatch p ,andto0otherwise.

Comparedto PMAS,wherepatches are weightedby their dis-tanceto thepatchto be labeled,hereeach propagated pointhas

(10)

Fig. 12. Comparison of CNN-based approaches: PCNN vs. UNET. Boxes represent feature maps. The number of channels is denoted next to each feature map. The size of the feature map is indicated after the @ when appropriate. N is the number of different labels to be predicted. For clarity sake, input and output are represented in 2D rather than 3D.

the sameweight in thelabel fusion.In orderto perform a simi-larweighting, wehavetestedtheuseof distancefromthe entire virtual patch tothe skeleton to be labeled. Thisdid not seemto signiﬁcantly improve the results. We also tried to weight by the distancefromthevirtualpatchpointtothepointithasactivated, withoutanyfurtherimprovements.Wethinkitisessentialto com-binethesetwodistanceswhenweighting, forexampleby averag-ing thetwo distances.However, ourattemptshavealso been un-successfulsofar,sowechosetoavoidweighting.

As the “unknown” labeldoes not belong toany virtual patch, itsscoremapisempty.Thislabelwillbeselectedonlyifthescore mapsofallotherlabelsarealsoemptyforagivenelementaryfold.

3.2.3. CNNs based approaches

As thisis thefirst time that CNNs are usedforsulci labeling, we take inspiration from two models that have proven their ef-ficacy in medicalimage segmentation (Fig. 12 ):the first being a patch-based approach inspired by (Ciresan et al., 2012 ) and the second anapproach thattreats theentireimage witha3DU-Net asin(Çiçek et al., 2016 ).Firstthecommonmodalitiesusedduring trainingof thesetwo networks are detailedfollowed by an indi-vidualdescriptionofeachnetwork.Themodelspresentedare im-plementedusingthePytorchlibrary(Paszke et al., 2017 ).

Data. All the fold skeletons are registered in MNI space and usedasinput:theycorrespondto3Dbinaryvolumeswitha com-mon resolution of 2∗2∗2mm, where the voxels belonging to the skeletonareoneandtheothersarezero.Inordertoaugmentthe trainingdataset, arotation ina random directionwitha random angle (following a Gaussian distribution N

(

0,16π

2

₎

) is applied to theimagesateachepoch.

Atthe outputoftheneural network,ascoreper labelpresent inthe databaseisobtainedper voxel.Concerningthe “unknown” label,itistreatedlikeasulcuslabel.

Training design. Initializationoftheweightsoftheneural net-workswasdone asin(LeCun et al., 2012 ).Stochasticgradient

de-scentwasusedfortraining,withlearningrateandmomentum de-terminedby 3-foldsinner cross validation.The learningratewas halvedwhentheloss functionhadnot improvedfortwo consec-utiveepochs.Afterfourconsecutiveepochswithoutimprovement, trainingwasstopped. Theselected trained neuralnetwork corre-spondsto theepoch obtainingthe lower errorrate E SI,described

in(Perrot et al., 2011 )andinthefollowingsection.

Thelossfunctionusedisthecross-entropyloss.Inmostcases, forunbalanced problems, the loss function must be weighted to avoidfavoringthelabelsmostinvolvedinbackpropagation,dueto theirhigherpresenceinthedatabase.Althoughtheaveragesizeof eachsulcusisextremelyunbalanced,wehavechosennottoweigh thislossfunctionbecauselargesulciarealsothemostinteresting fromaneuroanatomicalpoint ofview andneed tobe better rec-ognizedthansmallones.

Patch-based model with a 3D CNN (PCNN) PCNNmethodadapts theapproachproposed in(Ciresan et al., 2012 ), addressinga seg-mentation problem asa classiﬁcation of each voxel based on its environmentcontainedina patch.Here, onlyvoxels belongingto theskeletonareselectedtoparticipateintheclassiﬁcation.

Wedesignedthe architectureof theneural networkso that it takescubicpatchesof6.2cmside ininput,whichweconsidered tobelargeenoughtoidentifyitscentralvoxel.Duringtraining,the dropoutstrategy (Srivastava et al., 2014 ) withaprobability of0.5 is usedon fully connected layers. Batch normalization(Ioffe and Szegedy, 2015 )wasalsousedonconvolutionalandfullyconnected layers. The batch size has been set at 100 to minimize learning time andﬁt inmemory. Inorder to ensurethat the inner cross-validationisnottootime-consuming,onlythreeepochsare calcu-latedforeachhyperparametervaluetested.

3D U-Net based model (UNET) For the UNETmethod, the net-workarchitectureusedistheonepresentedin(Çiçek et al., 2016 ), withthePytorchimplementationof(Wolny and enﬁsan, 2019 ).The particularityofthisapplicationofU-Netliesinthefactthatallthe voxelsthatdonotbelongtothefold skeleton,i.e.alargemajority

(11)

ofthe voxels in the image, do not need to be classiﬁed. Indeed, as the values predicted by U-Net are masked by the segmenta-tionofthefoldskeletonmadeupstream,thebackgroundvoxelsdo notneedtobepredictedandthereforedonotneedtobelearned. Thus,duringtraining,allvoxelsthatdonotbelongtosulciarenot usedforgradientbackpropagation.Thebatch sizehasbeensetat 1inordertoﬁtinmemory.

3.3. Bottom-up geometric constraints

In orderto standardize theresults,the voxels were agglomer-ated into elementary folds. However, these folds are not always suﬃcientlyfragmented,soweproposetousethelabelscoremaps toreconsidertheirfragmentation.

ck− c

)

T (7)

with N be the numberof voxels in theelementary fold E, C k be

thesetofvoxelsincluster k, c kbethecenterofcluster k, c bethe

centerof E, n kbethenumberofpointsincluster k .

The ratiowas higherwhen clusters are dense andwell sepa-rated.Ifthisscorewashigherthanathresholddetermined by in-nercrossvalidation,thepartitioningwasperformed.Whenan el-ementaryfoldwassplitintwo,eachofthetwoclustersobtained werealsochallengedwiththesamemanipulation,untilallthe el-ementaryfoldshadaCalinski-Harabaszindexbelowthethreshold.

3.4. Performance evaluation of labeling models

As inPerrot et al. (2011) ,twomeasureswere usedtocompare thedifferentmodelsproposed above: E localatthesulcusscaleand E SIatthesubjectscale.Errorrateswereassessedby10-foldscross

validation.Onemodelwastrainedperhemisphere.

3.4.1. Mean/max error rates

Totakeintoaccountthevariabilityofthefragmentationinto el-ementaryfoldsandthereforetherobustnessofthelabeling meth-odstothisvariability,eachimagewasre-segmentedtentimes(See Fig.20insupplementarymaterial).Thus,iftheimagebelongedto thetraining set,only the segmentation used formanual labeling wasconsidered. However, if the image belonged to the test set, tenother segmentations(whose truelabelshavebeentransferred frommanualsegmentation)werelabeledandusedtoquantifythe

errorrates.Notethatmanualsegmentationwasnotusedto calcu-late errorrates.Using tendifferentsegmentationsforeach sulcus highlights the weaknesses ofthe BrainVISA/Morphologist prepro-cessingsince we can compute errorsfromthe worstresult, typi-callyassociatedtoanissueofunder-segmentation.

Toquantifyerrors,foreach newsegmentation,themanual la-belingontheinitialsegmentationmustbe transferredtothenew one.Because ofthevariabilityofthesegmentationsobtainedand the sparsityofthefold skeleton, thesimple superpositionof im-ages was insuﬃcient. We have given to skeleton voxels that do not overlap with those of the initial segmentation, the label of thenearest skeleton voxelofthe initialsegmentation. Todo this, aVoronoidiagramofthemanuallabelingisperformed.Notethat the elementary folds were not used to transfer the labeling and thatthetruelabelingwasonthevoxelscale.

Foreachsubject,fromthetensegmentations,theaverageofthe errors(E mean

SI and E localmean)andthemaximumerror(E max

SI and E localmax)

werecalculated.Notethatthetrainingsegmentationusedfor man-uallabelingwasnotusedintheerrorcalculationbecauseitwould bias ourevaluation. By considering themaximum errorrates, la-belingerrors dueto modelvariabilitywerehighlighted. These er-rorsinmostmodelswererelatedtoan incorrectfragmentationof the fold skeleton into elementary folds. Only the PMAS labeling modelwasnotdeterministicandincludesstochasticoptimizations thatcanpenalizethecalculationofmaximumerrorrates.

3.4.2. Error at the sulcus scale: Elocal

Givenasulcus l ,

Elocal

(

l

)

=

FPl+FNl

FPl+FNl+TPl

(8)

with TP l, FP l and FN l,respectivelythenumberoftruepositive,false

positiveandfalsenegativevoxelsforthesulcus l .

It isimportant tonote that the errorrate wasone,when the sulcus wasabsent and labeled by themodel. Similarly, forwhen thesulcuswaspresentbutnotlabeledbythemodel.Assmallsulci arefrequentlyabsent,thisexplainedwhyerrorratescanbehighly variablewhenaveragingtheerrorratespersubject.

3.4.3. Error at the subject scale: ESI

Givenasetofsulci L ,

ESI= l∈L wl∗ FPl+FNl FPl+FNl+2∗ TPl (9)

with w _l=s _l/ s _l and s _l= FN _l+T P _l, thesulcus l truesize. The erroratthe subjectscale allows localerrors to be gener-atedinasinglemeasurement.AsexplainedinPerrot et al. (2011) , eachcomponentofthesumoverlabelsdiffersontwopoints com-pared to E local(l ). First, true positive measures are counted twice

ascompared tothefalsepositiveandnegativemeasures,inorder to removeerrors sharedby severallabels,since each extrasulcal piecefora givenlabelisamissingpartforanotherlabel. Second, eachcomponentwasweightedaccordingtothesulcustruesizeso thateachlocalcomponentcountasmuchasitssize.

ComparedtoPerrot et al. (2011) ,threelabelswerenotincluded in the set of sulci (unknown and both ventricles). These labels were not particularly considered assulcus labels,but correspond to other structures, not pertinent to our study. Thus, the scores presentedherefortheSPAM methodareworsethanpresentedin

Perrot et al. (2011) for four reasons.First, because removing the two labelsconsiderablyimproved thescores. Second,because we cut theelementary foldsduringmanual labeling whilethe SPAM modelcannotautomaticallycorrectthiskindofsub-segmentation errors.Third,becauseweareinterestedinthemean/maxofthe er-rorrates.Finally,becausetheerrorratesareestimatedby10-folds

(12)

Fig. 13. Comparison of E SI error rates by model. Once the 10 segmentations have been labeled by hemisphere, we consider the average error in the upper chart and the

maximum error in the lower chart. The box extends from the lower to upper quartile error values, with a line at the median. The whiskers extend from the box to show the minimum and maximum limits of the error rates. The SPAM model is represented in red, the PMAS model in blue, the HPMAS model in green, the PCNN model in yellow and the UNET model in purple. For the four new models, three modalities are represented: first, labeling at the voxel scale, then labeling after regularization at the elementary fold scale ( + reg.), and finally the labeling obtained after automatic re-division of the elementary folds ( + reg. + cut.). The models are compared by Wilcoxon signed-rank test. The p -values of the differences in model performances are written above and below the compared models. The p -value is written in black if it is less than 0.05 and in red otherwise. Regularization by elementary folds significantly improves results. Automatic fold re-division also significantly improves results. All regularized models are significantly better than the SPAM model. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

cross-validation andnot by leave-one-out cross-validation. More-over, theadditionofthefournewsulci labelsandourreﬁned la-belingofthetrainingdatasetmayalsohaveimpactedtheresults.

3.4.4. Error rate comparison

During the 10-folds cross validation, each fold contained ap-proximately 6 hemispheres labeled to test the model’s perfor-mance. Error rates are calculated by hemisphere and then aver-aged overthe entiredatabasetoobtain themean errorratesper model. When not speciﬁed, the average error rate includes the right and left hemispheres. In order to compare the models in pairs,aWilcoxonsigned-ranktestwasperformedbetweenthe

er-rorratelistsforeachhemisphere.Ifthe p -valuewaslessthan0.05, theerrorrateswereconsideredsigniﬁcantlydifferent.

4. Results

4.1. Which is the best model?

Inordertocomparetheﬁvemodelspresentedabove,wewere interestedinthe E mean

SI and E maxSI foreachmodel,trainedseparately

on each hemisphere (Fig. 13 ). Please refer to the supplementary materials for the numerical values of the error rates per hemi-sphere(Table1).

(13)

First, we observed that all of the new approaches proposed withregularizationper elementaryfolds weresignificantly better thantheSPAMapproach(alsobasedonthisregularization),which suggeststhat a modelbasedonan average templatewasnot the mostappropriatetorepresentthehighvariabilityofcorticalfolds. Second,withregardstothefourproposedmethods, regulariza-tionbyelementaryfoldsofthelabelscoremapssignificantly im-proved theresults compared to voxellabeling. Mostimportantly, the automatic re-division of these elementary folds also signifi-cantlyimprovedthefourmethods. Thus,theuseoftop-down re-finementofbottom-upregularizationisparticularlyrelevantinthis paper.

Third,bycomparingthenewmodelsinpairs,themodelsseem todemonstrateequivalentperformance.

Concerning the PCNN and UNET models, this paper conse-quentlydemonstratedtheincredibleeﬃciencyofneuralnetworks, evenfortherecognitionofstructures asvariableascorticalfolds. However,itissurprisingthattheUNETmodelwasnotbetterthan thePCNNmodelduetoitsdeeperarchitecture.

The fact that thesefour models donot stand out radicallyon thisdatasetsuggeststhatthesemodelsmayhavereachedthelimit ofwhatcanbeinterpretedfromthisdatabase,probablyduetoits insuﬃcientsize to representthehigh variabilityofcortical folds. Therefore, the fold variability is such that manual labeling of a brainraisesmanyquestionsanditmaybepossiblethatthemodels havereachedthe human-levelperformances.Unfortunately, since manual labeling isbased on consensus among severalexperts,it is impossible for us to assess human-level performance on this database.

Finally,withregardstothecomputationtimerequiredtolabel ahemisphere,theSPAMmodeltakesabout5min,whiletheUNET modeltakesabout20s,PCNNtakesslightlymorethan aminute, PMASand HPMAStake several hours. Althoughthe PMAS model couldbemuchfasterbyoptimizingthecodesasin(Giraud et al., 2016 ),the UNETmodeliscurrentlyby farthefastest. Thus,since theUNETmodelhasthelowest errorratesandisthefastest, we proposetostudyinmoredetailthedifferencesbetweenthismodel andtheSPAM model in thefollowing section. Inthe restof this study,theUNETmodelwillthereforerefertothemodelwith regu-larizationusingelementaryfoldsandautomaticredivisionofthese, ifnecessary.

4.2. Which sulci are better recognized?

Concerning E mean local and E

max

local,theSPAM modelhasaverage/max

error rates from 5% to 77% while the error rates of the UNET model vary between 2% and 68%. Comparing the E max

local of each

sulcus (Fig. 14 ), we can see that the difference between the er-rorrates ofboth modelfora givensulcus reachesup to25%. Fi-nally,almostall sulciwere betterrecognizedby theUNETmodel, onlyabout twenty sulci are less well recognized. Their compari-son with the Wilcoxon signed-rank test, by controlling the false discoveryratewiththehelpoftheBenjamini-Hochbergprocedure (Benjamini and Hochberg, 1995 ), showedthat around13%ofsulci were significantly better recognized by UNET than SPAM, while noneweresignificantlylesswellrecognized.Inthefigure,wecan alsoseethat thesulci withthehighestlabelingerrorrates using theUNETmodelarealsothesmallest.Thisisprobablyduetothe factthatsmallsulciaregenerallyalsothemostvariableandwere alreadylesswellrecognizedbytheSPAMmodel.Pleaserefertothe supplementarymaterialforexact valuesofsulcus errorrates (Ta-bles2and3).Inordertovisualizethelocation ofthesulcibetter recognizedthanbefore,Figs. 15 and16 givegraphicalcomparisons ofsulcuserrorratesbetweenSPAMandUNETlabeling.InFig. 16 ,it canbeseenthatthedifferencesinperformancebetweentheSPAM andUNETmodelsarenotspatiallyuniform.Thismaybeduetothe

factthat someregions havemorevariablefold patternsthan oth-ersandtherecognitionoftheir sulciwasmoreseverelypenalized by theuseofamono-template approach.Wealsonotedthat the sulcibestrecognizedbytheUNETmodelarealsothosethatwere mostimpactedbysub-segmentationerrorsinelementaryfolds.

In thenext section, we focus onthe impact ofthe signiﬁcant improvementincentralsulcusrecognition,inwhichtheE max

localvalue

hasgonefromabout8%usingtheSPAMmodeltoonly3%withthe bestUNETmodel.

4.3. Experiment on an external database demonstrating the clinical advantage

Here, the SPAM model and the UNET model were trained on theentiremanuallylabeleddatabase. Thehyperparametersofthe UNET model were estimated over the entiredatabase, using the sameproceduresasduringinner cross-validation,i.e.by perform-ing a3-foldscross-validationto selectthehyperparametervalues thatminimizeerrorrates.ThedatabaseusedbySun et al. (2012) to studytheeffectofhandednessontheshapeofthecentral sulcus waslabeledmanuallyandautomaticallybythesetwomodels.This databasecontains23consistentageandsexmatchednatural dex-trals (mean age34, range22-59 years; 17males, 6 females)and 18 similarnatural sinistrals (meanage 36,range 25-56years; 12 males,6females).ThedatabaseusedinSun et al. (2012) also con-tainsagroupof34forceddextralsthatisnotstudiedhere.

We propose to investigatethe asymmetry index I of the cen-tralsulcuslengthalongthebrainhullbetweentheleft l S.C._le f tand

righthemispheres l _S.C._{_}_right:

I=lS.C._le f t− lS.C._right

lS.C._le f t+lS.C._right

(10)

Notethatinthenomenclatureproposedinthispaper,twosulci labelsbelongtothecentralsulcus:“S.C.” and“S.C._sylvian.”. There-fore,thelengthsofthesetwo“sub-sulci” areaddedtogetherto ob-tain l S.C..

Withmanual labeling,thereisasignificantdifferencebetween left-handed and right-handed people (Fig. 17 ). Therefore, left-handedpeoplehaveonaveragealongercentralsulcusintheright hemispherethan intheleft,andviceversaforright-handed peo-ple.However,when focusingontheasymmetry indexwithSPAM labeling, nosignificant difference was found,whereas this differ-encewassignificantwithUNETlabeling.

Consideringtheworstlabelingerrors(SeeFig.21in supplemen-tarymaterial)ofeachmodel,weobservethattheSPAMmodelcan doublethesizeofthecentralsulcus,bylabelingcompletely unre-latedlarge structures. However,the UNETmodelonly addssmall fragments.

5. Discussion

5.1. PMAS

Considering the hyperparameters selected during the inner crossvalidation (SeeFig.22 insupplementary material),it seems that this method would beneﬁt from increasing the number of ANNsselectedby voxel.Indeed,thenumberofANNsis automat-ically set to 10,which is the upperlimit of the values proposed intheinner cross-validation.However, testingalargernumberof ANNswouldrequireoptimizationofthecodescurrentlyinuseand it is very likely that the model wouldnot gain much in perfor-mance.Indeed,theevolutionofthescores accordingtothe num-berofANNssuggeststhataplateauisreachedandthatincreasing thishyperparameterwouldhavelittleinﬂuenceontherankingof themethodsobtained.

(14)

Fig. 14. E max

local per sulcus. The graph on the left and the graph on the right present E maxlocal for the sulci on the left hemisphere and on the right hemisphere, respectively. The

SPAM model is represented in blue and the UNET ( + reg. + cut.) is represented in pink. The signiﬁcant differences ( pvalue < 0.05) are marked with a star. The star is black when the difference is still signiﬁcant after controlling the false discovery rate through the Benjamini-Hochberg procedure ( Benjamini and Hochberg, 1995 ). Sulci are sorted from top to bottom, from the smallest to the largest. The average sulci sizes, ranging from about 15 mm 3 to more than 20 0 0 mm 3 on average per subject, are represented

on the black graph.

5.2. HPMAS

With regard to HPMAS, the choice to usesulci pairs to form patches was questionable, since there was no evidence suggest-ingthattwosulciaresuﬃcienttopreventspurioushits,especially whentwosmallsulciareassociated.Inordertocreate distinguish-able local shapes, patches containing three ormore sulci should also be considered. However, it would be too expensive to take into account all combinations of three neighboringsulci, asit is done forpairs of sulci. To remedy this, criteriafor selecting

rel-evantpatch typesshouldbe determined, butnoneofthecriteria wetestedimprovestheresultssuﬃcientlytobeconsideredhere.

5.3. PCNN and UNET

Comparedto the approach proposed by Ciresan et al. (2012) , the PCNN approach has a major difference. In (Ciresan et al., 2012 ), several patch sizes, processed by several neural networks inparallel,were usedtolabeleach pixel,yetourPCNNapproach is based on only one patch size. Moreover, the neural network

(15)

Fig. 15. E local error rate per sulcus for SPAM and UNET models. The UNET model corresponds to the one after re-division of the elementary folds. Once the 10 segmentations

have been labeled by hemisphere, we consider the average error per sulcus in the left column and the maximum error in the right column. The external and internal sides are represented for each of the right and left hemispheres.

Fig. 16. Comparison of E max

local error rates between the SPAM model and the UNET model. The left column represents the difference between the E maxlocal of the SPAM model and

of the UNET model. The right column shows the p -value of the Wilcoxon test between each model. Note that the scale of the color palette used to represent p -values is logarithmic. In order to visualize the sulci significantly better recognized, the threshold 0.05 is indicated and the threshold at the star corresponds to the first sulci considered significantly better by controlling the false discovery rate through the Benjamini-Hochberg procedure ( Benjamini and Hochberg, 1995 ).

usedfor PCNNis not deep (only one hiddenlayer) compared to (Ciresan et al., 2012 ). However, aftertrying to make thenetwork architecture more complex by increasing the number of hidden layers or using multiple patch sizes, we did not observe signif-icant improvements in the results. It is imperative to note that thePCNNmodel achievesperformances comparableto the UNET model while the U-Net architecture is much deeper and previ-ous studies show that it is supposed to achieve better results (Ronneberger et al., 2015 ).

5.4. Unknown label

Inthispaper,exceptfortheHPMASmodel,the“unknown” label inthemanuallylabeleddatabaseistreatedliketheothersulci la-bels.However,althoughthe“unknown” labelrepresentsabout0.5% oftheskeletonvoxelsofmanuallabeling,thisproportionisnullif weconsider thelabelsoftheHPMASmodel.Moreover,thePMAS and PCNN models label around 0.02% of voxels as “unknown” and the SPAM and UNET models 0.04%. These ﬁgures show that

(16)

Fig. 17. Comparison of the asymmetry index I between right-handed and left-handed people. The left/middle/right graphs respectively show the results obtained with manual/SPAM/UNET labeling. The index for right-handed people is represented in blue and the one for left-handed people in green. The p -values of the T -test for the means of these two independent samples of scores are indicated on the graphs. With manual labeling, there is a signiﬁcant difference: in left-handed people, the central sulcus is longer in the right hemisphere than in the left, while this is the opposite in right-handed people. The same signiﬁcant difference is observed with the UNET model labeling but not with the SPAM model labeling. The box extends from the lower to upper quartile index I values, with a line at the median. The whiskers extend from the box to show the minimum and maximum values.

treatingthe“unknown” labelasotherlabelsisinsufficient.Models should also assign the “unknown” label tostructures whereit is notsufficientlyconfident.

However, since all the new methods were compared to the SPAM model,whichtreatedtheunknown labelassulcilabels,we chosenottoaddressthispointinthispaper.

6. Conclusion

To summarize, thenewmethods presented inthis paper out-perform the current SPAM model provided by the Morphologist toolboxofBrainVISA.ComparedtotheSPAMmodel,thebest mod-els have a 4% higher recognition rate and 15% of sulci are sig-niﬁcantly better recognized.By automatically re-dividingthe ele-mentary folds, the new models are considerably more robust to under-segmentation errors.Inpractice,theseimprovementsmake it possibletoreproduce ﬁndings thatwere previously only possi-blewithmanual labeling.The UNETmodelwillsoonbe available intheBrainVISA/Morphologisttoolbox.

Inthispaper,theapplicationofmethodsbasedonMASorCNNs give approximatelythesameresultsfortheautomaticrecognition of cortical sulci. However, although CNN-based methods have a particularly long training process compared to MAS-based meth-ods, which are signiﬁcantly faster. Therefore, CNNs-based meth-ods are farmoreproductive inpractice.The UNETmethodlabels a braininonly twentyseconds,whereas theSPAM methodtakes about ten minutes. It is interesting to note that patch MAS ap-proaches arealsobeginningtointegrate deeplearningtechniques (Manjón et al., 2018 ), probably due to their ability to effectively summarizethedataandfortheirrapidityofexecution.

Furthermore, the top-down refinement of bottom-up regu-larization significantly improves the results. Indeed, voxel-wise labeling is used to give a top-down perspective to a traditional bottom-up pattern recognition process that agglomerates the voxels into elementary folds: these folds can therefore be auto-maticallyre-dividedwhennecessary.Thus,thelabelingisrobustto under-segmentation errors, unlike theSPAM method,which does not provide voxel-wise labeling. Note that despite the definition of elementary folds specific to the problem posedhere, defining a coherent geometric entity isa legitimate concern addressedin many segmentation problems,for example by using super-pixels (Giraud et al., 2017; Soltaninejad et al., 2017 )that groupthemost similarconnectedpixelstogethersothattheyhavethesamelabel. Inordertoimprovethecurrentperformanceofthemodel, sev-eraloptionsremaintobe considered.Second,theinputscurrently

containthefoldskeletoninordertonormalizethedatafor acquisi-tionandagebiases.However,theinputcanbeenrichedbytaking intoaccount grey/whitematter segmentation ordirectly the nor-malizedMRI.Forinstance,wecouldconsiderintegratingthisdata intonewinput channelsforCNN-basedapproaches.Finally,in or-der to take advantage of the large unlabeleddatabases currently available,asemi-supervisedstrategy wouldbeparticularly attrac-tivetobetterrepresentthevariabilityofthecorticalfolds.

Inthe nearfuture, consideringthat the labeling modelseems suﬃcientlyreliabletous,wewouldliketoreconsiderthenumber ofsulci used inthe nomenclature onthe basis ofthe sulci most oftenconfusedbythemodel.Indeed,theerrorratesofsomesmall sulciarestill toohightobe usedinmorphologicalstudies.By al-lowingtheuser tochoose thelevel ofgranularityof the nomen-clature, he will be able to use suﬃciently stablelabeling of the structuresofinteresttohim.

DeclarationofCompetingInterest

Theauthorsdeclarethattheyhavenoknowncompeting ﬁnan-cialinterestsorpersonalrelationshipsthatcouldhaveappearedto inﬂuencetheworkreportedinthispaper.

Acknowledgments

This project has received funding from the European Union’s

Horizon 2020 Research and Innovation Programme under Grant Agreement No. 785907 (HBP SGA2), No. 720270 (HBP SGA1) and No. 604102 (HBP’s ramp-up phase), and from the FR-MDIC20161236445.

Supplementarymaterial

Supplementary material associated with this article can be found,intheonlineversion,atdoi:10.1016/j.media.2020.101651 .

References

Auzias, G. , Colliot, O. , Glaunes, J.A. , Perrot, M. , Mangin, J.-F. , Trouve, A. , Baillet, S. , 2011. Diffeomorphic brain registration under exhaustive sulcal constraints. IEEE Trans. Med. Imaging 30 (6), 1214–1227 .

Auzias, G. , Lefevre, J. , Le Troter, A. , Fischer, C. , Perrot, M. , Régis, J. , Coulon, O. , 2013. Model-driven harmonic parameterization of the cortical surface: hip-hop. IEEE Trans. Med. Imaging 32 (5), 873–887 .

Barnes, C. , Shechtman, E. , Finkelstein, A. , Goldman, D.B. , 2009. Patchmatch: a ran- domized correspondence algorithm for structural image editing. In: ACM Trans- actions on Graphics (ToG), vol. 28. ACM, p. 24 .