HAL Id: hal-02517321
https://hal.archives-ouvertes.fr/hal-02517321
Submitted on 24 Mar 2020
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Automatic labeling of cortical sulci using patch- or
CNN-based segmentation techniques combined with
bottom-up geometric constraints
Léonie Borne, Denis Rivière, Martial Mancip, Jean-François Mangin
To cite this version:
Léonie Borne, Denis Rivière, Martial Mancip, Jean-François Mangin. Automatic labeling of
corti-cal sulci using patch- or CNN-based segmentation techniques combined with bottom-up geometric
constraints. Medical Image Analysis, Elsevier, 2020, 62, pp.101651. �10.1016/j.media.2020.101651�.
�hal-02517321�
ContentslistsavailableatScienceDirect
Medical
Image
Analysis
journalhomepage:www.elsevier.com/locate/media
Automatic
labeling
of
cortical
sulci
using
patch-
or
CNN-based
segmentation
techniques
combined
with
bottom-up
geometric
constraints
Léonie
Borne
a ,∗,
Denis
Rivière
a,
Martial
Mancip
b,
Jean-François
Mangin
a a Université Paris-Saclay, CEA, CNRS, Neurospin, Baobab, Gif-sur-Yvette, 91191, Franceb Maison de la Simulation, CNRS, CEA Saclay, Gif-sur-Yvette, 91191, France
a
r
t
i
c
l
e
i
n
f
o
Article history: Received 28 May 2019 Revised 14 January 2020 Accepted 16 January 2020 Available online 28 February 2020
Keywords:
Convolutional neural network Multi-atlas segmentation Cortical sulci labeling
a
b
s
t
r
a
c
t
Theextremevariabilityofthefoldingpatternofthehumancortexmakestherecognitionofcorticalsulci, bothautomaticandmanual,particularlychallenging.Reliableidentificationofthehumancorticalsulciin itsentirety, isextremelydifficultand is practicedbyonlyafew experts. Moreover,thesesulci corre-spondto morethanahundred differentstructures, whichmakes manuallabelinglongand fastidious andthereforelimits accesstolargelabeled databasestotrainmachinelearning.Here,weseekto im-provethecurrentmodelproposedintheMorphologisttoolbox,awidelyusedsulcusrecognitiontoolbox includedintheBrainVISApackage.Twonovelapproachesareproposed:patch-basedmulti-atlas segmen-tation(MAS)techniquesandconvolutionalneuralnetwork(CNN)-basedapproaches.Botharecurrently appliedforanatomicalsegmentationsbecausetheyembedmuchbetterrepresentationsofinter-subject variabilitythanapproachesbasedonasingletemplateatlas.However,thesemethodstypicallyfocuson voxel-wiselabeling,disregardingcertaingeometricalandtopologicalpropertiesofinterestforsulcus mor-phometry.Therefore,weproposetorefinetheseapproacheswithdomainspecificbottom-upgeometric constraintsprovidedbytheMorphologisttoolbox.Theseconstraintsareutilizedtoprovideasingle sul-cuslabeltoeachtopologicallyelementaryfold,thebuildingblocksofthepatternrecognitionproblem.To eliminatetheshortcomings associatedwiththeMorphologist’spre-segmentationintoelementaryfolds, wecomplementthisregularization schemeusingatop-downperspective whichtriggersan additional cleavageofthe elementaryfolds when required.Allthe newlyproposed models outperformthe cur-rentMorphologistmodel,themostefficientbeingaCNNU-Net-basedapproachwhichcarriesoutsulcus recognitionwithinafewseconds.
© 2020TheAuthors.PublishedbyElsevierB.V. ThisisanopenaccessarticleundertheCCBY-NC-NDlicense. (http://creativecommons.org/licenses/by-nc-nd/4.0/)
1. Introduction
The surface of the brain is divided into many convolutions, calledgyri,delimitedbyfolds,calledsulci.Themainsulciare con-sideredasthelimitsbetweenfunctionallyandarchitecturally dif-ferentregions. Additionally,cortex morphometryisused to quan-tify brain development and degenerative diseases. Despite the manytoolsavailable for3Dvisualization ofsulci,sulci labelingis a longandfastidiousprocess. Ittakesseveralhoursforan expert tolabelallsulciinasinglebrainandreliablelabelingrequiresthe opinion ofseveralexperts.However, becauseofthelarge
variabil-∗ Corresponding author.
E-mail address: [email protected] (L. Borne).
ityofthe foldingpatterninthe generalpopulation, inferring de-velopmentalbiomarkers requiresthe miningofdata fromalarge numberofbrains.Thesebiomarkers maycorrespondto character-isticsofthesulci, such assize,depthoropening.However, these measuresrequirethepriorlabelingofsulci.Therefore,automation ofthesulcusrecognitionisessential.
Nevertheless, learning to label sulci is an extremely complex challengeforseveralreasons.First,asillustratedinFig. 1 ,sulciare highlyvariablestructures,somesulciareevenabsentinmorethan 70%ofbrainsandsomesubjectshaveupto8sulcimissing. Addi-tionally,eachbraincontainsmorethan120differentsulciandonly asmallnumberofsegmentationalgorithmsaremadeforasmany structures.Finally,thenumberofmanuallylabeledsubjectswhich canbeusedforsupervisedlearningislimited.
https://doi.org/10.1016/j.media.2020.101651
Fig. 1. Illustration of cortical folds variability. The manual labeling of the three right hemispheres represented here shows the variability of cortical sulci by their shape, size, and position.
1.1. Overview of automatic sulci recognition methods
Algorithms dedicated to automatic sulci recognition are pri-marily based on graphical representations, which represents the relative positions of the sulci with respect to each other, as well as their position and their location in a standardized space (Royackkers et al., 1998; Riviere et al., 2002; Vivodtzev et al., 2006; Shi et al., 2007; Yang and Kruggel, 2009; Belaggoune et al., 2014 ). Toensuretheirrobustrecognition,othermethodshavepreviously been experimented with to model inter-subject variability using severalframeworks rangingfromprincipalcomponentanalysisto Bayesian approaches (Lohmann and von Cramon, 20 0 0; Behnke et al., 2003; Fischl et al., 2004; Perrot et al., 2011 ). All of these methods are based on a segmentation algorithm followed by a classificationalgorithm, inwhich thesulci are first extracted, ac-cordingtodifferentrepresentations,thenlabeled.
In thispaper, theobjective isto improvethe modelproposed intheBrainVISA/Morphologistpackage(Perrot et al., 2011 ). Todo this,wefocusedontwoaspects ofthepipeline:ontheonehand, thesulcilabelingalgorithmand,ontheotherhand,the regulariza-tionoftheresults.Note thatwe didnot tryto improvethesulci extractionalgorithm.
1.2. New sulci labeling approaches: MAS and CNN
Currently, the sulci labeling model proposed in the Brain-VISA/Morphologistpackage,referredastheStatisticalProbabilistic AnatomyMap(SPAM)modelinthispaper,isbasedonaBayesian approach. As this labeling model has shown significant weak-nesses,wehavebeeninspiredbytwosegmentationapproachesfor biomedicalapplicationsthatare amongthemostwidelyused to-day,multi-atlassegmentation(MAS)andconvolutionalneural net-works(CNNs).
MAS techniques,initially introduced by Rohlfing et al. (2004) , useeach manually segmented image as an atlas: the atlases are adjustedtotheimage tobe segmentedandthe bestmatchesare used to participate in the segmentation. Thus, MAS techniques makeitpossibletomoreaccuratelyrepresentanatomical variabil-ityby notattempting tomodela segmentationproblemusingan averagemodel.Thesetechniquesarenowwidelyused,buthavea majordisadvantage:theregistrationoftheatlasestotheimagesis particularlyexpensive.
Among the many variations of these techniques, the patch-based approach introduced by Coupé et al. (2011) and
Rousseau et al. (2011) have particularly attracted our attention. By using a patch-basedsearch strategy to identify matches with the atlases, the image no longer needs to be aligned globally with all the atlases via expensive non-linear registration. Thus, the registrationandselection of matchingpatches can be partic-ularlyaccelerated thanksto the OptimizedPatchMatch algorithm proposed by Ta et al. (2014) . This algorithm is an adaptation to segmentation of 3D images of the PatchMatch algorithm (Barnes et al., 2009 )thataimstoassigntoeachpatchofanimage, apatchsimilartoitinanotherimage.
Inspired by these approaches, we propose two algorithms for cortical sulci recognition. The first is directly inspired by
Romero et al. (2017) , that proposesa cerebellumlobule segmen-tationmethodusinganapproachsimilartotheoneoriginally pro-posedbyCoupé et al. (2011) ;Rousseau et al. (2011) withsome im-provements.Inthesecondalgorithm,weproposeanewpatch gen-eration strategybasedon ahighlevelrepresentationofthe sulci, asthestandardwayofextractingcubicpatchesdoesnotseem ca-pableoptimallyexploitingthesulcigeometryandtherelations be-tweenthem,whichwebelievetobethediscriminativefeaturesfor theirrecognition.Thesetwoalgorithmswillbedesignated respec-tivelybyPMAS(forPatch-basedMAS)andHPMAS(forPatch-based MASwithHighlevelrepresentationofthedata).
TheCNNs wereinitially developedtoaddressproblemsin im-age classificationand are nowrenowned fortheir formidable ef-fectiveness in dealingwith numerous computer vision problems. Thesetechniquesalloweffectiveimageanalysisbylearningan ab-stractrepresentationoftheimage.Concerningsegmentation prob-lems,thefirstapproachwasproposedapproximatelytenyearsago by Ciresan et al. (2012) where a neural network was trained to classify each voxel of the image to be segmented from its sur-rounding patch. Sincethen, newapproachesallow theentire im-agesegmentation usingfullyconvolutional neural networks,such asthe oneinitially proposed by Long et al. (2015) anddedicated to semantic segmentation. Concerningsegmentation problems in medicalimaging, themostcommonlyused architecture isthe U-Net,a fullyconvolutional neuralnetworkwhichwasinitially pro-posed by Ronneberger et al. (2015) andwhose adaptation to 3D imageswasproposedin(Çiçek et al., 2016; Milletari et al., 2016 ). Here,weproposetocomparetwoapproachesbasedonCNNs.The
first isinspiredby Ciresan et al. (2012) ,adaptedtoaddress prob-lems associated with3D imaging. The second uses the3D U-Net architectureproposedin(Çiçek et al., 2016 ).Thesetwoapproaches willbecalledPCNN(forPatch-basedCNN)andUNET,respectively. Tothebestofourknowledge,despitetheir currentpopularity, no MASorCNN-basedapproach hasyetbeenproposed for corti-calsulcirecognition.Notethatthesetwoapproachesaregenerally usedtosegmenttheentireimagewhileinthisstudyonlythe pre-segmentedfoldsneedtobelabeled,requiringseveraladjustments intheproposedmodels.
1.3. Bottom-up geometric constraints
Thereisnoguaranteethatthegeometricdefinitionofasulcus, asa set oftopologically simple surfaces,isrespected inthe case of MAS and CNN-basedmethods described above. Thisis partic-ularlydisadvantageousformorphometricstudieswhose measure-ments are based on the definition of sulci. To remedy this, the BrainVISA/Morphologistpipelineprovidesanalgorithmfor bottom-upaggregationofvoxelsintoelementaryfolds,whicharethe geo-metricbuildingblocksoftheproblem. Oncethevoxels havebeen labeled by one of the methods proposed above, it is possible to regularize the results at the scale ofthe elementary folds. How-ever,the upstream extractionoftheelementary foldsmay some-timesbeinaccurate.AlthoughfromthesameMRI,vastlydifferent fragmentationscanbeobtainedbecauseofstochasticoptimizations embedded inthe pipeline. Thiswas previously a problem inthe modelproposedin(Perrot et al., 2011 ),whichusesthesame geo-metricentitiesto performrecognition,butisnotcapableof auto-maticallyre-dividingtheelementaryfolds.
In thispaper,we propose tousevoxel-wise labeling togive a top-down perspective toa traditionalbottom-up pattern recogni-tion system. Thus, the initial cutting into elementary folds pro-posed by BrainVISA/Morphologist is challenged by voxel-wise la-beling, eliminating under-segmentation errors in the model. The proposed approach is particularly robust to the spatial inconsis-tencies that can occur duringvoxel labeling andto the potential incorrectdefinitionofupstreamgeometricentities.
2. Database
The training base is composed of 62 healthy brains selected fromdifferentheterogeneousdatabasesandlabeledwithamodel containing 63 sulci for the right hemisphere and 64 for the left hemisphere. The “unknown” label is used to designate unidenti-fiedstructures(usuallysmallsulci).Thetwoventriclesarelabeled butnotconsideredassulci.Mostofthesubjectsareright-handed men,aged25to35yearsold.
Unfortunately,thereisnogoldstandarddefinitionofsulci mor-phology.Eventheboundariesofthewell-knowncentralsulcuscan be difficulttodefine (Fig. 2 ).Moreover,Fig. 2 showsthatthe def-initionofsulci morphologyimpactsthelevelofgranularityofthe nomenclature. Therefore, for this study, the elementary folds of each brainwere manually labeled according to a sulcus nomen-clature following a long iterativeprocess to achieve a consensus across a panel of several expertson cortex morphology. The last iterationofthedatabaselabelingwasperformedusingtheTileViz visualizationtool (Mancip et al., 2018 ).Thistoolallows theentire databasetobevisualizedandlabeledsimultaneouslyonawallof screens(See Fig.19insupplementary material). Untilnowit was onlypossibletolabelandsimultaneouslyevaluatealimited num-berofhemispheres,generallyfour,onastandardscreen.Thus,this toolhelpstolimitthebiasoflabelinginducedbyarestrictedview ofthedatabase.Tosupportthisnewiteration,theelementaryfolds weremanuallycutwhennecessary,whichwasnotpossibleduring thestudyofPerrot et al. (2011) .
Fig. 2. Where should the central sulcus end? The folds that may belong to the cen- tral sulcus are shown in red. Limits 1 or 2 can be chosen according to the morpho- logical definition of the central sulcus used. Note that depending on the definition chosen, the question then arises of adding a label to the nomenclature to identify the sulcus located between boundaries 1 and 2. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this arti- cle.)
Note that compared to traditional labeling approaches where onlyoneexpertcanlabelimages,thisdatabasehasbeen progres-sivelylabeledbyseveralexperts,bothsuccessivelyand simultane-ously.Thisconsensus-basedlabelinghassometimesledtothe in-troductionof newsulci labelswhen it wasconsiderednecessary, making it essential to usethe video wall. However, the different expertshavethusnotproducedindependentlabelings,which pre-ventsusfromassessinghuman-levelperformanceonthisdataset.
Compared to (Perrot et al., 2011 ), the same MRI acquisitions wereused butanewiteration oflabeling wasperformed, result-ingintheintroductionoffournewsulciinthenomenclatureused. ThenewnomenclatureisdescribedintheFig. 3 .Amoredetailed description is provided in the Fig. 23 of the supplementary ma-terialssubsection.Themanually labeleddatabaseisnowavailable ontheBrainVISAwebsite(http://brainvisa.info/data/sulci _ database/ base _ 62/2019 ).
3. Method
TheMorphologist/BrainVISApipelinepresentedin(Perrot et al., 2011 )hastwomajordeficiencies.First,theSPAMmodelofsulci la-belingmakesobviouslabelingerrorsthatareproblematicin prac-tice.Typically,ittendstoduplicatethecentralsulcus,whichisan aberration.Then,themodelusesbottom-up geometricconstraints togroupthevoxelstobelabeledinelementaryfolds,andthisstep issubject to errors.In this article,we therefore seekto improve theperformanceofthesulci labelingmodelanditsrobustnessto sub-segmentationerrorsinelementaryfolds.
Inthissection,sulci labelingfroman MRIisdescribedinthree steps(Fig. 4 ).First,thefoldsaresegmentedfromtheMRIusingthe BrainVISA/Morphologistpipeline(3.1.).Then,theyarelabeledusing different algorithms (3.2.). Finally, the agglomeration of the vox-elsintoelementaryfoldsproposedby theBrainVISA/Morphologist pipelineisusedtoregularizetheresults(3.3.).
Notethatthe strategiesusedto setthemethod hyperparame-tersaredetailedinthesupplementarymaterial.
3.1. Folds representation
The Morphologist pipeline of the BrainVISA software (www. brainvisa.info ), a widely used resource for studying cortical
Fig. 3. New nomenclature used to label sulci. The visualization of the sulci labels is done thanks to the SPAM representation used by Perrot et al. (2011) which averages the position of the sulci as probability maps that are thresholded for this image. The new nomenclature includes 63 labels for the right hemisphere and 64 for the left hemi- sphere. Only the left hemisphere is represented in this figure. The right hemisphere has the same labels except the S.GSM. label. Compared to Perrot et al. (2011) , two new sulci are labeled (S.intraCing. and S.R.sup.). The ventricle label does not correspond to a sulcus label, but belongs to the fold skeleton extracted by the BrainVISA/Morphologist toolbox. Only the “unknown” label is not shown in this figure. Please refer to the Fig. 23 of the supplementary material section for English translations of each label.
Fig. 4. MRI to labeled cortical sulci: a three-step pipeline. First, the fold skeleton is extracted using the BrainVISA/Morphologist toolbox. This toolbox also makes it possible to fragment the skeleton into elementary folds. Second, skeleton voxels are labeled by different algorithms. Algorithms based on MAS techniques (PMAS, HPMAS) and CNN- based algorithms (PCNN, UNET) label each skeleton voxel while the SPAM algorithm directly labels the elementary folds. Third, voxelwise labeling is regularized through the elementary folds, while automatically re-dividing them when the labeling indicates that it is required.
anatomy,allowsfirsttorepresentthefoldsasa setofvoxels cor-respondingtoaskeleton ofthecerebrospinalfluid fillingthefold andthen tolabel themusing the SPAM model(Fig. 5 ). This first stepoffoldsegmentation iscommonto allthemodels presented inthisarticle.Itconsistsofthreemajorsteps:first,the segmenta-tionofwhiteandgreymatterfromMRI,thentheextractionofthe skeletonofcorticalfolds,followedbyits divisioninto elementary folds.Thefragmentationintoelementaryfoldssatisfiestopological andgeometricconstraintsspecifictothesulci’sdefinition.Itisfirst basedonthetopologicalcharacterizationofasimplesurface pro-posedbyMalandain et al. (1993) whichisolatessurfacepiecesthat donotincludeanyjunction.Theskeletonisalsofragmentedatthe leveloftheburiedgyri(Fig. 6 ).
The skeleton representation has three main advantages. First, this3Drepresentationisessentialduringmanuallabelingbecause itallows thevisualization oftherelative positionofthesulci be-tweeneach other andtheevaluationoftheir depth,size,etc. Ad-ditionally, the agglomeration of the voxels into elementary folds makes it possibleto speed up labeling by givinga labelto a set of voxels rather than individually. Second, as the data are par-ticularly influenced by the type of MRI sequence, the ageof the subjects (which has a significant impact on the opening of the sulci)or even their pathologies,thispre-processing enables opti-malnormalizationofthedata.Moreover, thealgorithm canfocus onlabelingonlyafteritssegmentation.Finally,thisrepresentation haspreviouslybeenusedinotherpipelines,makingitpossibleto
Fig. 5. A computer vision pipeline mimicking a human anatomist ( Mangin et al., 2015 ). A: interface between the cerebral envelope and the cortex. B: interface be- tween white matter and grey matter. C: extraction of the fold skeleton. D: cutting of the skeleton into elementary folds. E: Folds labeling using the SPAM model of
Perrot et al. (2011)
Fig. 6. Schematic representation of the fold skeleton. The fragmentation into ele- mentary folds isolates the internal and external branches and cuts the skeleton at the level of the buried gyri. Image taken from Riviere et al. (2002) .
automatethecalculationofmeasurements(depth,length, connec-tivity, etc.)usedinmorphometricstudies ortorealign thebrains according tothe major sulci (Auzias et al., 2011; 2013 ), whichis whywehavechosentokeepit.However,ifwehadchosento con-struct a model to recognize thesulci, that carriesout both their extraction and labeling without relying on this representation,it washighlyprobablethattheresultsobtainedwouldnot conform to therepresentation usedby thesepipelines andthat some sig-nificantpostprocessingstepswouldbenecessary.
Althoughtheextractionofthefold skeletonisrobust,its frag-mentation into elementary folds demonstrates certain significant instabilities, such as vastly different fragmentations can be ob-servedfromthesameMRI(Fig. 7 ).Severalstochasticoptimizations were includedinthe segmentationpipeline(e.g. forbias correc-tion,brainmasking,skeletonization,etc.).Theseoptimizationsonly have a slightimpact on the shape ofthe resulting fold skeleton. However, forthetopological fragmentationintoelementary folds, asinglevoxelcanthenmakethedifference.Thus,thesestochastic optimizationscanhaveimportantconsequencesonthe fragmenta-tionoflargesimplesurfaces.Toremedythis,duringmanual label-ing,thefoldswerecutmanuallywhennecessary.Duringautomatic labeling,weproposeatechnique,basedonaclusteringalgorithm, toautomaticallyredividetheelementaryfoldsfromavoxelwise
la-Fig. 7. Extraction of the elementary folds from the same MRI. In the two lower brains, each color represents a different elementary fold. We observed that the skeleton extraction is visually stable, but its division into elementary folds can pro- duce very different results. (For interpretation of the references to color in this fig- ure legend, the reader is referred to the web version of this article.)
belingduringtheregularizationstep.Thistechniqueisdescribedin
Section 3.3 .
3.2. Labeling methods
The methods described belowseek to automatically label the voxelsofthefoldskeleton.Amongthepossiblelabels,whilemost correspondtocorticalsulci,threeotherlabelsareused:those cor-respondingtotherightandleft ventriclesandthe“unknown” la-bel.According to themethods presentedhere, the ventriclesare treatedassulci,asthey arerelativelystableanatomicalstructures ofthebrainnegativemold. However, the“unknown” label, corre-spondingtovoxelsthatdonotbelong toanyoftheother labeled structures,mustbetreateddifferentlyinsomecases.
3.2.1. Statistical probabilistic anatomy map (SPAM) models
Inthiscomparativestudy,thereferencemethodcorrespondsto the one described in (Perrot et al., 2011 ), where they propose a coherentBayesianframeworktoautomaticallyidentifysulci based on a probabilistic atlas (a mixture of SPAM models) estimating simultaneously normalization parameters. This method, currently availableintheBrainVISA/Morphologist pipeline,hasbeenwidely usedonverylargedatabasesforlarge-scalemorphometricstudies (Le Guen et al., 2019 ).However,themodelisstillmakingobvious errorsandwebelievethatthisisduetothefactthattheSPAM ap-proachisbasedonasingletemplateatlas,whichpreventsitfrom fullyrepresentingthehighvariabilityoffoldingpatterns.Each sul-cus canhave severalconfigurations, whichmayprove difficult to representwithasingleaveragemodel.
3.2.2. MAS approaches
TwoMASapproaches, PMASandHPMAS,arecomparedinthis section.Thefirstapproachislargelyinspiredby theoneproposed in (Romero et al., 2017 ) in which, unlike most MAS approaches, similaratlasesaresearchedbetweentwocubicpatches,insteadof two full images. The second MAS algorithm presented here, and describedinBorne et al. (2018) , aims todefine a library oflocal patches embedding enough geometrical information to minimize
Fig. 8. Comparison of MAS approaches: PMAS vs. HPMAS. First of all, the patches are designed. Second, they are transferred to a new image to be labeled, where the fold skeleton has been extracted. Third, the best matches were selected and patch labels were propagated on the image to be labeled. Finally, the propagated labels are used to calculate the label score maps. In order to make the figures as readable as possible, we have chosen to represent the images in 2D while they are processed in 3D. All images are represented in 2 ∗2 ∗2 mm resolution, while for HPMAS, images are processed with the acquisition resolution. The acronym ANNs refers to the Approximate
Nearest Neighbors patches obtained by the multiple run of the Optimized PatchMatch (OPM) algorithm.
ambiguities when searching for a high similarity hit in the un-knownsubjectmorphology.Therefore,insteadoftakingnative cu-bicpatches,thisalgorithmbuildsvirtualpatchescontainingwhole sulci.
Thesetwoapproachesaredescribedinfoursteps:first,the de-signofthepatches (patchgeneration), second,thestrategy of re-aligningthepatchesbetweenthemandselectingthebestmatches (distancecalculation),third,thestrategyofpropagatingthelabels fromthepatchtothe brainto belabeled (labelpropagation)and finallythecombinationofthelabelsofthepropagatedpatches (la-belfusion)(Fig. 8 ).
Patch-based MAS approach (PMAS)
Patch generation. The patches are cubes containing the fold skeleton. They are extracted from images with a resolution of 2∗2∗2 mm, that has been automatically relocated thanks to the BrainVISA/Morphologist pipeline in the well-known MNI space (Collins et al., 1994 ), whichaligns theroughshapesofthe brains throughan affinetransformation.Wechosetoharmonizethe res-olutionoftheimagesat2∗2∗2mm,becauseitseemedsufficientto ustovisuallyrecognizethesulci.
We choseto takeinto accountonlythe patcheswiththe cen-tral voxel belonging to the fold skeleton for two main reasons. First,itlimitsthenumberofpatchmatchesthatrequire optimiza-tionasthevoxelsbelongingtotheskeletonrepresentonlyasmall partoftheimage’svoxels.Second,sincethepatchesareextracted frombinarizedimages,thecalculationofthedistancebetweentwo patchescanbe successfulonlyifthepatches containa minimum numberofskeletonvoxels.
As proposed in(Giraud et al., 2016 ),we adopteda multi-scale approach, which involves the independent use of several patch sizes (determined by inner cross validation), to produce several scoremapsperlabel,whicharethenaveraged.
Distance calculation. In order to find the most similar set of patches, we aimed to optimizethe following distance d between two patches P (S A) and P (S B), respectively belonging to the fold
skeletons S A and S B(superimposedbyasimpletranslation):
d
(
P(
SA)
,P(
SB))
=d
(
P(
SA)
→SB)
+d(
P(
SB)
→SA)
2 (1)
Themeasurementfromapatch P (S A)toa foldskeleton S B
cor-respondstotheaverageofquadraticEuclideandistances d Eofthe
skeleton voxels p A ∈ P (S A) and their nearest neighbor inthe fold
skeleton S B(Fig. 9 ): d
(
P(
SA)
→SB)
= 1|
P(
SA)
|
pA∈P(SA) min pB∈SB [d2 E(
pA,pB)
] (2)Notethat,inordertoavoidbordereffects,theclosestneighbor of p Aissearchedintheentireskeleton S B andnotonlyamongthe
skeletonvoxelscontainedinthepatch P (S B).
Realigning andcomparing all the patches inthe databasefor each skeleton voxelto be labeled wouldbe extremely expensive, making itimpossibletolabelwithin areasonabletime. Addition-ally, it would increase the probability of spurious matching be-tween remote areas in the brain while the images are already roughly alignedwith each other.It isimportant to note that be-causeweusebinarizedimages,theriskofobtainingfalsepositives ishigherthanusual.
In(Romero et al., 2017 ),the OptimizedPatch MatchLabel fu-sion(OPAL)(Ta et al., 2014; Giraud et al., 2016 )wasused.This seg-mentation method isbased on the OptimizedPatchMatch (OPM) algorithmwhichusesacooperativeandrandomstrategyresulting inaverylowcomputationalburden.ComparedtothePatchMatch algorithm (Barnes et al., 2009 ) from whichit isinspired, OPM is adaptedto3Danatomicalsegmentationbytakingintoaccountthe
Fig. 9. Calculation of the distance from the patch P ( S A ) to the skeleton S B for the
PMAS method. The grey voxel represents the central voxel of the patch P ( S A ) which
is superposed with a voxel of the skeleton S B . For each voxel p A ∈ P ( S A ), we look
for its closest neighbor among the voxels of the skeleton S B . The Euclidean distance
between these two voxels is calculated. The distances over all the points p A ∈ P ( S A )
and their nearest neighbors are then averaged to obtain d ( P ( S A ) → S B ).
rough alignment of images. Here, as only patches with the central voxel belonging to the fold skeleton are considered, an adapted version of the OPM algorithm has been implemented. Pleaserefertosupplementarymaterialformoredetails.
Label propagation. InordertoselectseveralApproximate Near-est Neighbors (ANNs)patch per skeleton voxel fora given patch size, multiple independent OPM were launched. The number of ANNstobeselectedisdeterminedbyinnercross-validation.Once theANNshavebeenselected,allthevoxelsofeachANNpatch par-ticipatesinthelabeling,asdonein(Rousseau et al., 2011; Giraud et al., 2016 ).However,thereareonlyafewvoxelsbelongingtothe skeletonof thepatchthat overlapwiththe skeletonvoxels tobe labeled.Thus,we proposetopropagatethelabelofeachskeleton voxelofthepatchtoitsnearestneighborintheskeletontobe la-beled.
Label fusion. Forthismethod,we haveimplementedthe non-localpatch-basedlabelfusionusedin(Romero et al., 2017 ).Inthis strategy,thedistancebetweenpatchesisusedtoperformarobust weighted average of the labels. The label fusion strategy corre-spondstothemultipointestimationdescribedin(Rousseau et al., 2011 ).Oncethenon-localmeansestimatorhasbeencalculatedfor allpatchsizes, thefinal estimationisobtainedbyaveragingthese estimations thanks to a late fusion (Snoek et al., 2005 ). Thus, a scoremapisestimatedforeachlabelinthedatabase.
Concerning the “unknown” label, present in the manually la-beleddatabase,itistreatedlikeasulcuslabel.
Patch-based MAS approach with High level representation of the data (HPMAS)
Fig. 10. 3D representation of the HPMAS method. As for the Fig. 8 which represents the method in 2D, the approach is described in four steps: generating the virtual patches, registering them on the image to be labeled, propagating the labels of the selected virtual patches and finally merging the propagated labels to obtain the final labeling.
As thestandard wayof extractingpatches does not seem ca-pableof exploitingthesulci geometryand therelations between them,which we believe to be the distinguishing features neces-saryfor recognition,we have proposed a newvirtual patch gen-erationstrategy based ona highlevel representationofthe sulci (Borne et al., 2018 ). This framework is well adapted to leverage moreinformationaboutthedifferentfoldingconfigurationsinthe trainingdataset.
Note that this method is the only one of the proposed new methodstohavebeenspecificallydevelopedfortherecognitionof corticalsulci.Itincludesmanyarrangementsspecifictothis appli-cation.Its complexdesigngivesan ideaofthescoresthat canbe obtainedby pushing asfar aspossible in thisdirection. To facil-itatetheunderstanding ofthisad-hoc method,Fig. 10 represents thepipelinein3D,whichcomplementsthe2Drepresentation pro-videdinFig. 8 .
Patch generation. In order to take into account as much ge-ometric information as possible, the idea was to define virtual patches containingwhole sulci. These virtual patches correspond toa voxel cloud representinga pairof sulci,extracted fromMNI space at the image resolution. By defining patches as clouds of voxelsandnotascubes, itallowsto takeintoaccount thesulcus initsentiretywithoutparasitizingthepatchwithallits surround-ingsulci.Notethattheshapeofsmallsulciisnotspecificenough to prevent spurious hits. That is why we have chosen to aggre-gatetwosulcitocreatediscriminativelocalshapes.Inthe follow-ing,wedefinea typeofvirtualpatchesforeachpairofsulcithat areneighborsinthebrain.
In practice, a pair of sulci is selected in the circumstance that the two sulci are neighbors in at least one brain of the atlas dataset, according to the topology provided by the Brain-VISA/Morphologistpipeline that produces the folds.Thispipeline endowsthe listof foldswith a graphstructure corresponding to eitherdirectconnectionsortothefactthattwofoldsareseparated byapieceofgyrus.Finally,each typeismadeupoftheinstances ofthepairofsulciintheatlasdataset,mostofthetimeasmany shapesasatlases(someatlasesmissafewsmallsulci)(Fig. 10 .1).
Notethatonlytheunknownsulcuslabelisnotselectedtoform virtualpatches,asitdoesnotconstitute acoherentstructurelike theotherlabels.Thus,unlike thepreviousPMASmethod,the un-knownlabelisnottreatedlikeothersulcuslabels.
Distance calculation. Forthe distancecalculation step, theset offoldsofthebraintosegmentandthevirtualpatches ofthe li-braryarerepresentedbypointclouds.Inordertofindanoptimal alignmentofeachvirtualpatchintotheskeletonpointcloudofthe braintosegment,thewell-knowniterativeclosestpointsalgorithm (Besl and McKay, 1992 )isused,withtherobustimplementationof
Holz et al. (2015) .Thisalgorithmiterativelyadjuststhe transforma-tions(translation androtation) inordertominimize the distance betweentwo setofpoints. Note that comparedto the PMAS ap-proachwhich only uses translations to superimposepatches, the registrationhereallowsrotations.
To build the measure used to rank the matches, the nearest voxelsinthenewfoldskeleton S Bofeachskeletonvoxel p A∈ P (S A)
aresavedasactivatedvoxels p ∗B∈S ∗B,P(S
A).Then,themeasure
corre-spondstothesumofthequadraticdistancesoftheskeletonvoxels andtheircorrespondingactivatedvoxels,dividedbythenumberof differentactivatedvoxels:
d
(
P(
SA)
→SB)
= 1|
S∗ B,P(SA)|
pA∈P(SA) min pB∈SB [d2 E(
pA,pB)
] (3)Note that by dividing by
|
S ∗B,P(SA)
|
, we take into account thenumberofdifferentactivatedpoints. Thisallows thepenalization ofvirtual patcheswhereseveralpointsactivatethesamepoint of theskeletontobelabeled(Fig. 11 ).
Fig. 11. Calculation of the distance from the virtual patch P ( S A ) to the skeleton S B
for the HPMAS method. For the sake of clarity, the skeletons S A and S B represented
do not overlap in this Figure. For each voxel p A ∈ P ( S A ), we look for its closest neigh-
bor among the voxels of the skeleton S B . The Euclidean distance between these two
voxels is calculated. The distances over all the points p A ∈ P ( S A ) and their nearest
neighbor are then summed and divided by the number of different activated points
p∗B to obtain d ( P ( S A ) → S B ). The two configurations represented are penalized by the
division by the number of different activated points rather than by the number of points in P ( S A ) as for a classical average. On the first configuration, we observe that
the proposed distance penalizes the virtual patch more if its shape is more complex or if its size is larger than the structure on which it has been registered. On the sec- ond configuration, we observe a greater penalization of the virtual patch if it has only one connected component and if it is registered on two different components.
With regards to each type of virtual patch, all matches are ranked according to the distance proposed above. A fixed num-ber of matches (determinedby inner cross-validation) leading to theshortestdistancesisselectedtopropagatethetwoparentsulci. Alltypesofvirtualpatchesareselectedthesamenumberoftimes eveniftheyarenotallequallyinformative.Itisimportanttonote thatsomesulcusinstancesareselectedseveraltimes,becausethey winthecompetitionforseveralvirtualpatchtypes,buttheir mul-tiplecontributionswillbe associatedwithslightlydifferent align-ments. Hence, sulcus instances maximizing regional similarity to theunknownsubjectgetmoreweight.
Label Propagation. Eachselectedvirtualpatchaftertheoptimal alignment tothe unknown subject, concomitantly propagatesthe labelofeach voxel toits nearest neighbor inthe targetbrain. To considerthe virtual patchstructure,each connectedsetof voxels inthe virtualpatchshould correspondto aunique connectedset inthe targetbrain:the smallestnon-connectedsets areexcluded (Fig. 10 .3).
Label Fusion. Post complete propagation of all the proposed virtual patches p ∈ V l that containthe sulcus l ,the scoremap S l
iscalculatedbyaveragingthenumberoftimesthepointsof coor-dinates(x, y, z )areactivatedbydifferentvirtualpatches:
Sl
(
x,y,z)
=
p∈Vlactp
(
x,y,z)
|
Vl|
(4)
with act p(x, y, z ) equalsto1ifthevoxelofcoordinates(x, y, z )is
activatedbythepatch p ,andto0otherwise.
Comparedto PMAS,wherepatches are weightedby their dis-tanceto thepatchto be labeled,hereeach propagated pointhas
Fig. 12. Comparison of CNN-based approaches: PCNN vs. UNET. Boxes represent feature maps. The number of channels is denoted next to each feature map. The size of the feature map is indicated after the @ when appropriate. N is the number of different labels to be predicted. For clarity sake, input and output are represented in 2D rather than 3D.
the sameweight in thelabel fusion.In orderto perform a simi-larweighting, wehavetestedtheuseof distancefromthe entire virtual patch tothe skeleton to be labeled. Thisdid not seemto significantly improve the results. We also tried to weight by the distancefromthevirtualpatchpointtothepointithasactivated, withoutanyfurtherimprovements.Wethinkitisessentialto com-binethesetwodistanceswhenweighting, forexampleby averag-ing thetwo distances.However, ourattemptshavealso been un-successfulsofar,sowechosetoavoidweighting.
As the “unknown” labeldoes not belong toany virtual patch, itsscoremapisempty.Thislabelwillbeselectedonlyifthescore mapsofallotherlabelsarealsoemptyforagivenelementaryfold.
3.2.3. CNNs based approaches
As thisis thefirst time that CNNs are usedforsulci labeling, we take inspiration from two models that have proven their ef-ficacy in medicalimage segmentation (Fig. 12 ):the first being a patch-based approach inspired by (Ciresan et al., 2012 ) and the second anapproach thattreats theentireimage witha3DU-Net asin(Çiçek et al., 2016 ).Firstthecommonmodalitiesusedduring trainingof thesetwo networks are detailedfollowed by an indi-vidualdescriptionofeachnetwork.Themodelspresentedare im-plementedusingthePytorchlibrary(Paszke et al., 2017 ).
Data. All the fold skeletons are registered in MNI space and usedasinput:theycorrespondto3Dbinaryvolumeswitha com-mon resolution of 2∗2∗2mm, where the voxels belonging to the skeletonareoneandtheothersarezero.Inordertoaugmentthe trainingdataset, arotation ina random directionwitha random angle (following a Gaussian distribution N
(
0,16π2
)
) is applied to theimagesateachepoch.
Atthe outputoftheneural network,ascoreper labelpresent inthe databaseisobtainedper voxel.Concerningthe “unknown” label,itistreatedlikeasulcuslabel.
Training design. Initializationoftheweightsoftheneural net-workswasdone asin(LeCun et al., 2012 ).Stochasticgradient
de-scentwasusedfortraining,withlearningrateandmomentum de-terminedby 3-foldsinner cross validation.The learningratewas halvedwhentheloss functionhadnot improvedfortwo consec-utiveepochs.Afterfourconsecutiveepochswithoutimprovement, trainingwasstopped. Theselected trained neuralnetwork corre-spondsto theepoch obtainingthe lower errorrate E SI,described
in(Perrot et al., 2011 )andinthefollowingsection.
Thelossfunctionusedisthecross-entropyloss.Inmostcases, forunbalanced problems, the loss function must be weighted to avoidfavoringthelabelsmostinvolvedinbackpropagation,dueto theirhigherpresenceinthedatabase.Althoughtheaveragesizeof eachsulcusisextremelyunbalanced,wehavechosennottoweigh thislossfunctionbecauselargesulciarealsothemostinteresting fromaneuroanatomicalpoint ofview andneed tobe better rec-ognizedthansmallones.
Patch-based model with a 3D CNN (PCNN) PCNNmethodadapts theapproachproposed in(Ciresan et al., 2012 ), addressinga seg-mentation problem asa classification of each voxel based on its environmentcontainedina patch.Here, onlyvoxels belongingto theskeletonareselectedtoparticipateintheclassification.
Wedesignedthe architectureof theneural networkso that it takescubicpatchesof6.2cmside ininput,whichweconsidered tobelargeenoughtoidentifyitscentralvoxel.Duringtraining,the dropoutstrategy (Srivastava et al., 2014 ) withaprobability of0.5 is usedon fully connected layers. Batch normalization(Ioffe and Szegedy, 2015 )wasalsousedonconvolutionalandfullyconnected layers. The batch size has been set at 100 to minimize learning time andfit inmemory. Inorder to ensurethat the inner cross-validationisnottootime-consuming,onlythreeepochsare calcu-latedforeachhyperparametervaluetested.
3D U-Net based model (UNET) For the UNETmethod, the net-workarchitectureusedistheonepresentedin(Çiçek et al., 2016 ), withthePytorchimplementationof(Wolny and enfisan, 2019 ).The particularityofthisapplicationofU-Netliesinthefactthatallthe voxelsthatdonotbelongtothefold skeleton,i.e.alargemajority
ofthe voxels in the image, do not need to be classified. Indeed, as the values predicted by U-Net are masked by the segmenta-tionofthefoldskeletonmadeupstream,thebackgroundvoxelsdo notneedtobepredictedandthereforedonotneedtobelearned. Thus,duringtraining,allvoxelsthatdonotbelongtosulciarenot usedforgradientbackpropagation.Thebatch sizehasbeensetat 1inordertofitinmemory.
3.3. Bottom-up geometric constraints
In orderto standardize theresults,the voxels were agglomer-ated into elementary folds. However, these folds are not always sufficientlyfragmented,soweproposetousethelabelscoremaps toreconsidertheirfragmentation.
The straightforwardapproachto regularizetheresultsisto do aweightedmajorityvote.Thescoresofeachelementaryfoldwere averaged by label and the highestscore label was selected. This strategywasusedasareferencetoevaluatetheimpactofthe au-tomaticre-divisionofelementaryfolds.
Inthispaper,weproposetore-dividetheelementaryfoldswith help of the Ward’s hierarchical agglomerative clustering method (Ward Jr, 1963 ). Clustering for each elementary fold was per-formed based on the label score maps. In order to ensure spa-tialconsistency,aspatialconnectivityconstraintwasimposed dur-ing cluster agglomeration. Then, the Calinski-Harabasz index I CH
(Cali ´nski and Harabasz, 1974 ), implementedin thescikit-learn li-brary(Pedregosa et al., 2011 ),wasusedtoquantifythequality of theproposedclustering. Thisscorecorrespondstotheratioofthe betweenclustersdispersionmean B andthewithincluster disper-sion W : ICH= Tr
(
B)
Tr(
W)
∗(
N− 2)
(5) W= 2 k=1 x∈Ck(
x− ck)(
x− ck)
T (6) B= 2 k=1 nk(
ck− c)(
ck− c)
T (7)with N be the numberof voxels in theelementary fold E, C k be
thesetofvoxelsincluster k, c kbethecenterofcluster k, c bethe
centerof E, n kbethenumberofpointsincluster k .
The ratiowas higherwhen clusters are dense andwell sepa-rated.Ifthisscorewashigherthanathresholddetermined by in-nercrossvalidation,thepartitioningwasperformed.Whenan el-ementaryfoldwassplitintwo,eachofthetwoclustersobtained werealsochallengedwiththesamemanipulation,untilallthe el-ementaryfoldshadaCalinski-Harabaszindexbelowthethreshold.
3.4. Performance evaluation of labeling models
As inPerrot et al. (2011) ,twomeasureswere usedtocompare thedifferentmodelsproposed above: E localatthesulcusscaleand E SIatthesubjectscale.Errorrateswereassessedby10-foldscross
validation.Onemodelwastrainedperhemisphere.
3.4.1. Mean/max error rates
Totakeintoaccountthevariabilityofthefragmentationinto el-ementaryfoldsandthereforetherobustnessofthelabeling meth-odstothisvariability,eachimagewasre-segmentedtentimes(See Fig.20insupplementarymaterial).Thus,iftheimagebelongedto thetraining set,only the segmentation used formanual labeling wasconsidered. However, if the image belonged to the test set, tenother segmentations(whose truelabelshavebeentransferred frommanualsegmentation)werelabeledandusedtoquantifythe
errorrates.Notethatmanualsegmentationwasnotusedto calcu-late errorrates.Using tendifferentsegmentationsforeach sulcus highlights the weaknesses ofthe BrainVISA/Morphologist prepro-cessingsince we can compute errorsfromthe worstresult, typi-callyassociatedtoanissueofunder-segmentation.
Toquantifyerrors,foreach newsegmentation,themanual la-belingontheinitialsegmentationmustbe transferredtothenew one.Because ofthevariabilityofthesegmentationsobtainedand the sparsityofthefold skeleton, thesimple superpositionof im-ages was insufficient. We have given to skeleton voxels that do not overlap with those of the initial segmentation, the label of thenearest skeleton voxelofthe initialsegmentation. Todo this, aVoronoidiagramofthemanuallabelingisperformed.Notethat the elementary folds were not used to transfer the labeling and thatthetruelabelingwasonthevoxelscale.
Foreachsubject,fromthetensegmentations,theaverageofthe errors(E mean
SI and E localmean)andthemaximumerror(E max
SI and E localmax)
werecalculated.Notethatthetrainingsegmentationusedfor man-uallabelingwasnotusedintheerrorcalculationbecauseitwould bias ourevaluation. By considering themaximum errorrates, la-belingerrors dueto modelvariabilitywerehighlighted. These er-rorsinmostmodelswererelatedtoan incorrectfragmentationof the fold skeleton into elementary folds. Only the PMAS labeling modelwasnotdeterministicandincludesstochasticoptimizations thatcanpenalizethecalculationofmaximumerrorrates.
3.4.2. Error at the sulcus scale: Elocal
Givenasulcus l ,
Elocal
(
l)
=FPl+FNl
FPl+FNl+TPl
(8)
with TP l, FP l and FN l,respectivelythenumberoftruepositive,false
positiveandfalsenegativevoxelsforthesulcus l .
It isimportant tonote that the errorrate wasone,when the sulcus wasabsent and labeled by themodel. Similarly, forwhen thesulcuswaspresentbutnotlabeledbythemodel.Assmallsulci arefrequentlyabsent,thisexplainedwhyerrorratescanbehighly variablewhenaveragingtheerrorratespersubject.
3.4.3. Error at the subject scale: ESI
Givenasetofsulci L ,
ESI= l∈L wl∗ FPl+FNl FPl+FNl+2∗ TPl (9)
with w l=s l/ s l and s l= FN l+T P l, thesulcus l truesize. The erroratthe subjectscale allows localerrors to be gener-atedinasinglemeasurement.AsexplainedinPerrot et al. (2011) , eachcomponentofthesumoverlabelsdiffersontwopoints com-pared to E local(l ). First, true positive measures are counted twice
ascompared tothefalsepositiveandnegativemeasures,inorder to removeerrors sharedby severallabels,since each extrasulcal piecefora givenlabelisamissingpartforanotherlabel. Second, eachcomponentwasweightedaccordingtothesulcustruesizeso thateachlocalcomponentcountasmuchasitssize.
ComparedtoPerrot et al. (2011) ,threelabelswerenotincluded in the set of sulci (unknown and both ventricles). These labels were not particularly considered assulcus labels,but correspond to other structures, not pertinent to our study. Thus, the scores presentedherefortheSPAM methodareworsethanpresentedin
Perrot et al. (2011) for four reasons.First, because removing the two labelsconsiderablyimproved thescores. Second,because we cut theelementary foldsduringmanual labeling whilethe SPAM modelcannotautomaticallycorrectthiskindofsub-segmentation errors.Third,becauseweareinterestedinthemean/maxofthe er-rorrates.Finally,becausetheerrorratesareestimatedby10-folds
Fig. 13. Comparison of E SI error rates by model. Once the 10 segmentations have been labeled by hemisphere, we consider the average error in the upper chart and the
maximum error in the lower chart. The box extends from the lower to upper quartile error values, with a line at the median. The whiskers extend from the box to show the minimum and maximum limits of the error rates. The SPAM model is represented in red, the PMAS model in blue, the HPMAS model in green, the PCNN model in yellow and the UNET model in purple. For the four new models, three modalities are represented: first, labeling at the voxel scale, then labeling after regularization at the elementary fold scale ( + reg.), and finally the labeling obtained after automatic re-division of the elementary folds ( + reg. + cut.). The models are compared by Wilcoxon signed-rank test. The p -values of the differences in model performances are written above and below the compared models. The p -value is written in black if it is less than 0.05 and in red otherwise. Regularization by elementary folds significantly improves results. Automatic fold re-division also significantly improves results. All regularized models are significantly better than the SPAM model. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
cross-validation andnot by leave-one-out cross-validation. More-over, theadditionofthefournewsulci labelsandourrefined la-belingofthetrainingdatasetmayalsohaveimpactedtheresults.
3.4.4. Error rate comparison
During the 10-folds cross validation, each fold contained ap-proximately 6 hemispheres labeled to test the model’s perfor-mance. Error rates are calculated by hemisphere and then aver-aged overthe entiredatabasetoobtain themean errorratesper model. When not specified, the average error rate includes the right and left hemispheres. In order to compare the models in pairs,aWilcoxonsigned-ranktestwasperformedbetweenthe
er-rorratelistsforeachhemisphere.Ifthe p -valuewaslessthan0.05, theerrorrateswereconsideredsignificantlydifferent.
4. Results
4.1. Which is the best model?
Inordertocomparethefivemodelspresentedabove,wewere interestedinthe E mean
SI and E maxSI foreachmodel,trainedseparately
on each hemisphere (Fig. 13 ). Please refer to the supplementary materials for the numerical values of the error rates per hemi-sphere(Table1).
First, we observed that all of the new approaches proposed withregularizationper elementaryfolds weresignificantly better thantheSPAMapproach(alsobasedonthisregularization),which suggeststhat a modelbasedonan average templatewasnot the mostappropriatetorepresentthehighvariabilityofcorticalfolds. Second,withregardstothefourproposedmethods, regulariza-tionbyelementaryfoldsofthelabelscoremapssignificantly im-proved theresults compared to voxellabeling. Mostimportantly, the automatic re-division of these elementary folds also signifi-cantlyimprovedthefourmethods. Thus,theuseoftop-down re-finementofbottom-upregularizationisparticularlyrelevantinthis paper.
Third,bycomparingthenewmodelsinpairs,themodelsseem todemonstrateequivalentperformance.
Concerning the PCNN and UNET models, this paper conse-quentlydemonstratedtheincredibleefficiencyofneuralnetworks, evenfortherecognitionofstructures asvariableascorticalfolds. However,itissurprisingthattheUNETmodelwasnotbetterthan thePCNNmodelduetoitsdeeperarchitecture.
The fact that thesefour models donot stand out radicallyon thisdatasetsuggeststhatthesemodelsmayhavereachedthelimit ofwhatcanbeinterpretedfromthisdatabase,probablyduetoits insufficientsize to representthehigh variabilityofcortical folds. Therefore, the fold variability is such that manual labeling of a brainraisesmanyquestionsanditmaybepossiblethatthemodels havereachedthe human-levelperformances.Unfortunately, since manual labeling isbased on consensus among severalexperts,it is impossible for us to assess human-level performance on this database.
Finally,withregardstothecomputationtimerequiredtolabel ahemisphere,theSPAMmodeltakesabout5min,whiletheUNET modeltakesabout20s,PCNNtakesslightlymorethan aminute, PMASand HPMAStake several hours. Althoughthe PMAS model couldbemuchfasterbyoptimizingthecodesasin(Giraud et al., 2016 ),the UNETmodeliscurrentlyby farthefastest. Thus,since theUNETmodelhasthelowest errorratesandisthefastest, we proposetostudyinmoredetailthedifferencesbetweenthismodel andtheSPAM model in thefollowing section. Inthe restof this study,theUNETmodelwillthereforerefertothemodelwith regu-larizationusingelementaryfoldsandautomaticredivisionofthese, ifnecessary.
4.2. Which sulci are better recognized?
Concerning E mean local and E
max
local,theSPAM modelhasaverage/max
error rates from 5% to 77% while the error rates of the UNET model vary between 2% and 68%. Comparing the E max
local of each
sulcus (Fig. 14 ), we can see that the difference between the er-rorrates ofboth modelfora givensulcus reachesup to25%. Fi-nally,almostall sulciwere betterrecognizedby theUNETmodel, onlyabout twenty sulci are less well recognized. Their compari-son with the Wilcoxon signed-rank test, by controlling the false discoveryratewiththehelpoftheBenjamini-Hochbergprocedure (Benjamini and Hochberg, 1995 ), showedthat around13%ofsulci were significantly better recognized by UNET than SPAM, while noneweresignificantlylesswellrecognized.Inthefigure,wecan alsoseethat thesulci withthehighestlabelingerrorrates using theUNETmodelarealsothesmallest.Thisisprobablyduetothe factthatsmallsulciaregenerallyalsothemostvariableandwere alreadylesswellrecognizedbytheSPAMmodel.Pleaserefertothe supplementarymaterialforexact valuesofsulcus errorrates (Ta-bles2and3).Inordertovisualizethelocation ofthesulcibetter recognizedthanbefore,Figs. 15 and16 givegraphicalcomparisons ofsulcuserrorratesbetweenSPAMandUNETlabeling.InFig. 16 ,it canbeseenthatthedifferencesinperformancebetweentheSPAM andUNETmodelsarenotspatiallyuniform.Thismaybeduetothe
factthat someregions havemorevariablefold patternsthan oth-ersandtherecognitionoftheir sulciwasmoreseverelypenalized by theuseofamono-template approach.Wealsonotedthat the sulcibestrecognizedbytheUNETmodelarealsothosethatwere mostimpactedbysub-segmentationerrorsinelementaryfolds.
In thenext section, we focus onthe impact ofthe significant improvementincentralsulcusrecognition,inwhichtheE max
localvalue
hasgonefromabout8%usingtheSPAMmodeltoonly3%withthe bestUNETmodel.
4.3. Experiment on an external database demonstrating the clinical advantage
Here, the SPAM model and the UNET model were trained on theentiremanuallylabeleddatabase. Thehyperparametersofthe UNET model were estimated over the entiredatabase, using the sameproceduresasduringinner cross-validation,i.e.by perform-ing a3-foldscross-validationto selectthehyperparametervalues thatminimizeerrorrates.ThedatabaseusedbySun et al. (2012) to studytheeffectofhandednessontheshapeofthecentral sulcus waslabeledmanuallyandautomaticallybythesetwomodels.This databasecontains23consistentageandsexmatchednatural dex-trals (mean age34, range22-59 years; 17males, 6 females)and 18 similarnatural sinistrals (meanage 36,range 25-56years; 12 males,6females).ThedatabaseusedinSun et al. (2012) also con-tainsagroupof34forceddextralsthatisnotstudiedhere.
We propose to investigatethe asymmetry index I of the cen-tralsulcuslengthalongthebrainhullbetweentheleft l S.C._le f tand
righthemispheres l S.C._right:
I=lS.C._le f t− lS.C._right
lS.C._le f t+lS.C._right
(10)
Notethatinthenomenclatureproposedinthispaper,twosulci labelsbelongtothecentralsulcus:“S.C.” and“S.C._sylvian.”. There-fore,thelengthsofthesetwo“sub-sulci” areaddedtogetherto ob-tain l S.C..
Withmanual labeling,thereisasignificantdifferencebetween left-handed and right-handed people (Fig. 17 ). Therefore, left-handedpeoplehaveonaveragealongercentralsulcusintheright hemispherethan intheleft,andviceversaforright-handed peo-ple.However,when focusingontheasymmetry indexwithSPAM labeling, nosignificant difference was found,whereas this differ-encewassignificantwithUNETlabeling.
Consideringtheworstlabelingerrors(SeeFig.21in supplemen-tarymaterial)ofeachmodel,weobservethattheSPAMmodelcan doublethesizeofthecentralsulcus,bylabelingcompletely unre-latedlarge structures. However,the UNETmodelonly addssmall fragments.
5. Discussion
5.1. PMAS
Considering the hyperparameters selected during the inner crossvalidation (SeeFig.22 insupplementary material),it seems that this method would benefit from increasing the number of ANNsselectedby voxel.Indeed,thenumberofANNsis automat-ically set to 10,which is the upperlimit of the values proposed intheinner cross-validation.However, testingalargernumberof ANNswouldrequireoptimizationofthecodescurrentlyinuseand it is very likely that the model wouldnot gain much in perfor-mance.Indeed,theevolutionofthescores accordingtothe num-berofANNssuggeststhataplateauisreachedandthatincreasing thishyperparameterwouldhavelittleinfluenceontherankingof themethodsobtained.
Fig. 14. E max
local per sulcus. The graph on the left and the graph on the right present E maxlocal for the sulci on the left hemisphere and on the right hemisphere, respectively. The
SPAM model is represented in blue and the UNET ( + reg. + cut.) is represented in pink. The significant differences ( pvalue < 0.05) are marked with a star. The star is black when the difference is still significant after controlling the false discovery rate through the Benjamini-Hochberg procedure ( Benjamini and Hochberg, 1995 ). Sulci are sorted from top to bottom, from the smallest to the largest. The average sulci sizes, ranging from about 15 mm 3 to more than 20 0 0 mm 3 on average per subject, are represented
on the black graph.
5.2. HPMAS
With regard to HPMAS, the choice to usesulci pairs to form patches was questionable, since there was no evidence suggest-ingthattwosulciaresufficienttopreventspurioushits,especially whentwosmallsulciareassociated.Inordertocreate distinguish-able local shapes, patches containing three ormore sulci should also be considered. However, it would be too expensive to take into account all combinations of three neighboringsulci, asit is done forpairs of sulci. To remedy this, criteriafor selecting
rel-evantpatch typesshouldbe determined, butnoneofthecriteria wetestedimprovestheresultssufficientlytobeconsideredhere.
5.3. PCNN and UNET
Comparedto the approach proposed by Ciresan et al. (2012) , the PCNN approach has a major difference. In (Ciresan et al., 2012 ), several patch sizes, processed by several neural networks inparallel,were usedtolabeleach pixel,yetourPCNNapproach is based on only one patch size. Moreover, the neural network
Fig. 15. E local error rate per sulcus for SPAM and UNET models. The UNET model corresponds to the one after re-division of the elementary folds. Once the 10 segmentations
have been labeled by hemisphere, we consider the average error per sulcus in the left column and the maximum error in the right column. The external and internal sides are represented for each of the right and left hemispheres.
Fig. 16. Comparison of E max
local error rates between the SPAM model and the UNET model. The left column represents the difference between the E maxlocal of the SPAM model and
of the UNET model. The right column shows the p -value of the Wilcoxon test between each model. Note that the scale of the color palette used to represent p -values is logarithmic. In order to visualize the sulci significantly better recognized, the threshold 0.05 is indicated and the threshold at the star corresponds to the first sulci considered significantly better by controlling the false discovery rate through the Benjamini-Hochberg procedure ( Benjamini and Hochberg, 1995 ).
usedfor PCNNis not deep (only one hiddenlayer) compared to (Ciresan et al., 2012 ). However, aftertrying to make thenetwork architecture more complex by increasing the number of hidden layers or using multiple patch sizes, we did not observe signif-icant improvements in the results. It is imperative to note that thePCNNmodel achievesperformances comparableto the UNET model while the U-Net architecture is much deeper and previ-ous studies show that it is supposed to achieve better results (Ronneberger et al., 2015 ).
5.4. Unknown label
Inthispaper,exceptfortheHPMASmodel,the“unknown” label inthemanuallylabeleddatabaseistreatedliketheothersulci la-bels.However,althoughthe“unknown” labelrepresentsabout0.5% oftheskeletonvoxelsofmanuallabeling,thisproportionisnullif weconsider thelabelsoftheHPMASmodel.Moreover,thePMAS and PCNN models label around 0.02% of voxels as “unknown” and the SPAM and UNET models 0.04%. These figures show that
Fig. 17. Comparison of the asymmetry index I between right-handed and left-handed people. The left/middle/right graphs respectively show the results obtained with manual/SPAM/UNET labeling. The index for right-handed people is represented in blue and the one for left-handed people in green. The p -values of the T -test for the means of these two independent samples of scores are indicated on the graphs. With manual labeling, there is a significant difference: in left-handed people, the central sulcus is longer in the right hemisphere than in the left, while this is the opposite in right-handed people. The same significant difference is observed with the UNET model labeling but not with the SPAM model labeling. The box extends from the lower to upper quartile index I values, with a line at the median. The whiskers extend from the box to show the minimum and maximum values.
treatingthe“unknown” labelasotherlabelsisinsufficient.Models should also assign the “unknown” label tostructures whereit is notsufficientlyconfident.
However, since all the new methods were compared to the SPAM model,whichtreatedtheunknown labelassulcilabels,we chosenottoaddressthispointinthispaper.
6. Conclusion
To summarize, thenewmethods presented inthis paper out-perform the current SPAM model provided by the Morphologist toolboxofBrainVISA.ComparedtotheSPAMmodel,thebest mod-els have a 4% higher recognition rate and 15% of sulci are sig-nificantly better recognized.By automatically re-dividingthe ele-mentary folds, the new models are considerably more robust to under-segmentation errors.Inpractice,theseimprovementsmake it possibletoreproduce findings thatwere previously only possi-blewithmanual labeling.The UNETmodelwillsoonbe available intheBrainVISA/Morphologisttoolbox.
Inthispaper,theapplicationofmethodsbasedonMASorCNNs give approximatelythesameresultsfortheautomaticrecognition of cortical sulci. However, although CNN-based methods have a particularly long training process compared to MAS-based meth-ods, which are significantly faster. Therefore, CNNs-based meth-ods are farmoreproductive inpractice.The UNETmethodlabels a braininonly twentyseconds,whereas theSPAM methodtakes about ten minutes. It is interesting to note that patch MAS ap-proaches arealsobeginningtointegrate deeplearningtechniques (Manjón et al., 2018 ), probably due to their ability to effectively summarizethedataandfortheirrapidityofexecution.
Furthermore, the top-down refinement of bottom-up regu-larization significantly improves the results. Indeed, voxel-wise labeling is used to give a top-down perspective to a traditional bottom-up pattern recognition process that agglomerates the voxels into elementary folds: these folds can therefore be auto-maticallyre-dividedwhennecessary.Thus,thelabelingisrobustto under-segmentation errors, unlike theSPAM method,which does not provide voxel-wise labeling. Note that despite the definition of elementary folds specific to the problem posedhere, defining a coherent geometric entity isa legitimate concern addressedin many segmentation problems,for example by using super-pixels (Giraud et al., 2017; Soltaninejad et al., 2017 )that groupthemost similarconnectedpixelstogethersothattheyhavethesamelabel. Inordertoimprovethecurrentperformanceofthemodel, sev-eraloptionsremaintobe considered.Second,theinputscurrently
containthefoldskeletoninordertonormalizethedatafor acquisi-tionandagebiases.However,theinputcanbeenrichedbytaking intoaccount grey/whitematter segmentation ordirectly the nor-malizedMRI.Forinstance,wecouldconsiderintegratingthisdata intonewinput channelsforCNN-basedapproaches.Finally,in or-der to take advantage of the large unlabeleddatabases currently available,asemi-supervisedstrategy wouldbeparticularly attrac-tivetobetterrepresentthevariabilityofthecorticalfolds.
Inthe nearfuture, consideringthat the labeling modelseems sufficientlyreliabletous,wewouldliketoreconsiderthenumber ofsulci used inthe nomenclature onthe basis ofthe sulci most oftenconfusedbythemodel.Indeed,theerrorratesofsomesmall sulciarestill toohightobe usedinmorphologicalstudies.By al-lowingtheuser tochoose thelevel ofgranularityof the nomen-clature, he will be able to use sufficiently stablelabeling of the structuresofinteresttohim.
DeclarationofCompetingInterest
Theauthorsdeclarethattheyhavenoknowncompeting finan-cialinterestsorpersonalrelationshipsthatcouldhaveappearedto influencetheworkreportedinthispaper.
Acknowledgments
This project has received funding from the European Union’s
Horizon 2020 Research and Innovation Programme under Grant Agreement No. 785907 (HBP SGA2), No. 720270 (HBP SGA1) and No. 604102 (HBP’s ramp-up phase), and from the FR-MDIC20161236445.
Supplementarymaterial
Supplementary material associated with this article can be found,intheonlineversion,atdoi:10.1016/j.media.2020.101651 .
References
Auzias, G. , Colliot, O. , Glaunes, J.A. , Perrot, M. , Mangin, J.-F. , Trouve, A. , Baillet, S. , 2011. Diffeomorphic brain registration under exhaustive sulcal constraints. IEEE Trans. Med. Imaging 30 (6), 1214–1227 .
Auzias, G. , Lefevre, J. , Le Troter, A. , Fischer, C. , Perrot, M. , Régis, J. , Coulon, O. , 2013. Model-driven harmonic parameterization of the cortical surface: hip-hop. IEEE Trans. Med. Imaging 32 (5), 873–887 .
Barnes, C. , Shechtman, E. , Finkelstein, A. , Goldman, D.B. , 2009. Patchmatch: a ran- domized correspondence algorithm for structural image editing. In: ACM Trans- actions on Graphics (ToG), vol. 28. ACM, p. 24 .