model architecture for classiﬁcation of histology sections When machine vision meets histology: A comparative evaluation of Medical Image Analysis

(1)

ContentslistsavailableatScienceDirect

Medical Image Analysis

journalhomepage:www.elsevier.com/locate/media

When machine vision meets histology: A comparative evaluation of model architecture for classiﬁcation of histology sections ^R

Cheng Zhong

^a

, Ju Han

^a

, Alexander Borowsky

^c

, Bahram Parvin

^b

, Yunfu Wang

^a^,^d^,^∗

, Hang Chang

^a^,^∗

aLawrence Berkeley National Laboratory, Berkeley CA USA

bDepartment of Electrical and Biomedical Engineering, University of Nevada, Reno, NV USA

cCenter for Comparative Medicine, University of California, Davis,CA, USA

dDepartment of Neurology, Taihe Hospital, Hubei University of Medicine, Shiyan, Hubei, China

a rt i c l e i nf o

Article history:

Received 25 February 2016 Revised 12 August 2016 Accepted 26 August 2016 Available online 9 September 2016 Keywords:

Computational histopathology Classiﬁcation

Unsupervised feature learning Sparse feature encoder

a b s t ra c t

Classification ofhistologysections inlarge cohorts,intermsofdistinct regionsofmicroanatomy(e.g., stromal)andhistopathology(e.g.,tumor,necrosis),enablesthequantificationoftumorcomposition,and theconstructionofpredictivemodelsofgenomicsandclinicaloutcome.Totacklethelargetechnicalvari- ationsandbiologicalheterogeneities,whichareintrinsicinlargecohorts,emergingsystemsutilizeeither priorknowledgefrompathologistsorunsupervisedfeaturelearningforinvariant representationofthe underlyingpropertiesinthedata.However,toalargedegree,thearchitecturefortissuehistologyclassi- ficationremainsunexploredandrequiresurgentsystematicalinvestigation.Thispaperisthefirstattempt toprovideinsightsintothreefundamentalquestionsintissuehistologyclassification:I.Isunsupervised featurelearningpreferable tohumanengineeredfeatures?II.Does cellularsaliencyhelp?III.Doesthe sparsefeatureencodercontributetorecognition?Weshowthat(a)inI,bothCellularMorphometricFea- ture andfeaturesfromunsupervisedfeaturelearningleadtosuperiorperformance whencomparedto SIFT and[Color,Texture];(b)inII,cellularsaliencyincorporationimpairstheperformance forsystems builtuponpixel-/patch-levelfeatures;and(c)inIII,theeffectofthesparsefeatureencoderiscorrelated withtherobustnessoffeatures,andtheperformancecanbeconsistentlyimprovedbythemulti-stageex- tensionofsystemsbuiltuponbothCellularMorphmetricFeatureandfeaturesfromunsupervisedfeature learning. TheseinsightsarevalidatedwithtwocohortsofGlioblastomaMultiforme(GBM)and Kidney ClearCellCarcinoma(KIRC).

1. Introduction

Although molecular characterization of tumors through gene expressionanalysishasbecomeastandardizedtechnique,bulktu- morgene expression data provide only an average genome-wide measurementforabiopsyandfailtorevealinherentcellularcom- positionandheterogeneity ofatumor.Ontheother hand,histol- ogysectionsprovidewealthofinformationaboutthetissuearchi- tecturethat containsmultiple celltypesatdifferentstatesofcell cycles.Thesesectionsareoftenstainedwithhematoxylinandeosin (H&E)stains,which labelDNA(e.g.,nuclei) andproteincontents, respectively,invariousshadesofcolor.Furthermore,morphomet- ricabberationsintumorarchitectureoftenleadtodiseaseprogres-

R Related resources have been released for public consumption at http://bmihub.org .

∗ Corresponding authors.

E-mail address: [email protected] (H. Chang).

sion,anditisthereforedesirabletoquantifytumorarchitectureas well asthe corresponding morphometricabberationsin largeco- hortsfortheconstructionofpredictivemodelsofendpoints,e.g., clinicaloutcome,whichhavethepotentialforimproveddiagnosis andtherapy.

Despitetheeffortsbysome researchersonreducinginter-and intra-pathologist variations (Dalton et al., 2000) during manual analysis,thisapproachisnotascalablesolution,andthereforeim- pedes theeffective representationandrecognition fromlargeco- horts for scientific discoveries. With its value resting on captur- ingdetailedmorphometricsignaturesandorganization,automatic quantitative analysis of a large collection of histological data is highly desirable, and is unfortunately impaired by a number of barriersmostlyoriginatingfromthetechnicalvariations(e.g.,fix- ation, staining) and biological heterogeneities (e.g.,cell type, cell state)always presentedinthedata.Specifically,ahistological tis- suesection refers toan image ofa thinsliceoftissueapplied to amicroscopicslideandscannedfromalightmicroscope,andthe http://dx.doi.org/10.1016/j.media.2016.08.010

(2)

technical variationsand biological heterogeneitieslead tosigniﬁ- cantcolorvariationsbothwithinandacrosstissuesections.Forex- ample,withinthesametissuesection,nuclearsignal(color)varies from light blue to darkblue due to the variationsof their chro- matincontent;andnuclearintensityinonetissuesectionmaybe very close to the backgroundintensity (e.g., cytoplasmic, macro- molecularcomponents)inanothertissuesection.

It is also worth to mentionthat alternativestaining (e.g., ﬂu- orescence)andmicroscopymethods(multi-spectralimaging)have beenproposedandstudiedinordertoovercomethefundamental limitations/challengesintissuehistology(Stacketal.,2014;Leven- son etal., 2015;Rimm,2014;Huang etal.,2013;Ghaznavi etal., 2013);however,H&Estainedtissuesectionsarestillthegoldstan- dardfortheassessmentoftissueneoplasm. Furthermore,theeﬃ- cientandeffectiverepresentationandinterpretationofH&Etissue histology sections in large cohorts (e.g., The Cancer Genome At- lasdataset)havethepotential toprovidepredictivemodelsofge- nomicsandclinicaloutcome,andarethereforeurgentlyrequired.

Althoughmany techniqueshavebeendesignedanddeveloped for tissue histology classification (see Section 2), the architecture fortissue histologyclassification remains largely unexplored and requires urgent systematical investigation. To fulfil thisgoal, our paper provides insights to three fundamental questions in tissue histology classification: I. Is unsupervised feature learning preferable to human engineered features? II. Does cellular prior knowledge help? III. Does the sparse feature encoder contribute to recognition? The novelty of our work resides in three folds:

(i) architecture design: we have systematically experimented the system architecture with various combinations of feature types, feature extraction strategies and intermediate layers based on sparsity/locality-constrained feature encoders, which ensures the extensive evaluation and detailed insights on impact of the key componentsduringthearchitectureconstruction;(ii)experimental design: ourexperimental evaluationhas beenperformedthrough cross-validationon twoindependent datasetswithdistinct tumor types, where both datasets have been curated by our patholo- gisttoprovideexamplesofdistinctregionsofmicroanatomy(e.g., stromal) andhistopathology (e.g., tumor,necrosis) withsufficient amount of technical variations and biological heterogeneities, so thatthearchitecturecanbe faithfullytestedandvalidatedagainst importanttopicsinhistopathology(seeSection4fordetails).More importantly, such an experimental design (combination of cross- validation andvalidationonindependent datasests),tothe maxi- mumextent,ensurestheconsistencyandunbiasednessofourfind- ings;and(iii)outcome: themajoroutcome ofourwork arewell- justified insights in the architecture design/construction. Specifi- cally,wesuggestthatthesparsefeatureencodersbasedonCellu- larMorphometricFeature andfeatures fromunsupervisedfeature learningprovidethebestconfigurationsfortissuehistologyclassi- fication.Furthermore,theseinsightsalsoledtotheconstructionof a highlyscalable andeffective system(CMF-PredictiveSFE-KSPM, seeSection4fordetails)fortissuehistologyclassification.Finally, webelievethatourworkwillnotonlybenefittheresearchincom- putationalhistopathology,butwill alsobenefitthecommunityof medicalimageanalysisatlargebysheddinglightsonthesystem- aticalstudyofotherimportanttopics.

Organization ofthispaperis asfollows:Section 2reviewsre- latedworks. Section3 describesvarious componentsfor thesys- temarchitectureduringevaluation.Section4elaboratesthedetails ofourexperimentalsetup,followedbyadetaileddiscussiononthe experimentalresults.Lastly,Section5concludesthepaper.

2. Relatedwork

Current work on histology section analysis is typically foru- mulatedandperformedatmultiple scalesforvarious endpoints,

andseveraloutstandingreviewscanbefoundinDemirandYener (2009); Gurcan et al. (2009). From our perspective, the trends are: (i) nuclear segmentation and organization for tumor grad- ingand/orthepredictionoftumorrecurrence(Basavanhallyetal., 2009; Doyle et al., 2011). (ii) patch level analysis(e.g., small regions)(Bhagavatulaetal.,2010;Kongetal.,2010),usingcolorand texturefeatures, fortumor representation.and(iii)detectionand representationof theauto-immune responseasa prognostic tool forcancer(Fatakdawalaetal.,2010).

Whileourfocusisontheclassificationofhistologysectionsin large cohorts, in terms of distinct regions of microanatomy (e.g., stromal)andhistopathology(e.g.,tumor,necrosis),themajorchal- lengeresidesinthelarge amountsoftechnical variationsandbi- ological heterogeneities in the data (Kothari et al., 2012), which typicallyleadstotechniques thatare tumortype specific oreven laboratoryspecific.Themajoreffortsaddressingthisissuefallinto twodistinctcategories:(i) fine-tuninghumanengineeredfeatures (Bhagavatula et al., 2010; Kong et al., 2010; Kothari etal., 2012;

Chang etal., 2013a);and(ii) applyingautomatic feature learning (Huang et al., 2011; Chang et al., 2013c) for robust representation.Specifically,theauthorsinBhagavatulaetal.(2010)designed multi-scale image features to mimic the visual cues that experts utilizedforthe automatic identification anddelineation ofgerm- layercomponents inH&Estained tissue histologysectionsofter- atomas derived from human and nonhuman primate embryonic stem cells;the authors inKong et al. (2010) integrated multiple texturefeatures (e.g., waveletfeatures) into a texture-basedcon- tentretrievalframeworkfortheidentificationoftissueregionsthat informdiagnosis; the work in Kothari et al. (2012) utilized various features (e.g., color, texture and shape) for the study of visual morphometric patterns across tissue histology sections; and thework inChangetal.(2013a)constructedthecellularmorpho- metric context based on various cellular morphometric features foreffectiverepresentationandclassificationofdistinct regionsof microanatomyandhistopathology.Althoughmanysuccessful systems havebeen designed anddeveloped, based on human engineeredfeatures,forvarioustasksincomputationalhistopathology, the generality/applicability of such systems to different tasks or to different cohorts can sometimes be limited, as a result, sys- temsbasedonunsupervisedfeaturelearninghavebeenbuiltwith demonstratedadvantagesespeciallyforthestudyoflargecohorts, amongwhich,boththeauthors inHuangetal.(2011)andChang etal. (2013c) utilized sparse coding techniques forunsupervised charactorizationoftissuemorphometricpatterns.

Furthermore, tissue histology classification can be considered asaspecific applicationofimagecategorization inthecontext of computer vision research, where spatial pyramid matching(SPM) (Lazebniketal.,2006)hasclearlybecomethemajorcomponentof thestate-of-artsystems(Everinghametal.,2012)foritseffective- ness in practice. Meanwhile, sparsity/locality-constrained feature encoders,throughdictionarylearning,havealsobeenwidelystud- ied,and theimprovement inclassification performance has been confirmedinvarious applications (Yang etal., 2009; Wangetal., 2010;Changetal.,2013a).

The evolution of our research on the classiﬁcation of histology sections contains several stages: (i) kernel-based classiﬁca- tionbuilt-uponhumanengineeredfeature(e.g.,SIFTfeatures)(Han etal., 2011); (ii) independent subspaceanalysis forunsupervised discovery of morphometric signatures without the constraint of beingableto reconstructtheoriginal signal (Le etal., 2012);(iii) singlelayerpredictivesparsedecompositionforunsuperviseddis- covery of morphometric signatures with the constraint of being able to reconstruct the original signal (Nayak et al., 2013); (iv) combination of either prior knowledge (Chang et al., 2013a) or predictivesparsedecomposition(Changetal., 2013c)withspatial pyramid matching; and(v) morerecently, stacking multiple pre-

(3)

Table 1

Annotation of abbreviations in the paper, where FE stands for feature extraction; SFE stands for sparse feature encoding; and SPM stands for spatial pyramid matching. Here, we also provide the dimension information about (i) original features (outcome of FE); (ii) sparse codes (outcome of SFE); (iii) final representation (outcome of final spatial pooling, i.e., SPM); and (iv) final prediction (outcome of architectures as a one-dimensional class label.).

Category Abbreviation Description Dimension

FE CMF Cellular Morphometric Feature 15

DSIFT Dense SIFT 128

SSIFT Salient SIFT 128

DCT Dense [Color,Texture] 203

SCT Salient [Color,Texture] 203

DPSD Dense PSD 1024

SPSD Salient PSD 1024

SFE SC Sparse Coding 1024

GSC Graph Regularized Sparse Coding 1024

LLC Locality-Constraint Linear Coding 1024 LCDL Locality-Constraint Dictionary Learning 1024

SPM KSPM Kernal SPM (256, 512, 1024)

LSPM Linear SPM (256, 512, 1024)

Architecture FE-KSPM Architectures without the sparse feature encoder 1 FE-SFE-LSPM Architectures with the sparse feature encoder 1

Table 2

Annotation of important terms used in this paper.

Term Description

Human Engineered Features Refers to features that are pre-determined by human experts, with manually ﬁxed ﬁlters/kernels/templates during extraction.

Cellular Prior Knowledge Refers to the morphometric information, in terms of shape, intensity, etc., that are extracted from each individual cell/nucleus

Cellular Saliency Refers to perceptually salient regions

corresponding to cells/nulei in tissue histology sections.

Multi-Stage System Speciﬁcally refers to the architectures with multiple stacked feature extraction/abstratcion layers.

Single-Stage System Speciﬁcally refers to the architectures with a single feature extraction layer.

dictivesparse coding modules into deep hierarchy (Chang et al., 2013d). And this paperbuilds onour longstandingexpertise and experiencestoprovide(i)extensiveevaluationonthemodelarchi- tecturefortheclassiﬁcationofhistologysections;and(ii)insights onseveralfundamentalquestionsfortheclassiﬁcationofhistology sections,which,hopefully, willshed lightsontheanalysisofhis- tology sectionsinlarge cohorts towards the ultimategoalof im- provedtherapyandtreatment.

3. Modelarchitecture

Toensuretheextensiveevaluationanddetailedinsightsonim- pactofthekey components duringthearchitecture construction, wehavesystematically experimentedthemodelarchitecture with variouscombinationsoffeaturetypes,featureextractionstrategies andintermediatelayersbasedonsparsity/locality-constrainedfeatureencoders. Andthissection describeshowwebuiltthetissue classiﬁcationarchitectureforevaluation.Tables1and2summarize theaberrationsandimportantterms,respectively,anddetailedde- scriptionsarelistedinthesectionsasfollows,

3.1.Featureextractionmodules(FE)

The majorbarrierintissuehistologyclassiﬁcation,inlargeco- horts,stemsfromthelargetechnicalvariationsandbiologicalhet- erogeneities,whichrequiresthe featurerepresentationto capture theintrinsicpropertiesinthedata.Inthiswork,wehaveevaluated threedifferentfeaturesfromtwodifferentcategories(i.e.,human- engineeredfeatureandunsupervisedfeaturelearning).Detailsare asfollows,

Cellular Morphometric Feature - CMF: The cellular morphometric features are human-engineered biological meaningful cellular-level features, which are extracted based on segmented nuclearregions overthe inputimage. Ithasbeenrecentlyshown that tissue classificationsystems basedon CMF areinsensitive to segmentationstrategies(Changetal.,2013a).Inthiswork,weem- ploythe segmentation strategy proposed inChang et al.(2013b), andsimply usethe same setof features asdescribedin Table3. Itisworthtomentionthatalthoughgenericcellularfeatures,e.g., Zernikemonments(Apostolopoulosetal.,2011;Asadietal.,2006), havebeensuccessfullyappliedinvariousbiomedicalapplications, wechoosetouseCMFdueto(i)itsdemonstratedpowerintissue histology classification (Chang et al., 2013a); and(ii) the limited impactbyincludingthosegenericcellularfeatures onboth evaluation andunderstandingofthe benefitsintroduced bythe sparse featureencoders.

Dense SIFT - DSIFT: The dense SIFT features are human- engineered features, which are extracted from regularly-spaced patches overthe inputimage, withtheﬁxed patch-size(16 × 16 pixels)andstep-size(8pixels).

Salient SIFT - SSIFT: The salient SIFT features are human- engineeredfeatures,whichareextractedfrompatchescenteredat segmentednuclearcenters(Changetal.,2013b)overtheinputimage,withaﬁxedpatch-size(16×16pixels).

Dense [Color,Texture] - DCT: The dense [Color,Texture] features are human engineered features, and formed as a concatenation of texture and mean color with the ﬁxed patch-size (20

× 20 pixels) and step-size (20 pixels), where color features are extracted in the RGBcolor space, and texturefeatures (in terms ofmeanandvariation ofﬁlterresponses) areextractedvia steer-

(4)

Table 3

Cellular morphometric features, where the curvature values were computed with σ= 2 . 0 , and the nuclear background region is deﬁned to be the region outside the nuclear region, but inside the bounding box of nuclear boundary.

Feature Description

Nuclear Size #pixels of a segmented nucleus

Nuclear Voronoi Size #pixels of the voronoi region, where the segmented nucleus resides Aspect Ratio Aspect ratio of the segmented nucleus

Major Axis Length of Major axis of the segmented nucleus Minor Axis Length of Minor axis of the segmented nucleus

Rotation Angle between major axis and x axis of the segmented nucleus Bending Energy Mean squared curvature values along nuclear contour

STD Curvature Standard deviation of absolute curvature values along nuclear contour Abs Max Curvature Maximum absolute curvature values along nuclear contour

Mean Nuclear Intensity Mean intensity in nuclear region measured in gray scale

STD Nuclear Intensity Standard deviation of intensity in nuclear region measured in gray scale Mean Background Intensity Mean intensity of nuclear background measured in gray scale

STD Background Intensity Standard deviation of intensity of nuclear background measured in gray scale Mean Nuclear Gradient Mean gradient within nuclear region measured in gray scale

STD Nuclear Gradient Standard deviation of gradient within nuclear region measured in gray scale

Table 4

Properties of various features in evaluation. Note, all human-engineered features are pre-determined and dataset independent; while features from unsupervised feature learning are task/dataset-dependent, and are able to capture task/dataset- speciﬁc information, such as potentially meaningful morphometric patterns in tissue histology.

FE Design Target Biological Information

CMF Human-Engineered Cell (dataset independent) Cellular morphometric information SIFT Human-Engineered Generic (dataset independent) NA

CT Human-Engineered Color and texture patterns (dataset independent) NA

PSD Learned Generic (dataset dependent) Dataset dependent

able ﬁlters (Young andLesperance, 2001) with 8 directions (

θ

∈

{

⁰,^π₈,^π₄,³₈^π,¹₂^π,⁵₈^π,³₄^π,⁷₈^π

}

⁾^and⁵^scales⁽

σ

∈{1,2,3,4,5}) on thegrayscaleimage.

Salient [Color,Texture] - SCT: The salient [Color,Texture] features are human-engineered features, which are extracted on patches centered at segmented nuclear centers (Chang et al., 2013b)overtheinputimage,withaﬁxedpatch-size(20×20pix- els).

Dense PSD -DPSD: The unsupervised features arelearned by predictive sparse decomposition (PSD) on randomly sampledim- age patches following the protocol in Chang et al. (2013c), and thedensePSDfeaturesareextractedfromregularly-spacedpatches over theinput image, withthe ﬁxedpatch-size (20× 20 pixels), step-size(20pixels)andnumberofbasisfunctions(1024).Brieﬂy, givenX=[x₁,...,x_N]∈R^m^×^Nasasetofvectorizedimagepatches, weformulatedthePSDoptimizationproblemas:

Bmin,Z,W

^X⁻^BZ

²F+

λ

^Z

¹⁺

^Z⁻^WX

²F

s.t.

^bi

²2=1,

∀

ⁱ⁼¹^,^.^.^.^,^h ⁽¹⁾

where B=[b1,...,b_h]∈R^m^×h is aset ofthe basisfunctions; Z= [z₁,...,z_N]∈R^h^×^N is the sparse feature matrix; W∈R^h^×^m is the auto-encoder;

λ

^is^theeregularizationconstant.Thegoalofjointly minimizing Eq.(1) with respect to the triple < B, Z, W > is to enforce the inference ofthe regressorWX tobe resemble tothe optimalsparsecodesZthatcanreconstructXoverB(Kavukcuoglu etal.,2008).Inourimplementation,thenumberofbasisfunctions (B) isﬁxedto be1024,

λ

^was^ﬁxed^to^be ^0.3,empirically, forthe bestperformance.

Salient PSD - SPSD: The salient PSD features are extracted on patches centered at segmented nuclear centers (Chang et al., 2013b) over theinput image, withthe ﬁxed patch-size(20 × 20 pixels)andﬁxednumberofbasisfunctions(1024).

The properties of aforementioned features are summarized in Table4.Notethatsalientfeaturesarenot included,giventhefact thattheyonlydifferfromtheircorrespondingdenseversionswith

extrasaliencyinformation.Itisclearthat,differentfromSIFTand CT,whicharegenericfeaturesdesignedforgeneralpurposes,both CMFandPSDcanencodebiologicalmeaningfulinformation,where theformerworksinapre-determinedmannerwhilethelatterhas thepotentialtocapturebiologicalmeaningfulpatternsinanunsu- pervisedfashion.Therefore,within thecontext oftissuehistology classiﬁcation,CMFandPSDhavethepotential toworkbetterdue totheseintrinsicproperties,asshowninourevaluation.

3.2.Sparsefeatureencodingmodules(SFE)

It has been shown recently (Yang et al., 2009; Wang et al., 2010) that the impose ofthe feature encoder through dictionary learning,withsparsityorlocalityconstraint,significantlyimproves theefficacy ofexisting image classification systems.Therationale is thatthesparsefeature encoderfunctionsas an additionalfeature extraction/abstractionoperation,andthusaddsanextralayer(stage) to the feature extraction componentof the system. Therefore, itex- tendstheoriginalsystemwithmultiplefeatureextraction/abstraction stages, which is able to capture intrinsic patterns at the higher- level,as suggested in Jarrett et al. (2009). To study the impact of the sparse feature encoder on tissue histology classification, we adoptthreedifferentsparsity/locality-constrainedfeatureencoders forevaluation. Briefly,let Y=[y₁,...,y_M]∈Râ^×M be a setof features,C=[c₁,...,c_M]∈R^b^×^M be theset ofsparsecodes, andB= [b₁,...,b_b]∈Râ^×^bbeasetofbasisfunctionsforfeatureencoding, thefeatureencodersaresummarizedasfollows,

SparseCoding-(SC):

minB,C

M

i=1

||

^yi−Bc_i

||

²⁺

λ||

^ci

||

¹^; ^s.t.

||

^bi

||

^≤¹^,

∀

ⁱ ⁽²⁾

where||b_i||is aunit 2-normconstraintfor avoidingtrivialsolu- tions,and||c_i||1 isthe1-normenforcingthesparsityofc_i.Inour implementation,the numberof basis functions(B) is ﬁxed tobe 1024,

λ

^is^ﬁxed^to^be^0.15,empirically,forthebestperformance.

(5)

GraphRegularizedSparseCoding-(GSC)(Zhengetal.,2011) minB,C

M

i=1

||

^yi−Bc_i

||

²+

λ||

^ci

||

1+

α

^Tr

(

^CLC^T

)

; s.t.

||

^bi

||

≤1,

∀

ⁱ

(3) where||b_i|| isa unit 2-normconstraintforavoiding trivialsolu- tions,and||c_i||₁isthe1-normenforcingthesparsityofc_i,Tr(·)is thetraceofmatrix·,ListheLaplacianmatrix,andthethirdterm encodestheLaplacianregularizer(BelkinandNiyogi,2003).Please referto Zheng etal. (2011)fordetails ofthe formulation.In our implementation,the numberofbasis functions (B) isﬁxed to be 1024,theregularizationparameters,

λ

^and

α

âre^fixed^to^be¹ând

5,respectively,forthebestperformance.

Locality-ConstraintLinearCoding-(LLC)(Wangetal., 2010):

minB,C

M

i=1

||

^yi−Bc_i

||

²⁺

λ||

^dic_i

||

¹^; ^s.t. ¹^ci=1,

∀

ⁱ ⁽⁴⁾

wheredenotestheelement-wisemultiplication,andd_i∈R^ben- codesthesimilarityofeachbasisvectortotheinputdescriptory_i, Speciﬁcally,

di=exp

dist

(

^yi,B

)

σ

(5)

wheredist(^yi,B)=[dist(^yi,b₁),...,dist(^yi,b_b)^], dist(y_i, b_j) is the Euclidean distance between y_i and b_j,

σ

^is ^used ^to ^control ^the

weight decayspeed forthe locality adaptor. In our implementation, the number of basis functions (B) is ﬁxed to be 1024, the regularization parameters

λ

^and

σ

âre ^fixed ^to ^be ⁵⁰⁰ ând^100,

respectively,toachievethebestperformance.

Locality-Constraint Dictionary Learning - (LCDL) (Zhou and Barner,2013):TheLCDLoptimizationproblemisformulatedas:

minB,C

^Y⁻^BC

²F+

λ

^N

i=1

K

j=1

c²_ji

yi−bj

²

2

+

μ

^C

²F

s.t.

1^Tci=1

∀

ⁱ

(

∗

)

c_ji=0 ifb_j∈/

τ

(

^yi

) ∀

^i,^j

(

^∗∗

)

⁽⁶⁾

whereτ(y_i)isdeﬁnedasthe

τ

-neighborhoodcontaining

τ

^near-

estneighborsofy_i,and

λ

^,

μ

^are^positiveregularizationconstants.

μ

^C

²F isincluded fornumerical stability of theleast–squares so- lution.The sum-to-oneconstraint(∗) followsfrom thesymmetry requirement, while the locality constraint (∗∗) ensures that y_i is reconstructedbyatomsbelongingtoits

τ

-neighborhood,allowing c_i to characterize the intrinsic local geometry.In our implementation,the numberofbasis functions(B) is ﬁxedto be 1024,the regularizationparameters

λ

^and

μ

âre ^fixed ^to^be ^0.3ând^0.001,

respectively,andtheneighborhoodsize

τ

îs^fixed^to^be^5,êmpiri-

cally,toachievethebestperformance.

The major differences of aforementioned sparse feature en- codersresideintwofolds:

1.Objective:

(a) SC: Learning sets of over-complete bases for eﬃcient data representation,originallyappliedtomodelingthehumanvi- sualcortex;

(b) GSC:learningthesparserepresentationsthatexplicitlytake intoaccountthelocalmanifoldstructureofthedata;

(c) LLC:generatingdescriptorsforimageclassiﬁcationbyusing eﬃcientlocality-enforcingterm;

(d) LCDLlearningasetoflandmarkpointstopreservethelocal geometryofthenonlinearmanifold;

2. LocalityEnforcingStrategy:

(a) SC:None;

(b)GSC: using graph Laplacian to enforce the smoothness of sparserepresentationsalongthegeodesicsofthedataman- ifold;

(c)LLC: usinga localityadaptor whichpenalizesfar-waysam- ples with larger weights. During optimization, the basis functions are normalized after each iteration, which could causethe learnedbasis functionsdeviatefromthe original manifoldandthereforeloselocality-preservationproperty;

(d) LCDLderiving anupper-bound forreconstructingan intrinsic nonlinear manifoldwithout imposing any constraintof theenergyofbasisfunctions;

Itisclearthat SCisthemostgeneralapproachfordatarepresen- tation purpose.Although various locality-constrained sparse coding techniques have demonstrated success in many applications (Zhengetal.,2011;Wangetal.,2010;ZhouandBarner,2013),their distancemetricinEuclideanSpacehasimposedimplicithypothe- sis on themanifoldof thetarget feature space, which mightpo- tentiallyimpairtheperformance,asreﬂectedinourevaluation.

3.3. Spatialpyramidmatchingmodules(SPM)

Asanextension ofthetraditionalBagofFeatures (BoF)model, SPM has become a major component of state-of-art systems for image classification and object recognition (Everingham et al., 2012). Specifically, SPM consists of two steps: (i) vector quantization fortheconstruction of dictionaryfrominput; and(ii) histogram (i.e., histogram of dictionary elements derived in previ- ous step) concatenation from image subregions for spatial pooling. Mostrecently,theeffectiveness ofSPMforthe taskoftissue histologyclassificationhasalsobeendemonstratedinChangetal.

(2013a); 2013c). Therefore, we include two variations of SPM as acomponentofthearchitecturefortissuehistologyclassiﬁcation, whicharedescribedasfollows,

Kernel SPM(KSPM Lazebnik etal., 2006):The nonlinear kernel SPM that uses spatial-pyramid histograms offeatures. In our implementation,weﬁxthelevelofpyramidtobe3.

LinearSPM(LSPMYangetal.,2009):ThelinearSPMthatuses thelinearkernelonspatial-pyramidpoolingofsparsecodes.Inour implementation,we ﬁx thelevel ofpyramid to be 3,andchoose the max pooling function on the absolute sparse codes, as sug- gestedinYangetal.(2009);Changetal.(2013a).

Thechoiceofspatialpyramidmatchingmoduleismadetoop- timizetheperformance/efficiencyoftheentireclassificationarchi- tecture.Experimentally,wefindthat(i)FE-KSPMoutperformsFE- LSPM; and(ii)FE-SFE-LSPM andFE-SFE-KSPM have similarper- formance,whiletheformerismorecomputationallyefficientthan thelatter.Therefore,weadoptFE-SFE-LSPMandFE-KSPMduring theevaluation.

As suggested in Jarrett et al. (2009), the vector quantization componentofSPMcanbeseenasanextremecaseofsparsecod- ing, and the local histogram construction/concatenation component ofSPM canbe considered asaspecial formofspatialpool- ing.Asaresult,SPMisconceptuallysimilartothecombinationof sparse codingwithspatialpooling, andthereforeis ableto serve asan extra layer (stage)for featureextraction. Consequently,FE- KSPM can be considered as a single-stage system, and FE-SFE- LSPMcanbeconsideredasamulti-stagesystemwithtwofeature extraction/abstractionlayers.

3.4. Classiﬁcation

Forarchitecture:FE-SFE-LSPM,weemployedthelinearSVMfor classification,thesameasinWangetal.(2010);Yangetal.(2009). Forarchitecture:FE-KSPM,thehomogeneouskernelmap (Vedaldi andZisserman,2012)wasfirstapplied,followedbylinearSVMfor classification.

(6)

Fig. 1. GBM Examples. First column: Tumor; Second column: Transition to necrosis; Third column: Necrosis. Note that the phenotypic heterogeneity is highly diverse in each column.

4. Experimentalevaluationofmodelarchitecture 4.1. Experimentalsetup

Our extensive evaluation is performed based on the cross- validationstrategywith10iterations,wherebothtrainingandtest- ing images are randomly selected per iteration, andthe ﬁnal re- sults are reported as the mean and standard error of the cor- rectclassiﬁcationrateswithvariousdictionarysizes(256,512,1024) on the following two distinct datasets, curatedfrom (i) Glioblas- tomaMultiforme(GBM)and(ii)KidneyRenalClearCellCarcinoma (KIRC)fromThe CancerGenome Atlas(TCGA), whichare publicly available from the NIH (National Institute of Health) repository.

The curationis performedby ourpathologist inorder toprovide examples ofdistinct regions of microanatomy (e.g., stromal) and histopathology(e.g.,tumor,necrosis)withsufficientamountofbi- ologicalheterogeneities andtechnicalvariations, sothat theclas- sificationmodelarchitecturecanbefaithfullytestedandvalidated againstimportantstudies.Furthermore,thecombinationofexten- sive cross-validationandindependentvalidation ondatasets with distinct tumortypes,tothemaximumextent,ensurestheconsis- tency and unbiasedness of our findings. The detaileddescription ofourdatasetsaswellasthecorrespondingtaskforumulationare describedasfollows,

GBM Dataset: In brain tumors, necrosis, proliferation of vas- culature, and infiltration of lymphocytes are important prognostic factors. And, some of these analyses, such as the quantifica- tion of necrosis, have to be defined andperformed asclassifica-

tiontasksinhistologysections.Furthermore,necrosisisadynamic process anddifferentstagesof necrosisexist (e.g.,fromcellsini- tiatinga necrosisprocess tocomplete lossofchromatincontent).

Therefore,thecapabilityofidentification/classificationoftheseend points,e.g.,necrosis-relatedregions,inbraintumor histologysec- tions, is highly demanded. In thisstudy, we aim to validate the modelarchitectureforthethree-categoryclassification(i.e.,Tumor, Necrosis, andTransition to Necrosis) on theGBM dataset, where theimagesarecuratedfromthewholeslideimages(WSI)scanned with a 20X objective (0.502 micron/pixel). Representative exam- plesofeachclasscanbefoundinFig.1,whichrevealasignificant amountofintra-classphenotypicheterogeneity.Suchahighlyhet- erogenousdatasetprovides an idealtest caseforthequantitative evaluation of the composition of model architecture andits impact,intermsofperformanceandrobustness,ontheclassification ofhistologysections.Specifically, thenumberofimagespercate- goryare628,428and324,respectively,andmostimagesare1000

× 1000pixels. Forthis task,we train, withvarious modelarchi- tectures,on160imagesper categoryandtestedonthe rest,with threedifferentdictionarysizes:256,512and1024.

KIRC Dataset: Recent studies on quantitative histologyanaly- sis (Lan etal., 2015;Rogojanu et al., 2015; Huijbers et al., 2013;

deKruijfetal.,2011)revealthatthetumor-stromaratioisaprog- nosticfactorinmanydifferenttumor types,anditisthereforein- terestinganddesirableto knowhowsuch an indexplays itsrole inKIRC,whichcanbefulfilledwithtwo stepsasfollows,(i) identification/classificationoftumor/stromalregionsintissuehistology sectionsfortheconstructionoftumor-stromaratio;and(ii)correl-

(7)

Fig. 2. KIRC examples. First column: Tumor; Second column: Normal; Third column: Stromal. Note that (a) in the first column, there are two different types of tumor corresponding to clear cell carcinoma, with the loss of cytoplasm (first row), and granular tumor (second row), respectively; and (b) in the second column, staining protocol is highly varied. The cohort contains a significant amount of tumor heterogeneity that is coupled with technical variation.

ativeanalysisofthederivedtumor-stromaratiowithclinicalout- come.Therefore, inthis study,we aimto validate the modelarchitectureforthethree-categoryclassification(i.e.,Tumor,Normal, andStromal) on the KIRCdataset, wherethe images are curated fromthewhole slideimages(WSI) scanned witha40X objective (0.252micron/pixel).Representativeexamplesofeachclasscanbe foundinFig.2,which(i)containtwodifferenttypesoftumorcor- respondingtoclearcellcarcinoma,withthelossofcytoplasm(first row),andgranulartumor(secondrow),respectively;and(ii)reveal largetechnicalvariations(i.e.,intermsofstainingprotocol),espe- ciallyinthenormalcategory.Thecombinationofthelargeamount ofbiologicalheterogeneityandtechnicalvariationsinthiscurated datasetprovidesan idealtest caseforthequantitative evaluation ofthecomposition ofmodelarchitecture andits impact,interms ofperformance androbustness, on the classification of histology sections.Specifically, thenumberofimagesper categoryare 568, 796and784,respectively,andmostimagesare1000×1000pix- els. Forthis task, we train, with various model architectures, on 280imagespercategoryandtestedontherest,withthreediffer- entdictionarysizes:256,512and1024.

4.2.Isunsupervisedfeaturelearningpreferabletohumanengineered features?

Feature extractionisthevery ﬁrststepfortheconstruction of classiﬁcation/recogonitionsystem, andis one of themost impor- tantfactors that affecttheperformance. Toanswer thisquestion, weevaluated fourwell-selected featuresbased ontwovastly dif-

ferent tumor types as described previously. The evaluation was carried out withthe FE-KSPM architecture forits simplicity,and the performance wasillustrated in Fig. 3 for the GBM andKIRC datasets. It is clear that the systems based on CMF (CMF-KSPM) and PSD(PSD-KSPM) have the top performances, which are due toi) thecriticalrole ofcellularmorphometric contextduringthe pathologicaldiagnosis,assuggestedinChangetal.(2013a);andii) thecapabilityofunsupervisedfeaturelearningincapturingintrin- sicmorphometricpatternsinhistologysections.

4.3. Doescellularsaliencyhelp?

CMF differs from DSIFT, DCT and DPSD in that (1) CMF char- acterizes biological meaningful properties at cellular-level, while DSIFT,DCTandDPSDarepurelypixel/patch-levelfeatureswithout anyspecific biological meaning; (2)CMF is extractedper nuclear regionwhichiscellular-saliency-aware,whileDSIFT,DCTandDPSD areextractedperregularly-spacedimagepatchwithoutusingcel- lularinformationasprior.Anillustrationofaforementionedfeature extractionstrategiescanbefoundinFig.4.Recentstudy(Wuetal., 2013)indicatesthatsaliency-awarenessmaybehelpfulforthetask of image classification, thus it will be interesting to figure out whetherSIFT,[Color,Texture]andPSDfeaturescanbeimprovedby theincorporationofcellular-saliencyasprior.Therefore,wedesign salientSIFT(SSFIT),salient[Color,Texture] andsalientPSD(SPSD) features, which are only extracted at nuclear centroid locations.

Comparison ofclassiﬁcation performance betweendense features andsalientfeatures,withtheFE-KSPM architecture,is illustrated

(8)

256 512 1024 Dictionary Size 70

75 80 85 90

Performance (%)

Feature evaluation on GBM dataset

CMF-KSPM DPSD-KSPM DSIFT-KSPM DCT-KSPM

256 512 1024

Dictionary Size 80

85 90 95

Performance (%)

Feature evaluation on KIRC dataset

CMF-KSPM DPSD-KSPM DSIFT-KSPM DCT-KSPM

Fig. 3. Evaluation of different f eatures with FE-KSPM architecture on both GBM (left) and KIRC (right) datasets. Here, the performance is reported as the mean and standard error of the correct classiﬁcation rate, as detailed in Section 4 .

Fig. 4. Illustration of dense feature extraction strategy (left) and salient feature extraction strategy (right), where dense features are extracted on regularly-spaced patches, while salient features are extracted on patches centered at segmented nuclear centers. Here, yellow rectangle and red blob represent feature extraction patch/grid and segmented nuclear region, respectively.

in Fig. 5 forGBM and KIRC datasets, which show that, forSIFT, [Color,Texture] andPSDfeatures, cellular-saliency-awarenessplays a negativerole for thetask oftissue histologyclassiﬁcation. One possible explanation is that, different from CMF, which encodes speciﬁcbiologicalmeaningsandsummarizestissueimagewithin- trinsicbiological-context-basedrepresentation,SIFT,[Color,Texture]

andPSDleadtoappearance-basedimagerepresentation,andthus requiredensesamplingallovertheplaceinordertofaithfullyas- sembletheviewoftheimage.

4.4. Doesthesparsefeatureencoderhelp?

The evaluationof systems with the sparse feature encoder is carried out with the configuration FE-SFE-LSPM, where LSPM is used instead of KSPM for improved efficiency. Classification performance isillustrated inFig.6 andFig.7fortheGBM andKIRC datasets,respectively;andtheresultsshowthat,comparedtoFE- KSPM,

1. ForFE=CMF and SFE ∈{SC,GSC,LLC,LCDL}, FE-SFE-LSPM con- sistentlyimprovestheclassiﬁcationperformanceforbothGBM andKIRCdatasets;

2. ForFE∈{SIFT,[Color,Texture]}andSFE∈{SC,GSC,LLC,LCDL},FE- SFE-LSPM improves the performance for KIRC dataset; while impairstheperformanceforGBMdataset;

3. ForFE=PSD, FE-SFE-LSPMimproves theperformance forboth GBM and KIRC datasets, with SFE = SC; while, in gen-

eral, impairs the performance for both datasets, with SFE ∈ {GSC,LLC,LCDL}.

The observations above suggest that, the effect of the sparse feature encoder highlycorrelates withthe robustnessof thefea- turesbeingused,andsigniﬁcantimprovementofperformancecan beachievedconsistently acrossdifferent datasetswiththechoice of CMF. It is also interesting to notice that, with the choice of PSD, the sparse feature encoder only helps improve the performance with sparse coding (SC) as the intermediate feature extraction layer. A possible explanation is that, compared to CMF whichhasrealphysicalmeanings,thePSDfeatureresidesinahy- per spaceconstructed fromunsupervised feature learning, where Euclidean-distance, as a critical part of GSC, LLC and LCDL, may notapply.

Furthermore,it is alsointeresting and importantto know the effectofincorporatingdeeplearningforfeatureextraction.There- fore, for further validation, we have also evaluated two popular deep learning techniques, namely Stacked PSD (Chang et al., 2013d) and Convolutional Neural Networks (CNN) (Lecun et al., 1998;HuangandLeCun,2006;Krizhevskyetal.,2012).Speciﬁcally,

1.StackedPSD-KSPM:fortheevaluationofStackedPSD,thesame protocolasinChangetal.(2013d)isutilized.Briefly,two lay- ersofPSD,with2048(firstlayer) and1024(secondlayer) basisfunctions, respectively,arestackedtoformadeeparchitec- tureforthefeatureextractionon 20× 20image-patcheswith astep-sizefixedtobe20,empirically,forbestperformance.Af-

(9)

256 512 1024 Dictionary Size

20 30 40 50 60 70 80 90

Performance (%)

Dense vs salient features on GBM dataset

DPSD-KSPM SPSD-KSPM DSIFT-SC-LSPM SSIFT-SC-LSPM DSIFT-KSPM SSIFT-KSPM DCT-KSPM SCT-KSPM

256 512 1024

Dictionary Size 55

60 65 70 75 80 85 90 95 100

Performance (%)

Dense vs salient features on KIRC dataset

DPSD-KSPM SPSD-KSPM DSIFT-SC-LSPM SSIFT-SC-LSPM DSIFT-KSPM SSIFT-KSPM DCT-KSPM SCT-KSPM

Fig. 5. Evaluation of dense feature extraction and salient feature extraction strategies with the FE-KSPM architecture on both GBM (left) and KIRC (right) datasets, where solid line and dashed line represent systems built upon dense feature and salient feature, respectively. Here, the performance is reported as the mean and standard error of the correct classiﬁcation rate, as detailed in Section 4 .

256 512 1024

Dictionary Size 82

84 86 88 90 92 94

Performance (%)

CMF-based architectures on GBM dataset

CMF-LCDL-LSPM CMF-LLC-LSPM CMF-SC-LSPM CMF-GSC-LSPM CMF-KSPM(baseline)

256 512 1024

Dictionary Size 82

84 86 88 90 92 94

Performance (%)

DPSD-based architectures on GBM dataset

DPSD-LCDL-LSPM DPSD-LLC-LSPM DPSD-SC-LSPM DPSD-GSC-LSPM DPSD-KSPM(baseline)

256 512 1024

Dictionary Size 70

75 80 85

Performance (%)

DSIFT-based architectures on GBM dataset

DSIFT-LCDL-LSPM DSIFT-LLC-LSPM DSIFT-SC-LSPM DSIFT-GSC-LSPM DSIFT-KSPM(baseline)

256 512 1024

Dictionary Size 60

65 70 75 80

Performance (%)

DCT-based architectures on GBM dataset

DCT-LCDL-LSPM DCT-LLC-LSPM DCT-SC-LSPM DCT-GSC-LSPM DCT-KSPM(baseline)

Fig. 6. Evaluation of the architectures with sparse feature encoders ( FE-SFE-LSPM ) on GBM dataset. Here, the performance is reported as the mean and standard error of the correct classiﬁcation rate, as detailed in Section 4 .

ter thepatch-basedextraction,thesameprotocolasshownin FE-KSPMisutilizedforclassiﬁcation.

2. AlexNet-KSPM: for the evaluation of CNN, we adopt one of the mostpowerful deep neural network architecture:AlexNet (Krizhevsky et al., 2012) with the Caffe (Jia et al., 2014) implementation. Given (i) the extremely large scale (60 million parameters) of the AlexNet architecture; (ii) the signiﬁcantly smaller data-scale of GBM and KIRC, compared to ImageNet (Dengetal.,2009) withone thousandcategoriesandmillions

ofimages,whereAlexNetisoriginallytrained;and(iii)thesig- nificantdeclineofperformance dueto over-fittingthatweex- periencewiththeend-to-endtuningofAlexNetonourdataset as a result of (i) and (ii), we simply adopt the pre-trained AlexNet for feature extraction on 224 × 224 image-patches with a step-size fixed to be 45, empirically, for best performance.After the patch-basedextraction,thesameprotocolas showninFE-KSPM is utilizedfor classification.It is worth to

(10)

90 92 94 96 98

Performance (%)

CMF-based architectures on KIRC dataset

CMF-LCDL-LSPM CMF-LLC-LSPM CMF-SC-LSPM CMF-GSC-LSPM CMF-KSPM(baseline)

256 512 1024

Dictionary Size 90

92 94 96 98

Performance (%)

DPSD-based architectures on KIRC dataset

DPSD-LCDL-LSPM DPSD-LLC-LSPM DPSD-SC-LSPM DPSD-GSC-LSPM DPSD-KSPM(baseline)

256 512 1024

Dictionary Size 88

90 92 94 96 98

Performance (%)

DSIFT-based architectures on KIRC dataset

DSIFT-LCDL-LSPM DSIFT-LLC-LSPM DSIFT-SC-LSPM DSIFT-GSC-LSPM DSIFT-KSPM(baseline)

256 512 1024

Dictionary Size 82

84 86 88 90 92

Performance (%)

DCT-based architectures on KIRC dataset

DCT-LCDL-LSPM DCT-LLC-LSPM DCT-SC-LSPM DCT-GSC-LSPM DCT-KSPM(baseline)

Fig. 7. Evaluation of the architectures with sparse feature encoders ( FE-SFE-LSPM ) on KIRC dataset. Here, the performance is reported as the mean and standard error of the correct classiﬁcation rate, as detailed in Section 4 .

4 2 0 1 2

1 5 6

5 2

Dictionary Size 96

96.5 97 97.5 98 98.5 99 99.5

Performance (%)

Evaluation of deep-learning-based architectures on KIRC dataset

StackedPSD-KSPM AlexNet-KSPM CMF-LLC-LSPM CMF-KSPM(baseline)

4 2 0 1 2

1 5 6

5 2

Dictionary Size 90

90.5 91 91.5 92 92.5 93 93.5 94 94.5

Performance (%)

Evaluation of deep-learning-based architectures on GBM dataset

StackedPSD-KSPM AlexNet-KSPM CMF-LCDL-LSPM CMF-KSPM(baseline)

Fig. 8. Evaluation of the effect of incorporating deep learning for feature extraction on both GBM and KIRC datasets. Note that, given the various combinations of FE-SFE- L SPM , CMF-LCDL-L SPM and CMF-LLC-L SPM are chosen for GBM and KIRC datasets, respectively, for their best performance. Here, the performance is reported as the mean and standard error of the correct classiﬁcation rate, as detailed in Section 4 .

mentionthatsuchanapproachfallsintothecategoriesofboth deeplearningandtransferlearning.

Experimentalresults,illustratedinFig.8,suggestthat,

1. Bothsparse feature encodersand feature extractionstrategies based on deep learning techniques consistently improve the performanceoftissuehistologyclassiﬁcation;

2. The extremely large-scale convolutional deepneural networks (e.g., AlexNet), pre-trained on extremely large-scale dataset (e.g.,ImageNet),canbedirectlyapplicabletothetaskoftissue histologyclassiﬁcationduetothecapabilityofdeepneuralnet- worksincapturingtransferablebaseknowledgeacrossdomains

(Yosinskietal., 2014). Althoughthefine-tuning ofAlexNet towards our datasets shows significant performance drop due to the problem of over-fitting, the direct deployment of pre- trained deep neural networksstill provides a promising solu- tionfortaskswithlimiteddataandlabels,whichisverycom- moninthefieldofmedicalimageanalysis.

4.5.Revisitonspatialpooling

Tofurther study the impact of poolingstrategy, we also pro- videextensiveexperimental evaluationonone ofthemostpopu- larpoolingstrategies(i.e.,maxpooling)inplaceofspatialpyramid

(11)

75 80 85 90

Performance (%)

CMF-based architectures on GBM dataset

CMF-LCDL-LSPM CMF-LLC-LSPM CMF-SC-LSPM CMF-GSC-LSPM CMF-LCDL-Max CMF-LLC-Max CMF-SC-Max CMF-GSC-Max

256 512 1024

Dictionary Size 75

80 85 90

Performance (%)

DPSD-based architectures on GBM dataset

DPSD-LCDL-LSPM DPSD-LLC-LSPM DPSD-SC-LSPM DPSD-GSC-LSPM DPSD-LCDL-Max DPSD-LLC-Max DPSD-SC-Max DPSD-GSC-Max

256 512 1024

Dictionary Size 84

86 88 90 92 94 96 98

Performance (%)

CMF-based architectures on KIRC dataset

CMF-LCDL-LSPM CMF-LLC-LSPM CMF-SC-LSPM CMF-GSC-LSPM CMF-LCDL-Max CMF-LLC-Max CMF-SC-Max CMF-GSC-Max

256 512 1024

Dictionary Size 84

86 88 90 92 94 96 98

Performance (%)

DPSD-based architectures on KIRC dataset

DPSD-LCDL-LSPM DPSD-LLC-LSPM DPSD-SC-LSPM DPSD-GSC-LSPM DPSD-LCDL-Max DPSD-LLC-Max DPSD-SC-Max DPSD-GSC-Max

Fig. 9. Evaluation of the impact of different spatial pooling strategies with the FE-SFE-LSPM framework on both GBM and KIRC datasets. Note that, given many of the popular spatial pooling strategies, max pooling is chosen due to the extensive justiﬁcation by both biophysical evidence in the visual cortex and researches in image categorization tasks. The derived architecture is described as FE-SFE-Max , and only the top-two-ranked features (i.e., DPSD and CMF) are involved during evaluation. Here, the performance is reported as the mean and standard error of the correct classiﬁcation rate, as detailed in Section 4 .

matchingwithinFE-SFE-LSPMframework,whichisdeﬁnedasfol- lows,

max: f_j=max

{|

^c1j

|

^,

|

^c2j

|

^,^.^.^.^,

|

^cM j

|}

⁽⁷⁾

whereC=[c₁,...,c_M]∈R^b^×^M isthesetofsparsecodesextracted froman image,c_ijisthe matrixelementati-throwandj-th col- umnofC,and f=[f₁,...,f_b] isthepooledimagerepresentation.

Thechoice ofmax poolingprocedure has beenjustiﬁed by both biophysicalevidence in the visual cortex (Serreetal., 2005) and researchesinimagecategorization(Yangetal.,2009),andthede- rivedarchitecture isdescribedasFE-SFE-Max. Inourexperimen- talevaluation,wefocusonthetop-two-rankedfeatures(i.e.,DPSD andCMF),where the corresponding comparisons ofclassiﬁcation performanceare illustratedinFig. 9.Itisclearthat systemswith SPM poolingconsistently outperformssystems with max pooling withvariouscombinationsoffeaturetypesandsparsefeatureen- coders.Apossibleexplanationisthatthevectorquantizationstep inSPMcanbeconsideredasanextremecaseofsparsecoding(i.e., withasinglenon-zeroelementineachsparsecode);andthelocal histogramconcatenationstepinSPM canbeconsidered asa spe- cialformofspatialpooling.As aresult,SPMisconceptuallysimi- lartoanextralayerofsparsefeatureencodingandspatialpooling, assuggestedinJarrettetal.(2009),andthereforeleadstoanim- provedperformance,comparedtothearchitecturewithmaxpool- ing.

4.6. Revisitoncomputationalcost

Inadditiontoclassificationperformance,anothercriticalfactor, in clinical practice, is the computational efficiency. Therefore, in thissection, we provided a detailed evaluationon computational cost of various systems. Given the fact that (i) training can al- waysbecarriedoutoff-line;(ii)theclassificationofthesystemsin evaluationareallbasedonlinearSVM,ourevaluationoncompu- tational efficiencyfocuseson on-linefeature extraction(including sparse featureencoding),whichis themosttime-consumingpart duringthetestingphase.AsshowninTable5,

1. SIFT features are the most computational eﬃcient features amongalltheonesincomparison.However,thesystemsbuilt onSIFTfeaturesgreatlysufferfromthetechnicalvariationsand biological heterogeneities in both datasets, and therefore are not goodchoices forthe classiﬁcation oftissue histology sections;

2. Giventhefact that the nuclearsegmentation is a prerequisite forsalient feature extraction (e.g.,SPSD, SSIFT andSCT), sys- temsbuilt upon salientfeatures may not be necessarilymore eﬃcientthan systemsbuiltupon dense features.Furthermore, since the salient features typically impair the tissue histology classiﬁcation performance, they are therefore not recom- mended;

model architecture for classiﬁcation of histology sections When machine vision meets histology: A comparative evaluation of Medical Image Analysis

Medical Image Analysis

When machine vision meets histology: A comparative evaluation of model architecture for classiﬁcation of histology sections R

Cheng Zhong

, Ju Han

, Alexander Borowsky

, Bahram Parvin

, Yunfu Wang

, Hang Chang

θ

{

}

σ

λ

∀

λ

λ

||

||

λ||

||

||

||

∀

λ

||

||

λ||

||

α

(

)

||

||

∀

λ

α

||

||

λ||

||

∀

(

)

σ

σ

λ

σ

λ

μ

∀

(

)

(

) ∀

(

)

τ

τ

λ

μ

μ

τ

λ

μ

τ

{|

|

|

|

|

|}

When machine vision meets histology: A comparative evaluation of model architecture for classiﬁcation of histology sections ^R