Deep learning onto graph space: application to image-based insect recognition

(1)

Deep learning onto graph space: application to image-based insect recognition

Maxime Martineau

Advised by Donatello Conte, Romain Raveaux and Gilles Venturini Jury: Luc Brun, Pierre Héroux, Kaspar Riesen and Christel Vrain 13 November 2019

LIFAT (EA 6300)

(2)

Introduction

(4)

Why arthropod identification

• Applied entomology

• Estimation of the insect populations

• Biodiversity assessment

• Integrated pest management

2

(5)

Arthropod identification

• Trap-based harvest

• Human identification procedure

• Use of taxonomical keys

Wings

Color ?

yes

Shape yellow

Bumblebee round

Bee thin

Ladybug red

Nb legs ?

no Cockroach

6

Spider 8

(6)

Why automation?

• Complex task

• Lots of individuals to identify (population estimation)

• Needs a lot of qualified workforce

4

(7)

Arthropod identification

How to automate the task ?

• 3D images

• sound

• genomics

• 2D images

• cost effective

• available

• easy capture

(8)

Theoretical context

Image classification

Let an imagex∈R^w^×^h^×³andy∈ Yits class (in the class setY).

We are searching for the classifier functionfs.t.:

f : R^w^×^h^×³ → Y

x 7→ y

Image-based insect classification

• Multi-granularity

• High intra-class variability

• Low inter-class variability

• Different sceneries (lab, field, …)

order family genus species

6

(9)

State of the art of insect image

recognition

(10)

State of the art

• No quality survey

• Study of 44 papers on insect image identification

• Analysis through several axes:

• Image capture

• Feature extraction

• Classification

7

(11)

State of the art

¹

colour SIFT shape

...

MLP BoW

Sparse

stacked auto-encoders

SVM DTreeMLP

...

kNN

entomart Gassoumi 2000

janzen.sas.upenn.edu

bagging boosting ...

1Maxime Martineau et al. “A survey on image-based insect classification”.In:Pattern Recognition65 (2017), pp. 273–284.

(12)

Feature extraction in insect image recognition

Category Levels Nb approaches

Handcrafted features

Domain-dependent Wing’ Venations

25 Geometry

Global and generic image features

Shape Color Texture Raw Pixel Local features SIFT

Others

Mid-level features

Unsupervised representations

BoW PCA 17 Supervised representationsMLP

Sparse Coding Hierarchical repre-

sentations Auto-encoder 1

Table 1:Feature taxonomy for insect recognition from images

9

(13)

Contributions

Scientifical lock Contribution No deep learning based ap-

proach

Applying CNN on insect image recognition

Almost no structural approach Graph-based representation and classification

No standard benchmark dataset ImageNet-derived dataset

(14)

Research directions

Directions Statistical

CNN

Structural

Graph-based

Application to insect image recognition

11

(15)

Research directions

CNN

Structural

Graph-based Application to insect image recognition

(16)

Convolutional neural networks

and transfer learning

(17)

Convolutional Neural Networks

Model:

Image apriori CNN character- istics

Layer Translation in-

variance

Shared weights Convolution Compositionality Multi-scale Pooling Spatial local-

ization

Spatial local- ization

Convolution

& Pooling

Learning problem

min

Θ

1

|X|

∑

(x,y)∈(X,Y)

L(fΘ(x),y) Optimization method:

Θt= Θt−1−α∂L(fΘ(x),y)

∂Θ

(18)

a Convolutional Neural Network architecture : VGG-16

3x3 conv, 64 3x3 conv, 64 maxpool/2 3x3 conv, 128 3x3 conv, 128 maxpool/2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 maxpool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 maxpool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv,512 maxpool/2 flatten fc, 4096 fc, 4096 fc, 1000

Extraction

Classification

• Good example of CNN architecture close to state of art

• Extraction-Classification pattern

13

(19)

Convolutional Neural Networks

• State of the art in image classification

• Can learn complex mapping between images and classes

• Sensitive to class-cardinality balance

•

4

^! Needs a lot of data

ImageNet-1000² Target entomological set

# images 1 000 000 3 000

# classes 1000 30

avg # images/class 1 000 100

min cardinalities 732 33

max cardinalities 1300 370

2Olga Russakovsky et al. “ImageNet Large Scale Visual Recognition Challenge”.In:

International Journal of Computer Vision115.3 (Dec. 2015), pp. 211–252. ISSN: 1573-1405.

DOI:10.1007/s11263-015-0816-y.

(20)

Proposed method

Limitation Solution

Sensitivity to imbalance Weighting loss

Too few data Reduce number of parameters

Transfer learning

15

(21)

Reduce number of parameters

Flattening Average pooling³

…

I/O l1:R⁷^×⁷^×⁵¹²→R²⁵⁰⁸⁸ l2:R⁷^×⁷^×⁵¹²→R⁵¹² Translation

invariance

No Yes (1 scalar per filter)

Next layer complexity (nneurons)

O(25088n) O(512n)

3Min Lin, Qiang Chen, and Shuicheng Yan. “Network In Network”.In:CoRR abs/1312.4400 (2013).

(22)

Reduced VGG-16

138M parameters

3x3 conv, 64 3x3 conv, 64 maxpool/2 3x3 conv, 128 3x3 conv, 128 maxpool/2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 maxpool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 maxpool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 maxpool/2 global avgpool

fc, 256 fc, 1000

15M parameters

90% reduction ₁₇

(23)

Transfer learning

⁴

fc, 256 fc, n 4S. J. Pan and Q. Yang. “A Survey on Transfer Learning”.In:IEEE Transactions on Knowledge and Data Engineering22.10 (Oct. 2010), pp. 1345–1359. ISSN: 1041-4347.

(24)

How much do we have to learn ?

⁵

fc, 256 fc, 1000

Frozen

Learnt

5Jason Yosinski et al. “How transferable are features in deep neural networks?”In:

Advances in neural information processing systems. 2014, pp. 3320–3328.

19

(25)

Mitigating the imbalance problem

Weighting the loss depending on class cardinalities:

minΘ

1

|X|

∑

(x,y)∈(X,Y)

wy.L(fΘ(x),y)

wy= maxy^′∈CCard(y^′) Card(y)

(26)

Research directions

CNN

Structural

Graph-based

Application to insect image recognition

21

(27)

Graph neural networks

(28)

Introduction to graph neural networks

Classic NN operate on euclidean spaces

→ Find replacement layers and operators to work on non-euclidean spaces

22

(29)

Graph classification

Searchingfsuch that:

f:G → Y

• Embedding

• Explicit (G →Rⁿ) (ex. Graph NN/Graph convolution)

• Implicit (⟨G,G⟩ →R)

• Graph space

• Graph matching (G × G →R)

(30)

Graph classification

f:G → Y

• Embedding

• Explicit (G →Rⁿ) (ex. Graph NN/Graph convolution)

• Graph space

23

(31)

Graph matching

a c b

1 2

ϵ1 ϵ2

ma,1= 1

mc,ϵ₂ = 1

mb,2= 1

G1 G2

minm d(G1,G2,m)

d(G1,G2,m) = ∑

mi,k=1

d(i,k) + ∑

mi,k=1

∑

mj,l=1

d(ij,kl) (1a)

=d(a,1) +d(b,2) +d(c, ϵ2) +d(ab,12) +d(cb, ϵϵ2) (1b)

(32)

Graph-perceptron

⁷

Perceptron⁶ I hI, βi Heaviside

Graph-perceptron GI

minmd(GI,GM,m, β) Heaviside

6F. Rosenblatt. “The Perceptron – a perceiving and recognizing automaton.”.In:

Cornell Aeronautical Laboratory.(1957), Report 85-460–1.

7Maxime Martineau et al. “Learning error-correcting graph matching with a multiclass neural network”.In:Pattern Recognition Letters(2018). ISSN: 0167-8655.

25

(33)

Parameterizing graph matching

a c b

1

2

ϵ1 ϵ2

ma,1= 1

mc,ϵ₂ = 1

mb,2= 1

GI GM

d(GI,GM,y, β) = ∑

mi,k=1

βkd(i,k) + ∑

mi,k=1

∑

mj,l=1

βkld(ij,kl) (2a)

= β1d(a,1) +β2d(b,2) +βϵ₂d(c, ϵ2) (2b) +β12d(ab,12) +βϵϵ₂d(cb, ϵϵ2) (2c)

(34)

Graph classification

f:G → Y

• Embedding

• Explicit (G →Rⁿ) (ex.Graph NN/Graph convolution)

• Graph space

27

(35)

Convolution on graphs

⁸

Spectral definitions

• analogy of the convolution theorem:

F{f∗g}=F{f} · F{g}

• definition of convolution filter in the frequency domain

• unstable across domain

Spatial definitions

• Explicit computation of convolution

• definition of convolution filter in the spatial domain

• No trivial definition

8Michael M. Bronstein et al. “Geometric deep learning: going beyond Euclidean data”.

In:CoRRabs/1611.08097 (2016). arXiv:1611.08097.

(36)

An example of spatial graph convolution: MoNet

⁹

Degradation of graphs to riemannian manifolds:

• Non-euclidean

• Node neighbourhoods as compact euclidean spaces

a

b c

d e

f g

9Federico Monti et al. “Geometric deep learning on graphs and manifolds using mixture model CNNs”.In:CoRRabs/1611.08402 (2016). arXiv:1611.08402.

29

(37)

An example of spatial graph convolution: MoNet

⁹

Degradation of graphs to riemannian manifolds:

• Non-euclidean

• Node neighbourhoods as compact euclidean spaces

Computation of convolution at nodee

a

b c

d e

f g

9Monti et al., “Geometric deep learning on graphs and manifolds using mixture model CNNs”.

(38)

MoNet: local operation

¹⁰

Node neighbourhood graphletg^e_I

b c

e

f g

(0,3)

(1,2)

(−2,1)

(−1,−1)

Local/tangent spaceN[e]

x y

b c f

g e

30

(39)

MoNet: local operation

¹⁰

Node neighbourhood graphletg^e_I

b c

e

f g

(0,3)

(1,2)

(−2,1)

(−1,−1)

b c f

g e

µ1 Σ1 µ2

Σ2

µ3 Σ3 µ4

Σ4

(40)

MoNet: local operation

¹⁰

b c f

g e

µ1 Σ1 µ2

Σ2

µ3 Σ3 µ4

Σ4

Convolution computation ate (

w1(e) w2(e) w3(e) w4(e))





 β1

β2

β3

β₄







wi(x) = ∑

x′∈N[x]

exp⁻¹²^(x^′⁻^µⁱ⁾^T^Σ⁻¹ⁱ ^(x^′⁻^µⁱ⁾ Parameters: βiandµi,Σi

30

(41)

MoNet: local operation

¹⁰

b c f

g e

µ1 Σ1 µ2

Σ2

µ3 Σ3 µ4

Σ4

Convolution computation ate (

w1(e) w2(e) w3(e) w4(e))





 β1

β₂ β3

β4







wi(x) = ∑

x^′∈N[x]

exp⁻¹²^(x^′⁻^µⁱ⁾^T^Σ⁻ⁱ¹^(x^′⁻^µⁱ⁾

Parameters: βiandµi,Σi

• Transforms graph to riemannian manifold

• Limited use of topology

(42)

Our Graph convolution proposal

Convolution purely in graph space

• Using graph matching

• No degradation of the graph representation

a

b c

d e

f g

31

(43)

Our Graph convolution proposal

Convolution purely in graph space

• Using graph matching

• No degradation of the graph representation

Computation of convolution at nodee

a

b c

d e

f g

(44)

Our GCNN approach: local operation

b c

e

f g

n o

p

q r

g^e_I GF

mc,p= 1

me,n= 1

(GI∗GF)(e) = max

m

∑

mi,k=1

hLi, βki+ ∑

mi,k=1

∑

mj,l=1

hLij, βkli (3a)

= hLe, β_ni+hLc, β_pi+hLec, β_npi (3b)

32

(45)

Graph convolution complexity

• Bipartite matching solver used

• O(n⁵)complexity per local operation

• Use of simplified model (no edges):O(n³)

(46)

Testing Graph Convolution

n×nconv, 32 maxpool/2 n×nconv, 64

maxpool/2 n×nconv, 128

maxpool/2 global avgpool

fc, n

34

(47)

Testing Graph Convolution

Dataset Training set Validation set Testing set

MNIST-original 48 000 12 000 10 000

MNIST-rotated¹¹

10 000 2 000 50 000

MNIST-reduced

1

4 grid 75 superpixels Region Adjacency Graph

11Hugo Larochelle et al. “An empirical evaluation of deep architectures on problems with many factors of variation”.In:Proceedings of the 24th international conference on Machine learning. ACM. 2007, pp. 473–480.

(48)

Results for Graph Convolution

Repr. Dataset CNN MoNet Ours

1

4grid reduced 99.88 % 99.40 % 97.76 % mixed 89.87 % 88.90 % 95.63 % 75 superpixelsreduced 92.70 % 89.53 %

mixed 92.90 % 94.17 %

Results on MNIST-2class

• reduced: no rotation

• mixed: rotation only during testing

36

(49)

Research directions

CNN

Structural

Graph-based Application to insect image recognition

(50)

Experiments on insect image

recognition

(51)

Experiments on insect image recognition

Datasets

(52)

IRBI dataset

Smartphone

Diﬀusion dome

LED Specimen

38

(53)

ImageNet-arthropods

• Images in ImageNet category arthropods

• Cardinality reduction (to match IRBI avg # of images per class)

(54)

Datasets

Dataset Nb of classes µ(Card(c)) σ(Card(c))

IRBI 30 85 71

ImageNet-arthropods 443 96 78

40

(55)

Experiments on insect image recognition

Results

(56)

Transfer learning parametrization

fc, 256 fc, 1000

41

(57)

CNNs

Model IRBI ImageNet-arthropods

Top-1 Top-5 Top-1 Top-5

SIFTBoW 52.3 % ± 3.7 82.7 % ± 3.3 11.7 % ± 0.2 25.9 % ± 0.4 VGG16-frsc 54.0 % ± 5.0 84.9 % ± 3.0 26.9 % ± 0.7 50.1 % ± 0.7 VGG16-fitu 72.0 % ± 3.2 92.1 % ± 1.1 42.7 % ± 0.9 69.4 % ± 0.6 VGG16-fitu/w 73.6 % ± 1.8 92.4 % ± 2.2 43.5 % ± 1.1 71.3 % ± 0.8 VGG16-fitu7/w 72.4 % ± 2.8 92.6 % ± 2.1 43.3 % ± 0.6 71.8 % ± 0.4

Table 2:Recognition rates on 5-fold cross-validation

• SIFTBoW: SIFT descriptors classified through codebook

• VGG16: CNN approaches

• frsc: from random weights

• fitu: with transfer learning

• /w: with loss weighting

• fitu7: with transfer learning (7 last layers)

(58)

Graph CNNs

Model IRBI

Top-1 Top-5

SIFTBoW 52.3 % ± 3.7 82.7 % ± 3.3 VGG16-frsc 54.0 % ± 5.0 84.9 % ± 3.0 Table 3:Recognition rates on 5-fold cross-validation

Dataset IRBI

Model Top-1 Top-5

Our graphcnn 30.40 % 71.03 %

Table 4:Recognition rates using region adjacency graphs

• Different network size

• No transfer learning

43

(59)

Conclusions and perspectives

(60)

Conclusions

• Thorough state of the art of insect image recognition

• Application of CNN through transfer learning

• Graph-based approaches proposals

• Graph perceptron

• Graph Convolutional neural networks

• Publicly available dataset proposed: ImageNet-arthropods

44

(61)

Perspectives

• Graph Convolution optimization

• CNN pipeline improvements

• Other research directions

• Metric learning

• Zero-shot learning

(62)

Graph convolution neural network

• Graph matching solver to be changed

• Lower complexity UnderO(n³)

• Differentiability

∂L(fW(x),y)

∂W depends on the solver iterations

46

(63)

CNN pipeline improvement

• Dealing with multi individuals images

• Networks returns list of locations and classes

• Existing CNN-based approaches:

R-CNN, YOLO, …

• Extra need in labelling

(64)

Towards metric learning

• Unknown query image

• Model returns list ofknearest known images

• Potentially useful to the user

• Neural approach: siamese network

48

(65)

Zero-shot learning

What if an unknown class occurs?

Existing approaches:

• Using metric learning

• Using a-priori knowledge (existing approaches with Graph CNN¹²)

zebra dear okapi

legs body striped

striped

brown

striped brown

brown

12Xiaolong Wang, Yufei Ye, and Abhinav Gupta. “Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs”.In:The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 2018.

(66)

Insect recognition application

50

(67)

Upcoming academic projects

• ANR projects

• Ecophyto ANR

• Biodiversité ANR (Challenge IA biodiv)

• New datasets from IRBI and IRSTEA

(68)

Publications i

In proceedings

Martineau, Maxime et al. “Approches connexionnistes pour la reconnaissance d’images d’insectes.”.In:Journées ORASIS’17. Colleville-sur-Mer, France, 2017.

Raveaux, Romain et al. “Learning Graph Matching with a Graph-Based Perceptron in a Classification Context”.In:Graph-Based Representations in Pattern Recognition - 11th IAPR-TC-15 International Workshop, GbRPR 2017, Proceedings. 2017, pp. 49–58.

Martineau, Maxime et al. “Effective Training of Convolutional Neural Networks for Insect Image Recognition”.In:International Conference on Advanced Concepts for Intelligent Vision Systems. Springer. 2018, pp. 426–437.

52

(69)

Publications ii

Journal articles

Martineau, Maxime et al. “A survey on image-based insect classification”.In:Pattern Recognition65 (2017), pp. 273–284. ISSN: 0031-3203.

Alaei, Alireza et al. “Blind document image quality prediction based on modification of quality aware clustering method integrating a patch selection strategy”.In:Expert Systems with Applications108 (2018), pp. 183–192. ISSN: 0957-4174.

Martineau, Maxime et al. “Learning error-correcting graph matching with a multiclass neural network”.In:Pattern Recognition Letters(2018). ISSN: 0167-8655.

(70)

Thanks

54

(71)

Thanks for your attention

(72)

Graph matching

a

b c

d e

f g

<match>

1 2

3 4

LetG1 = (N1,E1,L1 )andG2 = (N2,E2,L2 )be two graphs

Problem

Graph Matching formulation

y∗=argmin

y d(G1,G2,y), (4a)

subject to y∈ {0,1}n1n2 (4b)

n1

∑ i=1

yi,a= 1 ∀a∈[1,· · ·,n2 ] (4c)

n2

∑

a=1yi,a= 1 ∀i∈[1,· · ·,n1 ] (4d)

d(G1,G2,y) = ∑ yia=1

dV(L1 i,L2

a) (5)

+ ∑

yia=1

∑ yjb=1

dE(L1 ij,L2

ab) =yTDy (6)

55

(73)

Graph classification

Input:

a

b c

d e

f g

<match>

Model:

1 2

3 4

LetG1 = (N1,E1,L1 )andG2 = (N2,E2,L2 )be two graphs

Problem

Graph Matching formulation

y∗=argmin

y d(G1,G2,y), (7a)

subject to y∈ {0,1}n1n2 (7b)

n1

∑ i=1

yi,a= 1 ∀a∈[1,· · ·,n2 ] (7c)

n2

∑

a=1yi,a= 1 ∀i∈[1,· · ·,n1 ] (7d)

d(G1,G2,y) = ∑ yia=1

dV(L1 i,L2

a) (8)

+ ∑

yia=1

∑ yjb=1

dE(L1 ij,L2

ab) =yTDy (9)

(74)

Parametrized graph matching

Problem

Parametrized Graph Matching Problem y^∗=argmin

y d(G¹,G²,y, β) (10a)

57

(75)

Parametrized graph matching

Problem

y

d(G¹,G²,y, β) (10a)

How to defineβ?

(76)

Parametrized graph matching

Problem

y

d(G¹,G²,y, β) (10a)

How to defineβ?

57

(77)

Parametrized distance function example

(78)

Testing graph perceptron: data

Database size (TrS,TeS)

# classes node labels

edge

labels |V| |E| max|V| max|E| LETTER

(high) (750,750) 15 x,y none 4.7 4.5 9 9

GREC (286,528) 22 x,y Line

types

11.5 12.2 25 30

59

(79)

Results

ηTrS η std(η) time std(time) GREC

Proposal 0.9733 0.9488 0.1054 87.31 24.49

NW-1-NN (β= 1) NA 0.5235 0.0561 1588.83 870.46

T-1-NN (Kaspar Riesen and Horst Bunke. “Approximate graph edit distance computation by means of bipartite graph matching”.In:Image Vision Comput.27.7 (2009), pp. 950–959)

NA 0.9992 0.0096 1789.52 990.08

LETTER

Proposal 0.8610 0.8262 0.1279 31.09 6.42

NW-1-NN (β= 1) NA 0.9735 0.0294 1584.15 510.37

T-1-NN (Riesen and Bunke,

“Approximate graph edit distance computation by means of bipartite graph matching”)

NA 0.9735 0.0295 1573.96 490.51

(80)

Learning rate impact

61

(81)

Demo

(82)

Conclusion

• Parametrized graph matching optimization problem solved with a graph-based perceptron

• Large speed-up with little accuracy loss compared to 1-NN classifier

• Simple paradigm with parameters to explore: Graph models types, stacking up neurons, …

63

Deep learning onto graph space: application to image-based insect recognition