Deep learning onto graph space: application to image-based insect recognition
Maxime Martineau
Advised by Donatello Conte, Romain Raveaux and Gilles Venturini Jury: Luc Brun, Pierre Héroux, Kaspar Riesen and Christel Vrain 13 November 2019
LIFAT (EA 6300)
Table of contents
1. Introduction
2. State of the art of insect image recognition
3. Convolutional neural networks and transfer learning 4. Graph neural networks
5. Experiments on insect image recognition Datasets
Results
6. Conclusions and perspectives
1
Introduction
Why arthropod identification
• Applied entomology
• Estimation of the insect populations
• Biodiversity assessment
• Integrated pest management
2
Arthropod identification
• Trap-based harvest
• Human identification procedure
• Use of taxonomical keys
Wings
Color ?
yes
Shape yellow
Bumblebee round
Bee thin
Ladybug red
Nb legs ?
no Cockroach
6
Spider 8
Why automation?
• Complex task
• Lots of individuals to identify (population estimation)
• Needs a lot of qualified workforce
4
Arthropod identification
How to automate the task ?
• 3D images
• sound
• genomics
• 2D images
• cost effective
• available
• easy capture
Theoretical context
Image classification
Let an imagex∈Rw×h×3andy∈ Yits class (in the class setY).
We are searching for the classifier functionfs.t.:
f : Rw×h×3 → Y
x 7→ y
Image-based insect classification
• Multi-granularity
• High intra-class variability
• Low inter-class variability
• Different sceneries (lab, field, …)
order family genus species
6
State of the art of insect image
recognition
State of the art
• No quality survey
• Study of 44 papers on insect image identification
• Analysis through several axes:
• Image capture
• Feature extraction
• Classification
7
State of the art
1colour SIFT shape
...
...
MLP BoW
Sparse
stacked auto-encoders
SVM DTreeMLP
...
...
kNN
entomart Gassoumi 2000
janzen.sas.upenn.edu
bagging boosting ...
1Maxime Martineau et al. “A survey on image-based insect classification”.In:Pattern Recognition65 (2017), pp. 273–284.
Feature extraction in insect image recognition
Category Levels Nb approaches
Handcrafted features
Domain-dependent Wing’ Venations
25 Geometry
Global and generic image features
Shape Color Texture Raw Pixel Local features SIFT
Others
Mid-level features
Unsupervised representations
BoW PCA 17 Supervised representationsMLP
Sparse Coding Hierarchical repre-
sentations Auto-encoder 1
Table 1:Feature taxonomy for insect recognition from images
9
Contributions
Scientifical lock Contribution No deep learning based ap-
proach
Applying CNN on insect image recognition
Almost no structural approach Graph-based representation and classification
No standard benchmark dataset ImageNet-derived dataset
Research directions
Directions Statistical
CNN
Structural
Graph-based
Application to insect image recognition
11
Research directions
Directions Statistical
CNN
Structural
Graph-based Application to insect image recognition
Convolutional neural networks
and transfer learning
Convolutional Neural Networks
Model:
Image apriori CNN character- istics
Layer Translation in-
variance
Shared weights Convolution Compositionality Multi-scale Pooling Spatial local-
ization
Spatial local- ization
Convolution
& Pooling
Learning problem
min
Θ
1
|X|
∑
(x,y)∈(X,Y)
L(fΘ(x),y) Optimization method:
Θt= Θt−1−α∂L(fΘ(x),y)
∂Θ
a Convolutional Neural Network architecture : VGG-16
3x3 conv, 64 3x3 conv, 64 maxpool/2 3x3 conv, 128 3x3 conv, 128 maxpool/2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 maxpool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 maxpool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv,512 maxpool/2 flatten fc, 4096 fc, 4096 fc, 1000
Extraction
Classification
• Good example of CNN architecture close to state of art
• Extraction-Classification pattern
13
Convolutional Neural Networks
• State of the art in image classification
• Can learn complex mapping between images and classes
• Sensitive to class-cardinality balance
•
4
! Needs a lot of dataImageNet-10002 Target entomological set
# images 1 000 000 3 000
# classes 1000 30
avg # images/class 1 000 100
min cardinalities 732 33
max cardinalities 1300 370
2Olga Russakovsky et al. “ImageNet Large Scale Visual Recognition Challenge”.In:
International Journal of Computer Vision115.3 (Dec. 2015), pp. 211–252. ISSN: 1573-1405.
DOI:10.1007/s11263-015-0816-y.
Proposed method
Limitation Solution
Sensitivity to imbalance Weighting loss
Too few data Reduce number of parameters
Transfer learning
15
Reduce number of parameters
Flattening Average pooling3
…
I/O l1:R7×7×512→R25088 l2:R7×7×512→R512 Translation
invariance
No Yes (1 scalar per filter)
Next layer complexity (nneurons)
O(25088n) O(512n)
3Min Lin, Qiang Chen, and Shuicheng Yan. “Network In Network”.In:CoRR abs/1312.4400 (2013).
Reduced VGG-16
3x3 conv, 64 3x3 conv, 64 maxpool/2 3x3 conv, 128 3x3 conv, 128 maxpool/2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 maxpool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 maxpool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv,512 maxpool/2 flatten fc, 4096 fc, 4096 fc, 1000
138M parameters
3x3 conv, 64 3x3 conv, 64 maxpool/2 3x3 conv, 128 3x3 conv, 128 maxpool/2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 maxpool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 maxpool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 maxpool/2 global avgpool
fc, 256 fc, 1000
15M parameters
90% reduction 17
Transfer learning
43x3 conv, 64 3x3 conv, 64 maxpool/2 3x3 conv, 128 3x3 conv, 128 maxpool/2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 maxpool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 maxpool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv,512 maxpool/2 flatten fc, 4096 fc, 4096 fc, 1000
3x3 conv, 64 3x3 conv, 64 maxpool/2 3x3 conv, 128 3x3 conv, 128 maxpool/2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 maxpool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 maxpool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 maxpool/2 global avgpool
fc, 256 fc, n 4S. J. Pan and Q. Yang. “A Survey on Transfer Learning”.In:IEEE Transactions on Knowledge and Data Engineering22.10 (Oct. 2010), pp. 1345–1359. ISSN: 1041-4347.
How much do we have to learn ?
53x3 conv, 64 3x3 conv, 64 maxpool/2 3x3 conv, 128 3x3 conv, 128 maxpool/2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 maxpool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 maxpool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 maxpool/2 global avgpool
fc, 256 fc, 1000
Frozen
Learnt
5Jason Yosinski et al. “How transferable are features in deep neural networks?”In:
Advances in neural information processing systems. 2014, pp. 3320–3328.
19
Mitigating the imbalance problem
Weighting the loss depending on class cardinalities:
minΘ
1
|X|
∑
(x,y)∈(X,Y)
wy.L(fΘ(x),y)
wy= maxy′∈CCard(y′) Card(y)
Research directions
Directions Statistical
CNN
Structural
Graph-based
Application to insect image recognition
21
Graph neural networks
Introduction to graph neural networks
Classic NN operate on euclidean spaces
→ Find replacement layers and operators to work on non-euclidean spaces
22
Graph classification
Searchingfsuch that:
f:G → Y
• Embedding
• Explicit (G →Rn) (ex. Graph NN/Graph convolution)
• Implicit (⟨G,G⟩ →R)
• Graph space
• Graph matching (G × G →R)
Graph classification
Searchingfsuch that:
f:G → Y
• Embedding
• Explicit (G →Rn) (ex. Graph NN/Graph convolution)
• Implicit (⟨G,G⟩ →R)
• Graph space
• Graph matching (G × G →R)
23
Graph matching
a c b
1 2
ϵ1 ϵ2
ma,1= 1
mc,ϵ2 = 1
mb,2= 1
G1 G2
minm d(G1,G2,m)
d(G1,G2,m) = ∑
mi,k=1
d(i,k) + ∑
mi,k=1
∑
mj,l=1
d(ij,kl) (1a)
=d(a,1) +d(b,2) +d(c, ϵ2) +d(ab,12) +d(cb, ϵϵ2) (1b)
Graph-perceptron
7Perceptron6 I hI, βi Heaviside
Graph-perceptron GI
minmd(GI,GM,m, β) Heaviside
6F. Rosenblatt. “The Perceptron – a perceiving and recognizing automaton.”.In:
Cornell Aeronautical Laboratory.(1957), Report 85-460–1.
7Maxime Martineau et al. “Learning error-correcting graph matching with a multiclass neural network”.In:Pattern Recognition Letters(2018). ISSN: 0167-8655.
25
Parameterizing graph matching
a c b
1
2
ϵ1 ϵ2
ma,1= 1
mc,ϵ2 = 1
mb,2= 1
GI GM
d(GI,GM,y, β) = ∑
mi,k=1
βkd(i,k) + ∑
mi,k=1
∑
mj,l=1
βkld(ij,kl) (2a)
= β1d(a,1) +β2d(b,2) +βϵ2d(c, ϵ2) (2b) +β12d(ab,12) +βϵϵ2d(cb, ϵϵ2) (2c)
Graph classification
Searchingfsuch that:
f:G → Y
• Embedding
• Explicit (G →Rn) (ex.Graph NN/Graph convolution)
• Implicit (⟨G,G⟩ →R)
• Graph space
• Graph matching (G × G →R)
27
Convolution on graphs
8Spectral definitions
• analogy of the convolution theorem:
F{f∗g}=F{f} · F{g}
• definition of convolution filter in the frequency domain
• unstable across domain
Spatial definitions
• Explicit computation of convolution
• definition of convolution filter in the spatial domain
• No trivial definition
8Michael M. Bronstein et al. “Geometric deep learning: going beyond Euclidean data”.
In:CoRRabs/1611.08097 (2016). arXiv:1611.08097.
An example of spatial graph convolution: MoNet
9Degradation of graphs to riemannian manifolds:
• Non-euclidean
• Node neighbourhoods as compact euclidean spaces
a
b c
d e
f g
9Federico Monti et al. “Geometric deep learning on graphs and manifolds using mixture model CNNs”.In:CoRRabs/1611.08402 (2016). arXiv:1611.08402.
29
An example of spatial graph convolution: MoNet
9Degradation of graphs to riemannian manifolds:
• Non-euclidean
• Node neighbourhoods as compact euclidean spaces
Computation of convolution at nodee
a
b c
d e
f g
9Monti et al., “Geometric deep learning on graphs and manifolds using mixture model CNNs”.
MoNet: local operation
10Node neighbourhood graphletgeI
b c
e
f g
(0,3)
(1,2)
(−2,1)
(−1,−1)
Local/tangent spaceN[e]
x y
b c f
g e
10Monti et al., “Geometric deep learning on graphs and manifolds using mixture model CNNs”.
30
MoNet: local operation
10Node neighbourhood graphletgeI
b c
e
f g
(0,3)
(1,2)
(−2,1)
(−1,−1)
Local/tangent spaceN[e]
b c f
g e
µ1 Σ1 µ2
Σ2
µ3 Σ3 µ4
Σ4
10Monti et al., “Geometric deep learning on graphs and manifolds using mixture model CNNs”.
MoNet: local operation
10Local/tangent spaceN[e]
b c f
g e
µ1 Σ1 µ2
Σ2
µ3 Σ3 µ4
Σ4
Convolution computation ate (
w1(e) w2(e) w3(e) w4(e))
β1
β2
β3
β4
wi(x) = ∑
x′∈N[x]
exp−12(x′−µi)TΣ−1i (x′−µi) Parameters: βiandµi,Σi
10Monti et al., “Geometric deep learning on graphs and manifolds using mixture model CNNs”.
30
MoNet: local operation
10Local/tangent spaceN[e]
b c f
g e
µ1 Σ1 µ2
Σ2
µ3 Σ3 µ4
Σ4
Convolution computation ate (
w1(e) w2(e) w3(e) w4(e))
β1
β2 β3
β4
wi(x) = ∑
x′∈N[x]
exp−12(x′−µi)TΣ−i1(x′−µi)
Parameters: βiandµi,Σi
• Transforms graph to riemannian manifold
• Limited use of topology
10Monti et al., “Geometric deep learning on graphs and manifolds using mixture model CNNs”.
Our Graph convolution proposal
Convolution purely in graph space
• Using graph matching
• No degradation of the graph representation
a
b c
d e
f g
31
Our Graph convolution proposal
Convolution purely in graph space
• Using graph matching
• No degradation of the graph representation
Computation of convolution at nodee
a
b c
d e
f g
Our GCNN approach: local operation
b c
e
f g
n o
p
q r
geI GF
mc,p= 1
me,n= 1
(GI∗GF)(e) = max
m
∑
mi,k=1
hLi, βki+ ∑
mi,k=1
∑
mj,l=1
hLij, βkli (3a)
= hLe, βni+hLc, βpi+hLec, βnpi (3b)
32
Graph convolution complexity
• Bipartite matching solver used
• O(n5)complexity per local operation
• Use of simplified model (no edges):O(n3)
Testing Graph Convolution
n×nconv, 32 maxpool/2 n×nconv, 64
maxpool/2 n×nconv, 128
maxpool/2 global avgpool
fc, n
34
Testing Graph Convolution
Dataset Training set Validation set Testing set
MNIST-original 48 000 12 000 10 000
MNIST-rotated11
10 000 2 000 50 000
MNIST-reduced
1
4 grid 75 superpixels Region Adjacency Graph
11Hugo Larochelle et al. “An empirical evaluation of deep architectures on problems with many factors of variation”.In:Proceedings of the 24th international conference on Machine learning. ACM. 2007, pp. 473–480.
Results for Graph Convolution
Repr. Dataset CNN MoNet Ours
1
4grid reduced 99.88 % 99.40 % 97.76 % mixed 89.87 % 88.90 % 95.63 % 75 superpixelsreduced 92.70 % 89.53 %
mixed 92.90 % 94.17 %
Results on MNIST-2class
• reduced: no rotation
• mixed: rotation only during testing
36
Research directions
Directions Statistical
CNN
Structural
Graph-based Application to insect image recognition
Experiments on insect image
recognition
Experiments on insect image recognition
Datasets
IRBI dataset
Smartphone
Diffusion dome
LED Specimen
38
ImageNet-arthropods
• Images in ImageNet category arthropods
• Cardinality reduction (to match IRBI avg # of images per class)
Datasets
Dataset Nb of classes µ(Card(c)) σ(Card(c))
IRBI 30 85 71
ImageNet-arthropods 443 96 78
40
Experiments on insect image recognition
Results
Transfer learning parametrization
3x3 conv, 64 3x3 conv, 64 maxpool/2 3x3 conv, 128 3x3 conv, 128 maxpool/2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 maxpool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 maxpool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 maxpool/2 global avgpool
fc, 256 fc, 1000
41
CNNs
Model IRBI ImageNet-arthropods
Top-1 Top-5 Top-1 Top-5
SIFTBoW 52.3 % ± 3.7 82.7 % ± 3.3 11.7 % ± 0.2 25.9 % ± 0.4 VGG16-frsc 54.0 % ± 5.0 84.9 % ± 3.0 26.9 % ± 0.7 50.1 % ± 0.7 VGG16-fitu 72.0 % ± 3.2 92.1 % ± 1.1 42.7 % ± 0.9 69.4 % ± 0.6 VGG16-fitu/w 73.6 % ± 1.8 92.4 % ± 2.2 43.5 % ± 1.1 71.3 % ± 0.8 VGG16-fitu7/w 72.4 % ± 2.8 92.6 % ± 2.1 43.3 % ± 0.6 71.8 % ± 0.4
Table 2:Recognition rates on 5-fold cross-validation
• SIFTBoW: SIFT descriptors classified through codebook
• VGG16: CNN approaches
• frsc: from random weights
• fitu: with transfer learning
• /w: with loss weighting
• fitu7: with transfer learning (7 last layers)
Graph CNNs
Model IRBI
Top-1 Top-5
SIFTBoW 52.3 % ± 3.7 82.7 % ± 3.3 VGG16-frsc 54.0 % ± 5.0 84.9 % ± 3.0 Table 3:Recognition rates on 5-fold cross-validation
Dataset IRBI
Model Top-1 Top-5
Our graphcnn 30.40 % 71.03 %
Table 4:Recognition rates using region adjacency graphs
• Different network size
• No transfer learning
43
Conclusions and perspectives
Conclusions
• Thorough state of the art of insect image recognition
• Application of CNN through transfer learning
• Graph-based approaches proposals
• Graph perceptron
• Graph Convolutional neural networks
• Publicly available dataset proposed: ImageNet-arthropods
44
Perspectives
• Graph Convolution optimization
• CNN pipeline improvements
• Other research directions
• Metric learning
• Zero-shot learning
Graph convolution neural network
• Graph matching solver to be changed
• Lower complexity UnderO(n3)
• Differentiability
∂L(fW(x),y)
∂W depends on the solver iterations
46
CNN pipeline improvement
• Dealing with multi individuals images
• Networks returns list of locations and classes
• Existing CNN-based approaches:
R-CNN, YOLO, …
• Extra need in labelling
Towards metric learning
• Unknown query image
• Model returns list ofknearest known images
• Potentially useful to the user
• Neural approach: siamese network
48
Zero-shot learning
What if an unknown class occurs?
Existing approaches:
• Using metric learning
• Using a-priori knowledge (existing approaches with Graph CNN12)
zebra dear okapi
legs body striped
striped
brown
striped brown
brown
12Xiaolong Wang, Yufei Ye, and Abhinav Gupta. “Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs”.In:The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 2018.
Insect recognition application
50
Upcoming academic projects
• ANR projects
• Ecophyto ANR
• Biodiversité ANR (Challenge IA biodiv)
• New datasets from IRBI and IRSTEA
Publications i
In proceedings
Martineau, Maxime et al. “Approches connexionnistes pour la reconnaissance d’images d’insectes.”.In:Journées ORASIS’17. Colleville-sur-Mer, France, 2017.
Raveaux, Romain et al. “Learning Graph Matching with a Graph-Based Perceptron in a Classification Context”.In:Graph-Based Representations in Pattern Recognition - 11th IAPR-TC-15 International Workshop, GbRPR 2017, Proceedings. 2017, pp. 49–58.
Martineau, Maxime et al. “Effective Training of Convolutional Neural Networks for Insect Image Recognition”.In:International Conference on Advanced Concepts for Intelligent Vision Systems. Springer. 2018, pp. 426–437.
52
Publications ii
Journal articles
Martineau, Maxime et al. “A survey on image-based insect classification”.In:Pattern Recognition65 (2017), pp. 273–284. ISSN: 0031-3203.
Alaei, Alireza et al. “Blind document image quality prediction based on modification of quality aware clustering method integrating a patch selection strategy”.In:Expert Systems with Applications108 (2018), pp. 183–192. ISSN: 0957-4174.
Martineau, Maxime et al. “Learning error-correcting graph matching with a multiclass neural network”.In:Pattern Recognition Letters(2018). ISSN: 0167-8655.
Thanks
54
Thanks for your attention
Graph matching
a
b c
d e
f g
<match>
1 2
3 4
LetG1 = (N1,E1,L1 )andG2 = (N2,E2,L2 )be two graphs
Problem
Graph Matching formulation
y∗=argmin
y d(G1,G2,y), (4a)
subject to y∈ {0,1}n1n2 (4b)
n1
∑ i=1
yi,a= 1 ∀a∈[1,· · ·,n2 ] (4c)
n2
∑
a=1yi,a= 1 ∀i∈[1,· · ·,n1 ] (4d)
d(G1,G2,y) = ∑ yia=1
dV(L1 i,L2
a) (5)
+ ∑
yia=1
∑ yjb=1
dE(L1 ij,L2
ab) =yTDy (6)
55
Graph classification
Input:
a
b c
d e
f g
<match>
Model:
1 2
3 4
LetG1 = (N1,E1,L1 )andG2 = (N2,E2,L2 )be two graphs
Problem
Graph Matching formulation
y∗=argmin
y d(G1,G2,y), (7a)
subject to y∈ {0,1}n1n2 (7b)
n1
∑ i=1
yi,a= 1 ∀a∈[1,· · ·,n2 ] (7c)
n2
∑
a=1yi,a= 1 ∀i∈[1,· · ·,n1 ] (7d)
d(G1,G2,y) = ∑ yia=1
dV(L1 i,L2
a) (8)
+ ∑
yia=1
∑ yjb=1
dE(L1 ij,L2
ab) =yTDy (9)
Parametrized graph matching
Problem
Parametrized Graph Matching Problem y∗=argmin
y d(G1,G2,y, β) (10a)
57
Parametrized graph matching
Problem
Parametrized Graph Matching Problem y∗=argmin
y
d(G1,G2,y, β) (10a)
How to defineβ?
Parametrized graph matching
Problem
Parametrized Graph Matching Problem y∗=argmin
y
d(G1,G2,y, β) (10a)
How to defineβ?
57
Parametrized distance function example
Testing graph perceptron: data
Database size (TrS,TeS)
# classes node labels
edge
labels |V| |E| max|V| max|E| LETTER
(high) (750,750) 15 x,y none 4.7 4.5 9 9
GREC (286,528) 22 x,y Line
types
11.5 12.2 25 30
59
Results
ηTrS η std(η) time std(time) GREC
Proposal 0.9733 0.9488 0.1054 87.31 24.49
NW-1-NN (β= 1) NA 0.5235 0.0561 1588.83 870.46
T-1-NN (Kaspar Riesen and Horst Bunke. “Approximate graph edit distance compu- tation by means of bipartite graph matching”.In:Image Vision Comput.27.7 (2009), pp. 950–959)
NA 0.9992 0.0096 1789.52 990.08
LETTER
Proposal 0.8610 0.8262 0.1279 31.09 6.42
NW-1-NN (β= 1) NA 0.9735 0.0294 1584.15 510.37
T-1-NN (Riesen and Bunke,
“Approximate graph edit distance computation by means of bipartite graph matching”)
NA 0.9735 0.0295 1573.96 490.51
Learning rate impact
61
Demo
Conclusion
• Parametrized graph matching optimization problem solved with a graph-based perceptron
• Large speed-up with little accuracy loss compared to 1-NN classifier
• Simple paradigm with parameters to explore: Graph models types, stacking up neurons, …
63