Effective Training of Convolutional Neural Networks for Insect Image Recognition
Maxime Martineau, Romain Raveaux, Cl´ ement Chatelain, Donatello Conte, Gilles Venturini
ACIVS
LIFAT EA 6300
Outline
1. Context
2. Theoretical context
3. State of the art ?
4. Convolutional Neural Networks
5. Proposed method
6. Results
Context
Arthropod identification
Figure 1: Examples of insect images. At the top, image acuired in a controled
environment. At the bottom, image acuired in a field-based environment.
Applications
• Applied entomology
• Estimation of the insect populations
• Biodiversity assessment
• Integrated pest management
3
Why automation?
• Complex task
• Needs a lot of qualified workforce
Arthropod identification
How to automate the task ?
5
Theoretical context
Theoretical context
Image classifcation
Let an image x ∈ R n×m×3 and C a class set.
We are searching for the classifier function f s.t.:
f : R n×m×3 → C x 7→ f (x)
6
Theoretical context
Image-based insect classification
• High intra-class variability
• Sometimes low inter-class variability
• Multi-granularity
• Different sceneries (lab, field, . . . )
order
family
genus
species
State of the art ?
A survey on image-based insect classification?
44 articles
Features
•Granularity
•Number of taxons
•Type of capture
•Constrained pose?
•Datasets
•Area of image
•Preprocessing
•Types of features used
•Classifier(s) used
•Accuracy
•Validation
•Cited
Clustering
State of the art
colour SIFT shape
...
...
MLP BoW
Sparse
stacked auto-encoders
SVM DTreeMLP
...
...
kNN
Image capture
Feature
extraction Classification
entomart Gassoumi 2000
janzen.sas.upenn.edu
bagging boosting ...
9
Features used
Category Levels
Handcrafted features
Domain-dependent Wing’ Venations
Geometry Global and
generic image features
Shape Color Texture Raw Pixel Local
features
SIFT Others Mid-
level features
Unsupervised representations
BoW PCA Supervised
representations
MLP Sparse Coding Hierarchical rep-
resentations Auto-encoder
Conclusions
• More and more generic and learning approaches.
• But no Convolutional Neural Network approach
11
Conclusions
• More and more generic and learning approaches.
• But no Convolutional Neural Network approach
Convolutional Neural Networks
Convolutional Neural Networks
Neural networks using convolution
Convolution
Source :http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
13
Convolution
Source :http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
Convolution
Source :http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
13
Convolution
Source :http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
Convolution
Source :http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
13
Convolution
Source :http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
Convolution
Source :http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
13
Convolution
Source :http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
Convolution
Source :http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
13
Convolution
Source :http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
Convolution
Source :http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
13
Convolutional Neural Networks
Neural networks using convolution
Convolutional Neural Networks
• State of the art in image classification
• Can learn complex mapping between images and classes
• Needs a lot of data
15
A lot of data
ImageNet
• 1000 classes
• 15 M images Our dataset :
• 30 classes
• 3000 images
Proposed method
Proposed method
Efficiently apply transfer learning method to CNN on insect image
recognition.
Proposed method
ImageNet-1000
3x3 conv, 64 3x3 conv, 64 maxpool/2 3x3 conv, 128 3x3 conv, 128 maxpool/2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 maxpool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 maxpool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv,512 maxpool/2
flatten fc, 4096 fc, 4096 fc, 1000
Target
3x3 conv, 64 3x3 conv, 64 maxpool/2 3x3 conv, 128 3x3 conv, 128 maxpool/2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 maxpool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 maxpool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 maxpool/2 global avgpool
fc, 256 fc, n
18
Results
Comparative study
Model IRBI ImageNet-arthropods
Top-1 Top-5 Top-1 Top-5
SIFTBoW 52.3 % ± 3.7 82.7 % ± 3.3 11.7 % ± 0.2 25.9 % ± 0.4 VGG16-frsc 54.0 % ± 5.0 84.9 % ± 3.0 26.9 % ± 0.7 50.1 % ± 0.7 VGG16-fitu 73.6 % ± 1.8 92.4 % ± 2.2 43.5 % ± 1.1 71.3 % ± 0.8
Table 1: Recognition rates on 5-fold cross-validation
19
How much do we have to learn ?
ImageNet-1000
3x3 conv, 64 3x3 conv, 64 maxpool/2 3x3 conv, 128 3x3 conv, 128 maxpool/2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 maxpool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 maxpool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv,512 maxpool/2
flatten fc, 4096 fc, 4096 fc, 1000
Target
3x3 conv, 64 3x3 conv, 64 maxpool/2 3x3 conv, 128 3x3 conv, 128 maxpool/2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 maxpool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 maxpool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 maxpool/2 global avgpool
fc, 256 fc, n