Solution design - Proposed approach and network design

Proposed approach and network design

3.4 Solution design

In fact the deconvolution layer function carries out the inverse of the con-volution layer one. Similarly, the unpooling layer rebuilds the sub-sampling fulfilled with the pooling layer. The layers definition explains then the sym-metric disposition of the layers.

Figure 3.20: Max unpooling process

Actually the working process of the deconvolution layer remains the same as the convolution one however there are small details added to unpooling layer. When picking up the highest values of the 4 pixels during the pooling, whenever we are aiming to perform an unpooling layer later, we store the locations of that max values. Then the unpooling layer that will receive the low dimension data will store the values in the max location and restore the data size by filling the rest with zeros as summarized in figure 3.20.

3.4 Solution design

The solution that we suggest to apply in order to perform the painters recog-nition task respects the end to end learning concept. Its structure starts with the training set which consists in a set of artworks digital images. those sam-ples will be introduced sequentially, one by one to the first block which is the trainable features extractor.

The latter learns from the input images the more significant and discriminant features. Those features will be the input of a trainable classifier that

at-tributes by the end the right painter to the introduced artwork.

The whole process is summarized in the flow chart illustrated in the figure 3.21.

Figure 3.21: Solution main components

3.4.1 Convolutional Auto Encoders

3.4.1.1 Trainable features extractor

Generally as we explained in section 1.4.1, the features in the typical pattern recognition solutions are designed to be manually extracted from the input training samples and then used to help the model learn how to distinguish each output classes members.

However, in the deep architectures, the input samples will not only make the classifier learn, but it will make the block of features extraction learn what features to extract. As mentioned in section 2.3.1 in the painters classifica-tion we have small amount of training data, that’s why we will use the CAE networks in this block.

In our features extraction network the input samples are volumes of 3×256× 256. They first go through a padding layer that adds two lines of null pixels in the input borders. Next, we apply a convolution layer that convolutes the image with 100 filters of 5x5 size. We obtain so 100 output feature maps sized 256x256. The next Layer is the pooling one. It receives the volume of 100x256x256 and outputs a volume of 100x128x128. We apply then zero padding again and input the volume to the second convolution layer. It con-volves its output with 200 filters of 5x5 size and outputs 200 feature maps.

The volume 200x128x128 will then get sub sampled to 200x64x64 by the sec-ond pooling layer.

We come now to the second part of the network which is symmetric to the first one and we start with a unpooling layer that returns the volume to

3.4 Solution design

200×128×128. We apply then a deconvolution layer which takes that volume and convolutes it withh 100 filters. We add afterward a padding layer and then we add a pooling layer that returns the volume to 100 × 256 ×256.

The final convolution layer takes the input volume of 100×256 ×256 and returns back the output that should be similar to the input image. The whole process (except the padding layers) of our solution’s feature extractor block is illustrated in the figure 3.22.

Figure 3.22: CAE designed layers

We can notice that the information going through the network is three di-mensional. The convolution and deconvolution layers act on the two last dimensions while the pooling and unpooling ones act on the first dimension which corresponds to the depth of the volume.

3.4.1.2 Trainable classifier

We can understand from the previous section that the first network output is absolutely not a classification result and what matters in it, is the mid layers when we are learning features by encoding and decoding the input samples.

Having trained this network, we can move now to the next steps and carry out the supervised classification task.

In our solution, we decided to make profit from CNN. On the one hand it is powerful on the other hand it is suitable to be used with a CAE feature extractor.

Figure 3.23: CNN designed layers

In order to conceive our CNN and to linked it at the same time to the first block, we remove from the CAE the decoder part, (deconvolutions and un-poolings) and add two fully connected layers of 400 and 200 sizes. Finally, w stack on top of them the Softmax layer that we explained in section 2.2.1.2 with N number of out put classes. The N here corresponds to the number of the Painters that painted the input samples s it depends basically on the training data set that we used. Here also, we can find the full process in the figure 3.23.

3.5 Conclusion

In this chapter, we detailed the system architecture with respect to the the-matic related ground-truth limits. In the next chapter we will be focusing more on the computational issues for deep neural network training and we will expose the way in which we deployed, trained and tested the conceived solution. We will give by the end a set of experimental results in addition to short interpretations.

Chapter 4

4.2.1 Deep Learning and computational environment . . . . 59 4.2.1.1 CPU vs GPU for DNN training: . . . 59 4.3.1 Data structures: From Vectors and Matrices to Tensors 66 4.3.2 Employed Training Set . . . 67 4.3.3 Wavelets from features to raw data . . . 68 4.4 Experimental results . . . 69 4.4.1 Classify with auto-encoder. . . 70 4.4.2 Classify with CAE and CNN . . . 75 4.4.2.1 Configuration and testing protocol . . . 75 4.4.2.2 Classification results . . . 76 4.4.3 Wavelets as input to CAE architecture . . . 77 4.4.3.1 Configuration and testing protocol . . . 77 4.4.3.2 Classification results . . . 78 4.5 Conlusion . . . 79

Dans le document Présenté en vue de l’obtention du (Page 68-72)