Présenté en vue de l’obtention du

(1)

Ministère de l’Enseigne

Recherche scientifique Université Tunis el Manar Université Paris Descartes Ecole Nationale d’ingénieurs de Tunis

Ecole Doctorale STI

MÉMOIRE

Présenté en vue de l’obtention du

Diplôme de Mastère de Recherche

Spécialité : Traitement de l'Information et Complexité du Vivant Élaboré par :

Nadine BEN ALAYA

Deep Learning For Painters Recognition

Soutenu le 02/02/2018 devant le jury d’examen composé de :

Président : M. Nicolas Loménie Rapporteur : M. Michel Soto Encadrant : M. Walid Ayadi Co-encadrant : Mme. Alice Porebski

Laboratoire de Recherche

Laboratoire d’informatique Signal et Image de la Côte d’Opale

(2)

“

It seems probable that once the machine thinking method had started, it would not take long to outstrip our feeble powers ...

At some stage therefore, we should have to expect the machines to take control.

”

Alan Turing,

(3)

to my father Farhat, the super-hero of my life whose great sacrifices are absolutely what led me to be what I am,

to my mother Amina, the great infinite support, tenderness and care that es- corted me all my life long.

to Nidhal, my discovery of the world and the person who has unconsciously inspired me for my future decisions,

to Tawfik, my marvellous life-long companion that none of the words might describe his contribution in my life or how beautiful is his existence by my side, to Nour, Myriam, Hasna, Sabrine and Rim the pretty little girls that are en- lightening my life and making it full of joy and happiness,

Nadine Ben Alaya, January, 2018

(4)

Preface

This master research thesis is the follow-up of the six-month graduation internship within LISIC (Laboratoire d’informatique Signal et Image de la Cote d’Opale) that belongs to Littoral Cote d’Opale in France.

During the End of studies internship we developed a deep learning based architecture that recognizes the artworks makers. The chosen architectures were Stacked AutoEncoders and the Convolutional AutoEncoders.

The work that have been done in the master thesis traction consisted in verifiy- ing whether the spatio-Frequency data can be an input to a deep architecture.

We also analyzed the evolution of the classification accuracy rate when introducing several types of wavelets transformed images as input to convolutional autoencoders.

(5)

There are many persons that contributed, straight or not, in the accomplish- ment of this project and to whom i am extremely thankful. Thus I would like to make profit from this opportunity and express my gratitude to them.

First, I would like to thank Mr. Denis HAMAD, Head of the LISIC ImAP team, for giving me this precious opportunity to accomplish my graduation project at LISIC laboratory.

My deepest appreciation goes to my supervisor Mrs. Alice POREBSKI for supporting me and providing me with assistance in the different phases of the project. I want to thank her for her wise advice and for always sharing mature, deep and interesting conversations with me. Mrs POREBSKI did always believe in me, trust my choices and appreciate the way I work and that was a very encouraging fact for me.

I would like to express also my sincere gratitude to my supervisor Mr. Nicolas VANDERBROUCK for his careful observations and precious advice.

I would like to thank my school supervisor Mr. Walid AYADI, for regularly following me up, for HIS supervision and for the great interesting and innovative suggestions he gave all the project long.

Furthermore, I would also like to acknowledge with much appreciation the team of EILCO for their welcome toward me, especially Mr. Olivier CALIN who provided me with much help.

I am also really grateful to my beloved parents, my brother and my friends for their love, encouragement and support.

Finally, I want to express my thanks to the jury members.

(6)

Abstract

Recognizing artworks’ painters is considered as an important computer vision application due to many issues. In fact the paintings are defined as a private property of their painters and each forgery or theft will be taken as punishable crime. That’s what urged the prosperity of the two main domains: Artworks Authentication and Artists Identification.

In this work we focus on the painters recognition problem. Instead of going after the majority of the methods and rely on Image processing based techniques, we present an innovative complete chain of the identification process based on deep learning. This approach combines the two deep architectures Auto-Encoders and Convolutional Neural Networks. Hence, contrarely to what painters recognizers used to do, we will be acting on the raw pixel data, with neither handcrafted features extraction, nor data pre-processing.

We firstly train a convolution auto-encoder with the challenging painter data set called painting 91 and use this network to lately initialize a supervised convolutional classification network.

Keywords: Deep learning, Convolutional Neural Networks, Auto-Encoders, Convolutional Auto-Encoders, Tensors, Torch.

(7)

La reconnaissance des artistes des œuvres d’art est consideré comme une application ayant assez d’importance dans les applications liées au domaine de la vision par ordinateur et ce retourne á plusieurs raisons. En effet, les tableaux peints sont définis comme une propriété privée du peintre correspondant et toute tentative de vol ou de falsification est classée comme un crime. C’est ce qui est a contribué á la prospérité des deux domaines: Authetification des oeuvres d’art et Identification des Artistes des oeuvres d’art.

Dans ce travail, nous allons nous concentrer sur la problématique de la reconnaissance des peintres. Au lieu de suivre la majorité des methodes qui sont á la base des applications des méthodes de traitement d’image, nous allons présenter une chaine compléte et innovante de la procédure d’identification qui se base sur l’apprentissage profond (Deep Learning). Cette approche combine des architectures profondes dites Auto-Encodeurs et Réseaux de Neurones Convolués. Ainsi, au contraire de ce que généralement font les gens qui trait- ent notre thématique, nous allons tout d’abord partir des données brutes des pixels, en applicant ni prétraitement, ni extraction manuelles des descripteurs.

Ainsi, nous faisant en premier lieux l’apprentissage d’un auto-encodeur con- volué profond et qu’on utilise par la suite pour initialiser un réseau convolué aproprié á la classification supervisée. Nous partons dans la suite du travail dans l’évalution de l’introduction de l’information spatcio-fréquentielle comme données brute au réseaux de neurone profond. Pour ce faire, mise a part les images originales du data set d’apprentissage, nous faison intervenir de plus leurs transformées en ondelettes, tout en gardant l’architecture profonde util- isé précedemment.

Mots clés : Deep Learning, Réseau de Neurones COnvolué, Auto-Encodeurs, Auto-Encodeurs Convolués, Tenseurs, Torch.

(8)

List of Tables

3.1 Biologic Neuron vs Artificial Neuron . . . 38 4.1 Test Cases with Stacked Auto-encoders . . . 75 4.2 Test Cases with Haar, Daubechies and Meyer wavelets transform 78 4.3 Test Cases with several degrees of Haar wavelets . . . 79

(15)

AE AutoEncoder AI ArtificialIntelligence

ANN ArtificialNeural Network CAE Convolutional AutoEncoder CNN Convolutional Neural Network CPU CentralProcessing Unit DL DeepLearning

DNN DeepNeural Network EC2 Elastic CcomputeCloud FC Fully Connected

GPU Graphical Processing Unit L Loss function

ML Machine Learning

MLP Multi Layer Perceptron MSE Mean Square Error NN NeuralNetwork

RBM RestrictedBoltzmanMachine ReLU RectifiedLinear Unit

RNN RecurentNeural Network SGD StockasticGradientDescent TanH HyperbolicTangent

W Neurons Weights

(16)

General introduction

Context and problematic

As image acquisition techniques has increased in the previous decade, the well known museums started to collect large digital libraries that belong to their collections. Besides, the cross-disciplinary synergy between image analysis researchers and art historians attained a high level. In fact, technology devel- opers became able to focus on the image analysis tasks that support on one hand art historians missions, like painting analysis, and on the other hand image acquisition critical points such as storage, data base research,..,etc [1].

Particularly the problem of painters identification seems to be mature enough to be considered as an interesting field for applying the state of the art image processing techniques. Usually, in order to affect an artist to a painting, experts use the current awareness of the artist frequent practices in addition to a highly meticulous comparison between a set of technical data that might be acquired through several advanced techniques. Experts may also notice the presence of a visual assessment of the way the artists draw their brushstrokes.

That is what led the painters to trust the image mathematical analysis in the identification process.

Although being interesting and important task, the painters recognition re- mained not competitive in terms of high error rates with other identification tasks such as digits or objects recognition and this is back to the tasks complexity. However the advances in terms of computer vision and Artificial intelligence terms stared to increase rapidly. The manually extracted features became more and more potential similarly to the numerous supervised classification algorithms.

With the recent spread of Deep learning, painters recognition moved to an new advanced level. Deep Neural Networks started as a biologic inspired model and became now an integral pillar of mathematics and Machine Learning framework. The DNN are relevant to regression and classification as supervised tasks, dimensionality reduction and representations learning as unsupervised tasks and even to be interpreted in probabilistic perspective.

This project, proposed by Louvre Lens museum, belongs exactly to this context. Basically, the initial main thematic is to treat the case of Le Nain Broth-

(17)

ers painters. These are three painters called Antoine, Louis and Mathieu Le Nain who did paint together on the same canvases for a particular period of time. The museum aims to distinguish who did what exactly in each of their artworks. The complexity of this thematic resides in the fact that those brothers did not fix specific regions for each of them before starting. They did meaningfully paint together in all shapes and details. We can even find regions that contain superposed brushstrokes that are back to different painters.

The first objective to deal with this problematic is hence to design a deep learning based solution that does not need a huge amount of data to be well tuned. That is to say designing a solution that works with the typical paintings identification but which is apt to be used for Le Nain brothers problematic.

Project framework

LISIC Laboratory

Created on January 1, 2010, LISIC resulted from the merge of two labora- tories part of the University of Littoral Côte d’Opale: LIL (Littoral Com- puter Science Laboratory)and LASL (Laboratory for the Analysis of Coastal Systems). It includes 40 professors and 16 doctoral students that develop research activities in the field of Information, computer Science and Technology.

It is composed of 4 research teams. Multi-modeling and software evolution team, Optimization Simulation Evolutionary Modelisation team, Information Perception and Fusion Systems team and Images and Learning called Imap Team responsible for the internship proposal. The Imap activities are related to analysis (classification) and image synthesis (simulation of lighting) fields.

In addition to that, they develop common activities around classification and learning, applied to images, whether of natural or artificial origin.

Figure 1: EILCO - École D’ingénieurs du Littoral Côte d’Opale

(18)

Louvre Lens Museum

Louvre-Lens is a public administrative institution for cultural cooperation. It has been founded by the Lens-Liévin agglomeration community, Pas-de-Calais department, Nord-Pas-de-Calais regional council and inaugurated December, 4^th, 2012. This "second Louvre" is located in Lens in Pas-de-Calais and by Marie Lavandier. It is an autonomous institution, linked to the Louvre museum in Paris by a scientific and cultural convention. The museum is built on the site of the former No. 9 pit of the Lens mines. The new building, under the supervision of the Nord-Pas-de-Calais regional council, hosts semi-permanent exhibitions representative of all the collections of the Louvre Museum, which are regularly renewed. It also hosts temporary exhibitions at national or inter- national level. The museum should be served by the Artois-Gohelle tramway in the 2020s, but in the meantime shuttles connect it to the pole stations.

Figure 2: Louvre Lens Museum

Signal and Systems Unity - U2S

U2S laboratory, created on 2003, is a research structure that belongs to the National Engineering School of Tunis - ENIT. It brings together about fifty researchers (5 professors, 15 assistants and assistant professors and more than 20 PhD students). Senior and junior U2S researchers have 4 types of bal- anced profiles between Telecommunications, Electrical Engineering, Statistics and Computer Science. What brings U2S researchers together is their abil- ity to use and design Signal, Systems and Statistics tools for R D applications.

U2S essentially develops tools for the analysis and design of signal processing and system control algorithms that are non-stationary, multidimensional and nonlinear. It also begins to acquire expertise and dual skills - Information Processing and Complexity of Life (TICV) supported by the TICV Master

(19)

that it manages in co-degree with the Master’s degree in Mathematics and Computer Science from Paris University. The work explained in this report is done as a Master thesis of this program.

Figure 3: U2S laboratory logo

Thesis overview

In this report we will detail the main theories and technologies that took part of the solution elaboration. We will start in the first chapter with a state of the art that limits the scope of our study. In this chapter we will give a brief literature review that concerns most of the works done in the field of image painters identification. In the next chapter we introduce on overview that encloses the basis behind artificial neural networks and deep neuronal architectures. In the Third chapter we will detail the configuration of the networks we propose to adopt in addition to the necessary theoretical background that helps to make them well understood. We will finish with describing the computational resources used in the networks running and testings in addition to the experimental results done in this work.

(20)

Chapter 1 Painters Identification: State Of The Art

Sommaire

1.2 Image Classification for Painter Recognition . . . 6

1.2.1 Artists Identification through history . . . 6

1.2.2 Features extraction . . . 8

1.2.3 Trainable classifiers . . . 9

1.3 Images transformed to frequency domain . . . 10

1.3.1 Fourier transform . . . 11

1.3.2 Gabor transform . . . 12

1.3.3 wavelet transform. . . 13

1.4 Literature review . . . 15

1.5 Conclusion. . . 16

(21)

1.1 Introduction

This chapter explores a state of the art of the images, specifically the digitized paintings, classification and painters identification task. First a brief overview of paintings recognition methods through history is given. Next the typical image processing based classification model is analyzed with a review of some of the known tools related to its several components. We finish up evidently with a review of some of the related works that have been done in the field of artworks classification.

1.2 Image Classification for Painter Recognition

1.2.1 Artists Identification through history

Painting authentication or the task of determining whether or not a given artwork was painted by a specific painter, was originally developed to cope with art forgery, which has been an active business dating back for thousands of years. Traditionally, the image authentication techniques for forgery detection relied principally on the discerning abilities of experts to deduce the authenticity of a painting or an artist’s well known work. Over time, these manually performed authentication techniques have been greatly enhanced by exploiting more characteristics other than the human eye, and using new technologies to upgrade the traditional methods, such as spectrometry, chemical analysis, X-ray and infrared imaging [2].

Figure 1.1: Classification of paintings based on their painters

In recent years, with the rapid advancement in digital acquisition, editing and production of paintings and artworks, automated painting analysis and

(22)

1.2 Image Classification for Painter Recognition

painter recognition has become an important task not only for forgery detection, but also for objects retrieval, archiving and retrieving artworks. With the vast digital collections being available especially on the internet or even for libraries or museums, painter recognition can also be very useful to provide some crucial information such as the authorship. Actually, this artist-based classification allows to create index for retrieving and organizing painting collections, identifying unknown paintings, as well as gaining new insights of the artistic style of given artists from their works [3].

With the availability of the recent high resolution digital technology, the exist- ing capabilities of art analysis and authorship detection are getting more and more enhanced by developing new statistical and image processing techniques, by which artists style can be described using mathematical tools applied to high-resolution digitized versions of the paintings and artworks. We expose in figure 1.1 examples of digitized paintings classified into 5 classes based on their painters.

Figure 1.2: Image classification process

In fact, knowing that calligraphy and signature have been used for decades as a singular sign of any individual, it is evident that every person has its own particular way of moving the hand while painting or writing. Therefore, every painter can normally be identified from his own way of striking the painting board with the brush, leaving certain personal patterns that can be detected by applying computer vision and pattern recognition techniques to high resolution images of paintings [4].

In this context, the painter recognition task can be presented as an image classification problem that deals with deciding which artist painted a given painting based on the analysis of a set of hidden descriptors. Given a set of painting images of various artists ( with multiple paintings for each painter) as an input, the purpose here is to automatically extract and analyze these descriptors in order to classify a given painting with the corresponding artist.

Usually, this classification process is carried out through a typical set of levels as shown in figure 1.2.

(23)

1.2.2 Features extraction

In general, a digitized version of a painting can be represented as an RGB image that contains a large quantity of visual data with various complex connections called image features, such as statistical descriptors or spatial features. In order to discover these hidden connections in a large amount of images data, many feature extraction techniques can be applied as a funda- mental step in any content-based classification problem [5].

In fact, this allows to reduce the high dimensionality of visual data to a low- dimensional representations which can be easily manipulated for image understanding. As a result, each painting image is represented as a feature vector whose dimensions are equal to the number of chosen features, allowing thus a transition from the image space to the feature space. Features extracted from the image might be whether global or local.

The global features do usually describe the considered image as a whole in order to generalize the entire object. They include shape descriptors, contour representations, as well as texture features. Some examples of global features that may be mentioned here are Invariant Moments,Shape Matrices and His- togram of Oriented Gradients (HOG) (shown in figure 1.3).

Figure 1.3: Histogram of Oriented Gradient example [6]

On the other hand, the Local descriptors do focus on image patches that are considered as key points in the image. Scale Invariant Features Transform - SIFT, Local Binary Patterns - LBP (shown in figure 1.4), Speeded-Up Ro- bust Features - SURF, Maximally Stable Extream Regions - MSER and Faste REtinA Keypoint - FREAK are some examples of local features.

(24)

1.2 Image Classification for Painter Recognition

Figure 1.4: Input image (left) processed by LBP (right) [7]

In general, in case of dealing with low level applications such as classification and object disclosure, it is more suitable to take advantage of global features however for applications with higher level just like object recognition, local features are usually more used. In some image processing tasks, the com- bination of both, global and local features, can improve the accuracy of the recognition yet this might generate a side effect of computational overheads.

1.2.3 Trainable classifiers

The features vector being already extracted, the image processing task’s flowchart achieves at this point the classification step. Generally, the objective behind a classification protocol is to assign an input presented pattern to a particular class with reference to the appropriate feature vector. The extracted vector serves to fill the input data into a new representation space called features space where the elements of the data set become more obviously separable as shown in the figure 1.5

The computer vision related literature presents a multitude of classifiers employed to solve the patterns classification problem. The latter’s complexity, depends basically on the irregularity of the features vector values assigned to the patterns taking part of the same class compared to the difference between feature values back to patterns that belong to different classes. Hence, we can consider that the accuracy got while using a specific classifier depends meaningfully on the employed data-set. Therefore the fact of reaching the best possible performance on a specific pattern recognition task in not absolutely dependent on finding out the best performing single classifier.

Practically, there are several cases that we come across where none of classifiers, used individually, can reach an acceptable classification accuracy level.

(25)

In such cases it is probably better to combine a set of classifiers results together in order to achieve the most accurate decision. That’s to say, since each classifier has its own way to well operate on a set of input features vector, the fact of being under appropriate assumptions, combining a variety of classifiers can lead us to better generalization performance compared to any single trainable classifier.

Figure 1.5: Data representation in input space and in the features space [8]

Due to their variety, the computer vision tasks such as object detection, pattern recognition and identification , are no more restricted to testing a single approach on the studied case and application. However we are getting more into comparison of different approaches that each of them might be the com- bination of a multitudes of preceding employed methods. In this context, as part of the well known supervised and unsupervised classifiers in the image related thematic, we can mention the Bayesian Classifier [9], Decision Tree [9], Parzen Window [9], k-Nearest Neighbor [9] , Maximum Likelihood classification [10], Support Vector Machine[10] and the family of neural networks such as Multi-Layer Perceptron and Recurrent or Feed Forward Neural Networks [9].

1.3 Images transformed to frequency domain

Digitized images, added to signal, belong to data types that are unstruc- tured and heavily rich with information hidden in. That’s why, in general, tasks related to image processing are relatively complicated and working on such cases requires deep work toward several directions such as descriptors extraction and classifiers as already explained in the previous sections. The representation space remains as well an interesting area to explore.

Images are originally given in the space domain where the pixels are referenced with their locations and intensities. Researchers among years illustrated that

(26)

1.3 Images transformed to frequency domain

moving from space domain to the the frequency domain is meaningfully useful in image understanding tasks. In this context, a multitude of well performing transforms are explored, yet Fourier Transform, Gabor filters and Wavelets remain the most known and commonly used.

1.3.1 Fourier transform

Fourier Transform is a valuable tool among image processing domain. it is theoretically based on decomposing the concerned image into its sine and cosine items. This transformation results into a new depiction of the initial input in the frequency domain (also called Fourier domain) corresponding to the initial spatial domain. Each point of an image in the Fourier domain refers to a specific frequency (pixel intensity) that appears "somewhere" in the spatial domain image.

Figure 1.6: Images with different edges contained with their FT transform below [11]

Actually, experiments in favor of Fourier transform proved that it is usually convenient to characterize several image processing operations at the expense of how do they behave toward the frequencies appearing in the image. In fact, non-theoretically and from a rather conceptual point of view, Fourier transform has the role of telling us what is happening in the original image representation (where shapes, forms colours and edges appear) in terms of frequencies. In this context, associating what is happening in the image to pixels intensities behaviour is back to the dependency of what we see in the original image on the values taken by our data frequency as well as its distribution.

for example whenever we need to blur an image we just have to eliminate high frequencies. However, when we eliminate low frequencies we get closer

(27)

to edges contained in the image. Finally, if we enhance high frequencies while maintaining low frequencies we are considered as sharpening the image.

The figure 1.6 shows Fourier transform of two images containing different va- rieties of edges. Noticing the obvious periodic texture, in the vertical direction in the bricks captures image (top left), explains the close appearance of the horizontal components contained in the corresponding FT (bottom left). Sim- ilarly, the FT associated to the lightened cubes image, shows a bright lines flowing to high frequencies and aligned with perpendicular direction to the relevant edges contained in the space domain image.

Hence, whenever we’re in an area of the image where we have a strong con- trast sharp edge, the gray intensities alter very speedily. It gets associated to plenty of high frequency power in order to be able to pursue such an edge.

That’s what justifies the existence of those bright lines in the corresponding magnitude spectrum. Fourier transform is a performing image processing and it has wide range of applications where it can be explored from which we can list, image filtering, image analysis, image reconstruction and especially image compression.

However, this does not deny that it has a non neglectful limit which is the lack of information it gives about the location indexes of intensities values. In other words, Fourier Transform doesn’t give us any idea about where did a behaviour of particular frequency took place in the spatial domain. That’s to say, it is important to know that our input image contains edges for example but it is also very useful to understand where did those edges appear exactly in our image. Gabor filters is one of the methods that goes through solving this issue.

1.3.2 Gabor transform

Gabor filters, also called Gaussian filters shown in figure 1.7 belong to linear filters class with the particularity of being oriented. They make highlighting textures as well as homogeneous zones of an image possible.

Thanks to the Gaussian form of the Gabor filter, the envelopes of the filtered images bring a local spectral information in each pixel. Furthermore, they provide information on the energy content of the image in the direction of the image in the direction of the chosen filter. The figure 1.8 shows the output obtained when implementing Gabor filters in different directions on an input face images.

(28)

1.3 Images transformed to frequency domain

Figure 1.7: Images with different edges contained with their FT transform below.

Gabor filters started to usefully assign the frequency information to localities however the size of the used windows in this method being non flexible and non adaptable to the analyzed signals frequency variation remains a weakness point of this transform that will be solved within Wavelets, subject of the next section.

Figure 1.8: Output of the implementation of Gabor filters with different ori- entations [12]

1.3.3 wavelet transform

The formalism of the 1D M-band wavelet transform has been developed for continuous signals. This transform is characterized given a scale function and an M-1 wavelet functions, each of them able to be subject of translation with respect to a real location parameter as well as dilation/contraction by a

(29)

strictly positive scale factor [13]. The fact of having a local treatment to data and of being able to apply flexible wavelets in terms of dilation/contraction and translation, this transform overcomes somehow the limits of Fourier and Gabor transforms mentioned in the two previous sections.

Figure 1.9: Different Wavelet shapes [14]

The figure 1.9 shows different shapes of wavelets that might be used in the domain transform operation. Wavelet transform has a wild range use in the features extraction component of classification systems. It is employed in several application to build a consistent features vector (figure1.10that serves lately to the classifier’s training.

Figure 1.10: Building the Feature Vector based on the Wavelet transform [4]

In this section, concerning domains transforming and its methods and going form Fourier to Wavelet transform we sufficed with a small overview free of mathematical details. In fact, we will not be in need to get into their very detailed theories nevertheless we should understand the basic definitions of the frequency-domain data that we will be dealing with in next chapters.

(30)

1.4 Literature review

Several works have dealt with analyzing paintings and visual arts based on many types of features. Among the feature extraction techniques found in the literature we can cite curvelets [15], wavelets [16], craqueleure and con- tourlet transforms [17], which are basically used to detect small differences in brushstrokes and painting degradation, that can describe the painter’s particular style for identification purposes. Other global features such as LBP [18], GIST [19], PHOG [20] and SIFT [21] can also be used in the same context.

For the classification process, many algorithms have been applied for art analysis and painter recognition. For instance, we can find the Support Vector Ma- chine (SVM) [4], Nearest Neighbor (NN), Genetic Algorithms-based Weighted Nearest Neighbor (GA-based NN) [22], Naive Bayes classifier [23] and Artifi- cial Neural Networks (ANN) [24].

The majority of the used methods have focused on image processing and manually extracted features used with a domain-specific knowledge and tailored for only some specific datasets (infrared reflectograms [25], ink paintings,..etc).

Therefore, since the great success reached with the family of neaural networks called Deep Neural Networks, researchers in the painters identification domain got more and more involved and aimed to employ the strength of those architectures to identify the artworks dowers.

While studying things globally , the application of DNNs (an overview of those architectures will be subject of the next chapter) to the painters recognition task seems to be incoherent with the deep architectures requirements. We can even notice that in several works, researchers were about to boost the algorithms with typical images processing, particularly manually extracted features, in order to enhance the resulting accuracy.

Therefore, we focus in the present report on using a generic approach that does not rely on any domain-knowledge. We propose thus to study the use of the Deep Neural Networks or the Deep Learning technology for the painting authentication and painter recognition, which operates on the raw pixel level and does not incorporate any pre-processing or manual feature extraction tasks. That’s to say, dealing with the data-architecture mismatch that exists between digitized paintings as training data and deep neural network as classification tool.

After achieving an acceptable classification rate through deep nets, we will be exploring a new approach that consists into making Deep learning deal with

(31)

new kind of input data: The frequency domain’s information obtained from the digitized painting’s wavelet transform.

1.5 Conclusion

In this chapter, we gave a brief literature review of painters classification as well as a small study about different components of a typical images processing based recognition system. Transforms in the frequency domain took a considerable part of this theoretical scope of study since we will make profit from such information as extra input data to the classification system we will designed in this work.

Obviously most of the painting recognition proposed approaches are based on image processing tools yet for many reasons we have chosen to adopt a deep architecture and adapt it to perform well the artworks property identification.

Deep neural networks remains the basic theory behind this project, hence, in the next chapter we will be sweeping their basic theory and enumerating their different architectures [26].

(32)

Chapter 2 Deep Learning overview

Sommaire

2.1 Introduction . . . 18 2.2 Neural Network Model . . . 18 2.2.1 Biological Insight . . . 18 2.2.2 Formal Artificial Neuron . . . 19 2.2.3 Artificial Neural Networks . . . 20 2.2.3.1 Feed-Forward Networks . . . 20 2.2.3.2 Activation Function . . . 22 2.3 Deep Neural Networks . . . 25 2.3.1 Deep Learning Basis . . . 25 2.3.2 Deep Architectures types . . . 26 2.3.3 Networks with 3D volumes of neurons . . . 27 2.3.4 Deep Unsupervised Neural Networks . . . 29 2.3.4.1 Auto-encoders . . . 29 2.3.4.2 Restricted Boltzmann Machines . . . 30 2.3.5 Training Neural Network. . . 31 2.3.5.1 Back propagation . . . 32 2.3.5.2 Gradient computing Loss Function . . . 34 2.4 Conclusion. . . 35

(33)

2.1 Introduction

In this chapter, we explore with reasonable amount of details, the Deep Neural Networks domain. In order to ensure a pedagogue and distinct understanding of those very potent classifiers, it is necessary to shell this task, and start the overview with the most elementary unit of their architectures, namely the formal Artificial Neuron before moving to a synthetic highlight toward the Networks erecting.

2.2 Neural Network Model

Deep Learning, also called hierarchical learning or End to End learning takes part of machine learning algorithms. It is based on the biologically inspired model, Artificial Neural Networks, and is leading the Images classification and objects/speech recognition tasks to be preformed in a closer way to the human perception, and thus making ’Artificial Intelligence - AI’ field near to what it pretends to be.

2.2.1 Biological Insight

Biological systems are able to perform high level complexity computational tasks in a way that makes them endure among their surrounding. These complex demeanors are handled by a nervous system which is self-assured by neurons or nerve cells (see figure 3.7). Amid the hallmarks of the aforesaid systems, the scalability they admit from just a petty hundred neurons is the most noteworthy one.

Figure 2.1: Structure of a biologic neuron

Specifically further, it sounds like neurons may be combined in a way that a raise in the neurons’ number contributes into a boost in cognitive capabili-

(34)

2.2 Neural Network Model

ties [27]. Although the complex way in which a large number of neurons can behave is not well understood, the equations ruling alone neuron excitability are properly ruled, as for instance the model of Hodgkin-Hucley [28].

A neuron reaps, at its dendrites level, a lot of information, named neuro- transmitters in terms of Neuroscience. The latter are considered as excitatory inputs to which the neuron reacts as follows : its membrane potential escalates progressively and whenever the voltage of the membrane attains a particular threshold, an action potential is launched and peddled alongside the axon until the post-synaptic neuron receives it.

2.2.2 Formal Artificial Neuron

The artificial Neuron, also called The Perceptron is an absolutely uncompli- cated processing unit modeled by a mathematical function. It has a limited and predefined inputs’ number, each of them linked to the neuron with a weighted connection as illustrated in figure 2.2.

Figure 2.2: Formal Artificial Neuron Model

The neurons taken as inputs can be viewed as a vector X = x₁, x₁, ..., x_n , wherenis the dimension,x_idovetails to thei^th input neuron’s activation, and y is the activation value of the Perceptron and it can be figured out given the links’ weights w_i and the activation of the input x_i according to the formula (2.1) :

Y = Φ(b+X

i

wixi) (2.1)

Similarly to the biologic neurons, where the dendrites take as an input electrical signals conferred by the axons of another input neurons, those signals in the artificial neurons are modeled as numerical values. Between the axons

(35)

and the dendrites, those electrical signals get regulated in several amounts and so on in the perceptron, each input value gets multiplied by the links’ weights.

The fact of firing an output signal only when the entire strength of the actual neuron’s input signals outstrips a definite threshold is modeled in the perceptron by computing the inputs’ weighted sum. This sum refers to the entire strength of the signals taken as inputs to which we apply a step function in order to define the perceptron output value and feed it into the next neurons.

Several relevant learning approaches exist to train a single neuron and most of the supervised ones are built on the basis of refining the weights in a way that decreases the difference between the target and the obtained output. We can take the examples of the Widrow-Hoff [29],Perceptron [30] and the Gradient descent [31] [32] Learning rules.

2.2.3 Artificial Neural Networks

As mentioned before, the single unitary artificial neuron is a very simple processing unit and it is obvious that this model can not tolerate more than very simple computational tasks.

However, referring to the fact that biologic neurons combined together can increase the performance in terms of cognitive tasks performance, it came the idea to arrange those unitary modules together in a neural network and make them able to perform operations with higher complexity level.

Those arranged neurons are actually able to calculate the activation of a batch of output neurons once the input neurons activation are computed. This calculation frequently embroils a relative number of intermediary computational tasks that are performed through a set of hidden neurons arranged in what we call hidden layers.

2.2.3.1 Feed-Forward Networks

In theory, nothing forbids artificial neurons to be organized in a fully random way, yet practically they are generally disposed in a graph that has to be acyclic. This property ensures that the neuron’s input will not hinge in a di- rect or indirect way on its output. The Networks established with a topology that respects this condition are called Feed-Forward Artificial Neural Net- works (figure 2.3 (a)). This naming is given to these architectures since the spread of the activation takes place straight forward in the network.

(36)

The restriction that is mentioned before, does not mean that neuron graphs that contain cyclic connections are not considered as neural networks, but they actually take part of another networks family called Recurrent Neural Networks as illustrated in figure 2.3 (b). Those models are known for their performance especially when used for dynamic systems modeling, yet in the same time, implying those cyclic links between the neurons takes the training of the network to a higher complexity level and makes it face extra computational challenges.

Figure 2.3: (a) Feedforward Neural Network, (b) Recurrent Neural Network Those recurrent neural networks are out of the scope of this study since they are not relevant to the problematic requirements and we will rather be focus- ing on feed-forward nets in which the perceptrons are arranged in Layers.

Feed Froward Networks are also called multilayer Perceptron with reference to the arrangement of the units in a layered way. Vectorx described before does refer here to the first input layer l₀. Subsequent neurons are regrouped in the mid layers and they have the particularity of exclusively receiving inputs from the previous perceptrons layers. Being arranged in such architecture allows then the activation of the neurons to be computed in a layer-wise, feed-forward manner.

The parameters labeling the links between neurons and serving the computation of a layer’s activation are stored together to form the weights matrix commonly named W. Each network contains at least two layers : the input and the output layers. Any extra layer other than these two is called a hidden layer since it has no connection outside the network and it makes the network be part of the multi-layered networks category. We illustrate in figure 2.4 an example of a feed-forward network with a set of hidden layers.

(37)

Figure 2.4: Feed-froward Network with a set of hidden layers

We should also mention that each neuron in one layer is linked to every neuron among the next layer while perceptrons taking part of a same layer do never have connections between them.

2.2.3.2 Activation Function

For the Multi Layer Perceptron - MLP, while considering the two subsequent layers y and x, the Matrix notation of the computation rule is given by the equation (2.2) :

y=f(M x+b) (2.2)

wheref is what we call an activation function and its a very crucial component in Artificial neural Networks Models. Basically it is the responsible agent for learning and extracting meanings from functions with high complexity level and non-linear property.

This activation function has as role to bring non-linearity to the neural network, because we always try to conceive powerful ANNs in a way that enables them to represent and learn any random and perplexing function that computes outputs given the inputs. If we do not use this activation function with its non linear particularity, the neural network will turn into a simple Linear Regression Model, which is much easier to solve but weakly powerful in extracting sense from arduous data.

Recently, since the technological advancement we’ve known in terms of computing capabilities and infrastructure robustness, Neural Networks are back and they are in the scope of study of many groups of researchers, the thing that made them evolve according to several points. One of these points is the

(38)

activation function where we can come across pretty much common choices of them and each one is relevant to a particular problem.

Sigmoid Activation Function:

Sigmoid Activation function is given by : φ(z) = 1

1 +e^−z (2.3)

Sigmoid maps the input value onto a [0, 1] range as shown in its curve in figure 2.5. It was a popular, simple and easy to handle activation function but it has made proof of several problems and limits that made it fall out of wild practice. We can take the example of the vanishing gradient problem (the problem that faces the majority of training methods involving gradient combined with some particular activation functions).

Figure 2.5: Sigmoid Function Curve

Sigmoid has also the problem of not being zero centered which urges the optimization task to be harder. Finally the sigmoid function is not appreciated because it saturates and it converges slowly.

Hyperbolic Tangent Activation Function - TanH:

TanH is given by the mathematical formula : φ(z) = e^2z−1

e^2z+ 1 (2.4)

It was proposed to remedy to some of the Sigmoid limits and it did so by being zero centered since having an enlarged range to [-1, 1] as shown in figure 2.6.

(39)

This makes optimization easier but the problem of Gradient Vanishing still persists.

Figure 2.6: Hyperbolic Tangent Curve

Rectified Linear Unit Activation Function:

Rectified Linear Unit has been recently introduced to the activation functions list to become later the most popular and used one. It is appreciated for its simplicity, it is given easily by :

φ(z) =max(0, z) (2.5)

It is the one and only proposed activation function that solves the vanishing gradient but it has a critical way of use since it can be introduced only within the networks hidden layers. Yet, it is possible to add a Softmax layer for classification tasks or linear layer for regression ones at the level of the last output network layer.

There are three types of ReLU functions and these variety results from coping with some negative particularities of the gradient computation: ReLU [33], Leaky ReLU and Maxout function or Randomized Leaky ReLU [34]. Its three respective three curves are shown in figure 2.7.

(40)

2.3 Deep Neural Networks

Figure 2.7: Rectified Linear Unit activation functions

2.3 Deep Neural Networks

2.3.1 Deep Learning Basis

Deep learning takes part of hierarchical learning field where the models try to learn hierarchical representations of complex data. When each of their layers represent a particular level in the architecture’s hierarchy, Neural networks, explained in the previous section, will be the most suitable architecture to deep learning modeling.

Figure 2.8: (1) Typical pattern recognition process. (2) Deep learning based recognition process

Most of the designed pattern recognition solutions contain a block for features extraction and another for classification which uses the extracted features and

(41)

returns the results we’re looking for as given in figure 2.8 (1).

The basis behind Deep Learning is to expel all handcrafted tasks that appear in the typical pattern recognition models based on image processing tools for features extraction and enforce the model to do them automatically as shown in figure 2.8 (2).

Hence applying deep learning will be a kind of end to end learning method where we will replace the block of features extraction that used to be manually engineered with a trainable algorithm able to extract the best features relevant to data and to the nature of the classification task.

2.3.2 Deep Architectures types

Deep Neural networks have a deep multilayer architecture and there are three types of them: Feed forward, Feed Back and Bi-directional networks illustrated respectively in figure2.9. The Forward model is the most used network in the majority of recognition tasks resolution.

Figure 2.9: Deep architectures types

The second type of those architectures is an upside down generative model called feed back network. In this standard, the information is spread in the opposite direction that it used to in the feed forward deep networks. This way the model aims to calculate the input of the model given its output. This kind of networks is extremely exhaustive but it is quite appreciated for thematics treated using Bayesian methods since it is possible to perform this inverted computation using the Bayesian probability. As a learning method, it is not

(42)

commonly used in practice but it is still interesting conceptually.

The third type of deep networks architectures is a model that almost combines the two preceding models. It is an unsupervised learning model that involves bidirectional connections between its blocks. At the beginning of the 2000’s a group of researchers has established a distant working environment in order to relaunch the work on machine learning applied to computer vision algorithms.

As a first step they focused on the unsupervised methods in order to solve the very deep networks training problem.

Actually it is possible to pre-train this networks in an unsupervised way in order to start from a refined state compared a random initialization and, there- after, focus the on the adjustment of the model through typical supervised methods.

Too much efforts were dedicated to this point and it involved many techniques where the most known and practiced ones are restricted Boltzmann Machines and stacked sparse auto encoders. The use of this model has known good success with several problems and it has been proven that applying this learning method on relatively small databases can improve the performance in a meaningful way [35] [36].

2.3.3 Networks with 3D volumes of neurons

The typical Neural Networks, detailed in previous sections , are conceived somehow to receive as inputs, single vectors and then convert them through a succession of invisible layers. The gross shortcoming of this architecture is its vulnerability to scaling up in case of full resolution image use. We can take for example the object classification known data set called CIFAR-10.

Figure 2.10: 3D volumes of neurons among the network

(43)

Its images are of 32 high, 32 width size and of 3 depth color channels. This means that only one perceptron that belongs to the first invisible layer of the typical NN with its fully-connected architecture will be a neuron that have about 3000 parameters to learn.

This weights per neuron amount almost looks like convenient and manage- able, yet it is pretty obvious that this structure is not able to scale up or to admit high resolution images using. For example, we might try to use a set of images with more appropriate size like 256×256×3. Such input vector will force neurons to own more than 190,000 weights and further more, the nature of the network would mostly require a multitude of such neurons.

Hence, due to this full-connectivity the amount of neuron’s parameters may raise in an uncontrollable manner and the model will be quickly facing the over-fitting problem.

Convolutional Neural Networks are one of the most feed forward powerful and recognized networks through deep Nets family. It belong to Articial NNs whose architecture enables them to overcome the limit of up scaling previously faced with fully connected structures.They take benefit of the inputs being images and force the networks to behave in a more judicious way.

Figure 2.11: Neurons connection in convolutional neural networks Neurons that belong to ConvNets layer, contrary to those who shape the reg- ular Networks layers, are arranged in what we call 3D volumes of Neurons as shown in the figure 2.10. The three dimensions refer to the Height, Width

(44)

2.3 Deep Neural Networks and Depth.

This Kind of networks differs also in terms of neurons connections. Actually, in the convolutional typology, each neuron that belong to a layer of the network will only be linked to a particular and restricted set of perceptrons from the previous layer instead of being connected to the totality of the neurons as is the case in the fully connected architectures as shown in figure 2.11.

As we can see in the third layer of the network in the figure 2.10 the CNN organizes its perceptrons among its levels in three dimensional volumes and while moving from one layer to other it does transform them to output volumes yet with different size.

2.3.4 Deep Unsupervised Neural Networks

The neural networks on which we focused so far are generally relevant to supervised learning tasks. Yet, when we have introduced the bi-directional deep architectures as one of the deep neural networks structures in the previous sections, we gave a very small glimpse on its use in order to build an unsupervised learning version for Deep NNs.

Hence, if we are to sweep the basics of deep learning, we have to understand the way it makes its algorithms learn representations from complex data in an unsupervised manner.

2.3.4.1 Auto-encoders

We can set the neural networks in a way to perform unsupervised learning in different manners. For example they can perform Compression, dimensionality reduction or learning new sparse representations of the data. Those tasks may be realized with the three-layered and symmetric neural network called auto-encoder [37].

Basically, the main task of an auto-encoder is reproducing almost identically its input data. It is trained in a way to learn a hidden representation of the input called "‘the code"’ which will be the responsible part for the upcoming input reconstruction.

As we can see in the figure2.12the auto-encoder is as we first introduced it a symmetric neural network that contains three layers, the first one is the input layer and in terms of the input reconstruction it is called the encoder.

(45)

It maps the input vector into a latent representation with lower dimension through applying a deterministic function give in the formula (2.6). That representation is modeled by the middle hidden layer. The output layer, also called Decoder, uses that code and regenerates the input through a reverse mapping given by the formula (2.8).

h=f_θ(x) = f(W x+b) (2.6) y=f0_θ(h) = f(W0h+b0) (2.7) Auto-encoders are trained in an unsupervised way so the datasets used for learning are logically a set of samples with no labels or classes just like D= {x1, x2, ..., xN}.Yet, in practice the training data set used for auto-encoders are supervised datasets with (sample, label) couples respecting the restriction of each target being identically the input itself. That supervised training set is modeled by D ={(x₁, x₁),(x₂, x₂), ...,(x_N, x_N)} and it summarizes perfectly the basis behind the auto-encoder.

Figure 2.12: Auto-encoder structure 2.3.4.2 Restricted Boltzmann Machines

A restricted Boltzmann machine (RBM) [38] is a stochastic, generative ANN that can determine a distribution probability among its inputs. They have found their fields of application in the reduction on dimension, collaborative filtering, classification, topic modeling and features learning. They have the advantage of being trained in an unsupervised manner when the task nature requires it.

The neurons among an RBM must constitute a bipartite graph that is to say a pair of neurons’ groups, each of them may posses symmetric links between them while they never have connections between them as shown in the figure 2.13. The twos sets of neurons are called hidden and visible layers of the

(46)

2.3 Deep Neural Networks Boltzmann Machine.

In order to train a restricted Boltzmann machine we use an algorithm called contrastive divergence algorithm [35].

Figure 2.13: Restricted Boltzmann Machine structure

Similarly to the auto-encoders, Restricted Boltzmann machines can be stacked on top of each others [39] a shown in figure2.14and gain more network depth.

In particular the stacked RBM are part of what we call deep belief networks where we can optionally use the SGD back propagation for fine-tuning the entire network.

Figure 2.14: Stacked RBM structure and training process

2.3.5 Training Neural Network

We have explained in the previous sections the biological basis behind the neural networks. We have also detailed the architectures and the mathematical theory of some of their models like AEs and CNNs. To summarize, the

(47)

neural networks are simply a set of neurons provided with parameters. The arrangement of these parameters decides of the network type.

Finally each conceived network has a particular task that it has to fulfill, object classification, digits recognition, handwritten text recognition etc and to do so the model has to be subject of training.

Training a neural network consists in adjusting the neurons’ weights in a way that enables the model to return as much right answers as possible while testing. In the literature, there are several methods to train a NN but the most widely used one is called back propagation.

2.3.5.1 Back propagation

Since we started talking about DL we have mentioned that it does recognize things in a similar way to the human beings one. Thus whenever we want to understand how does the neural network learning works we should link its similarities with the steps taken in the human being learning process.

When we first get born we have absolutely no idea about the scenes we see and we can not carry out any object or face recognition. Later, our parents or teachers might "label" the things and the persons we see and thus we learn to recognize them progressively.

In terms of neural networks, learning means regularizing the values of the weights associated to the neurons and thus make them able to recognize specific features for each class of object or for a particular person. The state of our brain fresh when we can not recognize anything : a baby’s case is modeled in the neural network by the random initial state of the weights.

The initial state being defined, the back-propagation process [40] can now get started. It takes place in four successive steps mainly , the forward pass, the loss function computation, the backward pass and the neuron’s weights update. The first step (forward pass) consists in forwarding the input training image through the whole network.

The weights being randomly initialized the output vector would be a kind of fair repartition of the probabilities among the output classes. It does not give preference to none of the above mentioned classes and such output is ex- pected since we know that a good decision about the input’s nature depends absolutely on the configuration of the weights. Thus an arbitrarily fixed parameters would never lead us to a meaningful output vector. Here comes the

(48)

second step of back-propagation but before getting into its details we should understand what do we pretend by a loss function.

A loss function is a measure that tells us about our regression/classification model performance. It computes how far is the model’s result from the ground truth so whenever the computed value of this function is high we can consider that the model is doing too much mess and it is far from where he is supposed to be. Yet whenever we get small values while computing the loss function, we can say that our model performs the classification or the regression task well.

Figure 2.15: Back-propagation algorithm in a flow chart

One of the training model’s objective is to tend this function’s values to zero.There are many different types of loss functions but in order to continue with the Back-propagation flow we will consider the most known function named Mean Square Error.

It is obvious that the first calculation of the loss function will return a very high value and this for the reasons we mentioned before. Now we will profit of the fact of introducing the loss function and we will define the back-propagation objective as following : All we need to do is making the model produce the minimum value of L. In this terminology the backward pass works in a way to recognize the weights that mostly contributed in distorting the results.

Finally the last step, the weights update comes to adjust those detected parameters in a way that brings the output closer to the target or what we call the label in the learning jargon.

Those four steps form what we call an Epoch and in order to perform back- propagation that returns a well-adjusted model we do generally reiterate that Epochs for a particular number of times, commonly fixed in advance by the programmer. The whole process listed in this section is described and sum- marized in the flow chart in the figure2.15.

(49)

2.3.5.2 Gradient computing Loss Function

Until know we could see the back-propagation as an algorithm that enables the Neural Networks to learn good representations of the data and to perform well the classification/regression task by the end. Besides, Back-propagation has behind it a solid mathematical background that justifies its adaptability to the neural networks training. This learning algorithm is based on the Stochastic Gradient Descent - SGD.

Figure 2.16: Curve of Loss with respect to the weights

Actually we are now pretty good knowers about training neural network that we can easily understand that the loss or literally the precision of the networks output depends on its weights. Besides, whenever we want to track the evolution of this loss and to handle it, we have to model this link mathematically through introducing the derivative of L with respect to the weights. That entity ∂L/∂W is called the Gradient.

The computation of this gradient will not only lead us to find the lowest point in the error/weight curve shown in the figure 2.16 and thus achieve the training goal but also it will be part of the update of the weights. In fact after finding the weights that are disrupting the results, the formula we use to update the outputs does involve these values. For a parameter ωi and an error rate α it is given as follows :

ω_i =ω_i−α ∂

∂θ_iL(ω) (2.8)

(50)

2.4 Conclusion

In this chapter, we gave a brief survey of deep learning systems with theoretical basis and some architectures. As mentioned in the first chapter, most of the painting classification proposed approaches are based on image processing tools yet we have chosen to adopt a deep architecture and adapt it to perform well the artworks property identification. The basis behind this approach and the design of our final network will be detailed in the next chapter.

Présenté en vue de l’obtention du

MÉMOIRE

Présenté en vue de l’obtention du

Diplôme de Mastère de Recherche

Spécialité : Traitement de l'Information et Complexité du Vivant Élaboré par :

Nadine BEN ALAYA

Deep Learning For Painters Recognition

Président : M. Nicolas Loménie Rapporteur : M. Michel Soto Encadrant : M. Walid Ayadi Co-encadrant : Mme. Alice Porebski

Laboratoire de Recherche

Laboratoire d’informatique Signal et Image de la Côte d’Opale

“

”

Preface

Abstract

Contents

List of Tables

General introduction

Context and problematic

Project framework

LISIC Laboratory

Louvre Lens Museum

Signal and Systems Unity - U2S

Thesis overview

Chapter 1

Painters Identification: State Of The Art

Sommaire

1.1 Introduction

1.2 Image Classification for Painter Recognition

1.2.1 Artists Identification through history

1.2.2 Features extraction

1.2.3 Trainable classifiers

1.3 Images transformed to frequency domain

1.3.1 Fourier transform

1.3.2 Gabor transform

1.3.3 wavelet transform

1.4 Literature review

1.5 Conclusion

Chapter 2

Deep Learning overview

Sommaire

2.1 Introduction

2.2 Neural Network Model

2.2.1 Biological Insight

2.2.2 Formal Artificial Neuron

2.2.3 Artificial Neural Networks

2.3 Deep Neural Networks

2.3.1 Deep Learning Basis

2.3.2 Deep Architectures types

2.3.3 Networks with 3D volumes of neurons

2.3.4 Deep Unsupervised Neural Networks

2.3.5 Training Neural Network

2.4 Conclusion