Artificial Neural Networks - Neural Network Model

Deep Learning overview

2.2 Neural Network Model

2.2.3 Artificial Neural Networks

As mentioned before, the single unitary artificial neuron is a very simple pro-cessing unit and it is obvious that this model can not tolerate more than very simple computational tasks.

However, referring to the fact that biologic neurons combined together can increase the performance in terms of cognitive tasks performance, it came the idea to arrange those unitary modules together in a neural network and make them able to perform operations with higher complexity level.

Those arranged neurons are actually able to calculate the activation of a batch of output neurons once the input neurons activation are computed. This cal-culation frequently embroils a relative number of intermediary computational tasks that are performed through a set of hidden neurons arranged in what we call hidden layers.

2.2.3.1 Feed-Forward Networks

In theory, nothing forbids artificial neurons to be organized in a fully random way, yet practically they are generally disposed in a graph that has to be acyclic. This property ensures that the neuron’s input will not hinge in a di-rect or indidi-rect way on its output. The Networks established with a topology that respects this condition are called Feed-Forward Artificial Neural Net-works (figure 2.3 (a)). This naming is given to these architectures since the spread of the activation takes place straight forward in the network.

2.2 Neural Network Model

The restriction that is mentioned before, does not mean that neuron graphs that contain cyclic connections are not considered as neural networks, but they actually take part of another networks family called Recurrent Neural Networks as illustrated in figure 2.3 (b). Those models are known for their performance especially when used for dynamic systems modeling, yet in the same time, implying those cyclic links between the neurons takes the training of the network to a higher complexity level and makes it face extra computa-tional challenges.

Figure 2.3: (a) Feedforward Neural Network, (b) Recurrent Neural Network Those recurrent neural networks are out of the scope of this study since they are not relevant to the problematic requirements and we will rather be focus-ing on feed-forward nets in which the perceptrons are arranged in Layers.

Feed Froward Networks are also called multilayer Perceptron with reference to the arrangement of the units in a layered way. Vectorx described before does refer here to the first input layer l₀. Subsequent neurons are regrouped in the mid layers and they have the particularity of exclusively receiving inputs from the previous perceptrons layers. Being arranged in such architecture allows then the activation of the neurons to be computed in a layer-wise, feed-forward manner.

The parameters labeling the links between neurons and serving the compu-tation of a layer’s activation are stored together to form the weights matrix commonly named W. Each network contains at least two layers : the input and the output layers. Any extra layer other than these two is called a hidden layer since it has no connection outside the network and it makes the network be part of the multi-layered networks category. We illustrate in figure 2.4 an example of a feed-forward network with a set of hidden layers.

Figure 2.4: Feed-froward Network with a set of hidden layers

We should also mention that each neuron in one layer is linked to every neuron among the next layer while perceptrons taking part of a same layer do never have connections between them.

2.2.3.2 Activation Function

For the Multi Layer Perceptron - MLP, while considering the two subsequent layers y and x, the Matrix notation of the computation rule is given by the equation (2.2) :

y=f(M x+b) (2.2)

wheref is what we call an activation function and its a very crucial component in Artificial neural Networks Models. Basically it is the responsible agent for learning and extracting meanings from functions with high complexity level and non-linear property.

This activation function has as role to bring non-linearity to the neural net-work, because we always try to conceive powerful ANNs in a way that enables them to represent and learn any random and perplexing function that com-putes outputs given the inputs. If we do not use this activation function with its non linear particularity, the neural network will turn into a simple Linear Regression Model, which is much easier to solve but weakly powerful in ex-tracting sense from arduous data.

Recently, since the technological advancement we’ve known in terms of com-puting capabilities and infrastructure robustness, Neural Networks are back and they are in the scope of study of many groups of researchers, the thing that made them evolve according to several points. One of these points is the

2.2 Neural Network Model

activation function where we can come across pretty much common choices of them and each one is relevant to a particular problem.

Sigmoid Activation Function:

Sigmoid Activation function is given by : φ(z) = 1

1 +e^−z (2.3)

Sigmoid maps the input value onto a [0, 1] range as shown in its curve in figure 2.5. It was a popular, simple and easy to handle activation function but it has made proof of several problems and limits that made it fall out of wild practice. We can take the example of the vanishing gradient problem (the problem that faces the majority of training methods involving gradient combined with some particular activation functions).

Figure 2.5: Sigmoid Function Curve

Sigmoid has also the problem of not being zero centered which urges the op-timization task to be harder. Finally the sigmoid function is not appreciated because it saturates and it converges slowly.

Hyperbolic Tangent Activation Function - TanH:

TanH is given by the mathematical formula : φ(z) = e^2z−1

e^2z+ 1 (2.4)

It was proposed to remedy to some of the Sigmoid limits and it did so by being zero centered since having an enlarged range to [-1, 1] as shown in figure 2.6.

This makes optimization easier but the problem of Gradient Vanishing still persists.

Figure 2.6: Hyperbolic Tangent Curve

Rectified Linear Unit Activation Function:

Rectified Linear Unit has been recently introduced to the activation functions list to become later the most popular and used one. It is appreciated for its simplicity, it is given easily by :

φ(z) =max(0, z) (2.5)

It is the one and only proposed activation function that solves the vanishing gradient but it has a critical way of use since it can be introduced only within the networks hidden layers. Yet, it is possible to add a Softmax layer for classification tasks or linear layer for regression ones at the level of the last output network layer.

There are three types of ReLU functions and these variety results from coping with some negative particularities of the gradient computation: ReLU [33], Leaky ReLU and Maxout function or Randomized Leaky ReLU [34]. Its three respective three curves are shown in figure 2.7.

Dans le document Présenté en vue de l’obtention du (Page 35-40)