Artificial Neural Networks: From Perceptron to Deep Learning 1 © 2021 ⏐Younès Bennani - USPN
Artificial Neural Networks
From Perceptron to Deep Learning
Younès BENNANI Full Professor
Master of Science in Informatics
Exploration Informatique des Données et Décisionnel (EID
2) Science des Données (WISD & MASD) Mathématiques des Données (MD)
© 2001-2021 @Y. Bennani : Ce document est la propriété de Younès Bennani, Professeur à l'USPN. Il ne peut être diffusé ou reproduit sans son autorisation écrite ([email protected]).
Artificial Neural Networks: From Perceptron to Deep Learning 2 © 2021 ⏐Younès Bennani - USPN
Artificial Intelligence: A Modern Approach
by Stuart Russell, Peter Norvig Prentice Hall Series in Artificial Intelligence Hardcover – Import, 2015
Deep Learning
by Ian Goodfellow, Yoshua Bengio, Aaron Courville
Adaptive Computation and Machine Learning series - Hardcover – 2017
The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani, Jerome Friedman Second Edition (Springer Series in Statistics) Hardcover – 2017
Machine Learning by Tom M. Mitchell Indian Edition – 2017
Pattern Recognition and Machine Learning
by Christopher M. Bishop Information Science and Statistics - Hardcover – 2010
The Nature Of Statistical Learning Theory by Vladimir Vapnik Second edition – Springer – 2011
BOOKS
Apprentissage connexionniste by Younès Bennani
Hermès / Lavoisier -2006- Traité IC2 Série Informatique et systèmes d'information Apprentissage artificiel
Concepts et algorithmes.
by Antoine Cornuéjols & Laurent Miclet Eyrolles – 2010-EAN13 : 9782212124712
Apprentissage machine de la théorie à la pratique by Massih-Reza Amini
Eyrolles – 2015 - EAN13 : 9782212138009
Data mining et statistique décisionnelle by Stéphane Tufféry Editions TECHNIP-2012 Intelligence artificielle
by Stuart Russell, Peter Norvig Editeur(s) : Pearson, 10/12/2010 (3e édition) EAN13 : 9782744074554
L'apprentissage profond
By Yoshua Bengio, Aaron Courville, Ian Goodfellow Préface Francis Bach -Florent Massot Eds – 2018 - EAN 979-1097160432 - ISBN 1097160432
BOOKS Practical work (lab work)
An open source machine learning framework that accelerates the path from research prototyping to production deployment.
https://pytorch.org
Artificial Neural Networks: From Perceptron to Deep Learning 5 © 2021 ⏐Younès Bennani - USPN
Course materials
Interactive Teaching Space
lipn.univ-paris13.fr/~bennani/enseignements
Deep Learning (DL)
Password: epi-m2-info-rna
Artificial Neural Networks: From Perceptron to Deep Learning 6 © 2021 ⏐Younès Bennani - USPN
Contents
v Theoretical formalisms, models, and learning algorithms Ø Motivations
Ø What is machine learning?
Ø Basic elements (formal neuron, architecture, parameters, ...) Ø Adaline and Perceptron
Ø Multi-Layers Perceptron (MLP) Ø Structured and convolutional networks Ø Auto Encoder / Auto-Associator (AE) Networks Ø Stacked Auto-Encoder (SAE)
Ø Radial Function Networks (RBF) Ø Learning Vector Quantization (LVQ) Ø Self Organizing Maps (SOM)
Ø Deep Self Organizing Maps (DeepSOM) Ø Restricted Boltzmann Machine (RBM) Ø Deep Belief Network (DBN)
Ø Generative Adversarial Network (GAN) v Applications
Artificial Intelligence (AI) can
isolate your face from a crowd AI for the detection and segmentation
of objects of interest
Artificial Neural Networks: From Perceptron to Deep Learning 9 © 2021 ⏐Younès Bennani - USPN
Image segmentation
Artificial Neural Networks: From Perceptron to Deep Learning 10 © 2021 ⏐Younès Bennani - USPN
Google Self-Driving Car Project
MIT: the "autonomous" car Medical diagnosis: spectacular results!
«Dermatologist-level Classification of Skin Cancer with Deep Neural Networks» Andre Esteva, Brett Kuprel, Rob Novoa, Justin Ko, Susan Swetter, Helen Blau, Sebastian Thrun - Nature, 2017
Classification des Cancer de la peau du niveau d’un expert.
(Nature, 2017)
Artificial Neural Networks: From Perceptron to Deep Learning 13 © 2021 ⏐Younès Bennani - USPN
The AI invites itself into the world of art ...
Une œuvre peinte par une IA a été vendue aux enchères à plus de 430.000 $
à New York. 115 ans après l’IA a permis de
terminer une symphonie du compositeur tchèque
Antonín Dvořák
Artificial Neural Networks: From Perceptron to Deep Learning 14 © 2021 ⏐Younès Bennani - USPN
AI "plays" better than the great champions
w Chess: the famous defeat in 1997 of the World Chess Champion Garry Kasparov against an AI designed by IBM:
Deep Blue
w In 2011, Watson, beats the humans in the Jeopardy TV game show!
w In 2017, AlphaGo Zero has supplanted all the champions!
Ø
in 3 hours, AlphaGo Zero learned the fundamentals of the game of Go
Ø
After 21 days, he equalled AlphaGo Master, who had beaten world champion Ke Jie
Categories/Positions in an image
30 x 32 Inputs
Left Straight Right Up
Hidden Layer Weights after 1 Epoch Hidden Layer Weights after 25 Epochs Output Layer Weights (including w0= q) after 1 Epoch
Position of the face in a picture
Artificial Neural Networks: From Perceptron to Deep Learning 17 © 2021 ⏐Younès Bennani - USPN
D après Beat, 2002 210 images (246x256 : TIFF)
10 femmes japonaises 6 expressions + 1 position neutre
Expression of the face in a picture
Artificial Neural Networks: From Perceptron to Deep Learning 18 © 2021 ⏐Younès Bennani - USPN
D après Beat, 2002
Facial Recognition
"DeepFace", Facebook's new facial
recognition system Unmanned aeroplanes soon?
Google Planes!
Once again, artificial intelligence is knocking down a big name in its field: General
Gene Lee, trainer of pilots forthe
US Air Force. AI has won all the rounds of thesimulation.
“J’ai été surpris par sa réactivité. Alpha semblait au fait de toutes mes intentions et réagissait instantanément à mes changements de vol et mes déploiements de missile”
Gene Lee Juillet 2016
Google Planes
"I was surprised by its responsiveness. Alpha seemed to be aware of all my intentions and reacted instantly to my flight changes and missile deployments".
Gene Lee July 2016
Artificial Neural Networks: From Perceptron to Deep Learning 21 © 2021 ⏐Younès Bennani - USPN
Eliminating language barriers Worldwide by using AI
DeepL is a German company that has set itself the goal of eliminating language barriers worldwide by using artificial intelligence.
(www.DeepL.com)
Artificial Neural Networks: From Perceptron to Deep Learning 22 © 2021 ⏐Younès Bennani - USPN
Assistance for the blind
DeepDream : machine à « rêves » psychédéliques de Google DeepDream: Google's psychedelic
dream machine
DeepDream : machine à « rêves » psychédéliques de Google DeepDream: Google's psychedelic
dream machine
Artificial Neural Networks: From Perceptron to Deep Learning 25 © 2021 ⏐Younès Bennani - USPN
AI & Machine Learning
Artificial Neural Networks: From Perceptron to Deep Learning 26 © 2021 ⏐Younès Bennani - USPN
Machine Learning paradigms taxonomy
Diameter (cm) Height (m)
12 2,7
19 4,7
15 3,9
11 3,1
23 5,9
27 6,7
32 8,2
10 2,6
28 6,3
35 9,2
20 6,4
9 2,1
17 3,6
What is Machine Learning?
Artificial Neural Networks: From Perceptron to Deep Learning 29 © 2021 ⏐Younès Bennani - USPN 0
1 2 3 4 5 6 7
0 5 10 15 20 25 30 35 40
He ig ht ( m )
Diameter (cm)
What is Machine Learning?
Diameter (cm) Height (m)
12 2,7
19 4,7
15 3,9
11 3,1
23 5,9
27 6,7
32 8,2
10 2,6
28 6,3
35 9,2
20 6,4
9 2,1
17 3,6
Artificial Neural Networks: From Perceptron to Deep Learning 30 © 2021 ⏐Younès Bennani - USPN
0 1 2 3 4 5 6 7
0 5 10 15 20 25 30 35 40
He ig ht ( m )
Diameter (cm) Forecast
Diameter (cm) Height (m)
12 2,7
19 4,7
15 3,9
11 3,1
23 5,9
27 6,7
32 8,2
10 2,6
28 6,3
35 9,2
20 6,4
9 2,1
17
193,6
?
What is Machine Learning?
0 1 2 3 4 5 6 7
0 5 10 15 20 25 30 35 40
He ig ht ( m )
Diameter (cm) Slo pe a
Biais b
Y = a . X + b
Height = a . Diameter + b
Diameter (cm) Height (m)
12 2,7
19 4,7
15 3,9
11 3,1
23 5,9
27 6,7
32 8,2
10 2,6
28 6,3
35 9,2
20 6,4
9 2,1
17 3,6
What is Machine Learning?
0 1 2 3 4 5 6 7
0 5 10 15 20 25 30 35 40
He ig ht ( m )
Diameter (cm)
a b
Diameter (cm) Height (m)
12 2,7
19 4,7
15 3,9
11 3,1
23 5,9
27 6,7
32 8,2
10 2,6
28 6,3
35 9,2
20 6,4
9 2,1
17 3,6
What is Machine Learning?
Artificial Neural Networks: From Perceptron to Deep Learning 33 © 2021 ⏐Younès Bennani - USPN 0
1 2 3 4 5 6 7
0 5 10 15 20 25 30 35 40
He ig ht ( m )
Diameter (cm)
a b
Diameter (cm) Height (m)
12 2,7
19 4,7
15 3,9
11 3,1
23 5,9
27 6,7
32 8,2
10 2,6
28 6,3
35 9,2
20 6,4
9 2,1
17 3,6
What is Machine Learning?
Artificial Neural Networks: From Perceptron to Deep Learning 34 © 2021 ⏐Younès Bennani - USPN
0 1 2 3 4 5 6 7
0 5 10 15 20 25 30 35 40
He ig ht ( m )
Diameter (cm)
a b
Diameter (cm) Height (m)
12 2,7
19 4,7
15 3,9
11 3,1
23 5,9
27 6,7
32 8,2
10 2,6
28 6,3
35 9,2
20 6,4
9 2,1
17 3,6
What is Machine Learning?
0 1 2 3 4 5 6 7
0 5 10 15 20 25 30 35 40
He ig ht ( m )
Diameter (cm)
a b
Diameter (cm) Height (m)
12 2,7
19 4,7
15 3,9
11 3,1
23 5,9
27 6,7
32 8,2
10 2,6
28 6,3
35 9,2
20 6,4
9 2,1
17 3,6
What is Machine Learning?
0 1 2 3 4 5 6 7
0 5 10 15 20 25 30 35 40
He ig ht ( m )
Diameter (cm)
a b
Diameter (cm) Height (m)
12 2,7
19 4,7
15 3,9
11 3,1
23 5,9
27 6,7
32 8,2
10 2,6
28 6,3
35 9,2
20 6,4
9 2,1
17 3,6
What is Machine Learning?
Artificial Neural Networks: From Perceptron to Deep Learning 37 © 2021 ⏐Younès Bennani - USPN 0
1 2 3 4 5 6 7
0 5 10 15 20 25 30 35 40
He ig ht ( m )
Diameter (cm)
a b
Diameter (cm) Height (m)
12 2,7
19 4,7
15 3,9
11 3,1
23 5,9
27 6,7
32 8,2
10 2,6
28 6,3
35 9,2
20 6,4
9 2,1
17 3,6
What is Machine Learning?
Artificial Neural Networks: From Perceptron to Deep Learning 38 © 2021 ⏐Younès Bennani - USPN
Machine Learning
Output
Y
Input
X
X Y
12 2,7
19 4,7
15 3,9
11 3,1
23 5,9
27 6,7
32 8,2
10 2,6
28 6,3
35 9,2
20 6,4
9 2,1
17 3,6
What is Machine Learning?
Machine Learning Output
Y
Input
X
X Y
12 2,7
19 4,7
15 3,9
11 3,1
23 5,9
27 6,7
32 8,2
10 2,6
28 6,3
35 9,2
20 6,4
9 2,1
17 3,6
What is Machine Learning?
Machine Learning Output
Y
Input
X
Learning Phase
X Y
12 2,7
19 4,7
15 3,9
11 3,1
23 5,9
27 6,7
32 8,2
10 2,6
28 6,3
35 9,2
20 6,4
9 2,1
17 3,6
What is Machine Learning?
Artificial Neural Networks: From Perceptron to Deep Learning 41 © 2021 ⏐Younès Bennani - USPN
Reality is more complex!
Artificial Neural Networks: From Perceptron to Deep Learning 42 © 2021 ⏐Younès Bennani - USPN
Recommendations at Amazon
A personalized space that is highlighted on the homepage, and it is surely not by chance.
Machine Learning
Output Input
To make recommendations, Amazon uses all the information it has:
- The products I bought;
- The products that I said I already have;
- The products I consulted - Links between products, extrapolated from shopping lists, created by all customers.
- ...
Will you like, click, comment, interact, ...?
Recommendations at Amazon Detection and investigation of credit
card fraud on the Internet
Machine Learning Output
Input
Fraud?
Artificial Neural Networks: From Perceptron to Deep Learning 45 © 2021 ⏐Younès Bennani - USPN
Image recognition
Machine Learning
•
Flower?
•
Flamingo?
•
Lamp?
•
Robot?
•
...?
Artificial Neural Networks: From Perceptron to Deep Learning 46 © 2021 ⏐Younès Bennani - USPN
For real problems in large size, the regression line is not enough!
More complex and richer models:
artificial neural networks
€ M
€ M
Large size vs. complexity
Image 400x400
160 000 numbers as input
? Cat Other
Extract features
•
Motorbike?
•
Car ?
•
Bus ?
•
Train ?
•
… ?
€ M
€ M Image 1000x1000
10 6 numbers as input
?
Artificial Neural Networks: From Perceptron to Deep Learning 49 © 2021 ⏐Younès Bennani - USPN
Extract features
Artificial Neural Networks: From Perceptron to Deep Learning 50 © 2021 ⏐Younès Bennani - USPN
Algorithm
Extraction of
features
€ M
Extracted characteristics
(relevant) Face/Other Pose Lighting
Expression
€ M Example: images of a person's face
1000x1000 pixels = 1,000,000 dimensions
Discovering the hidden structure of large dimension data
• But the face has 3 Cartesian coordinates and 3 Euler angles.
• The human face has about 50 muscles
• The hidden dimension < 56 dimensions
The relevant representation of a face image must contain less than 10
6values!
• Motorbike?
• Car ?
• Bus ?
• Train ?
• … ?
€ M
Algorithm
Extraction of
features
€ M
Recognition using an Artificial Neural Network Extracted
characteristics (a few dozen) Raw image
(millions of pixels)
Image Abstraction
Extract features
€ M
€ M
€ M
€ M
€ M
Raw image (millions of pixels)
Deep Network
Extraction of hierarchical characteristics Classification
Extract features
Artificial Neural Networks: From Perceptron to Deep Learning 53 © 2021 ⏐Younès Bennani - USPN
€ M
€ M
€ M
€ M
€ M
Raw image (millions of pixels)
Deep Network
Extraction of hierarchical characteristics Classification
New representation space
Learning the characteristics
Artificial Neural Networks: From Perceptron to Deep Learning 54 © 2021 ⏐Younès Bennani - USPN
Learning the characteristics
€ M
How to learn features with such networks?
Yann LE CUN
Léon BOTTOU
How to learn features with such
networks?
Artificial Neural Networks: From Perceptron to Deep Learning 57 © 2021 ⏐Younès Bennani - USPN Le Cun Y., Boser B., Denker J.S., Henderson D., Howard R.E., Hubbard W., Jackel L.D. (1989) : «Back-propagation applied to handwritten zip code recognition»
Neural Computation, Vol. 1, pp. 541-551.
How to learn features with such networks?
Artificial Neural Networks: From Perceptron to Deep Learning 58 © 2021 ⏐Younès Bennani - USPN
BENNANI Y. & GALLINARI P. (1991), “On The Use Of TDNN-Extracted Features Information In Talker Identification”, IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP ’91, May 14-17, pp. 385-388, Toronto, CANADA Speech
Time
Frequency
Convolutional hidden layers Convolutional hidden layers Extracted features HMM Speaker 1
HMM Speaker 2
….HMM Speaker N Input layer
Younès BENNANI Patrick GALLINARI
How to learn features with such networks?
Big names in digital technology recruit big names in Machine Learning
Yoshua Bengio Geoff Hinton Yann
LeCun
Andrew Ng
Big Data: data in all directions ...
An exponential explosion in the quantity of data
GPU computing: using the graphics processing unit (GPU) in parallel with the CPU to accelerate computations
Graphics Processing Units
Why this strong comeback?
NVIDIA DGX-1
Deep Learning System
Artificial Neural Networks: From Perceptron to Deep Learning 61 © 2021 ⏐Younès Bennani - USPN
The graphics card becomes a
"coprocessor" that supports the main processor of a system.
How does the GPU work?
CPU : Central processing unit GPU : Graphics processing unit
Artificial Neural Networks: From Perceptron to Deep Learning 62 © 2021 ⏐Younès Bennani - USPN
Learning in Deep Networks
w Constructing a feature space
n Note that this is what we do with kernels in SVM, or hidden layers in MLP, etc, but now we will build the representation space using deep architectures.
n Unsupervised learning between layers can decompose the problem into distributed sub-problems (with higher levels of abstraction) to be further decomposed at successive layers.
w Difficulties in supervised learning of deep networks
n
The first layers of MLP are not well learned
l
Gradient Diffusion - error is reduced as it propagates to the previous layers: the gradient propagates "badly" from the output to the input.
l
Leads to very slow learning.
l
The deeper the network, the greater the degree of non-linearity of the network, which would increase the chances of finding these obstacles to optimisation.
l
The lower layers remain with not very useful transformations of the input.
l
Need a way to help the first layers to do an efficient job.
n
Often not enough labelled data available
l
Can we use unsupervised/semi-supervised approaches to take advantage of untagged data?
n
Deep networks tend to have more local minima problems than simple networks during supervised learning
The challenge of learning in deep networks
• Living organisms, even primitive ones (eg insects), carry out complex tasks of information processing:
• Orientation
• Communication
• Social behavior
• ...
• The processing power of their nervous system comes from:
• Massive interconnection (10
14connections in humans)
• Large number of single treatment units (10
11in humans): neurons
• The initial motivation was to make neuromimicry
• however, the 1940s vision was rather simplistic;
• the biological reality has been more complex since.
• On the other hand, this idea has proved very fruitful in mathematics and engineering.
The original idea (≈1940)
Artificial Neural Networks: From Perceptron to Deep Learning 65 © 2021 ⏐Younès Bennani - USPN
1943, Mc Culloch & Pitts
A model and not a copy of the biological neuron
= an elementary processor characterized by:
• input signals
• weight of connections
• activation function
• internal activation state
• transition function
• output state
x = x
0, x
1, K , x
dw = w
0, w
1, K , w
d€
F (x,w)
€
a = F(x, w)
€
f (a) y = f (a)
x0 w0
a y
x1
xd M
F(x,w) f(a)
w1
wd
The artificial neuron
M
Artificial Neural Networks: From Perceptron to Deep Learning 66 © 2021 ⏐Younès Bennani - USPN
x0
w0
y x1
xd
M
F(x,w) f(a)
w1
wd
a
Definition: A formal (artificial) neuron is a processing unit that receives input data, in the form of a vector, and produces a real output. This output is a function of the inputs and the weights of the connections.
M
The artificial neuron
Artificial Neural Networks: From Perceptron to Deep Learning 67 © 2021 ⏐Younès Bennani - USPN
x
0x
1x
dM
w
ix
ii
∑
ƒ
( )
x=ex−e−x ex+e−xa
€
w
j0€
w
j1€
w
j nF(x,w) = w.x
y = ƒ w
ix
i∑
i"
# $$ %
&
''
The artificial neuron
Artificial Neural Networks: From Perceptron to Deep Learning 68 © 2021 ⏐Younès Bennani - USPN
Activation functions
ƒ
( )
x=tanh(x)=ex−e−x ex+e−xƒ ( ) x = 1
1+ e
−x ƒ( )
x=ex−1ex+1
ʹ
ƒ ( ) x = f (x) 1− ( f (x) )
Artificial Neural Networks: From Perceptron to Deep Learning 69 © 2021 ⏐Younès Bennani - USPN
Activation functions
Artificial Neural Networks: From Perceptron to Deep Learning 70 © 2021 ⏐Younès Bennani - USPN
Rectified Linear Units
w More efficient gradient propagation, derivative is 0 or constant, just fold into learning rate
w More efficient computation: Only comparison, addition and multiplication.
n
Leaky ReLU: f(x) = x if > 0 else ax where 0 ≤ a <= 1, so that derivate is not 0 and can do some learning for that case.
n
Lots of other variations
w Sparse activation: For example, in a randomly initialized networks, only about 50% of hidden units are activated (having a non-zero output)
Artificial Neural Networks: From Perceptron to Deep Learning 71 © 2021 ⏐Younès Bennani - USPN
The Neuron Distance
x
0x
1x
n€
w
j i− x
i( )
2i=0 n
∑
€
a
j€
w
j0€
w
j1€
w − x 2 = w 2 − 2 w, x + x 2
€
F(x,w) = w− x
2= ( w
j i− x
i)
2i∈Amont(j)
∑
€
s
j= ƒ ( w
j i− x
i)
2i∈Amont(j)
∑
⎛
⎝
⎜ ⎜
⎞
⎠
⎟ ⎟
M
Artificial Neural Networks: From Perceptron to Deep Learning 72 © 2021 ⏐Younès Bennani - USPN
x
0x
1x
D€ M
y
1(x,w) Definition: A neural network is a valued oriented graph consisting of a set of units (or automata), performing elementary computations, structured in successive layers capable of exchanging information by means of connections that connect them.
- A network is characterized by:
- architecture - functions of its elements
€ M
€ M
-Massively parallel architecture.
- System based on the cooperation of several simple units (formal neurons).
Artificial Neural Network
y
K(x,w)
Artificial Neural Networks: From Perceptron to Deep Learning 73 © 2021 ⏐Younès Bennani - USPN
x
0x
1x
D€ M
y
1(x,w)
y
K(x,w)
€ M
Adaptation network parameters Synaptic
efficiency
Artificial Neural Network
Artificial Neural Networks: From Perceptron to Deep Learning 74 © 2021 ⏐Younès Bennani - USPN
How Artificial Neural Networks work?
Adaline : Adaptive Linear Element
Stanford, 1960, Bernard Widrow*
x
0= 1 w
0a x
1x
D€ M
w
ix
ii=0 D
∑
w
1w
DIt is an adaptive linear element: y = w i x i
i=0 D
∑
The unit x0, whose activation set to 1, is called bias unit. It allows to introduce the terms of bias in the network.
* Widrow B., Hoff M.E. (1960) : « Adaptive switching circuits », IRE WESCON Conv. Record, part 4, pp. 96-104.
y = ψ (x, w) = w 0 + w i x i
i=1 D
∑
€
y
f(x)=
1 si x>0
−1 si x<0
⎧
⎨
⎪
⎩ ⎪
y = ψ (x, w) = w 0 + w i x i
i=1 D
∑
€
y = ψ (x,w) = w 0 + w T x
w x 2
x 1 w 0 + w T x = 0
l = w
Tx w = − w
0w
Adaline : Adaptive Linear Element
Stanford, 1960, Bernard Widrow*
Artificial Neural Networks: From Perceptron to Deep Learning 77 © 2021 ⏐Younès Bennani - USPN
Without bias
x x x
x x
x x x x
x
oo o
o o
o o o
o o o x x
x x x
x x x x
x
oo o
o o
o o o
o o o
With bias
Adaline : Adaptive Linear Element
Stanford, 1960, Bernard Widrow*
Artificial Neural Networks: From Perceptron to Deep Learning 78 © 2021 ⏐Younès Bennani - USPN
It can be used in discrimination (classification) between 2 classes:
x x x
x x
x x x x
x
oo o
o o
o o o
o o o
x
iC
2C
1!(x,w)=0
!(x,w)> 0)
!(x,w)< 0
! ", $ = $
&+ $
("
> 0 +, " ∈ .
/< 0 +, " ∈ .
1Adaline : Adaptive Linear Element
Stanford, 1960, Bernard Widrow*
If we call the example taken into account at iteration n, We define the square of the instantaneous error by:
R %
Adaline(n)
(w) = ( d
(n)− w x
(n))
2z
(n)= ( x
(n), d
(n))
Desired output
The mean squared error or (MSE) is defined as the mean observed squares of instantaneous errors on all of all examples:
Computed output: y
R %
Adaline(w) = 1
N R %
Adaline(n)
(w)
n=1 N
∑
There are several learning algorithms.
Adaline : Adaptive Linear Element
Stanford, 1960, Bernard Widrow*
Gradient descent techniques (the biggest slope):
suppose that at the moment , the weights of the Adaline are
and show the example , then the weights will be changed by:
This rule is called stochastic gradient rule or Widrow-Hoff rule
or delta Widrow-Hoff rule or µ-LMS (Least Mean Square) rule
∇
wR %
Adaline(n)
(w) = ∂ R %
Adaline(n)(w)
∂ w = −2 ( d
(n)− w x
(n)) x
(n)w(t +1) = w(t) − ε (t)∇
wR %
Adaline(n)
(w)
t
The step of the gradient The instant gradient
w t ( )
!
(#), &
(#)Adaline : Adaptive Linear Element
Stanford, 1960, Bernard Widrow*
Artificial Neural Networks: From Perceptron to Deep Learning 81 © 2021 ⏐Younès Bennani - USPN Artificial Neural Networks: From Perceptron to Deep Learning 82 © 2021 ⏐Younès Bennani - USPN
Artificial Neural Networks: From Perceptron to Deep Learning 85 © 2021 ⏐Younès Bennani - USPN
1- Initialize randomly
2- Randomly choose a couple of data 3- Determine the value of the error
4- Calculate a gradient approximation
5- Adapt the weights
Where is de step of the gradient.
6- Repeat from 2 to 4 until an acceptable error value is obtained.
W
0w t ( ) ε ( ) t
e
(n)( ) t = ( d
(n)− w x
(n))
∇
wR %
Adaline(n)
(w) = −2e
(n)( ) t x
(n)x
(n), d
(n)( )
w(t+1) = w(t) − ε (t)∇
wR %
Adaline(n)(w) Learning algorithm ADALINE
Adaline : Adaptive Linear Element
Stanford, 1960, Bernard Widrow*
Artificial Neural Networks: From Perceptron to Deep Learning 86 © 2021 ⏐Younès Bennani - USPN
Adaline : Example
Data: Truth Table of a Boolean Function of 2 Variables
Problem:
Find an Adaline able to learn the truth table of a Boolean function of 2 variables.
x
11 1
x
21 -1
d
1 1
-1 1 1
-1 -1 -1
Function : !(x,w)= (x
1or x
2)
D
4= 1 1
⎛
⎝
⎜ ⎜
⎜
⎞
⎠
⎟ ⎟
⎟ ,1
⎛
⎝
⎜ ⎜
⎜
⎞
⎠
⎟ ⎟
⎟
; 1
−1
⎛
⎝
⎜ ⎜
⎜
⎞
⎠
⎟ ⎟
⎟ ,1
⎛
⎝
⎜ ⎜
⎜
⎞
⎠
⎟ ⎟
⎟
;
−1 1
⎛
⎝
⎜ ⎜
⎜
⎞
⎠
⎟ ⎟
⎟ ,1
⎛
⎝
⎜ ⎜
⎜
⎞
⎠
⎟ ⎟
⎟
;
−1
−1
⎛
⎝
⎜ ⎜
⎜
⎞
⎠
⎟ ⎟
⎟ ,−1
⎛
⎝
⎜ ⎜
⎜
⎞
⎠
⎟ ⎟
⎟
⎧
⎨ ⎪
⎩ ⎪
⎫
⎬ ⎪
⎭ ⎪
Adaline : Example
x
11 1
x
21 -1
d
1 1
-1 1 1
-1 -1 -1
x 2
x 1 Function : !(x,w)= (x
1or x
2)
Adaline : Example
D
N= { ( x
(1),d
(1)) , ( x
(2), d
(2)) ,..., ( x
(N),d
(N)) }
x
11 1
x
21 -1
d
1 1
x
01 1
-1 1 1
1
-1 -1 -1
1
x
0= 1 w
0a y
x
1x
2w
ix
ii=0 n
w
1∑
w
2ψ ( x,w ) = x
1∨ x
2Artificial Neural Networks: From Perceptron to Deep Learning 89 © 2021 ⏐Younès Bennani - USPN
Adaline : Example
x
0= 1 w
0a y
x
1x
nw
ix
ii=0 n
w
1∑
w
2ψ ( x, w ) = x
1∨ x
2x 2
x 1 w 1 x 1 + w 2 x 2 + w 0 = 0
Artificial Neural Networks: From Perceptron to Deep Learning 90 © 2021 ⏐Younès Bennani - USPN
Adaline : limitations
ψ ( x,w ) = x
1⊕ x
2= XOR(x
1, x
2)
x 2
x 1
1 1
1 -1
-1 1 1
1
-1 1 1
1
-1 -1 -1
1
?
x
0x
1x
2€
d x
1x
2x
3x
4Linear separability
A
B
A B
Linearly separable Non-Linearly separable
x2
x1
« Two classes of objects, described in a space of dimension n, are said to be "linearly separable" if they are on both sides of a hyperplane in the representation space. »
Madaline : Multi-Adaptive Linear Element
ψ(x,w)=x1⊕x2
Adaline with polynomial preprocessing
x
0x
1x
2Adaline
x
2x Id x
2Id
w
0w
1w
2w
3w
4w
5w
1x
1+ w
2x
12+ w
3x
1x
2+ w
4x
2+ w
5x
22+ w
0= 0
Artificial Neural Networks: From Perceptron to Deep Learning 93 © 2021 ⏐Younès Bennani - USPN
x 2
x 1
w
1x
1+ w
2x
12
+ w
3x
1x
2+ w
4x
2+ w
5x
22
+ w
0= 0
Separation Ellipse
Adaline with polynomial preprocessing
Madaline : Multi-Adaptive Linear Element
Artificial Neural Networks: From Perceptron to Deep Learning 94 © 2021 ⏐Younès Bennani - USPN
€
ψ ( x,w ) = y = x
1⊕ x
2Madaline = a set of parallel Adalines
Adaline 1
Adaline 2
x
0w
0 1x
1x
2w
1 1w
2 1w
0 2w
1 2w
2 2Adaline 1 Adaline 2
1 1
1 -1
-1 1 1
1
-1 1 1
1
-1 -1 1
1
1 1 1 -1
-1 1 1 -1
Adaline 3
w
3 1w
3 2z1∧z2
z
1z
2Adaline 3
x
0x
1x
2€
d
€
d
€
d x
1x
2x
3x
4Madaline : Multi-Adaptive Linear Element
x 2
x 1
Adaline 1 Adaline 2
Adaline 1
Adaline 2
x
0w
0 1x
1x
2w
1 1w
2 1w
0 2w
1 2w
2 2Adaline 3
w
3 1w
3 2 z1∧z2z
1z
2€
ψ (
x,w)
=y=x1⊕x2Madaline : Multi-Adaptive Linear Element
€
x
1⊕ x
2x 0 x 1
x 2 z
1∧ z
2z 1
z 2
Madaline : Multi-Adaptive Linear Element
Artificial Neural Networks: From Perceptron to Deep Learning 97 © 2021 ⏐Younès Bennani - USPN
Perceptron Perceptron
Rosenblatt F., 1957, 1962*
* Rosenblatt F. (1957) : « The perceptron: a perceiving and recognizing automaton », Reports 85-460-1, Cornell Aeronautical Lab., Ithaca, N.Y.
* Rosenblatt F. (1962) : « Principles of Neurodynamics: perceptrons and theory of brain mechanisms », Spartan Books, Washington.
1957, Frank Rosenblatt
The perceptron does not designate a single model but groups together an important family of algorithms.
The perceptron = adaptive machine used to solve classification problems (discrimination).
X ψ ( x,w )
Artificial Neural Networks: From Perceptron to Deep Learning 98 © 2021 ⏐Younès Bennani - USPN
Perceptron
Rosenblatt F., 1957, 1962
X
i=0∑
nwiϕi(x) ψ(
x,w)
=f(wTϕ)w0
w1
wn
ϕ
1ϕ
2ϕ
Df(x)=
1 si x≥0
−1 si x<0
⎧
⎨
⎪
⎩ ⎪
The Retina
Receives the information from the
outside
Decision cell The association cells
each cell has a function
transition defined on the retina:
ψ ( x, w ) = f w
0+ w
iϕ
i(x)
i=1 D
⎛ ∑
⎝ ⎜ ⎞
⎠ ⎟ ϕ
i( ) x : R → ℜ
R
wT=
(
w0,w1,K,wD)
ϕ=
(
1,ϕ1(x),ϕ2(x),K,ϕD(x)) 1
Perceptron
2-class case
x x x
x x
x x x x
x
oo o
o o
o o o
o o o
x
iC
2C
1!(x,w)=1
!(x,w)= -1 A Perceptron can be seen as a 2-class classifier:
C
1= { x ∈R : ψ (x, w ) =1 }
C
2= { x ∈ R : ψ (x, w ) = −1 }
! ", $ = & $
'+ )
*+, -
$
*.
*" =
1 0& " ∈ 2
,−1 0& " ∈ 2
4Perceptron
2-class case
The mean square error (MSE) is:
There are several learning algorithms.
If we call the example taken into account at iteration , we define the square of the instantaneous error by:
" # = ! 1
& '
()*
+
" !
(()# = − '
/:1(2)34567855494:;
#
<=
(()>
(()" !
((#) = −#
<=
(()>
(()∀ @
(, #
<=
(()>
(()> 0
#
<=
(()> 0 DEF @
(()∈ H
*#
<=
(()< 0 DEF @
(()∈ H
J>
(()= 1 DEF @
(()∈ H
*>
(()= −1 DEF @
(()∈ H
J@
((), >
(()K
Artificial Neural Networks: From Perceptron to Deep Learning 101 © 2021 ⏐Younès Bennani - USPN Stopping criterion = satisfactory classification rate.
t
Step of the gradient
Instant gradient
w t ( ) x
(n), d
(n)( )
! " + 1 = ! " − '(")*
+- ! ,
*
+- ! = , . , - !
.! = −/
(0)1
(0)Perceptron
2-class case
Gradient descent techniques (the biggest slope):
suppose that at the moment , the weights of the Perceptron are and show the example , then the weights will be changed by:
Artificial Neural Networks: From Perceptron to Deep Learning 102 © 2021 ⏐Younès Bennani - USPN
Perceptron
1- Initialize randomly
2- Randomly choose a couple of data
3- Compute the output of the perceptron and compare it to
4- Adapt the weights:
If is well classified:
If is misclassified:
Where is de step of the gradient.
5- Repeat from 2 to 4 until an acceptable error value is obtained.
Learning algorithm: 2-class case
! " + 1 = ! " + &(") )
(*)+
(*), !
-)
(*)≠ +
(*), !
-)
(*)= +
(*)! " + 1 = ! "
&(") /
(*)/
(*)0
1/
(*), +
(*)+
(*), !
1+ 3
456 7
!
4)
4/ = , !
-)
(*)Perceptron example
1 1
1 -1
1 1 1
1
-1 1 1
1
-1 -1 -1
1
x
0x
1x
2€
d x
1x
2x
3x
4ψ ( x,w ) = x
1∨ x
2wiϕi i=0 D
∑
(x)w
0w
1w
2ϕ
1( ) x
(n)= x
1(n)x (n )
x2
x1
ϕ
2( ) x
(n)= x
2(n)1
Applet
: http://lcn.epfl.ch/tutorial/french/perceptron/html/index.html
K i 2
Perceptron
p-class case: C
1, C
2, …, C
pX ∑j=0Dwijϕj(x) ψ ( x, w )
wi0
wi1
win
ϕ
1ϕ
2ϕ
D1
Max
wpjϕj j=0 D
∑
(x)w2jϕj j=0 D
∑
(x)w1jϕj j=0 D
∑
(x)1
! ", $ = &
'() ∀ + ≠ (, $
'-. > $
0-. 12
! ", $ = &
'() ∀ + ≠ (, 3
456 7
$
'4.
4> 3
456 7
$
04.
4Artificial Neural Networks: From Perceptron to Deep Learning 105 © 2021 ⏐Younès Bennani - USPN
Perceptron
1- Initialize randomly
2- Randomly choose a learning example
3- Compute the output of the perceptron and compare it to
4- Adapt the weights:
If is well classified:
If is misclassified:
Where is de step of the gradient.
5- Repeat from 2 to 4 until an acceptable error value is obtained.
!
"# + 1 = !
"# + '(#) *
(+),
(+)-
.= -
"! # + 1 = ! #
'(#) /
(+)/
(+)0
1/
(+)∈ -
"-
"3 /
(+), ! = -
.⟺ 6 = Argmax
"
!
"=*
(+)-
.≠ -
"!
.# + 1 = !
.# − '(#) *
(+),
(+)!
@# + 1 = !
@# ∀ B ≠ C, 6 Learning algorithm: p-class case
Artificial Neural Networks: From Perceptron to Deep Learning 106 © 2021 ⏐Younès Bennani - USPN
Perceptron vs Adaline
2-class case
Class A Class B
Solution found by the Adaline Best robust separation between classes.
Solution found by the Perceptron Separation that minimizes the number of errors
Pe rc ep tr on Ad al in e
+
SupervisionComputed Output
Error
+
Supervision
Error
Computed Output
Perceptron convergence theorem
« If a set of examples is linearly separable, then the algorithm of Perceptron learning converges to a correct solution in a finite number of iterations »
Arbib M.A. (1987) : « Brains, Machines, and Mathematics » Berlin, Springer-Verlag.
Rosenblatt F. (1962) : « Principles of Neurodynamics » N.Y., Spartan.
Block H.D. (1962) : « The Perceptron: A Model for Brain Functioning » Reviews of Modern Physics 34, 123-135.
Minsky M.L. & Papert S.A. (1969) : « Perceptrons » Cambridge, MIT Press.
Diederich S. & Opper M. (1987) : « Learning of Correlated Patterns Spin-Glass Networks by Local Learning Rules »
Physical Review letters 58, 949-952.
Multi-layer Architecture
The credit assignment problem
Given a layered network, and a set of input-output pairs examples.
The Gradient Back - Propagation algorithm brings a simple solution to this problem.
x0
x1
xD
x
M€
d
Apply the Perceptron learning algorithm to determine W2 We do not know the desired
outputs of hidden units!
We can not apply the Perceptron learning algorithm to determine W1
W1 W2
Artificial Neural Networks: From Perceptron to Deep Learning 109 © 2021 ⏐Younès Bennani - USPN
The post connectionist break
Bryson A., Denham W., Dreyfuss S. (1963) : « Optimal Programming Problem With Ineduality Constraints. I: Necessary Conditions for Extremal Solutions », AIAA Journal, Vol. 1, pp. 25-44.
LeCun Y. (1986) : « Learning Processes in Asymmetric Threshold Network » Disordered Systems and Biological Organizations, Les Houches, France, Springer, pp. 223-240.
Rumelhart D., Hinton G.E., Williams R. (1986) : « Learning Internal Representations by Error Propagation » In Parallel Distributed Processing: exploring the microstructure of cognition, Vol I, Badford Books, Cambridge, MA, pp. 318-362, MIT Press.