• Aucun résultat trouvé

ArtificialNeural Networks

N/A
N/A
Protected

Academic year: 2022

Partager "ArtificialNeural Networks"

Copied!
28
0
0

Texte intégral

(1)

Artificial Neural Networks: From Perceptron to Deep Learning 1 © 2021 ⏐Younès Bennani - USPN

Artificial Neural Networks

From Perceptron to Deep Learning

Younès BENNANI Full Professor

Master of Science in Informatics

Exploration Informatique des Données et Décisionnel (EID

2

) Science des Données (WISD & MASD) Mathématiques des Données (MD)

© 2001-2021 @Y. Bennani : Ce document est la propriété de Younès Bennani, Professeur à l'USPN. Il ne peut être diffusé ou reproduit sans son autorisation écrite ([email protected]).

Artificial Neural Networks: From Perceptron to Deep Learning 2 © 2021 ⏐Younès Bennani - USPN

Artificial Intelligence: A Modern Approach

by Stuart Russell, Peter Norvig Prentice Hall Series in Artificial Intelligence Hardcover – Import, 2015

Deep Learning

by Ian Goodfellow, Yoshua Bengio, Aaron Courville

Adaptive Computation and Machine Learning series - Hardcover – 2017

The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani, Jerome Friedman Second Edition (Springer Series in Statistics) Hardcover – 2017

Machine Learning by Tom M. Mitchell Indian Edition – 2017

Pattern Recognition and Machine Learning

by Christopher M. Bishop Information Science and Statistics - Hardcover – 2010

The Nature Of Statistical Learning Theory by Vladimir Vapnik Second edition – Springer – 2011

BOOKS

Apprentissage connexionniste by Younès Bennani

Hermès / Lavoisier -2006- Traité IC2 Série Informatique et systèmes d'information Apprentissage artificiel

Concepts et algorithmes.

by Antoine Cornuéjols & Laurent Miclet Eyrolles – 2010-EAN13 : 9782212124712

Apprentissage machine de la théorie à la pratique by Massih-Reza Amini

Eyrolles – 2015 - EAN13 : 9782212138009

Data mining et statistique décisionnelle by Stéphane Tufféry Editions TECHNIP-2012 Intelligence artificielle

by Stuart Russell, Peter Norvig Editeur(s) : Pearson, 10/12/2010 (3e édition) EAN13 : 9782744074554

L'apprentissage profond

By Yoshua Bengio, Aaron Courville, Ian Goodfellow Préface Francis Bach -Florent Massot Eds – 2018 - EAN 979-1097160432 - ISBN 1097160432

BOOKS Practical work (lab work)

An open source machine learning framework that accelerates the path from research prototyping to production deployment.

https://pytorch.org

(2)

Artificial Neural Networks: From Perceptron to Deep Learning 5 © 2021 ⏐Younès Bennani - USPN

Course materials

Interactive Teaching Space

lipn.univ-paris13.fr/~bennani/enseignements

Deep Learning (DL)

Password: epi-m2-info-rna

Artificial Neural Networks: From Perceptron to Deep Learning 6 © 2021 ⏐Younès Bennani - USPN

Contents

v Theoretical formalisms, models, and learning algorithms Ø Motivations

Ø What is machine learning?

Ø Basic elements (formal neuron, architecture, parameters, ...) Ø Adaline and Perceptron

Ø Multi-Layers Perceptron (MLP) Ø Structured and convolutional networks Ø Auto Encoder / Auto-Associator (AE) Networks Ø Stacked Auto-Encoder (SAE)

Ø Radial Function Networks (RBF) Ø Learning Vector Quantization (LVQ) Ø Self Organizing Maps (SOM)

Ø Deep Self Organizing Maps (DeepSOM) Ø Restricted Boltzmann Machine (RBM) Ø Deep Belief Network (DBN)

Ø Generative Adversarial Network (GAN) v Applications

Artificial Intelligence (AI) can

isolate your face from a crowd AI for the detection and segmentation

of objects of interest

(3)

Artificial Neural Networks: From Perceptron to Deep Learning 9 © 2021 ⏐Younès Bennani - USPN

Image segmentation

Artificial Neural Networks: From Perceptron to Deep Learning 10 © 2021 ⏐Younès Bennani - USPN

Google Self-Driving Car Project

MIT: the "autonomous" car Medical diagnosis: spectacular results!

«Dermatologist-level Classification of Skin Cancer with Deep Neural Networks» Andre Esteva, Brett Kuprel, Rob Novoa, Justin Ko, Susan Swetter, Helen Blau, Sebastian Thrun - Nature, 2017

Classification des Cancer de la peau du niveau d’un expert.

(Nature, 2017)

(4)

Artificial Neural Networks: From Perceptron to Deep Learning 13 © 2021 ⏐Younès Bennani - USPN

The AI invites itself into the world of art ...

Une œuvre peinte par une IA a été vendue aux enchères à plus de 430.000 $

à New York. 115 ans après l’IA a permis de

terminer une symphonie du compositeur tchèque

Antonín Dvořák

Artificial Neural Networks: From Perceptron to Deep Learning 14 © 2021 ⏐Younès Bennani - USPN

AI "plays" better than the great champions

w Chess: the famous defeat in 1997 of the World Chess Champion Garry Kasparov against an AI designed by IBM:

Deep Blue

w In 2011, Watson, beats the humans in the Jeopardy TV game show!

w In 2017, AlphaGo Zero has supplanted all the champions!

Ø

in 3 hours, AlphaGo Zero learned the fundamentals of the game of Go

Ø

After 21 days, he equalled AlphaGo Master, who had beaten world champion Ke Jie

Categories/Positions in an image

30 x 32 Inputs

Left Straight Right Up

Hidden Layer Weights after 1 Epoch Hidden Layer Weights after 25 Epochs Output Layer Weights (including w0= q) after 1 Epoch

Position of the face in a picture

(5)

Artificial Neural Networks: From Perceptron to Deep Learning 17 © 2021 ⏐Younès Bennani - USPN

D après Beat, 2002 210 images (246x256 : TIFF)

10 femmes japonaises 6 expressions + 1 position neutre

Expression of the face in a picture

Artificial Neural Networks: From Perceptron to Deep Learning 18 © 2021 ⏐Younès Bennani - USPN

D après Beat, 2002

Facial Recognition

"DeepFace", Facebook's new facial

recognition system Unmanned aeroplanes soon?

Google Planes!

Once again, artificial intelligence is knocking down a big name in its field: General

Gene Lee, trainer of pilots for

the

US Air Force. AI has won all the rounds of the

simulation.

“J’ai été surpris par sa réactivité. Alpha semblait au fait de toutes mes intentions et réagissait instantanément à mes changements de vol et mes déploiements de missile”

Gene Lee Juillet 2016

Google Planes

"I was surprised by its responsiveness. Alpha seemed to be aware of all my intentions and reacted instantly to my flight changes and missile deployments".

Gene Lee July 2016

(6)

Artificial Neural Networks: From Perceptron to Deep Learning 21 © 2021 ⏐Younès Bennani - USPN

Eliminating language barriers Worldwide by using AI

DeepL is a German company that has set itself the goal of eliminating language barriers worldwide by using artificial intelligence.

(www.DeepL.com)

Artificial Neural Networks: From Perceptron to Deep Learning 22 © 2021 ⏐Younès Bennani - USPN

Assistance for the blind

DeepDream : machine à « rêves » psychédéliques de Google DeepDream: Google's psychedelic

dream machine

DeepDream : machine à « rêves » psychédéliques de Google DeepDream: Google's psychedelic

dream machine

(7)

Artificial Neural Networks: From Perceptron to Deep Learning 25 © 2021 ⏐Younès Bennani - USPN

AI & Machine Learning

Artificial Neural Networks: From Perceptron to Deep Learning 26 © 2021 ⏐Younès Bennani - USPN

Machine Learning paradigms taxonomy

Diameter (cm) Height (m)

12 2,7

19 4,7

15 3,9

11 3,1

23 5,9

27 6,7

32 8,2

10 2,6

28 6,3

35 9,2

20 6,4

9 2,1

17 3,6

What is Machine Learning?

(8)

Artificial Neural Networks: From Perceptron to Deep Learning 29 © 2021 ⏐Younès Bennani - USPN 0

1 2 3 4 5 6 7

0 5 10 15 20 25 30 35 40

He ig ht ( m )

Diameter (cm)

What is Machine Learning?

Diameter (cm) Height (m)

12 2,7

19 4,7

15 3,9

11 3,1

23 5,9

27 6,7

32 8,2

10 2,6

28 6,3

35 9,2

20 6,4

9 2,1

17 3,6

Artificial Neural Networks: From Perceptron to Deep Learning 30 © 2021 ⏐Younès Bennani - USPN

0 1 2 3 4 5 6 7

0 5 10 15 20 25 30 35 40

He ig ht ( m )

Diameter (cm) Forecast

Diameter (cm) Height (m)

12 2,7

19 4,7

15 3,9

11 3,1

23 5,9

27 6,7

32 8,2

10 2,6

28 6,3

35 9,2

20 6,4

9 2,1

17

19

3,6

?

What is Machine Learning?

0 1 2 3 4 5 6 7

0 5 10 15 20 25 30 35 40

He ig ht ( m )

Diameter (cm) Slo pe a

Biais b

Y = a . X + b

Height = a . Diameter + b

Diameter (cm) Height (m)

12 2,7

19 4,7

15 3,9

11 3,1

23 5,9

27 6,7

32 8,2

10 2,6

28 6,3

35 9,2

20 6,4

9 2,1

17 3,6

What is Machine Learning?

0 1 2 3 4 5 6 7

0 5 10 15 20 25 30 35 40

He ig ht ( m )

Diameter (cm)

a b

Diameter (cm) Height (m)

12 2,7

19 4,7

15 3,9

11 3,1

23 5,9

27 6,7

32 8,2

10 2,6

28 6,3

35 9,2

20 6,4

9 2,1

17 3,6

What is Machine Learning?

(9)

Artificial Neural Networks: From Perceptron to Deep Learning 33 © 2021 ⏐Younès Bennani - USPN 0

1 2 3 4 5 6 7

0 5 10 15 20 25 30 35 40

He ig ht ( m )

Diameter (cm)

a b

Diameter (cm) Height (m)

12 2,7

19 4,7

15 3,9

11 3,1

23 5,9

27 6,7

32 8,2

10 2,6

28 6,3

35 9,2

20 6,4

9 2,1

17 3,6

What is Machine Learning?

Artificial Neural Networks: From Perceptron to Deep Learning 34 © 2021 ⏐Younès Bennani - USPN

0 1 2 3 4 5 6 7

0 5 10 15 20 25 30 35 40

He ig ht ( m )

Diameter (cm)

a b

Diameter (cm) Height (m)

12 2,7

19 4,7

15 3,9

11 3,1

23 5,9

27 6,7

32 8,2

10 2,6

28 6,3

35 9,2

20 6,4

9 2,1

17 3,6

What is Machine Learning?

0 1 2 3 4 5 6 7

0 5 10 15 20 25 30 35 40

He ig ht ( m )

Diameter (cm)

a b

Diameter (cm) Height (m)

12 2,7

19 4,7

15 3,9

11 3,1

23 5,9

27 6,7

32 8,2

10 2,6

28 6,3

35 9,2

20 6,4

9 2,1

17 3,6

What is Machine Learning?

0 1 2 3 4 5 6 7

0 5 10 15 20 25 30 35 40

He ig ht ( m )

Diameter (cm)

a b

Diameter (cm) Height (m)

12 2,7

19 4,7

15 3,9

11 3,1

23 5,9

27 6,7

32 8,2

10 2,6

28 6,3

35 9,2

20 6,4

9 2,1

17 3,6

What is Machine Learning?

(10)

Artificial Neural Networks: From Perceptron to Deep Learning 37 © 2021 ⏐Younès Bennani - USPN 0

1 2 3 4 5 6 7

0 5 10 15 20 25 30 35 40

He ig ht ( m )

Diameter (cm)

a b

Diameter (cm) Height (m)

12 2,7

19 4,7

15 3,9

11 3,1

23 5,9

27 6,7

32 8,2

10 2,6

28 6,3

35 9,2

20 6,4

9 2,1

17 3,6

What is Machine Learning?

Artificial Neural Networks: From Perceptron to Deep Learning 38 © 2021 ⏐Younès Bennani - USPN

Machine Learning

Output

Y

Input

X

X Y

12 2,7

19 4,7

15 3,9

11 3,1

23 5,9

27 6,7

32 8,2

10 2,6

28 6,3

35 9,2

20 6,4

9 2,1

17 3,6

What is Machine Learning?

Machine Learning Output

Y

Input

X

X Y

12 2,7

19 4,7

15 3,9

11 3,1

23 5,9

27 6,7

32 8,2

10 2,6

28 6,3

35 9,2

20 6,4

9 2,1

17 3,6

What is Machine Learning?

Machine Learning Output

Y

Input

X

Learning Phase

X Y

12 2,7

19 4,7

15 3,9

11 3,1

23 5,9

27 6,7

32 8,2

10 2,6

28 6,3

35 9,2

20 6,4

9 2,1

17 3,6

What is Machine Learning?

(11)

Artificial Neural Networks: From Perceptron to Deep Learning 41 © 2021 ⏐Younès Bennani - USPN

Reality is more complex!

Artificial Neural Networks: From Perceptron to Deep Learning 42 © 2021 ⏐Younès Bennani - USPN

Recommendations at Amazon

A personalized space that is highlighted on the homepage, and it is surely not by chance.

Machine Learning

Output Input

To make recommendations, Amazon uses all the information it has:

- The products I bought;

- The products that I said I already have;

- The products I consulted - Links between products, extrapolated from shopping lists, created by all customers.

- ...

Will you like, click, comment, interact, ...?

Recommendations at Amazon Detection and investigation of credit

card fraud on the Internet

Machine Learning Output

Input

Fraud?

(12)

Artificial Neural Networks: From Perceptron to Deep Learning 45 © 2021 ⏐Younès Bennani - USPN

Image recognition

Machine Learning

Flower?

Flamingo?

Lamp?

Robot?

...?

Artificial Neural Networks: From Perceptron to Deep Learning 46 © 2021 ⏐Younès Bennani - USPN

For real problems in large size, the regression line is not enough!

More complex and richer models:

artificial neural networks

€ M

€ M

Large size vs. complexity

Image 400x400

160 000 numbers as input

? Cat Other

Extract features

Motorbike?

Car ?

Bus ?

Train ?

… ?

€ M

€ M Image 1000x1000

10 6 numbers as input

?

(13)

Artificial Neural Networks: From Perceptron to Deep Learning 49 © 2021 ⏐Younès Bennani - USPN

Extract features

Artificial Neural Networks: From Perceptron to Deep Learning 50 © 2021 ⏐Younès Bennani - USPN

Algorithm

Extraction of

features

€ M

Extracted characteristics

(relevant) Face/Other Pose Lighting

Expression

€ M Example: images of a person's face

1000x1000 pixels = 1,000,000 dimensions

Discovering the hidden structure of large dimension data

But the face has 3 Cartesian coordinates and 3 Euler angles.

The human face has about 50 muscles

The hidden dimension < 56 dimensions

The relevant representation of a face image must contain less than 10

6

values!

• Motorbike?

• Car ?

• Bus ?

• Train ?

• … ?

€ M

Algorithm

Extraction of

features

€ M

Recognition using an Artificial Neural Network Extracted

characteristics (a few dozen) Raw image

(millions of pixels)

Image Abstraction

Extract features

€ M

€ M

€ M

€ M

€ M

Raw image (millions of pixels)

Deep Network

Extraction of hierarchical characteristics Classification

Extract features

(14)

Artificial Neural Networks: From Perceptron to Deep Learning 53 © 2021 ⏐Younès Bennani - USPN

€ M

€ M

€ M

€ M

€ M

Raw image (millions of pixels)

Deep Network

Extraction of hierarchical characteristics Classification

New representation space

Learning the characteristics

Artificial Neural Networks: From Perceptron to Deep Learning 54 © 2021 ⏐Younès Bennani - USPN

Learning the characteristics

€ M

How to learn features with such networks?

Yann LE CUN

Léon BOTTOU

How to learn features with such

networks?

(15)

Artificial Neural Networks: From Perceptron to Deep Learning 57 © 2021 ⏐Younès Bennani - USPN Le Cun Y., Boser B., Denker J.S., Henderson D., Howard R.E., Hubbard W., Jackel L.D. (1989) : «Back-propagation applied to handwritten zip code recognition»

Neural Computation, Vol. 1, pp. 541-551.

How to learn features with such networks?

Artificial Neural Networks: From Perceptron to Deep Learning 58 © 2021 ⏐Younès Bennani - USPN

BENNANI Y. & GALLINARI P. (1991), “On The Use Of TDNN-Extracted Features Information In Talker Identification”, IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP ’91, May 14-17, pp. 385-388, Toronto, CANADA Speech

Time

Frequency

Convolutional hidden layers Convolutional hidden layers Extracted features HMM Speaker 1

HMM Speaker 2

….HMM Speaker N Input layer

Younès BENNANI Patrick GALLINARI

How to learn features with such networks?

Big names in digital technology recruit big names in Machine Learning

Yoshua Bengio Geoff Hinton Yann

LeCun

Andrew Ng

Big Data: data in all directions ...

An exponential explosion in the quantity of data

GPU computing: using the graphics processing unit (GPU) in parallel with the CPU to accelerate computations

Graphics Processing Units

Why this strong comeback?

NVIDIA DGX-1

Deep Learning System

(16)

Artificial Neural Networks: From Perceptron to Deep Learning 61 © 2021 ⏐Younès Bennani - USPN

The graphics card becomes a

"coprocessor" that supports the main processor of a system.

How does the GPU work?

CPU : Central processing unit GPU : Graphics processing unit

Artificial Neural Networks: From Perceptron to Deep Learning 62 © 2021 ⏐Younès Bennani - USPN

Learning in Deep Networks

w Constructing a feature space

n Note that this is what we do with kernels in SVM, or hidden layers in MLP, etc, but now we will build the representation space using deep architectures.

n Unsupervised learning between layers can decompose the problem into distributed sub-problems (with higher levels of abstraction) to be further decomposed at successive layers.

w Difficulties in supervised learning of deep networks

n

The first layers of MLP are not well learned

l

Gradient Diffusion - error is reduced as it propagates to the previous layers: the gradient propagates "badly" from the output to the input.

l

Leads to very slow learning.

l

The deeper the network, the greater the degree of non-linearity of the network, which would increase the chances of finding these obstacles to optimisation.

l

The lower layers remain with not very useful transformations of the input.

l

Need a way to help the first layers to do an efficient job.

n

Often not enough labelled data available

l

Can we use unsupervised/semi-supervised approaches to take advantage of untagged data?

n

Deep networks tend to have more local minima problems than simple networks during supervised learning

The challenge of learning in deep networks

• Living organisms, even primitive ones (eg insects), carry out complex tasks of information processing:

• Orientation

• Communication

• Social behavior

• ...

• The processing power of their nervous system comes from:

• Massive interconnection (10

14

connections in humans)

• Large number of single treatment units (10

11

in humans): neurons

• The initial motivation was to make neuromimicry

• however, the 1940s vision was rather simplistic;

• the biological reality has been more complex since.

• On the other hand, this idea has proved very fruitful in mathematics and engineering.

The original idea (≈1940)

(17)

Artificial Neural Networks: From Perceptron to Deep Learning 65 © 2021 ⏐Younès Bennani - USPN

1943, Mc Culloch & Pitts

A model and not a copy of the biological neuron

= an elementary processor characterized by:

input signals

weight of connections

activation function

internal activation state

transition function

output state

x = x

0

, x

1

, K , x

d

w = w

0

, w

1

, K , w

d

F (x,w)

a = F(x, w)

f (a) y = f (a)

x0 w0

a y

x1

xd M

F(x,w) f(a)

w1

wd

The artificial neuron

M

Artificial Neural Networks: From Perceptron to Deep Learning 66 © 2021 ⏐Younès Bennani - USPN

x0

w0

y x1

xd

M

F(x,w) f(a)

w1

wd

a

Definition: A formal (artificial) neuron is a processing unit that receives input data, in the form of a vector, and produces a real output. This output is a function of the inputs and the weights of the connections.

M

The artificial neuron

Artificial Neural Networks: From Perceptron to Deep Learning 67 © 2021 ⏐Younès Bennani - USPN

x

0

x

1

x

d

M

w

i

x

i

i

ƒ

( )

x=exe−x ex+e−x

a

w

j0

w

j1

w

j n

F(x,w) = w.x

y = ƒ w

i

x

i

i

"

# $$ %

&

''

The artificial neuron

Artificial Neural Networks: From Perceptron to Deep Learning 68 © 2021 ⏐Younès Bennani - USPN

Activation functions

ƒ

( )

x=tanh(x)=exe−x ex+e−x

ƒ ( ) x = 1

1+ e

−x ƒ

( )

x=ex−1

ex+1

ʹ

ƒ ( ) x = f (x) 1− ( f (x) )

(18)

Artificial Neural Networks: From Perceptron to Deep Learning 69 © 2021 ⏐Younès Bennani - USPN

Activation functions

Artificial Neural Networks: From Perceptron to Deep Learning 70 © 2021 ⏐Younès Bennani - USPN

Rectified Linear Units

w More efficient gradient propagation, derivative is 0 or constant, just fold into learning rate

w More efficient computation: Only comparison, addition and multiplication.

n

Leaky ReLU: f(x) = x if > 0 else ax where 0 ≤ a <= 1, so that derivate is not 0 and can do some learning for that case.

n

Lots of other variations

w Sparse activation: For example, in a randomly initialized networks, only about 50% of hidden units are activated (having a non-zero output)

Artificial Neural Networks: From Perceptron to Deep Learning 71 © 2021 ⏐Younès Bennani - USPN

The Neuron Distance

x

0

x

1

x

n

w

j i

x

i

( )

2

i=0 n

a

j

w

j0

w

j1

wx 2 = w 2 − 2 w, x + x 2

F(x,w) = w− x

2

= ( w

j i

x

i

)

2

i∈Amont(j)

s

j

= ƒ ( w

j i

x

i

)

2

i∈Amont(j)

⎜ ⎜

⎟ ⎟

M

Artificial Neural Networks: From Perceptron to Deep Learning 72 © 2021 ⏐Younès Bennani - USPN

x

0

x

1

x

D

€ M

y

1

(x,w) Definition: A neural network is a valued oriented graph consisting of a set of units (or automata), performing elementary computations, structured in successive layers capable of exchanging information by means of connections that connect them.

- A network is characterized by:

- architecture - functions of its elements

€ M

€ M

-Massively parallel architecture.

- System based on the cooperation of several simple units (formal neurons).

Artificial Neural Network

y

K

(x,w)

(19)

Artificial Neural Networks: From Perceptron to Deep Learning 73 © 2021 ⏐Younès Bennani - USPN

x

0

x

1

x

D

€ M

y

1

(x,w)

y

K

(x,w)

€ M

Adaptation network parameters Synaptic

efficiency

Artificial Neural Network

Artificial Neural Networks: From Perceptron to Deep Learning 74 © 2021 ⏐Younès Bennani - USPN

How Artificial Neural Networks work?

Adaline : Adaptive Linear Element

Stanford, 1960, Bernard Widrow*

x

0

= 1 w

0

a x

1

x

D

€ M

w

i

x

i

i=0 D

w

1

w

D

It is an adaptive linear element: y = w i x i

i=0 D

The unit x0, whose activation set to 1, is called bias unit. It allows to introduce the terms of bias in the network.

* Widrow B., Hoff M.E. (1960) : « Adaptive switching circuits », IRE WESCON Conv. Record, part 4, pp. 96-104.

y = ψ (x, w) = w 0 + w i x i

i=1 D

y

f(x)=

1 si x>0

−1 si x<0

⎩ ⎪

y = ψ (x, w) = w 0 + w i x i

i=1 D

y = ψ (x,w) = w 0 + w T x

w x 2

x 1 w 0 + w T x = 0

l = w

T

x w = − w

0

w

Adaline : Adaptive Linear Element

Stanford, 1960, Bernard Widrow*

(20)

Artificial Neural Networks: From Perceptron to Deep Learning 77 © 2021 ⏐Younès Bennani - USPN

Without bias

x x x

x x

x x x x

x

oo o

o o

o o o

o o o x x

x x x

x x x x

x

oo o

o o

o o o

o o o

With bias

Adaline : Adaptive Linear Element

Stanford, 1960, Bernard Widrow*

Artificial Neural Networks: From Perceptron to Deep Learning 78 © 2021 ⏐Younès Bennani - USPN

It can be used in discrimination (classification) between 2 classes:

x x x

x x

x x x x

x

oo o

o o

o o o

o o o

x

i

C

2

C

1

!(x,w)=0

!(x,w)> 0)

!(x,w)< 0

! ", $ = $

&

+ $

(

"

> 0 +, " ∈ .

/

< 0 +, " ∈ .

1

Adaline : Adaptive Linear Element

Stanford, 1960, Bernard Widrow*

If we call the example taken into account at iteration n, We define the square of the instantaneous error by:

R %

Adaline

(n)

(w) = ( d

(n)

w x

(n)

)

2

z

(n)

= ( x

(n)

, d

(n)

)

Desired output

The mean squared error or (MSE) is defined as the mean observed squares of instantaneous errors on all of all examples:

Computed output: y

R %

Adaline

(w) = 1

N R %

Adaline

(n)

(w)

n=1 N

There are several learning algorithms.

Adaline : Adaptive Linear Element

Stanford, 1960, Bernard Widrow*

Gradient descent techniques (the biggest slope):

suppose that at the moment , the weights of the Adaline are

and show the example , then the weights will be changed by:

This rule is called stochastic gradient rule or Widrow-Hoff rule

or delta Widrow-Hoff rule or µ-LMS (Least Mean Square) rule

w

R %

Adaline

(n)

(w) = ∂ R %

Adaline(n)

(w)

w = −2 ( d

(n)

w x

(n)

) x

(n)

w(t +1) = w(t) − ε (t)∇

w

R %

Adaline

(n)

(w)

t

The step of the gradient The instant gradient

w t ( )

!

(#)

, &

(#)

Adaline : Adaptive Linear Element

Stanford, 1960, Bernard Widrow*

(21)

Artificial Neural Networks: From Perceptron to Deep Learning 81 © 2021 ⏐Younès Bennani - USPN Artificial Neural Networks: From Perceptron to Deep Learning 82 © 2021 ⏐Younès Bennani - USPN

(22)

Artificial Neural Networks: From Perceptron to Deep Learning 85 © 2021 ⏐Younès Bennani - USPN

1- Initialize randomly

2- Randomly choose a couple of data 3- Determine the value of the error

4- Calculate a gradient approximation

5- Adapt the weights

Where is de step of the gradient.

6- Repeat from 2 to 4 until an acceptable error value is obtained.

W

0

w t ( ) ε ( ) t

e

(n)

( ) t = ( d

(n)

w x

(n)

)

w

R %

Adaline

(n)

(w) = −2e

(n)

( ) t x

(n)

x

(n)

, d

(n)

( )

w(t+1) = w(t) − ε (t)∇

w

R %

Adaline(n)

(w) Learning algorithm ADALINE

Adaline : Adaptive Linear Element

Stanford, 1960, Bernard Widrow*

Artificial Neural Networks: From Perceptron to Deep Learning 86 © 2021 ⏐Younès Bennani - USPN

Adaline : Example

Data: Truth Table of a Boolean Function of 2 Variables

Problem:

Find an Adaline able to learn the truth table of a Boolean function of 2 variables.

x

1

1 1

x

2

1 -1

d

1 1

-1 1 1

-1 -1 -1

Function : !(x,w)= (x

1

or x

2

)

D

4

= 1 1

⎜ ⎜

⎟ ⎟

⎟ ,1

⎜ ⎜

⎟ ⎟

; 1

−1

⎜ ⎜

⎟ ⎟

⎟ ,1

⎜ ⎜

⎟ ⎟

;

−1 1

⎜ ⎜

⎟ ⎟

⎟ ,1

⎜ ⎜

⎟ ⎟

;

−1

−1

⎜ ⎜

⎟ ⎟

⎟ ,−1

⎜ ⎜

⎟ ⎟

⎨ ⎪

⎩ ⎪

⎬ ⎪

⎭ ⎪

Adaline : Example

x

1

1 1

x

2

1 -1

d

1 1

-1 1 1

-1 -1 -1

x 2

x 1 Function : !(x,w)= (x

1

or x

2

)

Adaline : Example

D

N

= { ( x

(1)

,d

(1)

) , ( x

(2)

, d

(2)

) ,..., ( x

(N)

,d

(N)

) }

x

1

1 1

x

2

1 -1

d

1 1

x

0

1 1

-1 1 1

1

-1 -1 -1

1

x

0

= 1 w

0

a y

x

1

x

2

w

i

x

i

i=0 n

w

1

w

2

ψ ( x,w ) = x

1

x

2

(23)

Artificial Neural Networks: From Perceptron to Deep Learning 89 © 2021 ⏐Younès Bennani - USPN

Adaline : Example

x

0

= 1 w

0

a y

x

1

x

n

w

i

x

i

i=0 n

w

1

w

2

ψ ( x, w ) = x

1

x

2

x 2

x 1 w 1 x 1 + w 2 x 2 + w 0 = 0

Artificial Neural Networks: From Perceptron to Deep Learning 90 © 2021 ⏐Younès Bennani - USPN

Adaline : limitations

ψ ( x,w ) = x

1

x

2

= XOR(x

1

, x

2

)

x 2

x 1

1 1

1 -1

-1 1 1

1

-1 1 1

1

-1 -1 -1

1

?

x

0

x

1

x

2

d x

1

x

2

x

3

x

4

Linear separability

A

B

A B

Linearly separable Non-Linearly separable

x2

x1

« Two classes of objects, described in a space of dimension n, are said to be "linearly separable" if they are on both sides of a hyperplane in the representation space. »

Madaline : Multi-Adaptive Linear Element

ψ(x,w)=x1⊕x2

Adaline with polynomial preprocessing

x

0

x

1

x

2

Adaline

x

2

x Id x

2

Id

w

0

w

1

w

2

w

3

w

4

w

5

w

1

x

1

+ w

2

x

12

+ w

3

x

1

x

2

+ w

4

x

2

+ w

5

x

22

+ w

0

= 0

(24)

Artificial Neural Networks: From Perceptron to Deep Learning 93 © 2021 ⏐Younès Bennani - USPN

x 2

x 1

w

1

x

1

+ w

2

x

1

2

+ w

3

x

1

x

2

+ w

4

x

2

+ w

5

x

2

2

+ w

0

= 0

Separation Ellipse

Adaline with polynomial preprocessing

Madaline : Multi-Adaptive Linear Element

Artificial Neural Networks: From Perceptron to Deep Learning 94 © 2021 ⏐Younès Bennani - USPN

ψ ( x,w ) = y = x

1

x

2

Madaline = a set of parallel Adalines

Adaline 1

Adaline 2

x

0

w

0 1

x

1

x

2

w

1 1

w

2 1

w

0 2

w

1 2

w

2 2

Adaline 1 Adaline 2

1 1

1 -1

-1 1 1

1

-1 1 1

1

-1 -1 1

1

1 1 1 -1

-1 1 1 -1

Adaline 3

w

3 1

w

3 2

z1z2

z

1

z

2

Adaline 3

x

0

x

1

x

2

d

d

d x

1

x

2

x

3

x

4

Madaline : Multi-Adaptive Linear Element

x 2

x 1

Adaline 1 Adaline 2

Adaline 1

Adaline 2

x

0

w

0 1

x

1

x

2

w

1 1

w

2 1

w

0 2

w

1 2

w

2 2

Adaline 3

w

3 1

w

3 2 z1∧z2

z

1

z

2

ψ (

x,w

)

=y=x1⊕x2

Madaline : Multi-Adaptive Linear Element

x

1

x

2

x 0 x 1

x 2 z

1

z

2

z 1

z 2

Madaline : Multi-Adaptive Linear Element

(25)

Artificial Neural Networks: From Perceptron to Deep Learning 97 © 2021 ⏐Younès Bennani - USPN

Perceptron Perceptron

Rosenblatt F., 1957, 1962*

* Rosenblatt F. (1957) : « The perceptron: a perceiving and recognizing automaton », Reports 85-460-1, Cornell Aeronautical Lab., Ithaca, N.Y.

* Rosenblatt F. (1962) : « Principles of Neurodynamics: perceptrons and theory of brain mechanisms », Spartan Books, Washington.

1957, Frank Rosenblatt

The perceptron does not designate a single model but groups together an important family of algorithms.

The perceptron = adaptive machine used to solve classification problems (discrimination).

X ψ ( x,w )

Artificial Neural Networks: From Perceptron to Deep Learning 98 © 2021 ⏐Younès Bennani - USPN

Perceptron

Rosenblatt F., 1957, 1962

X

i=0

nwiϕi(x) ψ

(

x,w

)

=f(wTϕ)

w0

w1

wn

ϕ

1

ϕ

2

ϕ

D

f(x)=

1 si x≥0

−1 si x<0

⎩ ⎪

The Retina

Receives the information from the

outside

Decision cell The association cells

each cell has a function

transition defined on the retina:

ψ ( x, w ) = f w

0

+ w

i

ϕ

i

(x)

i=1 D

⎛ ∑

⎝ ⎜ ⎞

⎠ ⎟ ϕ

i

( ) x : R → ℜ

R

wT=

(

w0,w1,K,wD

)

ϕ=

(

1,ϕ1(x),ϕ2(x),K,ϕD(x)

) 1

Perceptron

2-class case

x x x

x x

x x x x

x

oo o

o o

o o o

o o o

x

i

C

2

C

1

!(x,w)=1

!(x,w)= -1 A Perceptron can be seen as a 2-class classifier:

C

1

= { x ∈R : ψ (x, w ) =1 }

C

2

= { xR : ψ (x, w ) = −1 }

! ", $ = & $

'

+ )

*+, -

$

*

.

*

" =

1 0& " ∈ 2

,

−1 0& " ∈ 2

4

Perceptron

2-class case

The mean square error (MSE) is:

There are several learning algorithms.

If we call the example taken into account at iteration , we define the square of the instantaneous error by:

" # = ! 1

& '

()*

+

" !

(()

# = − '

/:1(2)34567855494:;

#

<

=

(()

>

(()

" !

(

(#) = −#

<

=

(()

>

(()

∀ @

(

, #

<

=

(()

>

(()

> 0

#

<

=

(()

> 0 DEF @

(()

∈ H

*

#

<

=

(()

< 0 DEF @

(()

∈ H

J

>

(()

= 1 DEF @

(()

∈ H

*

>

(()

= −1 DEF @

(()

∈ H

J

@

(()

, >

(()

K

(26)

Artificial Neural Networks: From Perceptron to Deep Learning 101 © 2021 ⏐Younès Bennani - USPN Stopping criterion = satisfactory classification rate.

t

Step of the gradient

Instant gradient

w t ( ) x

(n)

, d

(n)

( )

! " + 1 = ! " − '(")*

+

- ! ,

*

+

- ! = , . , - !

.! = −/

(0)

1

(0)

Perceptron

2-class case

Gradient descent techniques (the biggest slope):

suppose that at the moment , the weights of the Perceptron are and show the example , then the weights will be changed by:

Artificial Neural Networks: From Perceptron to Deep Learning 102 © 2021 ⏐Younès Bennani - USPN

Perceptron

1- Initialize randomly

2- Randomly choose a couple of data

3- Compute the output of the perceptron and compare it to

4- Adapt the weights:

If is well classified:

If is misclassified:

Where is de step of the gradient.

5- Repeat from 2 to 4 until an acceptable error value is obtained.

Learning algorithm: 2-class case

! " + 1 = ! " + &(") )

(*)

+

(*)

, !

-

)

(*)

≠ +

(*)

, !

-

)

(*)

= +

(*)

! " + 1 = ! "

&(") /

(*)

/

(*)

0

1

/

(*)

, +

(*)

+

(*)

, !

1

+ 3

456 7

!

4

)

4

/ = , !

-

)

(*)

Perceptron example

1 1

1 -1

1 1 1

1

-1 1 1

1

-1 -1 -1

1

x

0

x

1

x

2

d x

1

x

2

x

3

x

4

ψ ( x,w ) = x

1

x

2

wiϕi i=0 D

(x)

w

0

w

1

w

2

ϕ

1

( ) x

(n)

= x

1(n)

x (n )

x2

x1

ϕ

2

( ) x

(n)

= x

2(n)

1

Applet

: http://lcn.epfl.ch/tutorial/french/perceptron/html/index.html

K i 2

Perceptron

p-class case: C

1

, C

2

, …, C

p

X

j=0Dwijϕj(x)

ψ ( x, w )

wi0

wi1

win

ϕ

1

ϕ

2

ϕ

D

1

Max

wpjϕj j=0 D

(x)

w2jϕj j=0 D

(x)

w1jϕj j=0 D

(x)

1

! ", $ = &

'

() ∀ + ≠ (, $

'-

. > $

0-

. 12

! ", $ = &

'

() ∀ + ≠ (, 3

456 7

$

'4

.

4

> 3

456 7

$

04

.

4

(27)

Artificial Neural Networks: From Perceptron to Deep Learning 105 © 2021 ⏐Younès Bennani - USPN

Perceptron

1- Initialize randomly

2- Randomly choose a learning example

3- Compute the output of the perceptron and compare it to

4- Adapt the weights:

If is well classified:

If is misclassified:

Where is de step of the gradient.

5- Repeat from 2 to 4 until an acceptable error value is obtained.

!

"

# + 1 = !

"

# + '(#) *

(+)

,

(+)

-

.

= -

"

! # + 1 = ! #

'(#) /

(+)

/

(+)

0

1

/

(+)

∈ -

"

-

"

3 /

(+)

, ! = -

.

⟺ 6 = Argmax

"

!

"=

*

(+)

-

.

≠ -

"

!

.

# + 1 = !

.

# − '(#) *

(+)

,

(+)

!

@

# + 1 = !

@

# ∀ B ≠ C, 6 Learning algorithm: p-class case

Artificial Neural Networks: From Perceptron to Deep Learning 106 © 2021 ⏐Younès Bennani - USPN

Perceptron vs Adaline

2-class case

Class A Class B

Solution found by the Adaline Best robust separation between classes.

Solution found by the Perceptron Separation that minimizes the number of errors

Pe rc ep tr on Ad al in e

+

SupervisionComputed Output

Error

+

Supervision

Error

Computed Output

Perceptron convergence theorem

« If a set of examples is linearly separable, then the algorithm of Perceptron learning converges to a correct solution in a finite number of iterations »

Arbib M.A. (1987) : « Brains, Machines, and Mathematics » Berlin, Springer-Verlag.

Rosenblatt F. (1962) : « Principles of Neurodynamics » N.Y., Spartan.

Block H.D. (1962) : « The Perceptron: A Model for Brain Functioning » Reviews of Modern Physics 34, 123-135.

Minsky M.L. & Papert S.A. (1969) : « Perceptrons » Cambridge, MIT Press.

Diederich S. & Opper M. (1987) : « Learning of Correlated Patterns Spin-Glass Networks by Local Learning Rules »

Physical Review letters 58, 949-952.

Multi-layer Architecture

The credit assignment problem

Given a layered network, and a set of input-output pairs examples.

The Gradient Back - Propagation algorithm brings a simple solution to this problem.

x0

x1

xD

x

M

d

Apply the Perceptron learning algorithm to determine W2 We do not know the desired

outputs of hidden units!

We can not apply the Perceptron learning algorithm to determine W1

W1 W2

(28)

Artificial Neural Networks: From Perceptron to Deep Learning 109 © 2021 ⏐Younès Bennani - USPN

The post connectionist break

Bryson A., Denham W., Dreyfuss S. (1963) : « Optimal Programming Problem With Ineduality Constraints. I: Necessary Conditions for Extremal Solutions », AIAA Journal, Vol. 1, pp. 25-44.

LeCun Y. (1986) : « Learning Processes in Asymmetric Threshold Network » Disordered Systems and Biological Organizations, Les Houches, France, Springer, pp. 223-240.

Rumelhart D., Hinton G.E., Williams R. (1986) : « Learning Internal Representations by Error Propagation » In Parallel Distributed Processing: exploring the microstructure of cognition, Vol I, Badford Books, Cambridge, MA, pp. 318-362, MIT Press.

Control issues

1963, Bryson A., Denham W., Dreyfuss S., Gradient Back-propagation

The 80s, the exact scope of the work of Minsky & Papert Will be correctly perceived.

The rediscovery of the Gradient Back-Propagation 1986, LeCun Y.

1986, Rumelhart D., Hinton G.E., Williams R.

Références

Documents relatifs

– Convolution layers &amp; Pooling layers + global architecture – Training algorithm + Dropout Regularization.. • Useful

The first section presents biological neurons and neural networks; the second section introduces biological glia cells; The third section provides artificial neural network, and

The robot captures a single image at the initial pose, the network is trained again and then our CNN-based direct visual servoing is performed.. While the robot is servoing the

La pente indique les distributions de classes et les coûts de faux classement pour lesquels le classificateur rouge est le même que le bleu... Université Paris 13/Younès

Artificial Neural Networks: From Perceptron to Deep Learning 1 © 2021 ⏐ Younès Bennani - USPN.. Artificial

Artificial Neural Networks: From Perceptron to Deep Learning 1 © 2021 ⏐ Younès Bennani - USPN.. Artificial

Université Paris 13/Younès Bennani Traitement Informatique des Données 7. Formes

- on lui présente un exemple en entrée et le réseau calcule une sortie qui dépend de l’architecture et des poids appris. A