A 4H INTRODUCTION TO IMAGE PROCESSING

(1)

A 4H INTRODUCTION TO IMAGE PROCESSING

5 épisodes de Lupin pour présenter les techniques fondamentales et l’apprentissage profond pour le traitement d’images

[email protected] Echocardiography

MRI T1w 3D CT Scan

(2)

PART A - IMAGE PROCESSING

Introduction Basic operations Image Filtering, Segmentation, Registration

(3)

WHAT IS AN IMAGE?

• Image definition

• An image may be defined as a two-dimensional function, f(x,y)

• x and y are spatial (plane) coordinates

• the amplitude of f at any pair of coordinates (x,y) is called intensity or gray level of the image at that point

(4)

WHAT IS AN IMAGE?

• Digital Image definition

• The definition of f may be extended:

• as a n-dimensional function,

• i.e. 3D: f(x,y,z) or image sequence f(x,y,t)

• with amplitudes composed as a vector of data,

i.e. Color image: 3 components at each point, Complex number

 Usage of matrix to represent, store and manipulate images

f(x

₁

,y

₁

)= {217, 182, 167}

= { , , }

_RGB

=

x

y

RBG image

(5)

FUNDAMENTAL STEPS IN DIP

Acquisition

Pre-processing

Low- and mid- level

Processing

Mid- and high- level

Problem domain

Extracted attributes

Image Image

Image filtering and enhancement Image restoration

Multiresolution processing Reconstruction

Segmentation

Morphological processing Image compression

Representation and description Measurements

Object recognition Image classification

… Sensors

Lighting

Image formation

…

(6)

SYSTEM

Processor

DSP, computer

Problem domain

Image sensors

Image displays

Image processing

software

Mass Storage

Specialized image processing

hardware

network

JAI

(7)

A1-BASIC OPERATIONS ON IMAGE

Histogram, Spatial operation, Geometric transform

(8)

LINEAR AND NONLINEAR OPERATIONS

• Important classification of an image-processing method

• Is it a linear or a a nonlinear method ?

• Let H be a general operator

• H is said to be a linear operator if

 Tools: vector-matrix, PDE, set, and related operators (+,*, convolution)…

 ^f ⁽ ^x ^, ^y ⁾  ^g ⁽ ^x ^, ^y ⁾

H 

     

) , ( )

, (

) , ( )

, ( )

, (

y x g a y

x g a

y x f H a y

x f H a y

x f a y

x f a H

j j i

i

j j

i i

j j i

i











(9)

OPERATIONS ON IMAGES

• Spatial Operations - Single pixel operations

• r: original intensity I(i,j),

• s: new intensity J(i,j),

• T: a transformation function

) ( r T s 

Some basic grey-levels transformation functions used for image

enhancement

Range of intensities:

[0,L-1]

Negative Original

Example : negative

(10)

• Examples

Mean = 100 Mean = 130

 20



Operations on images : Histogram

(11)

11

v=f(u)

0 255 u

v=f(u)

Histogram Equalization

(12)

LOCAL OPERATIONS ON IMAGES

• Arithmetic Operations

• Application: Corrupted image g obtained by adding the noise η to a noiseless image f

• assumptions :

• at every pair of coordinates (x,y) the noise is uncorrelated

• the noise has zero average value

 Noise reduction by summing (averaging) a set of noisy images







 , , ,

) , ( )

, ( )

,

( x y f x y x y

g   

) , ( )

,

(

1 .

y x y

x

g

K 

^

 

(13)

55 ,

 3



45 ,

 25

  ^¹⁸^,³

5 ,

13

  11,6

1 2

6 4

Original

Noise Reduction

(14)

• Spatial Operations - Neighborhood operations

• S_xy: set of coordinates of a neighborhood centered on a point (x,y) in an image f.

• Neighborhood processing generates one corresponding pixel in the output image g at the same (x,y) coordinates.

• The value of that pixel in g is determined by a operation involving the pixels in S_xy.

14

(15)

OPERATIONS ON IMAGES

• Spatial Operations - Neighborhood operations

-1, -1, -1, -1, -1 -1, -1, -1, -1, -1 -1, -1, 24, -1, -1 -1, -1, -1, -1, -1 -1, -1, -1, -1, -1

Mean of S_xy S_xy

S_xy

FIR filters…







Sxy

c r

c r n f

y m x g

) , (

) , . (

) 1 , (

(16)

OPERATIONS ON IMAGES

• Convolution









H l k

l j

k i

f l k h j

i g

) , (

) ,

( ) , ( )

, (

Output image

Convolution mask

Impulse response Input image

g(x,y) = h(x,y)*f(x,y)

(two-dimensional convolution)

i

j j

i

j+1 h i+1

f g

(17)

IMAGES TRANSFORMATION

• Previous methods work in spatial domain

• In some cases, image processing tasks are best formulated in a transform domain.

• i.e. frequency  Fourier

 Many transformations exist

Transform Operation

R

Inverse transform

f(x,y) T(u,v) R[T(u,v)] g(x,y)

Spatial domain

Spatial domain Transform domain

(18)

IMAGES TRANSFORMATION

• A particularly important class of 2D linear transforms can be expressed in the general form



^







 ¹

0 1

0

) ,

; , ( ).

, ( )

, (

M

x N

y

v u y x r y x f v

u T

Forward transformation kernel

Input image forward transform



^







 ¹

0 1

0

) ,

; , ( ).

, ( )

, (

M

u N

v

v u y x s v u T y

x f

inverse transformation kernel

Forward transform Recovered image

Separable kernel:

Symmetric kernel:

) , ( ).

, ( )

, , ,

(x y u v r₁ x u r₂ y v

r 

) , ( ).

, ( )

, , ,

(x y u v r₁ x u r₁ y v

r 

(19)

IMAGES TRANSFORMATION, DFT

• 2-D Discrete Fourier Transform (DFT)

) / /

(

) 2

, , ,

(x y u v e ^j ^ux ^M ^vy ^N

r  ^ ^ ^

) / /

(

1 2

) , , ,

( e^j ^ux ^M ^vy ^N

v MN u

y x

s  ^ ^

Forward kernel Inverse kernel



^











¹  0

1

0

) / /

(

).

2

, ( )

, (

M

x

N

y

N vy M ux

e

j

y x f v

u

T

^



^







¹  0

1

0

) / /

(

).

2

, 1 (

) , (

M

x

N

y

N vy M ux

e

j

v u MN T

y x

f

^

Complex numbers…

Modulus and phase (angle)

(20)

Image example:

2-D Fourier transform

DFT modulus

phase DFT^-1

(21)

f

_i

f

_j

High

frequency

f

_i

f

_j

Low

frequency

(22)

gauss

grid Weighted 2D Dirac comb

(23)

modulus phase modulus phase

DFT

(24)

TRANSFORMATIONS

• Spatial transformations

• Example

• Shrink image to half its size

• Affine transform:

• Homographic transform

• Two cameras looking points on a plane : x,y,…z !

• Higher order



⁽ ^, ⁾



⁽ ^/²^, ^/²⁾

) ' , '

(x y T x y  x y

 

 







 









1 0 0 ].

1 , , [ 1 , ' , '

32 31

22 21

12 11

t t

y x y

x

 ^x ^' ^, ^y ^' ^, ¹  ^ ^[ ^x ^, ^y ^, ^x ^², ^y ^², ^xy ^,..., ¹ ^]. ^T

(25)







 

















y y

x x

' ' 1

0 0

0 1 0

0 0 1







 

















y c y

x c c x

c

y x y

x

. '

. ' 1

0 0











 

















y x

t y y

t x x t

t '

' 1

0 1 0

0 0 1 identity

scaling

translation









 

















y y

y s x

s_x x ^x

'

. '

1 0 0

0 1

0 0 1 Shear

(vertical)









 

















x s y y

x s x

y y

. '

' 1

0 0

0 1 0

0 Shear 1

(horizontal)

x’= x

y’= y+0.5 x y’

x’

scaling

translation

horizontal shearing y’

x’

y’

x’

y’

x’

(26)

• Affine transform: rotation

35° degrees rotation (from image center)





















1 0

0

0 cos

sin

0 sin

cos



Rotation T

Remark : interpolation is necessary to avoid aliasing

(27)

• Higher order transforms

Applications :

Lens distortion correction, perspective

Orthoscopic projection

Barrel distortion

Pincushion distorsion

(28)

A2 - APPLICATIONS : IMAGE FILTRERING

Linear, non linear filtering Image restauration

(29)

NOISE CLEANING

• Many types of noise…

Gaussian Noise (sd 25)

Salt and pepper noise

(30)

LINEAR FILTERING

• Mean filter ^{g i j} ^{h k l f i} ^{k j} ^l

k l W

( , ) ( , ) ( , )

( , )



 

 



W: 25 neighbors H=(1/25).I

Result on Gaussian noise Result on Salt & Pepper noise

(31)

NON LINEAR FILTERING

• Median filter

Result on salt & pepper noise Result on Gaussian noise W=25 W=9

(32)

Mean Filter width = 3

Median Filter

neighborhood = 3

• Median filter

• Advantage of median filtering over linear filtering: edges are preserved

(33)

(34)

Noisy image (S&P, 0.35)

(35)

(36)

36

Reference Noisy image (S&P, 0.35)

Median Filter (5x5) Adaptive Median (Smax9x9)

(37)

37

+ =

Corrupted image I Shift filter

Reference Image L

f

Noise

n

( I, f ) L ?

(38)

EXAMPLES

defocussed acquisition Moving object

(39)

HOW TO ‘DECONVOLVE’?

• Inverse filtering

• Without noise

• With noise

• We have to known N!

• What happen for small values of H(u,v) ?

Solutions

Minimum Mean Square Error (Wiener) Filtering

Constrained Least Squares Filtering

When H is unknown : blind deconvolution

Many others approaches exist, ie the ‘blind’ ones

i(x,y) = f(x,y) L(x,y) + n(x,y)*

(40)

WIENER (MINIMUM MSE) AND CLS

• Wiener (Fourier), Minimum MSE

• Constrained Least Square (Fourier)

f(u,v) = degradation function in the Fourier space

|f(u,v)|² = f*(u,v).f(u,v), with f*(u,v) the complex conjuagte of f S_n(u,v) = |N(u,v)|² = noise power spectrum

S_L(u,v) = |L(u,v)|² = power spectrum of undegraded image ɣ = parameter to tune

(41)

(42)

A3 - APPLICATIONS : IMAGE SEGMENTATION

Definition From thresholding to region growing

(43)

DEFINITION : SEGMENTATION

• Process of partitionning a digital image into contiguous regions which members shared similar properties.

Pixels or voxels … or samples

Features (intensity, texture, …) split, delineate, cut, …

Input Segmentation

overlay Label of regions

(44)

SEGMENTATION AIM? 2 POINTS OF VIEW

• Low-level processing

• Over segmentation of image

• Applied on image feautres

Simplify, compress the image content with regions or clusters

• High level processing (final process)

• Use prior knowledge

• Last process before region/object analysis

[Comaniciu 2000]

(45)

SEGMENTATION, USAGE?

• Visualization

• Counting

• Identification

• Measurements (shape, volumes …)

• Tracking

HDR mode

IRM SEP

[Comaniciu 2003]

(46)

3D Slicer – Quick Manual Segmentation with interpolation

https://www.youtube.com/watch?v=u93kI1MG6Ic

WHY MANUAL

SEGMENTATION IS EXHAUSTING

(47)

THRESHOLDING

If I(x,y) < s S(x,y) = 0 Else

S(x,y) = 1

I(x,y) S(x,y)

1

0

(48)

Average intensity of regions a and b

Nombers of pixels of each

region a and b (divided by total number of pixels)

AUTOMATIC THRESHOLDING

• Find a threshold value that best split the histogram

• Otsu (1979)

• Generalization : k-means, Expectation Maximisation (EM), …

• Ostu

• find a value that minimises the intra-class variance

• … or maximizes the inter-class variance

• (the algorithm optimization is iterative)

 

²

* arg max _a( ) _b( ). _a( ) _b( )

s

s  w s w s  s   s

s a b

(49)

116

104 192

134

1 threshold (116)

2 thresholds (104 ; 192)

(50)

MATHEMATICAL MORPHOLOGY

• Def. : Outils pour extraire des constituants de l’image utiles pour décrire ou représenter formes, frontières ou squelettes de régions.

• Elément structurant

• Forme souvent circulaire ou directionnel

• Valeur binaire: on prend / on ne prend pas

• Exemple

Initial region Dilated region

Structuring element

Pierre Soille's book "Morphological Image Analysis: Principles and Applications", Second Edition, Springer, 2003

(51)

BINARY OPERATIONS

• Complement

• Dilatation

• Erosion

• Opening

• Closing

 ^/

z



X   B z X  B  

 ^/

_z



X B  z B  X

 ^/ 

X

c

 z z  X

^{Input X}

2 érosions 6 érosions

( )

X B  X B  B

( )

X   B X  B B

B

( ) X



B

( ) X



(52)

GRAYSCALE OPERATIONS

• Dilatation

• Erosion

Input X

 

( ( )) sup ( ) /

B f X f y y Bx



 

 

( ( )) inf ( ) /

B f X f y y Bx



 

( ( ))

B f X



( ( ))

B f X



Opening Closing Derivative filter

( ) ( ) ( )

B B B

G f   f  f

( ) ( ( ))

B f B B f

    ( ) ( ( ))

B f B B f

  

(53)

COMBINING OPERATIONS

• Distance map (binary image)

• successive erosions

• Accumulate differences

• Top hat

• Iterative filtering



1



₂



₅



₈

2 2

 

1 1 1

1

( ) ( ( ))

( ) ( ( ( )))

n n n n

f f

 

_

 

  

1 1 1

1

( ) ( ( ))

( ) ( ( ( )))

n n n n

f f

 

_

 

  

Distance map

(54)

• Principle

• Choose some seeds in the image  initial region

• Add to the region the neighboring pixels that verify the agglomeration criterion

• Repeat while the region evolve

[Zucker ’76]

Neighboring pixels : added pixels Initial region

(seeds)

R

^[t]

R

^[t+1]

(55)

REGION GROWING

• Algorithm

• Determine C neigbhoors of region R

• Add neigboors similar to region R

• Iterate while R is growing

Spatial proximity Resemblance



^{( )}



E R



μ I

Mean of intensities of R

Similarity …

d  μ I x  ( )

Criteria : x is similar to R if :

d  seuil

Exemple

(56)

REGION GROWING, EXAMPLES

CT kidney MRI Brain

(57)

SEGMENTATION

• Think simple and don’t forget pre processing (filtering)

• Identify your real problem. Is it related to:

• Shape ?

• Numbers of objects?

• Identification of type of objects?

Recommended Tools : Photoshop, Fiji, Slicer, (ITK/OpenCV/scikit-image)

(58)

A4 - APPLICATIONS : IMAGE REGISTRATION

Basics Atlas segmentation

(59)

IMAGE REGISTRATION PRINCIPLE

• Optimize a geometrical image transformation T to maximize the similarity S between a reference image I_r and a moving image I_m (the transformed one).

I

_r

I

_m

||I

_r

– T(I

_m

)||

²





















1 0

0

0 cos

sin

0 sin

cos



 T

S

Try another

that improves S

(60)

ITERATIVE EVOLUTION

I

_r

I

_m

https://itk.org/Wiki/SimpleITK/Tutorials/MICCAI2015

(61)

Original

T=translation only T=Affine transformation

T=Affine

transformation (on previous image)

Original

(62)

24 months MRI (FLAIR) follow up

Multiple sclerosis

(63)

REGISTRATION, KEY POINTS

• Transformation : translation, rigid, affine, B-spline (deformable)

• Translation: 2 or 3 parameters;

• Rotation : 1 (2D) ; 3 (3D)

• Scale :1 for rigid; 2 (2D) and 3 (3D) for affine

• Similarity : Sum of Square Difference, Normalized Correlation, Mutual Information

Recommended tools : ANTS, Elastix, (ITK)

(64)

PART B - MACHINE LEARNING

As fast as image introduction Elements for ANN Two architectures : classification and segmentation

(65)

ARTIFICIAL INTELLIGENCE AND SOCIETY

Today : leaders on personnal data use it extensively

… Until few years ago only science fiction…

(66)

… BUT ALSO

• Why is it so exciting for value creation?

• Proposes solutions to (all) problems that have been partially solved or not

• Obtain better performances than conventional approaches (quality and/or use of resources)  2012

• Intrinsically allows to have a usable solution after the 'research' and development phase

(67)

https://pjreddie.com/darknet/

(68)

MACHINE LEARNING APPROACHES

• Supervised approaches (non ANN)

• Bayes, kNN, Support Vector Machine, Random Forest, LDA, Regression Logistique, …

• Un-supervised approaches (non ANN)

• K-means, GMM, Hierarchical approach, DBScan, Laio-Rodriguez, …

 ANN (Artificial Neural Network)

 « DNN » (Deep Neural Network) MLP/ CNN / RNN / BM

CNN

(69)

NOTATIONS : DATA

• Instance of all features values : x

• The corresponding predicted label : û

• Examples x : û

• Bank statement : solvency

• Flower picture : identification (rose, dendelion, iris…)

• Address, surface, year , … : apartment price

 a Dataset consists of N instances (x, u)  X, u

 Training set, validation set

 Test set (assessment)

(70)

CLASSIFICATION PROBLEM

• Objective : split the space to identify new incomming

instance

scikit-learn.org

(71)

SCHEMA

model

input output

û u

L(u,û)

loss

(72)

BRIEFLY, ANN ARE

• Collection of connected neurons, structured by layers

https://cs231n.github.io/neural-networks-1/

weight: w_i

bias : b Activation function:

f()

(73)

BRIEFLY, ANN ARE

• Collection of connected neurons, structured by layers

https://cs231n.github.io/neural-networks-1/

16 + 10 = 26 variables 16 + 20 + 4 = 40 variables

(74)

PLAYGROUND TENSORFLOW

• Demo

• One : simple example

• Two : XOR case

http://playground.tensorflow.org

(75)

TOWARD MATRIX/VECTOR MANIPULATION

• Neuron with bias : w_k, b_k input : x_k output : z_k

• z_k = w^T x_k + b x of size K_l-1

• A layer l (without activation) : W_l, b_l

• z_l=W_l x +b_l W_l of K_l-1 row by K_llines With x⁺ and W⁺are [1,x] and [b, W]

• z_l= W⁺_lx⁺

• Activation layer

• y_l = f_l( z_l)

(76)

ACTIVATION FUNCTIONS (LOCAL)

Rectified Linear Unit (reLU )

and Leaky reLU (one parameter) Exponential Linear Unit (eLU )

Activation Google (swish) Sigmoid

(77)

LAYER)

softmax

x₁ x₂

x_K,l-1 1 w₀

Layer l Input

… …

w₁ w₂

w_K,l-1

z₀ z₁

z_K,l

softmax

…

y₀ y₁

y_K,l

Used when normalization of the output layer is needed

 i.e. probability

(78)

SCHEMA

Model

input output

û u

L(u,û)

loss

(79)

input output

û u

L(u,û)

loss

Back propagation

Forward propagation

(80)

OPTIMIZATION OF W : TRAINING

• Loss function must be minimized and will drive the evolution of W: gradient descent

• In the perceptron case, Novikoff (1962) prove the convergence in (D/g)² iterations

• ||xⁱ||₂< D

• yⁱu^Txⁱ ≥g with ||u||₂ =1

• D and g in R⁺_*

Learning rate

(81)

OPTIMIZATION W : DERIVATION?

• To update the w^l_q,k the weight of neuron q of layer l-1 to neuron k of layer l we need to compute

• Usage of chain rule

(82)

OPTIMIZATION W : DERIVATION?

• Matrice view chain rule

Hadamard product

(element wise multiplication) Derivative computed for layer l+1

(wrt to its input x^l+1)

Matrix transpose

(83)

FORMULA…

• https://en.wikipedia.org/wiki/Matrix_calculus

Numerator or denominator layouts

(84)

BACKPROPAGATION EXAMPLE

• f(3,4) and

Aurélien Géron:

Hands-on

(85)

GRADIENT DESCENT

• Iterative process

• Can be regularized (momentum, Adam)

• 3 ways of using data

• Use the whole dataset

• Long, small update

• Use one instance (Stochastic)

• Fast but noisy

• Use a couple of instances

•  mini batch

• Quite fast, less noisy

= 0 ?  minimum

(86)

LOSSES EVOLUTION DURING TRAINING

• Loss on training set  learning the problem

• Loss on validation set  generalization ability Loss

Epoch or iterations Validation Loss Training Loss From here, the network performs

worst on unseen data

(87)

REAL EVOLUTION CURVES…

• https://www.tensorflow.org/tutorials/keras/overfit_and_underfit

Tiny : 16

Small : 16, 16

Medium:64,64,64

Large: 512, 512, 512, 512 Dense networks

(88)

MLP : CLASSIFICATION EXAMPLE

https://ml4a.github.io/ml4a/neural_networks/

(89)

MLP : CLASSIFICATION EXAMPLE

https://ml4a.github.io/ml4a/neural_networks/

784

(90)

MLP: LIMITATIONS

160M parameters for the first layer!

(91)

MLP LIMITATIONS AND SOLUTION!

Fully connected Not fully connected Sharing weights

(92)

B2 – CONVOLUTIONAL NEURAL NETWORK : ELEMENTS

Convolution and related dedicated layers

(93)

CNN, YANN LECUN

1000

x₁ x₂ x₃ x₄

x_1000000…

Input Hidden

…

We have to reduce the number of dimensions if we don’t want some influence on result

2 properties of natural and numeric data (link to human)

 Multiscale hierarchic organization

 Symetries

 Too much dimensions!

Use carrefully the fully connected

« malédiction de la grande dimension »

Le nombre d’exemples

d’apprentissage doit augmenter exponentiellement avec le nombre de dimensions

tinyurl.com/cours-Mallat

>-

(94)

ARTIFICIAL…

https://neuwritesd.files.wordpress.com/2015/10/visual_stream_small.png

https://fr.mathworks.com/solutions/deep-learning/convolutional-neural-network.htm

… vs natural

(95)

TYPICAL CNN (CLASSIFICATION)

https://web.eecs.umich.edu/~honglak/icml09-ConvolutionalDeepBeliefNetworks.pdf

convolution

pooling flatten softmax

convolution

(96)

(97)

NETWORK

• Convolution

• Pad

• border condition

• Stride

• jump

• Atrous

• empty pixels in mask

http://deeplearning.net/software/theano/tutorial/conv_arithme

tic.html#zero-padding-unit-strides

(98)

CHANNELS

convolution

One input channel

Three filters Three output channels Filters have the same

number of channels than the inputs

(99)

PRE-PROCESSING ‘LAYERS’

Intensity values of input data are critical: how to warrant that the dynamic and statistic are similar on all inputs ?  pre-process all data!

• Mean-substraction

• Normalizatation

• Local contrast normalisation

• (size adaptation and interpolation)

(100)

Half size padding (‘same’ size) Unit stride

No zero padding, unit stride

No zero padding,

Non-unit stride Half size padding (‘same’ size)

Non-unit stride Atrous or dilated filter Full padding, Unit stride

(101)

• Down sampling

• max,

• mean

• …

• Up sampling

• Nearest Neighbour,

• deconvolution – transposed convolution

https://cs231n.github.io/convolutional-networks/

Max pooling

(102)

DON’T MISS IT

• Convolution works on all channels

• Generally 3 by 3, 5 by 5, 7 by 7 or 1 by 1

• At a given layer, many channels are obtained, one per convolution

• Pooling works on channel per channel

• Generally size is decreased by 4 (2 by 2)

• (it can be increased)

• Receptive field

Input channels

Same number of channels

(103)

B3 - CLASSIFICATION AND SEGMENTATION

NETWORKS

(104)

1. CNN

• Classification, détection, régression

[Krizhevsky 12]

96 filtres (ImageNET)

Filtres Gabor

(105)

2. U-NET ⁽²⁰¹⁵⁾

• Segmentation

• Filtering

Output

Same support as input

A 4H INTRODUCTION TO IMAGE PROCESSING