LEARNING DYNAMIC SYSTEMS :

(1)

LEARNING DYNAMIC SYSTEMS : MARKOV MODELS

Markov Process and Markov Chains Hidden Markov Models

Bayesian Filtering and Kalman Filter

(2)

Bayesian, Machine Learning, Frederic Pennerath

Example of application:

Navigation Systems and Data Fusion

GPS:

☺ No drift

 Medium accuracy

 Not responsive Altimeter:

☺ No drift

 Altitude only

 Not accurate

Inertial System:

☺ Responsive

 Long term drift Navigation System:

Position (L, l, z)

Angles (yaw, pitch, roll) Speed

Angular speed

Tracking example using camshift & Kalman filtering

(3)

The whole picture:

Measure Y _m

Estimated state 𝑋 ෠ Control U

Sensor

Controller

State estimator U Real state 𝑋

Input U Real system Output Y

Environment Calculator DAC

State 𝑋 is a hidden variable

(4)

Controllable Markov models

Update Markov models with a command 𝑼 _𝒕 :

𝑃 𝑋 _𝑡 𝑋 _𝑡−1 , 𝑈 ₁ , … 𝑈 _𝑡−1 , 𝑌 ₁ , … 𝑌 _𝑡−1 = 𝑃 𝑋 _𝑡 𝑋 _𝑡−1 , 𝑈 _𝑡−1 and

𝑃 𝑌 _𝑡 𝑋 _𝑡 , 𝑈 ₁ , … 𝑈 _𝑡−1 , 𝑌 ₁ , … 𝑌 _𝑡−1 = 𝑃 𝑌 _𝑡 𝑋 _𝑡

𝑋

₀

…

𝑈

₀

𝑋

₁

𝑌

₁

𝑈

₁

𝑋

₂

𝑌

₂

𝑈

_𝑡−1

𝑋

_𝑡

𝑌

_𝑡

𝑃 𝑋 ₂ 𝑋 ₁ , 𝑈 ₀ , 𝑈 ₁ , 𝑌 ₁ = 𝑃 𝑋 ₂ 𝑋 ₁ , 𝑈 ₁

𝑃 𝑌 ₂ 𝑋 ₂ , 𝑈 ₀ , 𝑈 ₁ , 𝑌 ₁ = 𝑃 𝑌 ₂ 𝑋 ₂

(5)

Bayesian Filtering:

Prediction and update steps

… 𝑋

_0|0

𝑈

₀

𝑋

_1|0

𝑋

_1|1

𝑈

₁

𝑋

_2|1

𝑋

_2|2

𝑈

_𝑡−1

𝑋

_{𝑡|𝑡−1}

𝑋

_𝑡|𝑡

Prediction step:

• Predicts future state given commands and observations:

X _t|t−1 ≝ X _𝑡 |𝑈 ₀ … 𝑈 _𝑡−1 𝑌 ₀ … 𝑌 _𝑡−1

• Model integration

• Increase of uncertainty (usually)

Update step:

• Adjust current state given new observation:

X _t|t ≝ X _t |𝑈 ₀ … 𝑈 _𝑡−1 𝑌 ₀ … 𝑌 _𝑡−1 𝑌 _𝑡

• Model correction

• Decrease of uncertainty

(6)

Bayesian Filtering:

State Space Representation

g _t (X _t , U _𝑡 ) State 𝑋 _𝑡

Input U _𝑡 Output Y _t

f _𝑡 (X _t , U _t )

State 𝑋 _𝑡+𝜏 Delay 𝜏

• Continuous system (𝑓 _𝑡 , 𝑔 _𝑡 ):

൞ 𝑑𝑋 _𝑡

𝑑𝑡 = 𝑓 _𝑡 𝑋 _𝑡 , 𝑈 _𝑡 𝑌 _𝑡 = 𝑔 _𝑡 𝑋 _𝑡 , 𝑈 _𝑡

• Discrete system (𝑓 _𝑡 , 𝑔 _𝑡 ):

ቊ 𝑋 _𝑡+𝜏 = 𝑓 _𝑡 𝑋 _𝑡 , 𝑈 _𝑡 𝑌 _𝑡 = 𝑔 _𝑡 𝑋 _𝑡 , 𝑈 _𝑡

• Stationary system:

𝑓 _𝑡 = 𝑓, 𝑔 _𝑡 = 𝑔

• Non deterministic system:

൝ 𝑋 _𝑡+𝜏 = 𝑓 _𝑡 𝑋 _𝑡 , 𝑈 _𝑡 + ℰ _𝑡 ^𝑋 𝑌 _𝑡 = 𝑔 _𝑡 𝑋 _𝑡 , 𝑈 _𝑡 + ℰ _𝑡 ^𝑌

• Linear system (A _t , B _t , C _t , D _t ):

ቊ 𝑋 _𝑡+𝜏 = 𝐴 _𝑡 𝑋 _𝑡 + 𝐵 _𝑡 𝑈 _𝑡 𝑌 _𝑡 = 𝐶 _𝑡 𝑋 _𝑡 + 𝐷 _𝑡 𝑈 _𝑡

State space representation ensures

Markov property.

(7)

A simple example:

localization of a wheeled vehicle

𝑋 _𝑡 = 𝑥 _𝑡 𝑦 _𝑡 𝜃 _𝑡 𝑣 _𝑡

, 𝑈 _𝑡 = 𝒞 _𝑡

𝛼 _𝑡 , 𝑌 _𝑡 ^𝑜𝑑𝑜 = 𝑣 _𝑡 ^𝑜𝑑𝑜 , 𝑌 _𝑡 ^𝑔𝑝𝑠 = 𝑥 _𝑡 ^𝑔𝑝𝑠 𝑦 _𝑡 ^𝑔𝑝𝑠 𝜃 _𝑡

𝒞 _𝑡 = 1

𝑅 _𝑡

(8)

A simple non-linear example:

localization of a wheeled vehicle

𝑋 _𝑡 = 𝑥 _𝑡 𝑦 _𝑡 𝜃 _𝑡 𝑣 _𝑡

, 𝑈 _𝑡 = 𝒞 _𝑡 𝛼 _𝑡

𝜃

_𝑡

𝒞

_𝑡

= 1

𝑅

_𝑡

Continuous model: Discretized model:

𝑑𝑋 _𝑡

𝑑𝑡 = 𝑑 𝑑𝑡

𝑥 _𝑡 𝑦 _𝑡 𝜃 _𝑡 𝑣 _𝑡

=

𝑣 _𝑡 cos 𝜃 _𝑡 𝑣 _𝑡 s𝑖𝑛 𝜃 _𝑡

𝑣 _𝑡 𝒞 _𝑡 1

𝑀 𝛼 _𝑡 − 𝑓 𝑀 𝑣 _𝑡

⇒ 𝑋 _𝑡+𝜏 = 𝑋 _𝑡 +

𝑣 _𝑡 sin 𝜃 _𝑡 𝑣 _𝑡 cos 𝜃 _𝑡

𝑣 _𝑡 𝒞 _𝑡 1

𝑀 𝛼 _𝑡 − 𝑓 𝑀 𝑣 _𝑡

⋅ 𝜏

(9)

A simple example:

the problem of open loop control

𝑋 _𝑡 = 𝑥 _𝑡 𝑦 _𝑡 𝜃 _𝑡 𝑣 _𝑡

, 𝑈 _𝑡 = 𝒞 _𝑡 𝛼 _𝑡

𝜃

_𝑡

𝒞

_𝑡

= 1

𝑅

_𝑡

𝑑𝑋 _𝑡

𝑑𝑡 = 𝑑 𝑑𝑡

𝑥 _𝑡 𝑦 _𝑡 𝜃 _𝑡 𝑣 _𝑡

=

𝑣 _𝑡 × cos 𝜃 _𝑡 + 𝜀 _𝑡 ^𝑥 𝑣 _𝑡 × sin 𝜃 _𝑡 + 𝜀 _𝑡 ^𝑦

𝑣 _𝑡 × 𝒞 _𝑡 + 𝜀 _𝑡 ^𝜃 1

𝑀 𝛼 _𝑡 − 𝑓

𝑀 𝑣 _𝑡 + 𝜀 _𝑡 ^𝑣

Wind, slope, collision, load, etc Slippy road, skid

Non linearities Uncertainties

(10)

0 10 20 30 40 50 60 70 80 90 100

0 50 100

t (s)

speed (km/h)

Pure integration

real estimated

0 10 20 30 40 50 60 70 80 90 100

0 1 2 3

t (s)

distance (km)

A simpler linear example:

Problem: measuring speed 𝑣(𝑡) and travelled distance l 𝑡 of a vehicle.

Silly solution: pure model integration

𝑑𝑙

𝑑𝑡 𝑡 = 𝑣 𝑡 + 𝜀 _𝑡 ^𝑙 𝑚 𝑑𝑣

𝑑𝑡 𝑡 = −𝑓 ₀ sign 𝑣 𝑡 − 𝑘 _𝑓 𝑣 𝑡 + 𝑎 𝑡 + 𝜀 _𝑡 ^𝑣

Better solution: using sensor

Using odometer counting wheel turn pulses

𝑦 _𝑡 = 𝑙 _𝑡 + 𝜀 _𝑡 ^𝑦 How to improve it?

• What happens if car slips?

• What happens if odometer breaks?

• How to integrate GPS?

0 10 20 30 40 50 60 70 80 90 100

0 50 100

t (s)

speed (km/h)

Using odometer

real estimated

0 10 20 30 40 50 60 70 80 90 100

0 0.5 1 1.5

t (s)

distance (km)

Simulations with varying

slope and wind

(11)

Linear State Space Representation

State integration:

𝑋 _𝑡 = 𝑣 _𝑡

𝑙 _𝑡 , 𝑈 _𝑡 = 1

𝑎 _𝑡 , ℇ _𝑡 ^𝑋 = 𝜀 _𝑡 ^𝑣 𝜀 _𝑡 ^𝑙

𝑋 _𝑡+1 = 𝑋 _𝑡 + 1

𝑚 −𝑓 ₀ − 𝑘 _𝑓 𝑣 _𝑡 + 𝑎 _𝑡 + 𝜀 _𝑡 ^𝑣 𝑣 _𝑡 + 𝜀 _𝑡 ^𝑙

𝜏 ⇔ 𝑋 _𝑡+1 = 𝐴 𝑋 _𝑡 + 𝐵 𝑈 _𝑡 + ℇ _𝑡 ^𝑋

with 𝐴 = 1 −

𝑘

_𝑓

𝑚 𝜏 0

𝜏 1

, 𝐵 = ^𝜏

𝑚

−𝑓 ₀ 1 0 0 , Output equation:

𝑌 _𝑡 = 𝐶𝑋 _𝑡 + 𝐷𝑈 _𝑡 + ℇ _𝑡 ^𝑌

with C = 0 1 , D = 0 0 , ℇ _𝑡 ^𝑌 = 𝜀 _𝑡 ^𝑜𝑑𝑜

(12)

Bayesian Filtering:

Kalman filter

Hypothesis of Kalman filters:

• The state representation is linear.

• Every “input” of the model (X ₀ , ℇ _𝑡 ^𝑋 , ℇ _𝑡 ^𝑌 , 𝑈 _𝑡 ) is normally distributed Consequences:

• Every state X _t |𝑋 ₀ , 𝒰 _𝑡 , 𝒴 _𝑡 , and output Y _t |𝑋 ₀ , 𝒪 _𝑡 is normally distributed

• State estimation problem is defined by:

– Model parameters: 𝐴

_𝑡

, 𝐵

_𝑡

, 𝐶

_𝑡

, 𝐷

_𝑡

– Initial state distribution: 𝑋 ෠

₀

= 𝐸 𝑋

₀

𝑃

₀

= 𝑐𝑜𝑣 𝑋

₀

– State noise distribution: 𝑅

_𝑡

= 𝑐𝑜𝑣 ℇ

_𝑡^𝑋

𝐸 ℇ

_𝑡^𝑋

= 0 – Output noise distribution: 𝑄

_𝑡

= 𝑐𝑜𝑣 ℇ

_𝑡^𝑌

𝐸 ℇ

_𝑡^𝑌

= 0

– Output samples: 𝑦

_𝑡

– Input samples: 𝑢

_𝑡

(13)

-1 0 1 2 3 4 5 6

Multivariate normal distribution:

definition

Generalization to ℝ ^𝐦 :

𝑋 _𝑚 ~𝒩 𝜇 _𝑚 , Σ _𝑚𝑚 ⟺ 𝑓 _𝑋 _𝑚 𝑋 = 1

2𝜋 ^𝑚 Σ _𝑚𝑚 𝑒 ⁻ ¹ ^{2 𝑋−𝜇} ^𝑚 ^𝑇 ^Σ ^𝑚𝑚 ⁻¹ ^𝑋−𝜇 ^𝑚 Basic properties:

• 𝐸 𝑋 _𝑚 = 𝜇 _𝑚

• 𝑐𝑜𝑣 𝑋 _𝑚 = 𝐸 𝑋 − 𝜇 _𝑚 ^𝑇 𝑋 − 𝜇 _𝑚 = Σ _𝑚𝑚 Example:

– 𝜇 ₂ = 2

1 , Σ ₂₂ = 2 −1

−1 1

-1 0 1 2 3 4 5

-4 -2 0 2 4 6

0 0.05 0.1 0.15 0.2

x y

(14)

Multivariate normal distribution:

fundamental properties

Closed under linear transformation:

𝑋~𝒩 μ, Σ , μ ∈ ℝ ^m , Σ ∈ 𝑀 _𝑚𝑚 , A ∈ M _nm , B ∈ M _n1 ⟹

𝐴𝑋 + 𝐵~𝒩 𝐴𝜇 + 𝐵, 𝐴Σ𝐴 ^𝑇 Particular cases:

Given 𝑋 ₁

𝑋 ₂ ~𝒩 𝜇 ₁

𝜇 ₂ , Σ ₁₁ Σ ₁₂ Σ ₁₂ ^𝑇 Σ ₂₂

• Addition: 𝑋 ₁ + 𝑋 ₂ ~𝒩 𝜇 ₁ + 𝜇 ₂ , Σ ₁₁ + Σ ₂₂ + Σ ₁₂ + Σ ₁₂ ^𝑇

• Marginalization: 𝑋 ₁ ~𝒩 𝜇 ₁ , Σ ₁₁

(15)

Multivariate normal distribution:

fundamental properties

Closed under conditioning:

𝑋 ₁

𝑋 ₂ ~𝒩 𝜇 ₁

𝜇 ₂ , Σ ₁₁ Σ ₁₂

Σ ₁₂ ^𝑇 Σ ₂₂ ⟹

𝑃(𝑋 ₁ |𝑋 ₂ = Ԧ 𝑥)~𝒩 𝜇 ₁ + Σ ₁₂ Σ ₂₂ ⁻¹ 𝑥 − 𝜇 Ԧ ₂ , Σ ₁₁ − Σ ₁₂ Σ ₂₂ ⁻¹ Σ ₁₂ ^𝑇 Closed under (pdf) multiplication:

𝒩 𝜇 _𝐴 , Σ _𝐴 × 𝒩 𝜇 _𝐵 , Σ _𝐵 ≡ 𝒩 Σ _A ⁻¹ + Σ _B ^{−1 −1} Σ _A ⁻¹ 𝜇 _𝐴 + Σ _B ⁻¹ 𝜇 _𝐵 , Σ _A ⁻¹ + Σ _B ^{−1 −1}

Particular case: conjugate prior 𝜇~𝒩 𝜇 ₀ , Σ ₀ of 𝑋~𝒩 𝜇, Σ :

𝜇|𝑋 = Ԧ 𝑥~𝒩 Σ ₀ ⁻¹ + Σ ^{−1 −1} Σ ₀ ⁻¹ 𝜇 ₀ + Σ ⁻¹ 𝑥 , Σ Ԧ ₀ ⁻¹ + Σ ^{−1 −1}

(16)

Kalman Filter:

Prediction step

Hypothesis:

• 𝑋 _𝑡|𝑡 is assumed to be 𝒩 ෠ 𝑋 _𝑡|𝑡 , 𝑃 _𝑡|𝑡

• ℇ _𝑡 ^𝑋 ~𝒩 0, 𝑄 _𝑡 is a white noise

Compute 𝐏 𝑿 _{𝒕+𝟏|𝒕} ≝ 𝑷 𝑿 _𝒕+𝟏 𝑼 _𝟎 … 𝑼 _𝒕 𝒀 _𝟏 … 𝒀 _𝒕 = ׬ 𝑷 𝑿 _𝒕+𝟏 𝑼 _𝒕 , 𝑿 _𝒕 = 𝒙 𝐏 𝑿 _𝒕|𝒕 = 𝒙 𝒅𝒙 𝑋 _𝑡+1 = 𝑀 𝑋 _𝑡

ℇ _𝑡 ^𝑋 + 𝑉 with ቊ 𝑀 = 𝐴 _𝑡 𝐼 𝑉 = 𝐵 _𝑡 𝑈 _𝑡

and 𝑋 _𝑡|𝑡

ℇ _𝑡 ^𝑋 ~𝒩 μ, Σ with μ = 𝑋 ෠ _𝑡|𝑡

0 𝛴 = 𝑃 _𝑡|𝑡 0 0 𝑄 _𝑡 ⇒ 𝑋 _𝑡+1|𝑡 ~𝒩 𝑀𝜇 + 𝑉, M𝛴𝑀 ^𝑇

⇒ 𝑋 _𝑡+1|𝑡 ~𝒩 ෠ 𝑋 _𝑡+1|𝑡 , 𝑃 _𝑡+1|𝑡 with ቐ 𝑋 ෠ _𝑡+1|𝑡 = 𝐴 _𝑡 𝑋 ෠ _𝑡|𝑡 + 𝐵 _𝑡 𝑈 _𝑡 𝑃 _𝑡+1|𝑡 = 𝐴 _𝑡 𝑃 _𝑡|𝑡 𝐴 _𝑡 ^𝑇 + 𝑄 _𝑡

𝑋

_𝑡

𝑈

_𝑡

𝑋

_𝑡+1

(17)

Kalman Filter:

Output prediction (intermediate step)

Hypothesis:

• 𝑋 _{𝑡|𝑡−1} is assumed to be 𝒩 ෠ 𝑋 _{𝑡|𝑡−1} , 𝑃 _{𝑡|𝑡−1}

• ℇ _𝑡 ^𝑌 ~𝒩 0, 𝑅 _𝑡 is a white noise

Given 𝒀 _{𝒕|𝒕−𝟏} ≝ 𝒀 _𝒕 |𝑼 _𝟎 … 𝑼 _𝒕−𝟏 𝒀 _𝟏 … 𝒀 _𝒕−𝟏 ~𝒩 𝑌 ෠ _{𝑡|𝑡−1} , 𝑆 _{𝑡|𝑡−1}

Compute 𝑷 𝑿 _{𝒕|𝒕−𝟏} , 𝒀 _{𝒕|𝒕−𝟏} ≝ 𝑷(𝑿 _𝒕 , 𝒀 _𝒕 |𝑼 _𝟎 … 𝑼 _𝒕−𝟏 𝒀 _𝟏 … 𝒀 _𝒕−𝟏 ) = 𝑷 𝒀 _𝒕 𝑿 _𝒕 , 𝑼 _𝒕−𝟏 𝐏 𝑿 _{𝒕|𝒕−𝟏} 𝑋 _𝑡

𝑌 _𝑡 = 𝑀 𝑋 _𝑡

ℇ _𝑡 ^𝑌 + 𝑉 with 𝑀 = 𝐼 0

𝐶 _𝑡 𝐼 , 𝑉 = 0 𝐷 _𝑡 𝑈 _𝑡

and 𝑋 _{𝑡|𝑡−1}

ℇ _𝑡 ^𝑌 ~𝒩 μ, Σ with μ = 𝑋 ෠ _{𝑡|𝑡−1}

0 𝛴 = 𝑃 _{𝑡|𝑡−1} 0 0 𝑅 _𝑡 ⇒ 𝑋 _{𝑡|𝑡−1}

𝑌 _𝑡 ~𝒩 𝑀𝜇 + 𝑉, M𝛴𝑀 ^𝑇

𝑋 𝑋 ෠ 𝑃 𝑃 𝐶 ^𝑇 𝑌 ෠ = 𝐶 𝑋 ෠ + 𝐷 𝑈

𝑋

_𝑡−1

𝑋

_𝑡

𝑌

_𝑡

𝑈

_𝑡−1

(18)

Kalman Filter:

Update step

Hypothesis:

• 𝑋 _{𝑡|𝑡−1}

𝑌 _{𝑡|𝑡−1} ~𝒩 𝑋 ෠ _{𝑡|𝑡−1}

𝑌 ෠ _{𝑡|𝑡−1} , 𝑃 _{𝑡|𝑡−1} 𝑃 _{𝑡|𝑡−1} 𝐶 _𝑡 ^𝑇 𝐶 _𝑡 𝑃 _{𝑡|𝑡−1} 𝑆 _{𝑡|𝑡−1}

• Observe 𝑌 _𝑡 = 𝑦 _𝑡

Compute 𝐏 𝑿 _𝐭|𝐭 ≝ 𝑷 𝑿 _𝐭 𝑼 _𝟎 … 𝑼 _𝒕−𝟏 𝒀 _𝟏 … 𝒀 _𝒕−𝟏 , 𝒀 _𝒕 = 𝒚 _𝒕 𝑋 ₁

𝑋 ₂ ~𝒩 𝜇 ₁

𝜇 ₂ , Σ ₁₁ Σ ₁₂

Σ ₁₂ ^𝑇 Σ ₂₂ ⟹

𝑃 𝑋 ₁ 𝑋 ₂ = Ԧ 𝑥 ~𝒩 𝜇 ₁ + Σ ₁₂ Σ ₂₂ ⁻¹ 𝑥 − 𝜇 Ԧ ₂ , Σ ₁₁ − Σ ₁₂ Σ ₂₂ ⁻¹ Σ ₁₂ ^𝑇

⇒ 𝑋 _𝑡|𝑡 ~𝒩 ෠ 𝑋 _𝑡|𝑡 , 𝑃 _𝑡|𝑡 with ቐ 𝑋 ෠ _𝑡|𝑡 = ෠ 𝑋 _{𝑡|𝑡−1} + 𝑃 _{𝑡|𝑡−1} 𝐶 _𝑡 ^𝑇 𝑆 _{𝑡|𝑡−1} ⁻¹ (𝑦 _𝑡 − ෠ 𝑌 _{𝑡|𝑡−1} ) 𝑃 _𝑡|𝑡 = 𝑃 _{𝑡|𝑡−1} − 𝑃 _{𝑡|𝑡−1} 𝐶 _𝑡 ^𝑇 𝑆 _{𝑡|𝑡−1} ⁻¹ 𝐶 _𝑡 𝑃 _{𝑡|𝑡−1}

𝑋

_𝑡−1

𝑋

_𝑡

𝑌

_𝑡

(19)

Kalman Filter:

Summary of equations & implementation

Initialisation step: 𝑋 ෠ _0|0 ← ෠ 𝑋 ₀ , 𝑃 _0|0 ← 𝑃 ₀

Prediction step: run when system time is increased (𝑡 ← 𝑡 + 1)

1. Predicted state: 𝑋 ෠ _𝑡+1|𝑡 ← 𝐴 _𝑡 𝑋 ෠ _𝑡|𝑡 + 𝐵 _𝑡 𝑈 _𝑡 (model integration) 2. Prediction covariance: 𝑃 _𝑡+1|𝑡 ← 𝐴 _𝑡 𝑃 _𝑡|𝑡 𝐴 _𝑡 ^𝑇 + 𝑄 _𝑡 (uncertainty increase) Estimation step: run when observations are received

1. Output prediction: 𝑌 ෠ _{𝑡|𝑡−1} ← 𝐶 _𝑡 𝑋 ෠ _{𝑡|𝑡−1} + 𝐷 _𝑡 𝑈 _𝑡 (mean of output posterior) 2. Output variance: 𝑆 _{𝑡|𝑡−1} ← 𝐶 _𝑡 𝑃 _{𝑡|𝑡−1} 𝐶 _𝑡 ^𝑇 + 𝑅 _𝑡 (add observation noise) 3. Innovation: 𝑌 ෨ _𝑡 ← 𝑦 _𝑡 − ෠ 𝑌 _{𝑡|𝑡−1} (error on output prediction) 4. Kalman filter gain 𝐾 _𝑡 ← 𝑃 _{𝑡|𝑡−1} 𝐶 _𝑡 ^𝑇 𝑆 _{𝑡|𝑡−1} ⁻¹ (compromise of uncertainty) 5. Posterior state: 𝑋 ෠ _𝑡|𝑡 ← ෠ 𝑋 _{𝑡|𝑡−1} + 𝐾 _𝑡 𝑌 ෨ _𝑡 (state correction)

6. Posterior covariance: 𝑃 ← 𝐼 − 𝐾 𝐶 𝑃 (uncertainty reduction)

(20)

Kalman filter tuning

Initial state distribution:

X ₀ = 𝑣 ₀

𝑙 ₀ ~𝒩 0

0 , 𝜎 _𝑣 ² ₀ 0

0 0 with 𝜎 _𝑣 ₀ = 1 𝑚/𝑠 State integration noise:

Force uncertainty: 𝜀 _𝑡 ^𝑣 ~𝒩 0, 𝜎 _𝑣 ² with 𝜎 _𝑣 = 𝑚 ⋅ 1 ⋅ 9.8 𝑚/𝑠 ² Slip uncertainty: 𝜀 _𝑡 ^𝑙 ~𝒩 0, 𝜎 _𝑙 ² with 𝜎 _𝑙 = 1 𝑚/𝑠

𝜀 _𝑡 ^𝑣 ⊥ 𝜀 _𝑡 ^𝑙 ⇒ ℇ _𝑡 ^𝑋 ~𝒩 0

0 , 𝑄 avec Q _t = 𝜏 ²

𝜎 _𝑣 ²

𝑚 ² 0 0 𝜎 _𝑙 ² Measurement noise:

𝜀 _𝑡 ^𝑦 ~𝒩 0, 𝜎 _𝑦 ² with 𝜎 _𝑦 = ^𝑞 ²

12 , 𝑞 = 50 𝑐𝑚

⇒ R _t = 𝜎 _𝑦 ²

(21)

Kalman Filter:

Without measures

0 10 20 30 40 50 60 70 80 90 100

-50 0 50 100 150

t (s)

speed (km/h)

Kalman

margin real estimated

0 10 20 30 40 50 60 70 80 90 100

-2000 -1000 0 1000 2000

t (s)

distance (km)

(22)

Kalman Filter:

With measures

0 10 20 30 40 50 60 70 80 90 100

-50 0 50 100

t (s)

speed (km/h)

Kalman

0 10 20 30 40 50 60 70 80 90 100

-0.5 0 0.5 1 1.5

t (s)

distance (km)

(23)

Kalman Filter:

Slowing down measurement time rate

0 2 4 6 8 10 12 14 16 18

0 20 40 60 80

t (s)

speed (km/h)

Kalman

2 4 6 8 10 12 14 16 18

-2 0 2 4 6

distance (km)

(24)

Kalman Filter example:

Video surveillance of pedestrians

Problem:

Multiple cameras observe image position of pedestrians:

• State is 𝑿 = 𝑥, 𝑦, 𝑣 _𝑥 , 𝑣 _𝑦

• Observation is image position 𝒚 _𝒊

• Linear integration and observation:

𝒚 _𝒊 = 𝑪 _𝒊 𝑿 + 𝒅 _𝒊 Advantages:

• Data fusion: multiple cameras, other sensors, etc

• Robust to camera occlusion

See code at “Demo/Bayes/8-Continuous Markov Models/Kalman”

𝑿

Screen 𝑖 focus

pedestrian

𝒚

_𝒊

(25)

Kalman Filter example:

Video surveillance of pedestrians

(26)

Kalman Filter Extensions for non linear models:

Extended Kalman Filter (EKF)

Linearize non linear state space representation (𝑓 _𝑡 , 𝑔 _𝑡 ) around estimated state 𝑋 ෠ _𝑡 to update covariance matrix 𝑃 _𝑡 :

𝑋 ෠ _𝑡+1|𝑡 = 𝑓 _𝑡 𝑋 ෠ _𝑡|𝑡 , 𝑈 _𝑡 , ෠ 𝑌 _𝑡|𝑡 = 𝑔 _𝑡 𝑋 ෠ _𝑡|𝑡 , 𝑈 _𝑡 , 𝐴 _𝑡 = 𝜕𝑓

𝜕𝑋 𝑋 ෠ _𝑡|𝑡 , 𝐶 _𝑡 = 𝜕𝑔

𝜕𝑋 𝑋 ෠ _𝑡|𝑡

𝜃_𝑡 𝒞_𝑡= 1

𝑅_𝑡

𝑑𝑋

_𝑡

𝑑𝑡 = 𝑑

𝑑𝑡 𝑥

_𝑡

𝑦

_𝑡

𝜃

_𝑡

𝑣

_𝑡

=

𝑣

_𝑡

× cos 𝜃

_𝑡

+ 𝜀

_𝑡^𝑥

𝑣

_𝑡

× sin 𝜃

_𝑡

+ 𝜀

_𝑡^𝑦

𝑣

_𝑡

× 𝒞

_𝑡

+ 𝜀

_𝑡^𝜃

1 𝑀 𝛼

_𝑡

− 𝑓

𝑀 𝑣

_𝑡

+ 𝜀

_𝑡^𝑣

⇒

𝑋 ෠

_𝑡+1|𝑡

= 𝑋 ෠

_𝑡|𝑡

+

𝑣

_𝑡

× cos 𝜃

_𝑡

𝑣

_𝑡

× sin 𝜃

_𝑡

𝑣

_𝑡

× 𝒞

_𝑡

1 𝑀 𝛼

_𝑡

− 𝑓 𝑀 𝑣

_𝑡

× 𝜏

𝐴

_𝑡

= 𝜕𝑓

𝜕𝑋 𝑋 ෠

_𝑡|𝑡

=

1 0 −𝑣

_𝑡

× sin 𝜃

_𝑡

× 𝜏 cos 𝜃

_𝑡

× 𝜏 0 1 𝑣

_𝑡

× cos 𝜃

_𝑡

× 𝜏 sin 𝜃

_𝑡

× 𝜏

0 0 1 𝒞

_𝑡

× 𝜏

0 0 0 1 − 𝑓

𝑀 𝑣

_𝑡

𝜏

(27)

Kalman Filter Extensions for non linear models:

Unscented Kalman Filter (UKF)

Take into account curvature of non linear functions(𝑓 _𝑡 , 𝑔 _𝑡 ) by observing displacement of few samples called sigma points 𝜒 Ƹ _𝑖

Example for state integration:

1 . Define 2𝑑 + 1 sigma points 𝜒 Ƹ

_𝑖

centered on 𝑋 ෠

_𝑡|𝑡

and scattered on lines passing through 𝑋 ෠

_𝑡|𝑡

and aligned with eigenvectors of 𝑃:

Ƹ

𝜒

_{𝑖 𝑖∈ 0…2𝑑}

= ෠ 𝑋

_𝑡|𝑡

∪ ෠ 𝑋

_𝑡|𝑡

± k 𝜎

_𝑖

𝑢 ො

_{𝑖 𝑖∈ 1…𝑑}

k = 𝑑 + 𝜆 2. Compute their images: 𝑓( Ƹ 𝜒

_𝑖

)

_{𝑖∈ 0…2𝑑}

3. Compute new mean state and covariance as weighted barycenters (weights depend on 𝑑 & 𝜆):

𝑋 ෠

_𝑡+1|𝑡

= ෍

𝑖=0 2𝑑+1

𝑊

_𝑖^𝑠

𝑓( Ƹ 𝜒

_𝑖

) 𝑃

_𝑡+1|𝑡

= ෍

𝑖=0 2𝑑+1

𝑊

_𝑖^𝑃

𝑓 Ƹ 𝜒

_𝑖

− ෠ 𝑋

_𝑡+1|𝑡

× 𝑓 Ƹ 𝜒

_𝑖

− ෠ 𝑋

_𝑡+1|𝑡 ^T

𝑋 ෠

_𝑡|𝑡

𝑓

_𝑡

𝜒 Ƹ

₀

𝑓

_𝑡

𝜒 Ƹ

₂

𝑓

_𝑡

𝜒 Ƹ

₃

𝑓

_𝑡

𝜒 Ƹ

₄

𝑃 _𝑡|𝑡

Ƹ

𝜒

₁

Ƹ

𝜒

₃

Ƹ

𝜒

₀

=

𝑋 ෠ _𝑡+1|𝑡

(28)

EKF & UKF example:

Cachalot tracking, again & again

Problem: localize cachalot:

• State is 𝑿 = 𝑥, 𝑦, 𝑧, 𝑣 _𝑥 , 𝑣 _𝑦 , 𝑣 _𝑧

• Observations are:

– Depth gauge: 𝑧

_𝑔

= 𝑧 + 𝜀

_𝑔

– Inclinometer: tan 𝜃

_𝑖

=

^𝑣^𝑧

𝑣_𝑥²+𝑣_𝑦²

+ 𝜀

_𝑖

– Compass: tan 𝜙

_𝑐

=

^𝑣^𝑥

𝑣_𝑦

+ 𝜀

_𝑐

– Pitot tube: 𝑣

_𝑝

= 𝑣

_𝑥²

+ 𝑣

_𝑦²

+ 𝑣

_𝑧²

+ 𝜀

_𝑝

– GPS: 𝑿

_𝑔𝑝𝑠

= (𝑥, 𝑦, 𝑧) + 𝜺

_𝐺𝑃𝑆

only when 𝑧 > −1

• Non linear observation Advantages:

• Data fusion: multiple heterogeneous sensors

• Robust to sensor failure

See code at “Demo/Bayes/8-Continuous Markov Models/EKF and UKF”

𝑽

𝜽

𝒛

(29)

Kalman Filter Extensions:

to go further

Algorithms:

– State smoothing

– State space representation learning Continuous time (Kalman-Bucy filter):

Make 𝜏 → 0: ^{𝑑 ෠} ^𝑋

^|𝑡𝑜

𝑑𝑡 𝑡 = 𝐴 _𝑡 𝑋 ෠ _|𝑡

₀

𝑡 + 𝐵 _𝑡 𝑈 𝑡 , … Hybrid filters:

Combine continuous time integration with discrete time updates Switching Kalman Filters:

Combine HMM with Kalman filters

𝑃 𝑆 ₀ … , 𝑆 _𝑇 , 𝑋 ₀ … , 𝑋 _𝑇 , 𝑌 ₁ … 𝑌 _𝑇 = 𝑃 𝑆 ₀ 𝑃(𝑋 ₀ |𝑆 ₀ ) ෑ

𝑡=1 𝑇

𝑃 𝑆 _𝑡 𝑆 _𝑡−1 𝑃 𝑋 _𝑡 𝑋 _𝑡−1 , 𝑆 _𝑡 ) 𝑃 𝑌 _𝑡 𝑆 _𝑡 , 𝑋 _𝑡

Extensions to non linear models:

– Extended Kalman Filter

– Unscented Kalman Filter

(30)

LEARNING DYNAMIC SYSTEMS :