LEARNING DYNAMIC SYSTEMS : MARKOV MODELS
Markov Process and Markov Chains Hidden Markov Models
Bayesian Filtering and Kalman Filter
Bayesian, Machine Learning, Frederic Pennerath
Example of application:
Navigation Systems and Data Fusion
GPS:
☺ No drift
Medium accuracy
Not responsive Altimeter:
☺ No drift
Altitude only
Not accurate
Inertial System:
☺ Responsive
Long term drift Navigation System:
Position (L, l, z)
Angles (yaw, pitch, roll) Speed
Angular speed
Tracking example using camshift & Kalman filtering
The whole picture:
Measure Y m
Estimated state 𝑋 Control U
Sensor
Controller
State estimator U Real state 𝑋
Input U Real system Output Y
Environment Calculator DAC
State 𝑋 is a hidden variable
Bayesian, Machine Learning, Frederic Pennerath
Controllable Markov models
Update Markov models with a command 𝑼 𝒕 :
𝑃 𝑋 𝑡 𝑋 𝑡−1 , 𝑈 1 , … 𝑈 𝑡−1 , 𝑌 1 , … 𝑌 𝑡−1 = 𝑃 𝑋 𝑡 𝑋 𝑡−1 , 𝑈 𝑡−1 and
𝑃 𝑌 𝑡 𝑋 𝑡 , 𝑈 1 , … 𝑈 𝑡−1 , 𝑌 1 , … 𝑌 𝑡−1 = 𝑃 𝑌 𝑡 𝑋 𝑡
𝑋
0…
𝑈
0𝑋
1𝑌
1𝑈
1𝑋
2𝑌
2𝑈
𝑡−1𝑋
𝑡𝑌
𝑡𝑃 𝑋 2 𝑋 1 , 𝑈 0 , 𝑈 1 , 𝑌 1 = 𝑃 𝑋 2 𝑋 1 , 𝑈 1
𝑃 𝑌 2 𝑋 2 , 𝑈 0 , 𝑈 1 , 𝑌 1 = 𝑃 𝑌 2 𝑋 2
Bayesian Filtering:
Prediction and update steps
… 𝑋
0|0𝑈
0𝑋
1|0𝑋
1|1𝑈
1𝑋
2|1𝑋
2|2𝑈
𝑡−1𝑋
𝑡|𝑡−1𝑋
𝑡|𝑡Prediction step:
• Predicts future state given commands and observations:
X t|t−1 ≝ X 𝑡 |𝑈 0 … 𝑈 𝑡−1 𝑌 0 … 𝑌 𝑡−1
• Model integration
• Increase of uncertainty (usually)
Update step:
• Adjust current state given new observation:
X t|t ≝ X t |𝑈 0 … 𝑈 𝑡−1 𝑌 0 … 𝑌 𝑡−1 𝑌 𝑡
• Model correction
• Decrease of uncertainty
Bayesian, Machine Learning, Frederic Pennerath
Bayesian Filtering:
State Space Representation
g t (X t , U 𝑡 ) State 𝑋 𝑡
Input U 𝑡 Output Y t
f 𝑡 (X t , U t )
State 𝑋 𝑡+𝜏 Delay 𝜏
• Continuous system (𝑓 𝑡 , 𝑔 𝑡 ):
൞ 𝑑𝑋 𝑡
𝑑𝑡 = 𝑓 𝑡 𝑋 𝑡 , 𝑈 𝑡 𝑌 𝑡 = 𝑔 𝑡 𝑋 𝑡 , 𝑈 𝑡
• Discrete system (𝑓 𝑡 , 𝑔 𝑡 ):
ቊ 𝑋 𝑡+𝜏 = 𝑓 𝑡 𝑋 𝑡 , 𝑈 𝑡 𝑌 𝑡 = 𝑔 𝑡 𝑋 𝑡 , 𝑈 𝑡
• Stationary system:
𝑓 𝑡 = 𝑓, 𝑔 𝑡 = 𝑔
• Non deterministic system:
൝ 𝑋 𝑡+𝜏 = 𝑓 𝑡 𝑋 𝑡 , 𝑈 𝑡 + ℰ 𝑡 𝑋 𝑌 𝑡 = 𝑔 𝑡 𝑋 𝑡 , 𝑈 𝑡 + ℰ 𝑡 𝑌
• Linear system (A t , B t , C t , D t ):
ቊ 𝑋 𝑡+𝜏 = 𝐴 𝑡 𝑋 𝑡 + 𝐵 𝑡 𝑈 𝑡 𝑌 𝑡 = 𝐶 𝑡 𝑋 𝑡 + 𝐷 𝑡 𝑈 𝑡
State space representation ensures
Markov property.
A simple example:
localization of a wheeled vehicle
𝑋 𝑡 = 𝑥 𝑡 𝑦 𝑡 𝜃 𝑡 𝑣 𝑡
, 𝑈 𝑡 = 𝒞 𝑡
𝛼 𝑡 , 𝑌 𝑡 𝑜𝑑𝑜 = 𝑣 𝑡 𝑜𝑑𝑜 , 𝑌 𝑡 𝑔𝑝𝑠 = 𝑥 𝑡 𝑔𝑝𝑠 𝑦 𝑡 𝑔𝑝𝑠 𝜃 𝑡
𝒞 𝑡 = 1
𝑅 𝑡
Bayesian, Machine Learning, Frederic Pennerath
A simple non-linear example:
localization of a wheeled vehicle
𝑋 𝑡 = 𝑥 𝑡 𝑦 𝑡 𝜃 𝑡 𝑣 𝑡
, 𝑈 𝑡 = 𝒞 𝑡 𝛼 𝑡
𝜃
𝑡𝒞
𝑡= 1
𝑅
𝑡Continuous model: Discretized model:
𝑑𝑋 𝑡
𝑑𝑡 = 𝑑 𝑑𝑡
𝑥 𝑡 𝑦 𝑡 𝜃 𝑡 𝑣 𝑡
=
𝑣 𝑡 cos 𝜃 𝑡 𝑣 𝑡 s𝑖𝑛 𝜃 𝑡
𝑣 𝑡 𝒞 𝑡 1
𝑀 𝛼 𝑡 − 𝑓 𝑀 𝑣 𝑡
⇒ 𝑋 𝑡+𝜏 = 𝑋 𝑡 +
𝑣 𝑡 sin 𝜃 𝑡 𝑣 𝑡 cos 𝜃 𝑡
𝑣 𝑡 𝒞 𝑡 1
𝑀 𝛼 𝑡 − 𝑓 𝑀 𝑣 𝑡
⋅ 𝜏
A simple example:
the problem of open loop control
𝑋 𝑡 = 𝑥 𝑡 𝑦 𝑡 𝜃 𝑡 𝑣 𝑡
, 𝑈 𝑡 = 𝒞 𝑡 𝛼 𝑡
𝜃
𝑡𝒞
𝑡= 1
𝑅
𝑡𝑑𝑋 𝑡
𝑑𝑡 = 𝑑 𝑑𝑡
𝑥 𝑡 𝑦 𝑡 𝜃 𝑡 𝑣 𝑡
=
𝑣 𝑡 × cos 𝜃 𝑡 + 𝜀 𝑡 𝑥 𝑣 𝑡 × sin 𝜃 𝑡 + 𝜀 𝑡 𝑦
𝑣 𝑡 × 𝒞 𝑡 + 𝜀 𝑡 𝜃 1
𝑀 𝛼 𝑡 − 𝑓
𝑀 𝑣 𝑡 + 𝜀 𝑡 𝑣
Wind, slope, collision, load, etc Slippy road, skid
Non linearities Uncertainties
Bayesian, Machine Learning, Frederic Pennerath
0 10 20 30 40 50 60 70 80 90 100
0 50 100
t (s)
speed (km/h)
Pure integration
real estimated
0 10 20 30 40 50 60 70 80 90 100
0 1 2 3
t (s)
distance (km)
A simpler linear example:
Problem: measuring speed 𝑣(𝑡) and travelled distance l 𝑡 of a vehicle.
Silly solution: pure model integration
𝑑𝑙
𝑑𝑡 𝑡 = 𝑣 𝑡 + 𝜀 𝑡 𝑙 𝑚 𝑑𝑣
𝑑𝑡 𝑡 = −𝑓 0 sign 𝑣 𝑡 − 𝑘 𝑓 𝑣 𝑡 + 𝑎 𝑡 + 𝜀 𝑡 𝑣
Better solution: using sensor
Using odometer counting wheel turn pulses
𝑦 𝑡 = 𝑙 𝑡 + 𝜀 𝑡 𝑦 How to improve it?
• What happens if car slips?
• What happens if odometer breaks?
• How to integrate GPS?
0 10 20 30 40 50 60 70 80 90 100
0 50 100
t (s)
speed (km/h)
Using odometer
real estimated
0 10 20 30 40 50 60 70 80 90 100
0 0.5 1 1.5
t (s)
distance (km)
Simulations with varying
slope and wind
Linear State Space Representation
State integration:
𝑋 𝑡 = 𝑣 𝑡
𝑙 𝑡 , 𝑈 𝑡 = 1
𝑎 𝑡 , ℇ 𝑡 𝑋 = 𝜀 𝑡 𝑣 𝜀 𝑡 𝑙
𝑋 𝑡+1 = 𝑋 𝑡 + 1
𝑚 −𝑓 0 − 𝑘 𝑓 𝑣 𝑡 + 𝑎 𝑡 + 𝜀 𝑡 𝑣 𝑣 𝑡 + 𝜀 𝑡 𝑙
𝜏 ⇔ 𝑋 𝑡+1 = 𝐴 𝑋 𝑡 + 𝐵 𝑈 𝑡 + ℇ 𝑡 𝑋
with 𝐴 = 1 −
𝑘
𝑓𝑚 𝜏 0
𝜏 1
, 𝐵 = 𝜏
𝑚
−𝑓 0 1 0 0 , Output equation:
𝑌 𝑡 = 𝐶𝑋 𝑡 + 𝐷𝑈 𝑡 + ℇ 𝑡 𝑌
with C = 0 1 , D = 0 0 , ℇ 𝑡 𝑌 = 𝜀 𝑡 𝑜𝑑𝑜
Bayesian, Machine Learning, Frederic Pennerath
Bayesian Filtering:
Kalman filter
Hypothesis of Kalman filters:
• The state representation is linear.
• Every “input” of the model (X 0 , ℇ 𝑡 𝑋 , ℇ 𝑡 𝑌 , 𝑈 𝑡 ) is normally distributed Consequences:
• Every state X t |𝑋 0 , 𝒰 𝑡 , 𝒴 𝑡 , and output Y t |𝑋 0 , 𝒪 𝑡 is normally distributed
• State estimation problem is defined by:
– Model parameters: 𝐴
𝑡, 𝐵
𝑡, 𝐶
𝑡, 𝐷
𝑡– Initial state distribution: 𝑋
0= 𝐸 𝑋
0𝑃
0= 𝑐𝑜𝑣 𝑋
0– State noise distribution: 𝑅
𝑡= 𝑐𝑜𝑣 ℇ
𝑡𝑋𝐸 ℇ
𝑡𝑋= 0 – Output noise distribution: 𝑄
𝑡= 𝑐𝑜𝑣 ℇ
𝑡𝑌𝐸 ℇ
𝑡𝑌= 0
– Output samples: 𝑦
𝑡– Input samples: 𝑢
𝑡-1 0 1 2 3 4 5 6
Multivariate normal distribution:
definition
Generalization to ℝ 𝐦 :
𝑋 𝑚 ~𝒩 𝜇 𝑚 , Σ 𝑚𝑚 ⟺ 𝑓 𝑋 𝑚 𝑋 = 1
2𝜋 𝑚 Σ 𝑚𝑚 𝑒 − 1 2 𝑋−𝜇 𝑚 𝑇 Σ 𝑚𝑚 −1 𝑋−𝜇 𝑚 Basic properties:
• 𝐸 𝑋 𝑚 = 𝜇 𝑚
• 𝑐𝑜𝑣 𝑋 𝑚 = 𝐸 𝑋 − 𝜇 𝑚 𝑇 𝑋 − 𝜇 𝑚 = Σ 𝑚𝑚 Example:
– 𝜇 2 = 2
1 , Σ 22 = 2 −1
−1 1
-1 0 1 2 3 4 5
-4 -2 0 2 4 6
0 0.05 0.1 0.15 0.2
x y
Bayesian, Machine Learning, Frederic Pennerath
Multivariate normal distribution:
fundamental properties
Closed under linear transformation:
𝑋~𝒩 μ, Σ , μ ∈ ℝ m , Σ ∈ 𝑀 𝑚𝑚 , A ∈ M nm , B ∈ M n1 ⟹
𝐴𝑋 + 𝐵~𝒩 𝐴𝜇 + 𝐵, 𝐴Σ𝐴 𝑇 Particular cases:
Given 𝑋 1
𝑋 2 ~𝒩 𝜇 1
𝜇 2 , Σ 11 Σ 12 Σ 12 𝑇 Σ 22
• Addition: 𝑋 1 + 𝑋 2 ~𝒩 𝜇 1 + 𝜇 2 , Σ 11 + Σ 22 + Σ 12 + Σ 12 𝑇
• Marginalization: 𝑋 1 ~𝒩 𝜇 1 , Σ 11
Multivariate normal distribution:
fundamental properties
Closed under conditioning:
𝑋 1
𝑋 2 ~𝒩 𝜇 1
𝜇 2 , Σ 11 Σ 12
Σ 12 𝑇 Σ 22 ⟹
𝑃(𝑋 1 |𝑋 2 = Ԧ 𝑥)~𝒩 𝜇 1 + Σ 12 Σ 22 −1 𝑥 − 𝜇 Ԧ 2 , Σ 11 − Σ 12 Σ 22 −1 Σ 12 𝑇 Closed under (pdf) multiplication:
𝒩 𝜇 𝐴 , Σ 𝐴 × 𝒩 𝜇 𝐵 , Σ 𝐵 ≡ 𝒩 Σ A −1 + Σ B −1 −1 Σ A −1 𝜇 𝐴 + Σ B −1 𝜇 𝐵 , Σ A −1 + Σ B −1 −1
Particular case: conjugate prior 𝜇~𝒩 𝜇 0 , Σ 0 of 𝑋~𝒩 𝜇, Σ :
𝜇|𝑋 = Ԧ 𝑥~𝒩 Σ 0 −1 + Σ −1 −1 Σ 0 −1 𝜇 0 + Σ −1 𝑥 , Σ Ԧ 0 −1 + Σ −1 −1
Bayesian, Machine Learning, Frederic Pennerath
Kalman Filter:
Prediction step
Hypothesis:
• 𝑋 𝑡|𝑡 is assumed to be 𝒩 𝑋 𝑡|𝑡 , 𝑃 𝑡|𝑡
• ℇ 𝑡 𝑋 ~𝒩 0, 𝑄 𝑡 is a white noise
Compute 𝐏 𝑿 𝒕+𝟏|𝒕 ≝ 𝑷 𝑿 𝒕+𝟏 𝑼 𝟎 … 𝑼 𝒕 𝒀 𝟏 … 𝒀 𝒕 = 𝑷 𝑿 𝒕+𝟏 𝑼 𝒕 , 𝑿 𝒕 = 𝒙 𝐏 𝑿 𝒕|𝒕 = 𝒙 𝒅𝒙 𝑋 𝑡+1 = 𝑀 𝑋 𝑡
ℇ 𝑡 𝑋 + 𝑉 with ቊ 𝑀 = 𝐴 𝑡 𝐼 𝑉 = 𝐵 𝑡 𝑈 𝑡
and 𝑋 𝑡|𝑡
ℇ 𝑡 𝑋 ~𝒩 μ, Σ with μ = 𝑋 𝑡|𝑡
0 𝛴 = 𝑃 𝑡|𝑡 0 0 𝑄 𝑡 ⇒ 𝑋 𝑡+1|𝑡 ~𝒩 𝑀𝜇 + 𝑉, M𝛴𝑀 𝑇
⇒ 𝑋 𝑡+1|𝑡 ~𝒩 𝑋 𝑡+1|𝑡 , 𝑃 𝑡+1|𝑡 with ቐ 𝑋 𝑡+1|𝑡 = 𝐴 𝑡 𝑋 𝑡|𝑡 + 𝐵 𝑡 𝑈 𝑡 𝑃 𝑡+1|𝑡 = 𝐴 𝑡 𝑃 𝑡|𝑡 𝐴 𝑡 𝑇 + 𝑄 𝑡
𝑋
𝑡𝑈
𝑡𝑋
𝑡+1Kalman Filter:
Output prediction (intermediate step)
Hypothesis:
• 𝑋 𝑡|𝑡−1 is assumed to be 𝒩 𝑋 𝑡|𝑡−1 , 𝑃 𝑡|𝑡−1
• ℇ 𝑡 𝑌 ~𝒩 0, 𝑅 𝑡 is a white noise
Given 𝒀 𝒕|𝒕−𝟏 ≝ 𝒀 𝒕 |𝑼 𝟎 … 𝑼 𝒕−𝟏 𝒀 𝟏 … 𝒀 𝒕−𝟏 ~𝒩 𝑌 𝑡|𝑡−1 , 𝑆 𝑡|𝑡−1
Compute 𝑷 𝑿 𝒕|𝒕−𝟏 , 𝒀 𝒕|𝒕−𝟏 ≝ 𝑷(𝑿 𝒕 , 𝒀 𝒕 |𝑼 𝟎 … 𝑼 𝒕−𝟏 𝒀 𝟏 … 𝒀 𝒕−𝟏 ) = 𝑷 𝒀 𝒕 𝑿 𝒕 , 𝑼 𝒕−𝟏 𝐏 𝑿 𝒕|𝒕−𝟏 𝑋 𝑡
𝑌 𝑡 = 𝑀 𝑋 𝑡
ℇ 𝑡 𝑌 + 𝑉 with 𝑀 = 𝐼 0
𝐶 𝑡 𝐼 , 𝑉 = 0 𝐷 𝑡 𝑈 𝑡
and 𝑋 𝑡|𝑡−1
ℇ 𝑡 𝑌 ~𝒩 μ, Σ with μ = 𝑋 𝑡|𝑡−1
0 𝛴 = 𝑃 𝑡|𝑡−1 0 0 𝑅 𝑡 ⇒ 𝑋 𝑡|𝑡−1
𝑌 𝑡 ~𝒩 𝑀𝜇 + 𝑉, M𝛴𝑀 𝑇
𝑋 𝑋 𝑃 𝑃 𝐶 𝑇 𝑌 = 𝐶 𝑋 + 𝐷 𝑈
𝑋
𝑡−1𝑋
𝑡𝑌
𝑡𝑈
𝑡−1Bayesian, Machine Learning, Frederic Pennerath
Kalman Filter:
Update step
Hypothesis:
• 𝑋 𝑡|𝑡−1
𝑌 𝑡|𝑡−1 ~𝒩 𝑋 𝑡|𝑡−1
𝑌 𝑡|𝑡−1 , 𝑃 𝑡|𝑡−1 𝑃 𝑡|𝑡−1 𝐶 𝑡 𝑇 𝐶 𝑡 𝑃 𝑡|𝑡−1 𝑆 𝑡|𝑡−1
• Observe 𝑌 𝑡 = 𝑦 𝑡
Compute 𝐏 𝑿 𝐭|𝐭 ≝ 𝑷 𝑿 𝐭 𝑼 𝟎 … 𝑼 𝒕−𝟏 𝒀 𝟏 … 𝒀 𝒕−𝟏 , 𝒀 𝒕 = 𝒚 𝒕 𝑋 1
𝑋 2 ~𝒩 𝜇 1
𝜇 2 , Σ 11 Σ 12
Σ 12 𝑇 Σ 22 ⟹
𝑃 𝑋 1 𝑋 2 = Ԧ 𝑥 ~𝒩 𝜇 1 + Σ 12 Σ 22 −1 𝑥 − 𝜇 Ԧ 2 , Σ 11 − Σ 12 Σ 22 −1 Σ 12 𝑇
⇒ 𝑋 𝑡|𝑡 ~𝒩 𝑋 𝑡|𝑡 , 𝑃 𝑡|𝑡 with ቐ 𝑋 𝑡|𝑡 = 𝑋 𝑡|𝑡−1 + 𝑃 𝑡|𝑡−1 𝐶 𝑡 𝑇 𝑆 𝑡|𝑡−1 −1 (𝑦 𝑡 − 𝑌 𝑡|𝑡−1 ) 𝑃 𝑡|𝑡 = 𝑃 𝑡|𝑡−1 − 𝑃 𝑡|𝑡−1 𝐶 𝑡 𝑇 𝑆 𝑡|𝑡−1 −1 𝐶 𝑡 𝑃 𝑡|𝑡−1
𝑋
𝑡−1𝑋
𝑡𝑌
𝑡Kalman Filter:
Summary of equations & implementation
Initialisation step: 𝑋 0|0 ← 𝑋 0 , 𝑃 0|0 ← 𝑃 0
Prediction step: run when system time is increased (𝑡 ← 𝑡 + 1)
1. Predicted state: 𝑋 𝑡+1|𝑡 ← 𝐴 𝑡 𝑋 𝑡|𝑡 + 𝐵 𝑡 𝑈 𝑡 (model integration) 2. Prediction covariance: 𝑃 𝑡+1|𝑡 ← 𝐴 𝑡 𝑃 𝑡|𝑡 𝐴 𝑡 𝑇 + 𝑄 𝑡 (uncertainty increase) Estimation step: run when observations are received
1. Output prediction: 𝑌 𝑡|𝑡−1 ← 𝐶 𝑡 𝑋 𝑡|𝑡−1 + 𝐷 𝑡 𝑈 𝑡 (mean of output posterior) 2. Output variance: 𝑆 𝑡|𝑡−1 ← 𝐶 𝑡 𝑃 𝑡|𝑡−1 𝐶 𝑡 𝑇 + 𝑅 𝑡 (add observation noise) 3. Innovation: 𝑌 ෨ 𝑡 ← 𝑦 𝑡 − 𝑌 𝑡|𝑡−1 (error on output prediction) 4. Kalman filter gain 𝐾 𝑡 ← 𝑃 𝑡|𝑡−1 𝐶 𝑡 𝑇 𝑆 𝑡|𝑡−1 −1 (compromise of uncertainty) 5. Posterior state: 𝑋 𝑡|𝑡 ← 𝑋 𝑡|𝑡−1 + 𝐾 𝑡 𝑌 ෨ 𝑡 (state correction)
6. Posterior covariance: 𝑃 ← 𝐼 − 𝐾 𝐶 𝑃 (uncertainty reduction)
Bayesian, Machine Learning, Frederic Pennerath
Kalman filter tuning
Initial state distribution:
X 0 = 𝑣 0
𝑙 0 ~𝒩 0
0 , 𝜎 𝑣 2 0 0
0 0 with 𝜎 𝑣 0 = 1 𝑚/𝑠 State integration noise:
Force uncertainty: 𝜀 𝑡 𝑣 ~𝒩 0, 𝜎 𝑣 2 with 𝜎 𝑣 = 𝑚 ⋅ 1 ⋅ 9.8 𝑚/𝑠 2 Slip uncertainty: 𝜀 𝑡 𝑙 ~𝒩 0, 𝜎 𝑙 2 with 𝜎 𝑙 = 1 𝑚/𝑠
𝜀 𝑡 𝑣 ⊥ 𝜀 𝑡 𝑙 ⇒ ℇ 𝑡 𝑋 ~𝒩 0
0 , 𝑄 avec Q t = 𝜏 2
𝜎 𝑣 2
𝑚 2 0 0 𝜎 𝑙 2 Measurement noise:
𝜀 𝑡 𝑦 ~𝒩 0, 𝜎 𝑦 2 with 𝜎 𝑦 = 𝑞 2
12 , 𝑞 = 50 𝑐𝑚
⇒ R t = 𝜎 𝑦 2
Kalman Filter:
Without measures
0 10 20 30 40 50 60 70 80 90 100
-50 0 50 100 150
t (s)
speed (km/h)
Kalman
margin real estimated
0 10 20 30 40 50 60 70 80 90 100
-2000 -1000 0 1000 2000
t (s)
distance (km)
Bayesian, Machine Learning, Frederic Pennerath
Kalman Filter:
With measures
0 10 20 30 40 50 60 70 80 90 100
-50 0 50 100
t (s)
speed (km/h)
Kalman
margin real estimated
0 10 20 30 40 50 60 70 80 90 100
-0.5 0 0.5 1 1.5
t (s)
distance (km)
Kalman Filter:
Slowing down measurement time rate
0 2 4 6 8 10 12 14 16 18
0 20 40 60 80
t (s)
speed (km/h)
Kalman
margin real estimated
2 4 6 8 10 12 14 16 18
-2 0 2 4 6
distance (km)
Bayesian, Machine Learning, Frederic Pennerath
Kalman Filter example:
Video surveillance of pedestrians
Problem:
Multiple cameras observe image position of pedestrians:
• State is 𝑿 = 𝑥, 𝑦, 𝑣 𝑥 , 𝑣 𝑦
• Observation is image position 𝒚 𝒊
• Linear integration and observation:
𝒚 𝒊 = 𝑪 𝒊 𝑿 + 𝒅 𝒊 Advantages:
• Data fusion: multiple cameras, other sensors, etc
• Robust to camera occlusion
See code at “Demo/Bayes/8-Continuous Markov Models/Kalman”
𝑿
Screen 𝑖 focus
pedestrian
𝒚
𝒊Kalman Filter example:
Video surveillance of pedestrians
Bayesian, Machine Learning, Frederic Pennerath
Kalman Filter Extensions for non linear models:
Extended Kalman Filter (EKF)
Linearize non linear state space representation (𝑓 𝑡 , 𝑔 𝑡 ) around estimated state 𝑋 𝑡 to update covariance matrix 𝑃 𝑡 :
𝑋 𝑡+1|𝑡 = 𝑓 𝑡 𝑋 𝑡|𝑡 , 𝑈 𝑡 , 𝑌 𝑡|𝑡 = 𝑔 𝑡 𝑋 𝑡|𝑡 , 𝑈 𝑡 , 𝐴 𝑡 = 𝜕𝑓
𝜕𝑋 𝑋 𝑡|𝑡 , 𝐶 𝑡 = 𝜕𝑔
𝜕𝑋 𝑋 𝑡|𝑡
𝜃𝑡 𝒞𝑡= 1
𝑅𝑡
𝑑𝑋
𝑡𝑑𝑡 = 𝑑
𝑑𝑡 𝑥
𝑡𝑦
𝑡𝜃
𝑡𝑣
𝑡=
𝑣
𝑡× cos 𝜃
𝑡+ 𝜀
𝑡𝑥𝑣
𝑡× sin 𝜃
𝑡+ 𝜀
𝑡𝑦𝑣
𝑡× 𝒞
𝑡+ 𝜀
𝑡𝜃1
𝑀 𝛼
𝑡− 𝑓
𝑀 𝑣
𝑡+ 𝜀
𝑡𝑣⇒
𝑋
𝑡+1|𝑡= 𝑋
𝑡|𝑡+
𝑣
𝑡× cos 𝜃
𝑡𝑣
𝑡× sin 𝜃
𝑡𝑣
𝑡× 𝒞
𝑡1
𝑀 𝛼
𝑡− 𝑓 𝑀 𝑣
𝑡× 𝜏
𝐴
𝑡= 𝜕𝑓
𝜕𝑋 𝑋
𝑡|𝑡=
1 0 −𝑣
𝑡× sin 𝜃
𝑡× 𝜏 cos 𝜃
𝑡× 𝜏 0 1 𝑣
𝑡× cos 𝜃
𝑡× 𝜏 sin 𝜃
𝑡× 𝜏
0 0 1 𝒞
𝑡× 𝜏
0 0 0 1 − 𝑓
𝑀 𝑣
𝑡𝜏
Kalman Filter Extensions for non linear models:
Unscented Kalman Filter (UKF)
Take into account curvature of non linear functions(𝑓 𝑡 , 𝑔 𝑡 ) by observing displacement of few samples called sigma points 𝜒 Ƹ 𝑖
Example for state integration:
1 . Define 2𝑑 + 1 sigma points 𝜒 Ƹ
𝑖centered on 𝑋
𝑡|𝑡and scattered on lines passing through 𝑋
𝑡|𝑡and aligned with eigenvectors of 𝑃:
Ƹ
𝜒
𝑖 𝑖∈ 0…2𝑑= 𝑋
𝑡|𝑡∪ 𝑋
𝑡|𝑡± k 𝜎
𝑖𝑢 ො
𝑖 𝑖∈ 1…𝑑k = 𝑑 + 𝜆 2. Compute their images: 𝑓( Ƹ 𝜒
𝑖)
𝑖∈ 0…2𝑑3. Compute new mean state and covariance as weighted barycenters (weights depend on 𝑑 & 𝜆):
𝑋
𝑡+1|𝑡=
𝑖=0 2𝑑+1
𝑊
𝑖𝑠𝑓( Ƹ 𝜒
𝑖) 𝑃
𝑡+1|𝑡=
𝑖=0 2𝑑+1
𝑊
𝑖𝑃𝑓 Ƹ 𝜒
𝑖− 𝑋
𝑡+1|𝑡× 𝑓 Ƹ 𝜒
𝑖− 𝑋
𝑡+1|𝑡 T𝑋
𝑡|𝑡𝑓
𝑡𝜒 Ƹ
0𝑓
𝑡𝜒 Ƹ
2𝑓
𝑡𝜒 Ƹ
3𝑓
𝑡𝜒 Ƹ
4𝑃 𝑡|𝑡
Ƹ
𝜒
1Ƹ
𝜒
3Ƹ
𝜒
0=
𝑋 𝑡+1|𝑡
Bayesian, Machine Learning, Frederic Pennerath
EKF & UKF example:
Cachalot tracking, again & again
Problem: localize cachalot:
• State is 𝑿 = 𝑥, 𝑦, 𝑧, 𝑣 𝑥 , 𝑣 𝑦 , 𝑣 𝑧
• Observations are:
– Depth gauge: 𝑧
𝑔= 𝑧 + 𝜀
𝑔– Inclinometer: tan 𝜃
𝑖=
𝑣𝑧𝑣𝑥2+𝑣𝑦2
+ 𝜀
𝑖– Compass: tan 𝜙
𝑐=
𝑣𝑥𝑣𝑦
+ 𝜀
𝑐– Pitot tube: 𝑣
𝑝= 𝑣
𝑥2+ 𝑣
𝑦2+ 𝑣
𝑧2+ 𝜀
𝑝– GPS: 𝑿
𝑔𝑝𝑠= (𝑥, 𝑦, 𝑧) + 𝜺
𝐺𝑃𝑆only when 𝑧 > −1
• Non linear observation Advantages:
• Data fusion: multiple heterogeneous sensors
• Robust to sensor failure
See code at “Demo/Bayes/8-Continuous Markov Models/EKF and UKF”
𝑽
𝜽
𝒛
Kalman Filter Extensions:
to go further
Algorithms:
– State smoothing
– State space representation learning Continuous time (Kalman-Bucy filter):
Make 𝜏 → 0: 𝑑 𝑋
|𝑡𝑜𝑑𝑡 𝑡 = 𝐴 𝑡 𝑋 |𝑡
0𝑡 + 𝐵 𝑡 𝑈 𝑡 , … Hybrid filters:
Combine continuous time integration with discrete time updates Switching Kalman Filters:
Combine HMM with Kalman filters
𝑃 𝑆 0 … , 𝑆 𝑇 , 𝑋 0 … , 𝑋 𝑇 , 𝑌 1 … 𝑌 𝑇 = 𝑃 𝑆 0 𝑃(𝑋 0 |𝑆 0 ) ෑ
𝑡=1 𝑇
𝑃 𝑆 𝑡 𝑆 𝑡−1 𝑃 𝑋 𝑡 𝑋 𝑡−1 , 𝑆 𝑡 ) 𝑃 𝑌 𝑡 𝑆 𝑡 , 𝑋 𝑡
Extensions to non linear models:
– Extended Kalman Filter
– Unscented Kalman Filter
Bayesian, Machine Learning, Frederic Pennerath