Conclusion
• Tackling a challenging detonation type classification task
• Proposed methodology to properly train and evaluate classifiers
• Application to SVMs and DRBMs
• DRBMs slightly superior overall, and less sensitive to the choice of preprocessing hyperparameters than SVMs.
• Bengio, Y. (2009). Learning Deep Archi- tectures for AI. Foundations and Trends in Machine Learning, to appear.
• Cortes, C. and V. Vapnik (1995). Support- Vector Networks. Mach. Learn. 20(3), 273–
297.
• Larochelle, H. and Y. Bengio (2008). Clas- sification using discriminative restricted Boltzmann machines. In A. M
CC
AL-
LUM
and S. R
OWEIS(Eds.), Proceedings of the 25th Annual International Conference on Machine Learning (ICML 2008), pp. 536–
543. Omnipress.
Sensitivity to Preprocessing Choice
• Performance can vary a lot depending on various parameters govern- ing data preprocessing, e.g. type of segmentation, number and size of signal windows for Fast Fourier Transform, etc.
Effect of Preprocessing on Accuracy (SVMs, Realistic−by−Day)
Accuracy
Density
0 5 10 15 20 25
0.4 0.5 0.6 0.7 0.8
●
●● ●●●●●●
● ●
● ● ●●●●●
: Log False
: FFT Windows1
●●●●●● ●●
●●● ● ●●●●●●●●●●●●●●●●●●●●●●●
●
: Log True
: FFT Windows1
●
● ●●● ●●● ●
●● ● ●●●●●●
: Log False
: FFT Windows2
0 5 10 15 20 25
●●● ●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●
: Log True
: FFT Windows2
0 5 10 15 20 25
● ●●●
●●● ●●●●●●●●
● ●●●●● ● ●●●●
● ●● ●●● ●●● ●
: Log False
: FFT Windows4
● ●
●
●●●●●●
● ●● ●
●●●● ●● ●●
● ●●●●●●●●●●●●●●
: Log True
: FFT Windows4
●
● ● ●
● ● ●●●● ●●●
● ● ●● ●
● ●●●●●● ● ●●●
: Log False
: FFT Windows6
0.4 0.5 0.6 0.7 0.8
0 5 10 15 20 25
● ● ● ●
●●●●●●
●
●●●●●●● ● ●●●●●●●
● ●●●●●●●●
: Log True
: FFT Windows6
Segmentation: ApPeakNoTrunc ● ARLTruncated
Example of SVM accuracy distribution in the R EALISTIC - BY -D AY setting when varying some preprocessing parameters.
• The accuracy residual measures the amount of variation in accuracy per- formance that is due to varying the preprocessing parameters (type of segmentation, number and size of windows for Fast Fourier Trans- form, ...), while keeping the model hyper-parameters fixed
• We plot the distribution of residuals for two sets of experiments: on the left in the R EALISTIC - BY -D AY setting and for a single type of seg- mentation (called ARLTruncated ), and on the right in the R EALIS -
TIC - BY -R ANGE setting and for all kinds of segmentations being tried.
Model Sensitivity to Preprocessing Variations (Realistic−by−Day, ARLTruncated)
Accuracy Residual
Density
0 10 20 30 40
−0.10 −0.05 0.00 0.05
● ●● ●
●●●●●●●●● ●● ● ●●●●●●●●● ●●●●●●●●●●●●●
● ●
● ● ●●● ● ●
SVM
0 10 20 30 40
● ●
● ● ●●●●● ●●●●●● ●
● ●● ● ●
● ●●●●●●● ●
●●● ●●● ●●● ●●●●●●● ● ●●●●●●●●●●●●●●●
● ●
DRBM
Model Sensitivity to Preprocessing Variations (Realistic−by−Day and −Range, all segmentations)
Accuracy Residual
Density
0 5 10 15
−0.1 0.0 0.1
●●
●●●●●●●●● ●●●●●● ●● ●● ●●●●●●●●●●●●●●●●●
● ●● ●●● ● ●
● ● ●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●
● ● ●
●●● ●●● ●●●●●●●●●
●● ● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●
● ●● ●●● ● ● ●
SVM
0 5 10 15
● ●●
●
● ●●● ●
●●●●●●●● ●●●●
●●●●●●●●●●●●● ● ●● ●● ● ●●●●● ●● ●●●●●●●●●● ●●● ●●●●●●● ●●●●●●●●●●●●●●●●●●
● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●●●●●● ●●●●●●●●● ●●
●● ● ●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●
●● ●●● ●●●●●●●●● ●● ●● ●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
DRBM
• Residuals of SVMs exhibit a greater variance than those of DRBMs (dif- ference is statistically significant)
• Lower variability ⇒ performance is more reliably estimated ⇒ DRBMs are particularly useful when not much data is available
Experimental Results:
R EALISTIC - BY -R ANGE
Model Type
Accuracy
0.70 0.75 0.80
SVM DRBM
●
Box-plot over statistically-indistinguishable hyperparameter values for each model type. DRBMs are superior to SVMs both in mean perfor- mance (statistically significant) and in robustness (lower variance).
Experimental Results:
R EALISTIC - BY -D AY
Model Type
Accuracy
0.75 0.76 0.77 0.78 0.79 0.80
SVM DRBM
Box-plot over statistically-indistinguishable hyperparameter values for each model type. DRBMs are superior to SVMs both in mean perfor- mance (statistically significant) and in robustness (lower variance).
Experimental Setting
• 5 repetitions of 5-fold cross-validation w.r.t. split constraints defined by the R EALISTIC - BY -D AY and R EALISTIC - BY -R ANGE settings
• Computation of the normalized classification accuracy to compensate for class imbalance
• Trying a wide range of hyperparameter values, reporting results for those leading to statistically indistinguishable performance compared to the best
Classification Algorithm 2: DRBMs
0 0 1 0
W U
h
y
!
y z
Restricted Boltzmann Machine modeling the joint distribution of inputs z and target class y
A Discriminative Restricted Boltzmann Machine combines generative and discriminative training criteria by minimizing
− X
z(i)∈T
log p(y
(i), z
(i)) − λ X
z(i)∈T
log p(y
(i)|z
(i))
where the model’s joint probability is
p(y, z) ∝ exp (h
0Wz + b
0z + c
0h + d
0~ y + h
0U~ y)
Training performed by contrastive divergence for the generative part and stochastic gradient descent for the discriminative part.
Classification Algorithm 1: SVMs
Find function f in reproducing kernel Hilbert space H
Kassociated to ker- nel K by:
argmin
f∈HK
C X
z(i)∈T
1 − y
(i)f (z
(i))
+
| {z }
hinge loss (bias)
+ 1 2
f
2
| {z }
margin (variance)