e-CHIMIOMETRIE 2021

(1)

e-CHIMIOMETRIE 2021

2-3 février 2021

https://chemom2021.sciencesconf.org/

________

(2)

RoBoost-PLSR : robust PLS regression method inspired from

boosting principles

M. Metz1,2, F. Abdelghafour1,2, JM. Roger1,2, M. Lesnoff2,3

1 ITAP, Univ Montpellier, INRAE, Institut Agro, Montpellier, France 2 _{ChemHouse Research Group, Montpellier, France}

3 CIRAD, UMR SELMET, Montpellier, France

4 SELMET, Univ Montpellier, CIRAD,INRA, Institut Agro, Montpellier,France

Keywords: Robust-PLSR, outliers, calibration

1 Introduction

The calibration of Partial Least Square regression (PLSR) models can be disturbed by outlying samples in the data. In these cases, the models can be unstable and their predictive potential can be depreciated. To address this issue, a new method and algorithm to better apprehend the

downweighting of outliers in a context of high dimensional data processing is proposed. This novel robust PLSR algorithm is inspired from the principles of boosting and is called RoBoost-PLSR.

2 Theory

RoBoost-PLSR consists in achieving a series of K unidimensional (1 LV) iteratively reweighted PLSR[1] models. The weigthed PLSR algorithm used is the weighted NIPALS[2].

The model (k+1) is calibrated with the X and Y residuals of the previous k models. Within each model, weights are computed according to a combination of X-residuals, Y-residuals and leverages. The more the samples deviate from the model, the lower the weights. Iteratively, the model is updated according to the weights previously attributed until convergence to a stable solution.

3 Material and methods

RoBoost-PLSR was compared with the PLSR algorithm calibrated with and without outliers (i.e. the reference) and with Partial Robust M-regression (PRM), a reference robust method. This evaluation was conducted on the basis of a simulated dataset and a real dataset.

The simulated dataset was generated with the framework proposed in [4]. The simulation objective is to reproduce a contamination in the samples leading to inconsistent spectral measurements.

The real dataset is an example of one animal nutrition application: the prediction of the protein content of feed materials and of the presence of incorrectly categorised samples. In this database the samples resulting from animal bonemeal (noted ANF) represent the outliers polluting the regular soyabean cakes (noted TTS).

(3)

-4 Results and discussion

Figure 1: Evolution of the RMSEP as a function of latent variables, for the reference, PLSR with outliers, PRM and RoBoost-PLSR for the dataset simulated dataset (left) and the real dataset (right).

Figure 1: shows that for this type of outliers the RoBoost-PLSR method performs well and can reach the prediction performance of the PLSR method without outliers.

5 Conclusion

Roboost-PLSR proves to be resilient to the tested outliers, and can achieve the performances of the reference PLSR calibrated without any of these outliers.

6 References

[1] Cummins, David J., et C. Webster Andrews. « Iteratively Reweighted Partial Least Squares: A Performance Analysis by Monte Carlo Simulation ». Journal of Chemometrics 9, nᵒ 6 (1995): 489‑ 507. https://doi.org/10.1002/cem.1180090607. [2] D. Tirambic : The book. The editor, the edition, 1929.

[2] Schaal, Stefan, Christopher G. Atkeson, et Sethu Vijayakumar. « Scalable Techniques from Nonparametric Statistics for Real Time Robot Learning ». Applied Intelligence 17, nᵒ 1 (1 juillet 2002): 49‑ 60. https://doi.org/10.1023/A:1015727715131. [3] Serneels, Sven, Christophe Croux, Peter Filzmoser, et Pierre J. Van Espen. « Partial Robust M-Regression ». Chemometrics and

Intelligent Laboratory Systems 79, nᵒ 1 (28 octobre 2005): 55‑ 64. https://doi.org/10.1016/j.chemolab.2005.04.007. [4] Metz, Maxime, Alessandra Biancolillo, Matthieu Lesnoff, et Jean-Michel Roger. « A Note on Spectral Data Simulation ».

Chemometrics and Intelligent Laboratory Systems 200 (15 mai 2020): 103979. https://doi.org/10.1016/j.chemolab.2020.103979.