• Aucun résultat trouvé

e-CHIMIOMETRIE 2021

N/A
N/A
Protected

Academic year: 2021

Partager "e-CHIMIOMETRIE 2021"

Copied!
3
0
0

Texte intégral

(1)

e-CHIMIOMETRIE 2021

2-3 février 2021

https://chemom2021.sciencesconf.org/

________

(2)

RoBoost-PLSR : robust PLS regression method inspired from

boosting principles

M. Metz1,2, F. Abdelghafour1,2, JM. Roger1,2, M. Lesnoff2,3

1 ITAP, Univ Montpellier, INRAE, Institut Agro, Montpellier, France 2 ChemHouse Research Group, Montpellier, France

3 CIRAD, UMR SELMET, Montpellier, France

4 SELMET, Univ Montpellier, CIRAD,INRA, Institut Agro, Montpellier,France

Keywords: Robust-PLSR, outliers, calibration

1 Introduction

The calibration of Partial Least Square regression (PLSR) models can be disturbed by outlying samples in the data. In these cases, the models can be unstable and their predictive potential can be depreciated. To address this issue, a new method and algorithm to better apprehend the

downweighting of outliers in a context of high dimensional data processing is proposed. This novel robust PLSR algorithm is inspired from the principles of boosting and is called RoBoost-PLSR.

2 Theory

RoBoost-PLSR consists in achieving a series of K unidimensional (1 LV) iteratively reweighted PLSR[1] models. The weigthed PLSR algorithm used is the weighted NIPALS[2].

The model (k+1) is calibrated with the X and Y residuals of the previous k models. Within each model, weights are computed according to a combination of X-residuals, Y-residuals and leverages. The more the samples deviate from the model, the lower the weights. Iteratively, the model is updated according to the weights previously attributed until convergence to a stable solution.

3 Material and methods

RoBoost-PLSR was compared with the PLSR algorithm calibrated with and without outliers (i.e. the reference) and with Partial Robust M-regression (PRM), a reference robust method. This evaluation was conducted on the basis of a simulated dataset and a real dataset.

The simulated dataset was generated with the framework proposed in [4]. The simulation objective is to reproduce a contamination in the samples leading to inconsistent spectral measurements.

The real dataset is an example of one animal nutrition application: the prediction of the protein content of feed materials and of the presence of incorrectly categorised samples. In this database the samples resulting from animal bonemeal (noted ANF) represent the outliers polluting the regular soyabean cakes (noted TTS).

(3)

-4 Results and discussion

Figure 1: Evolution of the RMSEP as a function of latent variables, for the reference, PLSR with outliers, PRM and RoBoost-PLSR for the dataset simulated dataset (left) and the real dataset (right).

Figure 1: shows that for this type of outliers the RoBoost-PLSR method performs well and can reach the prediction performance of the PLSR method without outliers.

5 Conclusion

Roboost-PLSR proves to be resilient to the tested outliers, and can achieve the performances of the reference PLSR calibrated without any of these outliers.

6 References

[1] Cummins, David J., et C. Webster Andrews. « Iteratively Reweighted Partial Least Squares: A Performance Analysis by Monte Carlo Simulation ». Journal of Chemometrics 9, nᵒ 6 (1995): 489‑ 507. https://doi.org/10.1002/cem.1180090607. [2] D. Tirambic : The book. The editor, the edition, 1929.

[2] Schaal, Stefan, Christopher G. Atkeson, et Sethu Vijayakumar. « Scalable Techniques from Nonparametric Statistics for Real Time Robot Learning ». Applied Intelligence 17, nᵒ 1 (1 juillet 2002): 49‑ 60. https://doi.org/10.1023/A:1015727715131. [3] Serneels, Sven, Christophe Croux, Peter Filzmoser, et Pierre J. Van Espen. « Partial Robust M-Regression ». Chemometrics and

Intelligent Laboratory Systems 79, nᵒ 1 (28 octobre 2005): 55‑ 64. https://doi.org/10.1016/j.chemolab.2005.04.007. [4] Metz, Maxime, Alessandra Biancolillo, Matthieu Lesnoff, et Jean-Michel Roger. « A Note on Spectral Data Simulation ».

Chemometrics and Intelligent Laboratory Systems 200 (15 mai 2020): 103979. https://doi.org/10.1016/j.chemolab.2020.103979.

Références

Documents relatifs

In the next section, we adapt the statement and proof of convergence of our approachability strategy to the class of games of partial monitoring such that the mappings m( · , b)

We finally consider external regret and internal regret in repeated games with partial monitoring and derive regret-minimizing strategies based on approachability theory.. Key

At a high intensity level (e.g. fac-outlier = 5 or 10), PETRELS-ADMM again provides the best performance in terms of both con- vergence rate and accuracy. While ROSETA and NORST fail

The main idea is to learn most of the parameters of the model using conditional quantile estimators (which are biased but robust estimators of the regression), and to learn a

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

Focusing on delayed neural fields under partial proportional feedback, we then show that the uncontrolled population is ISS with respect to the state of the controlled population

Three main results are provided: optimal bounds for the estimation prob- lem in Section 3, that improve in particular previous results obtained with LASSO penalization [23],

In this section, we consider a new formal language known as exchange-robust primitive partial words with one hole which remains primitive when any two consecutive symbols in a