Lidwine Grosmaire, Pedro Maldonado-Alvarado, Christelle Reynès, Robert
Sabatier, Dominique Dufour, Thierry Tran and Jean-Louis Delarbre
Joint Selection of Wavenumber Regions for MidIR and
Raman Spectra and Variables in PLS Regression using
Genetic Algorithms
The Context
Empirical and small-scale
processing Good breadmaking ability
Interest Increased production and consumption
of cassava From the crop to
the starch
2012
E
F
F
o
S
T
A
n
n
u
a
l
M
e
e
ti
n
g
20
-23
N
o
v
e
m
b
e
r
2
0
1
2
•
M
o
n
tp
e
ll
ie
r,
F
r
a
n
c
e
Varietal and process impacts on breadmaking ability
Standardize and scale-up the process Physicochemical parameters
Improve product quality
Industrial development of new gluten-free bread products
Aim
The Aim
The Data
Physicochemical parameters Spectral data
Predicting the expansion ability
PLS1
Y
IR 2X
Ra 3X
1 52 1 52 1 4562 -0.45 -0.40 -0.35 -0.30 -0.25 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 Int ens it é (c oup s) 500 1 000 1 500 2 000 2 500 3 000 3 500 Shift Raman (cm-1) RVA Relevant parameters
selGAmPLS
3351 1 c 1X
IR 2X
13 Breadmaking ability Amylose contentRVA parameters (12 variables)
Mid-infrared spectra (3351 variables) Raman spectra (4562 variables)
How to explain the breadmaking ability from our data using a statistical regression method while selecting variables of different types: individual and intervals?
Explanatory variables are organized in a multitable in which intervals and individual variables are selected in order to predict one variable of interest: the breadmaking capacity.
A Genetic Algorithm (GA) was developed in a context of discrimination, jointly with the PLS1 method : this method is called selGAmPLS.
The Results
Fig 2: Final GA populations
characteristics: selected variables
are indicated by black points
Table 1: Comparison with other methods results
(number of selected variables, number of retained
PLS components, R² and cross-validation R²).
The 10 final populations are quite close indicating a global convergence of the GA
Individual populations seem to have converged
Method # var # comp R² R²CV
PLS 7926 7 0.7836 0.6605 PLS + VIP 4 3 0.7210 0.6650 selAGmPLS 311 12 0.9936 0.8273 Physicochemical parameters 4 RVA parameters Peak Viscosity Holding Strenght Breakdown Relative Breakdown Spectroscopic variables 4 spectral regions
Crystalline order of starch
Selected variables
Conclusion
Genetic Algorithms provide a very adaptable and efficient solution when dealing with both several kinds of variables selections (individual vs
intervals) and multiway tables.
The results obtained are very interesting for a predictive use. In terms of interpretation the method allowed to highlight the importance of
some physico-chemical variables and to select a small number of short intervals in spectroscopic data.
The data selected are related to the water absorptivity and the crystalline state of starch and play a key role in breadmaking ability
7926 variables