HAL Id: hal-02470765
https://hal-cnam.archives-ouvertes.fr/hal-02470765
Submitted on 9 Feb 2020
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Clusterwise analysis for multiblock component methods
Stéphanie Bougeard, Hervé Abdi, Gilbert Saporta, Ndèye Niang
To cite this version:
Stéphanie Bougeard, Hervé Abdi, Gilbert Saporta, Ndèye Niang. Clusterwise analysis for multiblock
component methods. Advances in Data Analysis and Classification, Springer Verlag, 2018, 12 (2),
pp.285-313. �10.1007/s11634-017-0296-8�. �hal-02470765�
HAL Id: hal-02470765
https://hal-cnam.archives-ouvertes.fr/hal-02470765
Submitted on 9 Feb 2020
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Clusterwise analysis for multiblock component methods
Stéphanie Bougeard, Hervé Abdi, Gilbert Saporta, Ndèye Niang
To cite this version:
Stéphanie Bougeard, Hervé Abdi, Gilbert Saporta, Ndèye Niang. Clusterwise analysis for multiblock
component methods. Advances in Data Analysis and Classification, Springer Verlag, 2018, 12 (2),
pp.285-313. �10.1007/s11634-017-0296-8�. �hal-02470765�
Stéphanie Bougeard 1 · Hervé Abdi 2 · Gilbert Saporta 3 · Ndèye Niang 3
Received: 20 October 2016 / Revised: 10 October 2017 / Accepted: 16 October 2017
© Springer-Verlag GmbH Germany 2017
Abstract Multiblock component methods are applied to data sets for which several blocks of variables are measured on a same set of observations with the goal to analyze the relationships between these blocks of variables. In this article, we focus on multi- block component methods that integrate the information found in several blocks of explanatory variables in order to describe and explain one set of dependent variables. In the following, multiblock PLS and multiblock redundancy analysis are chosen, as par- ticular cases of multiblock component methods when one set of variables is explained by a set of predictor variables that is organized into blocks. Because these multiblock techniques assume that the observations come from a homogeneous population they will provide suboptimal results when the observations actually come from different populations. A strategy to palliate this problem—presented in this article—is to use a technique such as clusterwise regression in order to identify homogeneous clusters of observations. This approach creates two new methods that provide clusters that have their own sets of regression coefficients. This combination of clustering and regres-
B Stéphanie Bougeard [email protected] Hervé Abdi
[email protected] Gilbert Saporta [email protected] Ndèye Niang
[email protected]
1
Department of Epidemiology, Anses (French agency for food, environmental and occupational health safety), 22440 Ploufragan, France
2