Modelisation and Information System Tools to Support the Discovery of Interactive Factors of Vulnerabilities in Life Courses

(1)

Thesis

Reference

Modelisation and Information System Tools to Support the Discovery of Interactive Factors of Vulnerabilities in Life Courses

ROUSSEAUX, Emmanuel

Abstract

In the past decades, the life course perspective and the vulnerability framework have grown in popularity to study how risks spread across people lives. Such studies involve complex longitudinal and network data as well as specific analysis methods. This thesis aims to help the social scientist in managing and analyzing such data. To support the provided methodological contributions, the thesis starts by setting a conceptual model of the diffusion of vulnerability along the life course. Then, the thesis develops several complementary strategies for exploring the set of vulnerability descriptive variables with the aim to identify interaction effects, such as when the gender effect depends on the age. The strategies rely on classification trees and specifically focus on unexpected interaction effects and data imbalance. In an illustrative application focusing on vulnerability to poverty, the proposed methods successfully achieved to identify an unexpected interaction effect between ego's and father's educational resources on ego's unemployment. Several of the contributions are made available to the scientific community [...]

ROUSSEAUX, Emmanuel. Modelisation and Information System Tools to Support the Discovery of Interactive Factors of Vulnerabilities in Life Courses . Thèse de doctorat : Univ. Genève, 2018, no. SdS 107

DOI : 10.13097/archive-ouverte/unige:120604 URN : urn:nbn:ch:unige-1206042

Available at:

http://archive-ouverte.unige.ch/unige:120604

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

Modelisation and Information System Tools to Support the

Discovery of Interactive Factors of Vulnerabilities in

Life Courses

TH` ESE

présentée à la Faculté des sciences de la société de l’Université de Genève

par

Emmanuel Rousseaux

sous la direction de

Prof. Giovanna Di Marzo

et

Prof. Gilbert Ritschard

pour l’obtention du grade de

Docteur ` es sciences de la soci´ et´ e mention syst` emes d’information

Membres du jury de th`ese : Mme Giovanna Di Marzo, Professeure M. MichelOris, Professeur, pr´esident du jury

M. GilbertRitschard, Professeur

M. BorisWernli, FORS, Universit´e de Lausanne, Suisse

M. DjamelZighed, Agence Universitaire de la Francophonie, Quebec, Canada &

Paris, France; Universit´e de Lyon 2, France

Thèse nô107 Genève, 12 décembre 2018

(3)

qui s’y trouvent énoncées et qui n’engagent que la responsabilité de leur auteur.

Gen`eve, le 30 janvier 2019.

Le doyen

BernardDebarbieux

Impression d’apr`es le manuscrit de l’auteur.

(4)

Abstract

The present Ph.D. thesis in information systems aims to provide social science research with a conceptual model and the associated analytic tools to support the discovery of interactive factors of vulnerability in life courses. The past decades have seen several social changes emerge such as the individualization of the society, the growing of inequalities due to social stratification, and the labor market’s increased demand for flexibility and personal engagement. These social changes place individuals at risk of experiencing undesired situations such as social exclusion, poverty, or stress. To study how such risks spread across people lives, the life course perspective and the vulnerability framework have recently grown in popularity. The life course perspective provides a conceptual structure for considering how past events in individual’s life history explain and influence future life. The framework of vulnerability provides a conceptual structure to study, as a dynamic process, how a system is exposed to and recover from perturbations. An effective strategy to overcome social vulnerability is to set up social policies targeting vulnerable people. To do so, a preliminary step is to understand what are the factors in connection with vulnerability outcomes. The thesis addresses this issue by focusing on the discovery of underlying factors of vulnerability in life courses. In this context, the term underlying factor refers to an independent variable that is not available in the data as a single variable. Instead, underlying factors have to be built up by researchers in social sciences by combining several available variables.

This operation often involves interaction effects between variables.

When addressing an information system issue, a first step is to take a holistic view of the activity the system is designed to support. Therefore, my first research question addresses the issue of integrating both frameworks of life course and vulnerability together to put forward, under some simplifying assumptions, a conceptual model of the process of vulnerability in life courses. Such a model is primarily expected to clarify the practical meanings of studying vulnerability, as the difference between studying, for example, poverty, and studying vulnerability to poverty. As a result, such a model would serve as a basis and a shared vo- cabulary for researchers in information systems interested in proposing strategies to address the vulnerability in life courses. My second research question focuses on the use of classification tree methods for identifying interaction effects. As I postulate that, in modern societies, vulnerability situations lead to life situations that are infrequent compared to non-vulnerable situations, I focus on identifying interaction effects in a context of class imbalance.

(9)

After reviewing the literature of both life course and vulnerability research fields, I answer the first research question by providing, under some simplifying assumptions, a conceptual model of the diffusion of vulnerability in life courses as a dynamic process. This approach uses the term vulnerability factor to bridge the life course and vulnerability frameworks. In this model, an underlying factor of vulnerability refers to a set of two or more individual or environmental resources for which there exists an interaction between their possible states that, possibly combined with an interaction effect with one or more stressors, involve a change in one or more vulnerability components: either exposure to stressors, sensitivity to stressors, or resilience capacity. I assess this model by confronting it with the other frameworks and models of vulnerability in life courses previously reviewed. Results show that the proposed model covers to a large extent all the other approaches.

After reviewing the literature on the growing of classification trees in an imbalanced data context, I answer the second research question by providing two complementary methodologies for exploring the attribute space and identifying underlying interaction effects with imbalanced data. The first methodology focuses on the exploratory process while the second methodology focuses on the tree growing process. The first methodology consists of the use of a preliminary bi- variate analyses step to separate between strong and weak associations with the outcome variable and then to explore those associations by dynamically tuning the classification tree growing parameters through an interactive graphical interface.

The second methodology consists of better separating between vulnerability outcomes in an imbalanced data context. To do so, I put forward three classification tree growing measures designed for this purpose. I assess the first methodology by conducting an exploration of underlying factors of vulnerability on real data.

The study focuses on the vulnerability to poverty in targeting underlying factors of unemployment among young adults second-generation immigrants in Switzerland.

The methods successfully achieved to identify an unexpected interaction effect between ego’s and father’s educational resources on ego’s unemployment. This result has been published in Guarin and Rousseaux (2017). I assess the second methodology by comparing the performance of the proposed tree growing measures to the main data imbalanced strategies discussed in the literature review. To this purpose, I conduct experiments on some datasets coming from the “UCI Machine Learning Repository” (Dheeru and Karra Taniskidou, 2017). The results tend to indicate that my contributions are able to achieve the same performances than the other strategies but in building less deeper trees. Such a feature is asked by social scientists that are more interested in result interpretability than raw prediction performance in regard to classification trees.

In a quantitative research process, the data exploration stage refers to three steps: data understanding, data preparation, and exploratory analysis. Both research questions focus on the exploratory analysis step. To address the data exploration stage globally, I investigate what improvements can be put forward regarding data understanding and data preparation. I do this investigation in immersing myself in the role of a social scientist by working as a coauthor on three real studies respectively in health sociology (Cullati et al., 2014), labor sociology (Guarin and Rousseaux,2017), and family sociology (work stil in progress). During these

(10)

Contents vii immersions, I provided several methodological strategies that could be able to im- prove access to data documentation within the statistical software and to better pay attention to data representativeness. These methodological strategies were not formally assessed but are available for testing purpose in the software this thesis releases.

This Ph.D. thesis brings several contributions. Firstly, the thesis provides the human vulnerability research area with a conceptual model of the diffusion of the vulnerability in life courses as a dynamic process. Secondly, the thesis provides researchers in social sciences with two complementary methodologies that facilitate the exploration of interaction effects in a context of class imbalance and aim to provide researchers with some sets of relevant underlying factors in a semi-automatic way. The conceptual and the methodological contributions combined represent my response to the issue of supporting researchers in social sciences in discovering vulnerabilities factors in the life course. In addition to these contributions, the thesis provides software contributions. The package Rsocialdata provides researchers in social sciences with tools for handling cross-sectional survey data inRand provide generic data structures that can be extended to more complex survey data types. As a key feature, it allows to store, document, and prepare survey data.

It provides tools for effectively exploring and recoding data, allowing researchers to proceed more quickly to analyses. The package Rsocialdata.panel extends the package Rsocialdata to provide ad-hoc tools for effectively preparing panel survey data. The Rsocialdata.network package also extends the Rsocialdata package to provide ad-hoc tools for effectively preparing egocentric network survey data. The packageTrimprovides an implementation of both the main off-centered entropy measures as well as the measures we introduced as new methodological contributions.

(11)

(12)

R´ esum´ e

Cette présente thèse de doctorat en systèmes d’information vise à mettre à dis- position de la recherche en sciences sociales un modèle conceptuel ainsi que les outils analytiques associées permettant d’appuyer la découverte d’interactions entre les facteurs de vulnérabilités dans le cadre des études de type parcours de vie. Ces dernières décennies ont vu émerger plusieurs changements sociaux tels que l’individualisation de la société, la croissance des inégalités dues à la stratification sociale et la demande accrue du marché du travail pour la flexibilité et l’engagement personnel. Ces changements sociaux exposent les individus à des situations indésirables telles que l’isolement social, la pauvreté ou le stress. Pour

étudier comment ces risques se diffusent dans la vie des individus, l’approche du parcours de vie et le cadre de la vulnérabilité ont récemment gagné en popu- larité. L’approche du parcours de vie fournit une structure conceptuelle permettant d’examiner comment les événements passés de la vie d’un individu peuvent expli- quer et influencer la vie future. Le cadre de la vulnérabilité fournit une structure conceptuelle permettant d’étudier, en tant que processus dynamique, comment un système est exposé à des perturbations et s’adapte à celles-ci. Une stratégie efficace pour surmonter la vulnérabilité sociale consiste à mettre en place des politiques sociales ciblant les personnes vulnérables. Pour ce faire, une étape préliminaire consiste à comprendre quels sont les facteurs liés aux manifestations de la vulnérabilité.

Cette thèse aborde ce problème en mettant l’accent sur la découverte des facteurs sous-jacents de la vulnérabilité dans les cours de vie. Dans ce contexte, le terme facteur sous-jacent fait référence à une variable indépendante qui n’est pas disponible dans les données en tant que variable unique. À la place, les facteurs sous-jacents doivent être construits par les chercheurs en sciences sociales en combinant plusieurs variables disponibles dans les données. Cette opération implique souvent des effets d’interaction entre les variables.

Lorsque l’on adresse un problème de système d’information, une première

étape est de modéliser l’activité que le système doit supporter. Ainsi, ma première question de recherche aborde la question de l’intégration conjointe des cadres conceptuel du parcours de vie et de la vulnérabilité pour proposer, sous certaines hypothèses simplificatrices, un modèle conceptuel du processus de vulnérabilité dans les parcours de vie. Un tel modèle pourrait servir de base commune pour les chercheurs en systèmes d’information intéressés à proposer des stratégies pour aborder la vulnérabilité dans les cours de la vie. Ma deuxième question de recherche porte sur l’utilisation des méthodes d’arbre de décision pour explorer les effets

(13)

d’interaction. En postulant que, dans les sociétés modernes, les situations de vulnérabilité sont moins fréquentes que les situations de non-vulnérabilité, je me concentre sur l’exploration des effets d’interaction dans un contexte de données déséquilibrées à deux classes.

Après avoir revu la littérature concernant respectivement l’approche parcours de vie et la vulnérabilité, je réponds à la première question de recherche en fournissant, sous certaines hypothèses simplificatrices, un modèle conceptuel de la diffusion de la vulnérabilité dans les cours de vie en tant que processus dynamique. Cette approche utilise le terme facteur de vulnérabilité pour faire le pont entre les cadres du parcours de vie et de la vulnérabilité. Dans ce modèle, un facteur de vulnérabilité sous-jacent fait référence à un ensemble de deux ou plusieurs ressources individuelles ou environnementales pour lesquelles il existe une interaction entre leurs états possibles qui, éventuellement combinées avec un effet d’interaction avec un ou plusieurs facteurs de stress, impliquent un changement dans une ou plusieurs composantes de vulnérabilité: l’exposition aux facteurs de stress, la sensibilité aux facteurs de stress ou la capacité de résilience. J’effectue une évaluation de ce modèle en le confrontant aux autres cadres et modèles de vulnérabilité dans les parcours de vie discutés dans la littérature. Les résultats montrent que le modèle proposé couvre dans une large mesure toutes les autres approches proposées.

Après avoir revu la littérature discutant les méthodes d’arbres de décision dans un contexte de données déséquilibrées, je réponds à la deuxième question de recherche en fournissant deux méthodologies complémentaires pour explorer les effets d’interaction dans ce contexte. La première méthodologie se concentre sur le processus exploratoire tandis que la seconde méthodologie se concentre sur le processus de développement des arbres. La première méthode consiste à utiliser une étape d’analyse bivariée préliminaire pour séparer les associations fortes des associations faibles avec la variable dépendante, puis à explorer ces associations en ajustant de fa¸con dynamique les paramètres de développement de l’arbre de décision au moyen d’une interface graphique interactive. La seconde méthodologie consiste à mieux prédire les réification de vulnérabilités dans un contexte de données déséquilibrées. Pour ce faire, je propose trois mesures de développement d’arbre de décision. J’évalue la première méthodologie en menant une exploration des facteurs de vulnérabilité sous-jacents sur des données réelles. L’étude en question se concentre sur la vulnérabilité à la pauvreté en ciblant les facteurs sous-jacents au chômage sur une population de jeunes adultes immigrants de seconde génération en Suisse. La méthodologie proposée a permis d’extraire un facteur de vulnérabilité sous-jacent lié aux ressources éducatives de l’individu et de son père. Ce résultat a

été publié dans Guarin and Rousseaux (2017). J’évalue la seconde méthodologie en comparant la performance des mesures de développement d’arbres proposées avec les principales stratégies discutées dans la revue de la littérature. Pour ce faire, je réalise des expérimentations sur plusieurs jeux de données provenant du “UCI Machine Learning Repository” (Dheeru and Karra Taniskidou,2017). Les résultats tendent à indiquer que mes contributions sont capables d’atteindre les mêmes performances que les autres stratégies mais en construisant des arbres moins profonds.

Une telle caractéristique est privilégiée par les chercheurs en sciences sociales qui

(14)

Contents xi s’intéressent davantage à l’interprétabilité des résultats qu’à la performance brute en ce qui concerne les arbres de décision.

Dans un processus de recherche quantitatif, la phase d’exploration des données se compose de trois étapes : la compréhension des données, la préparation des données et l’analyse exploratoire. Les deux questions de recherche que j’ai posées se concentrent sur l’étape de l’analyse exploratoire. Pour aborder la phase d’exploration des données de manière globale, j’ai étudié les améliorations qui peuvent être ap- portées concernant la compréhension des données et la préparation des données.

Pour cela, je me suis immergé dans le rôle d’e sociologue d’un chercheur scoence sociales en travaillant comme coauteur sur trois études réelles respectivement en sociologie de la santé (Cullati et al., 2014), en sociologie du travail (Guarin and Rousseaux, 2017), et en sociologie de la famille (étude encore en cours). Au cours de ces immersions, j’ai fourni plusieurs stratégies méthodologiques qui pour- raient être en mesure d’améliorer l’accès à la documentation des données dans les logiciels de statistique et de mieux prêter attention à la représentativité des données. Ces stratégies méthodologiques n’ont pas été formellement évaluées mais sont disponibles à des fins de test dans les logiciels développés durant ce travail de thèse.

Cette thèse de doctorat amène plusieurs contributions. Tout d’abord, la thèse fournit au champ de recherche de la vulnérabilité humaine un modèle conceptuel de la diffusion de la vulnérabilité dans les parcours de vie en tant que processus dynamique. Deuxièmement, la thèse fournit aux chercheurs en sciences sociales deux méthodologies complémentaires qui facilitent l’exploration des effets d’interaction dans un contexte de données déséquilibrées à deux classes et la détection de manière semi-automatique d’ensemble de potentiels facteurs sous-jacents de vulnérabilité.

Ces deux contributions conceptuelles et méthodologiques combinées forment ma réponse à la problématique de fournir un soutien aux chercheurs en sciences sociales pour la découverte des facteurs de vulnérabilité dans les parcours de vie.

En plus de ces contributions, cette th`ese fournit aussi des contributions logicielles.

La librairie Rsocialdata fournit aux chercheurs en sciences sociales des outils pour gérer les données d’enquêtes transversales dansRet fournit des structures de données génériques qui peuvent être étendues à des types de données d’enquête plus complexes. Il permet le stockage, la documentation et la préparation de données d’enquête. Il fournit aussi des outils pour explorer et recoder efficacement les données, ce qui permet aux chercheurs de se lancer plus rapidement les analyses.

La librairie Rsocialdata.panel étend la librarie Rsocialdata pour fournir des outils spécifiques pour préparer efficacement les données d’enquête de type panel.

Le paquet Rsocialdata.network étend également la librairie Rsocialdata pour fournir des outils spécifiques à la préparation efficace de données d’enquête de réseau égocentriques. La librarie Trimfournit une implémentation des principales mesures d’entropie décentrée ainsi que les mesures introduites en tant que nouvelles contributions méthodologiques.

(15)

(16)

Remerciements

Je tiens tout d’abord à remercier GilbertRitschardde m’avoir donné l’opportunité d’initier ce travail de thèse et de l’avoir co-dirigé. Gilbert, je te remercie de l’encadrement que tu m’as apporté tout au long de ce projet et de ton soutien indéfectible et bienveillant. Je te remercie également pour tes minutieuses et per- spicaces relectures qui m’ont souvent obligé à questionner mon travail et m’ont par là même permis de l’orienter dans les bonnes directions. Aussi, à tes côtés j’ai pu apprécier dans tout leur sens ce que rigueur et précision signifient et je t’en remercie.

Je tiens également à remercier GiovannaDi Marzod’avoir co-dirigé ce travail de thèse. Giovanna, je te remercie de ton suivi tout au long de ce projet et particulièrement de ton appui dans la structuration de mon travail. Aussi, je te remercie de m’avoir partagé avec pédagogie ta vision des systèmes d’information et de m’avoir aiguillé dans la compréhension que ce que l’on entend, et par conséquent de ce que l’on attend, d’une thèse en systèmes d’information.

J’adresse en outre mes remerciements à MichelOrisqui m’a fait l’honneur de présider mon jury de thèse. Michel, je te remercie pour tes critiques constructives sur mon modèle conceptuel et notamment sur ma proposition de modélisation de la vulnérabilité au sein des parcours de vie. Par ailleurs, je souhaite te remercier pour les différents environnements interdisciplinaires que tu as activement participés à mettre en place à l’Université de Genève et en Suisse romande. Il est évident que le travail que je présente ici a germé dans le terreau fertile de cet écosystème.

Mes remerciements s’adressent également à DjamelZighedet BorisWernli qui m’ont fait l’honneur d’être jurés de cette thèse. Djamel, tu as, par l’intermédiaire de ton cours de master sur les méthodes de classification, et notamment les méthodes d’arbre, significativement inspiré ce projet de thèse et je t’en remercie. Je te remercie de surcroˆıt pour l’ensemble des échanges que nous avons eus et à travers desquels tu m’as partagé tes larges connaissances des différentes méthodologies de modélisation et d’extraction des connaissances. Boris, je te remercie sincèrement pour ta relecture minutieuse de mon travail et l’enthousiasme que tu y as porté. Je te remercie également pour tes nombreux conseils et suggestions avisés qui m’ont permis de préciser mon propos et d’affiner la portée de plusieurs de mes proposi- tions.

En outre, je remercie les doctorants et chercheurs du NCCR LIVES, des facultés SES, SdS et GSEM de l’Université de Genève, pour tous les échanges,

(17)

les discussions, les questions posées lors de mes présentations et l’enthousiasme manifesté qui m’a motivé à avancer. En particulier, je remercie StéphaneCullati, Andrés Guarin et MyriamGirardin pour le temps qu’ils m’ont consacré et les riches discussions que nous avons eues, discussions qui ont activement contribué à fa¸conner mon projet de recherche.

Je ne peux terminer sans remercier chaleureusement ma famille et mes proches qui m’ont soutenu et encouragé dans et tout au long de ce projet. Je remercie tout particulièrement mes parents, Jean-Louis et Francette, et mon frère, Jean-Fran¸cois, d’avoir cultivé mon goût d’apprendre et de m’avoir donné envie d’approfondir tou- jours davantage mes connaissances. Aussi, je remercie intensément ma compagne, Delphine, de m’avoir accompagné et soutenu ces dernières années et de m’avoir offert un environnement propice à la réalisation de ce travail de thèse.

Cette thèse a été une expérience exceptionnelle et extrêmement enrichissante.

À toutes celles et ceux qui m’ont permis de la vivre, je vous adresse mes très sincères remerciements.

Cette publication a bénéficié du soutien du Pôle de recherche national LIVES - Surmonter la vulnérabilité : perspective du parcours de vie (IP214), financé par le Fonds national suisse. L’auteur remercie le Fonds national suisse de son aide financière.

(18)

A mes parents et `` a mon fr`ere.

(19)

(20)

Chapter 1

Introduction

1.1 Context. . . 2

1.2 Research framework . . . 3

1.2.1 Research issue . . . 3

1.2.2 Research questions. . . 7

1.2.2.1 Expliciting a model of vulnerability in life courses . . . . 7

1.2.2.2 Identifying interaction effects with infrequent outcomes . 9 1.2.3 Immersions as a social scientist. . . 13

1.3 Contributions. . . 14

1.3.1 Conceptual model of the vulnerability in life courses . . . 15

1.3.2 Methodological contribution . . . 15

1.3.3 Software contribution . . . 16

(21)

1.1 Context

Individual life is a dynamic process affected by a multitude of events. Some of the life events may have positive consequences for individuals; some others may have negative consequences. Life events also differ in term of impact on individual lives. Buying a new sofa or celebrating New Year’s Eve are such life events that are expected to have minor impacts on the individual life path. On the opposite, moving to a new country, becoming a parent, or falling in long-term unemployment are such life events that are expected to have significant impacts on the individual life path.

Depending on resources of an individual, such as economic resources, social capital, physical health, psychological resources or social support, the same life event will not have the same consequences on individuals’ lives. For example, a short-term loss of employment may be easier to face for a married couple with no children than for a single woman with two dependent children. Resources themselves shape the kind of events individuals are likely to experience by moderating risks and opportunities. For example, one of the findings of the present thesis is that according to educational level of the individual, educational level of the father may turn out to moderate individual’s ability to find employment in early adulthood (Guarin and Rousseaux,2017).¹

In addition, the timing of an event has a significant influence. Depending on its timing in the life path, a life event will not trigger the same consequences. For example, let us consider the impact of a leg fracture caused by a domestic accident.

It can be assumed that for young people this event will not have significant consequences: a few weeks of rest are often sufficient to recover. But for elderly people, recovering from such an accident is often harder. In the worst cases, this domestic accident can precipitate the transition of the individual to the loss of independence.

The past four decades have been characterized as a period of growing un- certainty (Beck,1992) when new social risks emerged as (1) family discontinuities and the labour market’s increased demand for flexibility and personal engagement, (2) the individualization of the society that places individuals under a high and continuous pressure to make the right choices for their own lives, (3) the diffusion of stress across life domains and between related individuals in a context of contin- gent work life courses, (4) the growing of inequalities related to social stratification such as the “working poor”, (5) and the emergence of new social risks that dispro- portionately affect specific sub-populations such as young adults or female-headed households (Spini et al., 2013). According to the authors, vulnerability in these

“risk” or “uncertain” societies is a growing concern for individuals, political leaders, and academics.

The Swiss National Centre of Competence in Research (NCCR)LIVES“Over- coming vulnerabilities: Life Course Perspectives” began in 2011. The NCCR LIVES aims to better understand the phenomenon of vulnerability using a longitudinal and comparative approach. It also focuses on the means to overcome

1This result has been shown on a population of second-generation immigrants in Switzerland.

Second-generation migrants refer to children of immigrants who were educated and socialized in their parents’ host country. Please refer to Guarin and Rousseaux (2017) for detailed information.

(22)

1.2. Research framework 3 vulnerability so as to contribute to the emergence of innovative social policy measures.

This Ph.D. thesis in information systems is conducted within the NCCR LIVES. The Centre of Competence LIVES is divided into several specific thematic teams, and I worked within the methodological individual project (IP) 14/214. The primary objective pursued through the methodological IP is to provide researchers in social sciences with effective tools for measuring vulnerability and exploring, vi- sualizing and analyzing life course data. This IP is titled “Measuring vulnerability”

and is led by GilbertRitschard.

Within this team, my thesis project is to investigate methodological strategies that could be set up to support researchers in social sciences in discovering interactive factors of vulnerability in life courses. Having a background both in information systems and quantitative methods, my strategy was to benefit from both approaches in this study. Also, I conceived this work from an interdisciplinary perspective with a deep anchor in human sciences. In this sense, the present work takes an enterprise architecture approach by integrating business strategic objectives and business processes in the modelisation of the information system. More specifically, I consider in this work the information system five-layer model of Long´ep´e (2009) and described in Table1.1.

1.2 Research framework

1.2.1 Research issue

The thesis focuses on the factors that are in connection with a particular vulnerability, and that I call here factors of vulnerability². By a literal interpretation, the term factor of vulnerability refers to an observable situation (factor) that (a) increases or decreases the likelihood of experiencing a particular vulnerability or (b) changes the level of vulnerability of an already vulnerable individual. For example, the factor “type of employment contract” is a potential factor of vulnerability to unemployment as one of its possible values, “fixed-term contract”, may increase the probability of experiencing unemployment.

As Spini et al. (2013) note, vulnerability pertains to the interaction of individual and contextual dimensions. For example, having a fixed-term contract in times of full employmentdoes not per se make individuals vulnerable to unemployment. Therefore, a better factor of vulnerability would result from the interaction between the individual resource “type of employment contract” and the contextual resource “unemployment rate of the area”.

In addition, as Adger (2006) notes, social processes are complex and with many linkages that are difficult to pin down. Therefore, I postulate that most of the relevant factors of vulnerability lie in the interaction of several variables.

But factors of vulnerability resulting from interaction effects are more difficult to identify for the researcher in social sciences as they are not directly available as

2I formalize the definition of a factor of vulnerability in Section3.1.1

(23)

Table 1.1 –Information system five-layer model of Long´ep´e (2009).

Layer Description

Strategic

The strategic layer defines the business objectives of the company. The business objectives concern both external objectives such as new services to be provided or performance objectives to be achieved, as well as internal objectives such as organiza- tional changes or reducing operating costs.

Process

The process layer defines the different activities required to achieve the business objectives defined in the strategic layer as well as the respective functions and skills needed to complete each activity. The activities are organized and orchestrated together by identifying both their sequence and sequencing con- ditions.

Functional

The functional layer defines the hierarchical structure of the different functions performing the activities defined in the process layer. For example, the responsibilities of the financial function include invoice and payment management, account- ing management, and budget planning. This general function is broken down into more specific sub-functions related to each of the activities to be executed.

Applicative

The applicative layer defines the operations that have been automated by software units. Software units include the applications and services used by the functions of the company as well as the libraries that allow applications to work. The dependency relationships between each software component as well as data structures and data flows are modeled.

Infrastructure

The infrastructure layer defines the set of hardware resources required for the proper execution of the software units. The hardware resources include data storage components, network components, and physical servers. The physical storage lo- cations of each element are referenced, and the connections between each component are modeled. Each software unit is linked to the hardware it requires.

a single variable in data – this is especially the case when factors of vulnerability result from high-level interactions, such as those involving more than three variables. In addition, interaction effects may be buried under main effects of some strong covariates. For those reasons, I refer to such factors asunderlying factors of vulnerability.

The thesis addresses the issue of the discovery by researchers in social sciences of the underlying factors associated with vulnerability.

As a starting point, it has to be asked where this discovery stage takes place in

(24)

1.2. Research framework 5 the quantitative research process followed by researchers in social sciences. From a general point a view, all scientific research is an iterative process of observation, rationalization, and validation (Bhattacherjee,2012). The observation phase consists in observing a natural or social phenomenon, event, or behavior that deserves consideration. The rationalization phase consists in logically connecting the different pieces of the puzzle that has been observed and integrating them into an existing theory with the purpose of building a new one that includes additional hypotheses.

Finally, the validation phase consists in testing the new theory by using a scientific method through a process of data analysis, and in doing so, possibly, validating the new theory. The two first steps of observation and rationalization call for both an inductive reasoning and a deductive reasoning. An inductive reasoning takes place when starting from some observations and then attempting to rationalize them. A deductive reasoning takes place when starting from an ex-ante rationalization or theory and attempting to integrate new hypotheses based on the observations. By conducting both stages in parallel or iteratively, the researcher ends with a conceptual model. To validate the model empirically, data are required. To acquire the data needed for this stage, either new data is collected or existing data previously collected are retrieved. Especially when the study focuses on a large sample of individuals such as a national or international population, like in demography and sociology, researchers opt for data collected by a professional or national survey in- stitution. In this case, a first stage for the researcher is to understand the data to be able to correctly prepare them (Wirth and Hipp,2000). Once data is prepared in the statistical software, one could directly go to confirmatory analyses. But in the context of looking for relevant factors of vulnerability, it is much more appropriate to go through an exploratory analysis. Here, the exploratory analysis must not be confused with descriptive analysis. While performing descriptive statistics is a passive stage limited to extract some indicators in order to have a first understanding of the data or to compare them with other data, exploratory analysis is an active stage involving a series of predictive analyses with a bottom-up strategy to extract the factors that impact the most the variables of interest (Kuonen,2015).

Although often under-used, the exploratory analysis is a very important stage: as Tukey (1980) notes, new ideas come more often from previous explorations than from lightning strokes. Thanks to the exploratory analysis, the researcher may end with a refined model that goes further than the initial hypotheses. Then, the confirmatory analysis is performed on this final model.

This quantitative research process is illustrated in Figure1.1. It is clear that the issue of discovering factors of vulnerability belongs to the earlier stages of this process. More precisely, I postulate that discovering appropriate factors in connection with a particular vulnerability, a researcher in social sciences has to focus on the steps A to C and E to G reported on the figure. Steps A to C refer to the observation stage and rationalization stage. Steps E to G refer to the data exploration stage. The thesis focuses on the data exploration stage by postulating that a more effective data exploration stage would facilitate the discovery of the underlying factors in connection with the vulnerability studied.

As shown by the steps E to G in Figure1.1, this stage involves understanding data, preparing data, and the use of exploratory analyses.

(25)

Figure 1.1 – An example of traditional quantitative research process based on Bhattacherjee (2012) and Wirth and Hipp (2000). Stages of concern in the context of discovering some factors of vulnerability are reported in red.

The core of the thesis focuses on the exploratory step (G/G’) in the context of the discovery of underlying factors of vulnerability in life courses. I address this issue through two research questions. As information systems are designed to support processes, my first research question focuses on the possibility of formalizing a model of the process of the vulnerability in life courses. To facilitate the design of information system strategies, the objective pursued is to provide researchers with an operationalizable model of the vulnerability in life courses. I introduce this first research question (Q1) in Section 1.2.2.1. Once such a model is set up, my second research question focuses on supporting researchers in social sciences in exploring variable interactions to identify potential factors of vulnerability. As I postulate that outcomes resulting from vulnerable situations may be infrequent in a general population, I focus in identifying interaction effects with infrequent outcomes. I introduce this second research question (Q2) in Section1.2.2.2.

But to address the issue globally, I also aim to investigate the data understanding step (E) and the data preparation step (F). The NCCR LIVES gathers together a high number of researchers in social sciences. Taking advantage of this opportunity, my strategy is to immerse myself in the role of a social scientist by collaborating on real social science studies and to observe what technical difficulties researchers experience when working on data, and the understanding and preparation of it. In this second work, my aim is not to assess but to identify and explore some strategies that could be studied in future works. Therefore, they will not be discussed in the assessment section of the thesis (Section6). Instead, I made some of these strategies available to the scientific community by adding them within

(26)

1.2. Research framework 7 the software developed for the assessment of my conceptual model (Section 5). I introduce these immersions in Section1.2.3.

1.2.2 Research questions

1.2.2.1 Expliciting a model of vulnerability in life courses

The life course perspective, also known as the life course approach, is a theoretical model that looks at how chronological age, relationships, common life transitions, and social change shape people’s lives from birth to death (Hutchison,2010). The life course approach especially focuses on individual’s life history to explain how early events influence future decisions and events, such as marriage and divorce (White and Klein, 2008, p. 122). In its simplest form, the life course of an individual can be defined as “a sequence of socially defined events and roles that the individual enacts over time” (Giele and Elder, 1998). In this context, examining the life course is about analyzing changes (Hendricks, 2012). In addition, a key point of the life course approach is also to focus on the connection between individuals and the historical and socioeconomic context in which these individuals lived (Elder et al.,2003). Indeed, although both time and individual characteristics are two significant dimensions of human behavior, the environment in which the person lives also plays a part (Hutchison,2010).

It is wished to every individual to live the happiest life possible. However, life is not a long quiet river, and most individuals encounter difficulties in their life course. These difficulties often occur when experiencing changes and transitions that are a source of vulnerability (Fisher and Hood, 1987). These difficulties can emerge from an unfortunate combination of circumstances, related, for example, to an undesired interaction between historical time and one’s personal time, or/and insufficient support within one’s micro-social environment. Through social policies, societies aim to provide resources for both protecting individuals against the onset of negative events or transitions and helping individuals to recover when fallen into a negative or undesired situation. To set up effective social policies, it is necessary to figure out which individuals are at risk and what kind of help these people would need to be able to stay on a happy trajectory.

To understand how to help individuals experiencing difficulties, a first approach is to analyze the negative state in itself. For example, regarding poverty, a strategy could be to compare between poor and non-poor people what elements differentiate them. But to go further, let us note that some individuals may be in a situation very at risk of falling into poverty without the poverty being observable by now. For example, a family with four children and a mortgage whose two parents were working in the same company and are now both unemployed because of the company wound up the past year, is at risk of financial hardship. They may be able to survive if they find a job quickly enough, but if the situation persists, it is likely that the financial situation will deteriorate. In such a configuration the family may need assistance to overcome this difficult situation. This means that studying the observable state of poverty is not sufficient, studying the risk of falling in this state has also to be addressed.

(27)

In addition, when facing adversity, some individuals are able to find solutions for getting out the undesired situation they felt in, while some others are stuck in this situation and some others fall further. This variability can be related to different attitudes and behaviors that individuals adopt and one’s individual resources that will be effectively monopolized. In our previous example, if the parents have both a strong professional social network, they may be able to find employment again relatively quickly. The capacity of resilience to a negative situation should also be studied to understand who will be the most vulnerable so as social policies target them first.

Such a global consideration suggests that experiencing a negative life course event or transition should be seen as a dynamic process. Moser (1998) distinguishes between poverty and vulnerability to poverty as the former being a static concept while the latter should be able to capture change processes. To capture and formalize this dynamic process, the theoretical framework of vulnerability has, in recent years, increasingly been used in various fields of social science research.

However, although there are some basic concepts that are common to most conceptualizations of vulnerability, there is currently no consensus on a formal definition of the concept of vulnerability in life courses (Schr¨oder-Butterfill and Mari- anti, 2006; Spini et al., 2013). In particular, the life course perspective and the framework of vulnerability have each their own set of concepts. On the one hand, the life course perspective refers to concepts such as roles, life events, transitions, trajectories and life courses. On the other hand, the framework of vulnerability refers to concepts such as stressors, outcome, resources, exposure, sensitivity, resilience. Our first research question is to study the possibility to integrate both terminologies together and put forward a conceptual model of the process of the vulnerability in life courses. Adopting an operational perspective, such a model is intended to clarify the practical considerations in connection with studying vulnerability and to support researchers interested in proposing strategies to address the vulnerability in life courses. However, integrating all the concepts and processes that the vulnerability in the life course implies and modeling it into a single system is an impossible exercise to be carried out holistically. Some simplifying assumptions have to be made to limit the scope of the study. In the present work, I limit the scope of the study to the following framework:

1. Vulnerability is considered to be defined in regards to an undesired life state (subsequently called outcome) instead of considering simultaneously the set of all possible undesired life states that could be associated with the study of human vulnerability in a broad sense. Therefore, the proposed model does not provide a holistic model of vulnerability in the life course. There are three reasons that lead me to consider vulnerability to a specific outcome: (a) I consider that providing a holistic model of vulnerability in life course is a very complex task that is beyond the scope of a thesis in information systems, (b) the perspective I adopt in regards to human vulnerability is not much to acquire a theoretical understanding but to acquire a practical understanding that enables social actors to act. As social actors are usually organized to target each a specific outcome (poverty, depression, etc.), it makes sense to adopt such a perspective in the present work, and (c) I consider that one can

(28)

1.2. Research framework 9 be vulnerable to some outcome without being vulnerable to another one. For example, one may be vulnerable to depression while not being vulnerable to poverty. Of course, both can be linked. Especially depression can lead to losing one’s job. But my way of thinking this connection is to consider that depression (or, to formulate it as a resource, the state of psychological health) is a factor of vulnerability to poverty.

2. Vulnerability is studied from an operational perspective. In particular, I aim to put forward a model relatively simple to instantiate for studying a particular vulnerability, even if such an objective may require to make some technical simplifications. As a result, the proposed model may be limited in its capacity to consider the various sociological models that address the study of human vulnerability.

3. As the present work targets the vulnerability in the life course, the outcome is assumed to belong to one of the possible states individuals can experience in their life course.

4. The present work considers the environmental dimensions of the life course, including temporal dimensions. Therefore, the present work adopts a linear perspective of vulnerability as proposed by Spini et al. (2013, 2017). Such a perspective contrasts with circular models of vulnerability, like, for example, those used to model stress processes (Almeida,2005).

5. Addressing the identification of risk factors underlying to risk factors identified by conventional analyzes, the present work puts an emphasize in the identification of interaction effects between two or more explanatory variables (manifest or latent). This research question is detailed in Section1.2.2.2.

6. Vulnerability is considered from the perspective of risk. However, the risk is here considered from a three-dimensional perspective³. To make a clear distinction with a one-dimensional definition of risk, I use the term factor of vulnerability instead. This term will also allow me to incorporate the notion of interaction effects.

7. Vulnerability outcomes, such as poverty, exclusion, mental disorder, are expected to be less frequent than other life situation. This assumption is detailed in Section1.2.2.2.

To address this research question, I conducted a literature review on both the life course perspective and the framework of vulnerability. This literature review is presented in Section2.1.

1.2.2.2 Identifying interaction effects with infrequent outcomes

In data analysis or statistics, when analyzing the effect of some explanatory factors, also called predictors or independent variables, on a variable of interest, also called dependent variable, response, or predicted variable, the term interaction refers to a situation in which the impact on the dependent variable of one independent variable is different depending on the values of another independent variable. Another possible formulation is that the effect of one independent variable on the dependent variable is not the same at all levels of the other independent variable.

3I formalize these dimensions in Section3.1.1

(29)

At the beginning of this chapter, I implicitly introduced an interaction effect when discussing the example of the leg fracture caused by a domestic accident.

For young people, it can be assumed that such an event will not have significant consequences for the individual: a few weeks of rest are often sufficient to recover.

But for elderly people, recovering from such an accident is often harder. In the worst cases, this domestic accident can precipitate the transition of the individual to the loss of independence. This means that there exists an interaction between the variable “experiencing a leg fracture” and the age.

From a more general point of view, there are two reasons for which I expect the study of vulnerability in life courses to involve interaction effects.

Firstly, in the life course perspective, individuals are studied in taking into account various aspects of their lives including biological, psychological and social characteristics. In addition, individuals are not studies in isolation but as entities delved into several contextual environments including a micro social environment, and macro social environment, a geographical environment and several dimensions of the time including the historical time, the individual time and the social time.

Such a holistic approach involve the use of a higher number variables in the analysis.

By design, increasing the number of variables increases the number of possible interactions effects.

Secondly, as the title of the NCCR LIVES project “Overcoming vulnerability:

life course perspective” suggests, the overall objective of studying the vulnerability in life courses is not only to understand vulnerability better but also to find solutions to avoid vulnerability or to reduce vulnerability. On a data analysis level, a possible strategy to do this is to look for interaction effects between a previously identified factor of vulnerability and another covariate that allow to disable or reduce the outcomes related to the factor of vulnerability.

In addition, it is relevant to note that outcomes resulting from vulnerable situations are possibly infrequent, and sometimes rare, in a general population. Indeed, modern societies are organized around a number of laws and public institutions that aims to protect individuals against a number of life hazards. For instance, labor laws provide individuals with support to prevent falling unexpectedly in unemployment. National and regional employment offices provide unemployed individuals with financial support and accompany them in the steps of finding a new professional position. This organization of the society minimizes, to some extent, the risk of falling in undesired life situations. As a hopeful consequence, outcomes resulting from vulnerability states are often experienced by a small proportion of the population. For instance, the unemployment rate of economically active youth living in Switzerland and aged 20 to 24 has been shown to be, according to the country of origin, between 3% and 11% in 2000 (Fibbi et al.,2006). The share of the Swiss resident population below the absolute poverty threshold in term of disposable income (i.e. the gross household income subtracted by compulsory expenditure such as social insurance contributions, taxes, basic health insurance premiums, alimony and other maintenance payments) in private households has been estimated by the Federal Statistical Office to be about 6.6% in 2014 (Swiss Federal Statistical Office, 2016). In the United States, the probability of developing invasive cancer for free of cancer citizens aged 40 to 59 has been estimated by the American Cancer Soci-

(30)

1.2. Research framework 11 ety to be about 9% in 2012 (Siegel et al., 2012). These examples show that when studying vulnerability on a general population of an industrialized country, we have to expect an underrepresentation of the outcomes experienced by the population.

On an analytical level, this underrepresentation entails an imbalanced distribution of the dependent variable⁴. The imbalance among classes of the dependent variable often leads to a poor prediction rate of the minority class. This issue is well-known in the literature as theimbalanced data issue. However, the minority class appears to be in most cases the class of interest (the class of vulnerable individuals) and for this reason, a high recall rate is desired on this class.

However, it has to be noted that a vulnerability outcome is not always associated with an underrepresentation in data. In particular, the individualisation of the society that has been taking place in the past decades leads individuals to have to assume more often personally challenges and failures. In addition, difficulties faced by individuals in their lives have “democratized”. In particular, rather than being differentially distributed between social classes, the emergence of stressors and observable outcomes vary according to periods of life (Leisering and Leibfried, 2001; Spini et al.,2013,2017; Oris,2017).

In addition, according to the nature of the phenomenon studied, socio-economic parameters and the targeted population, the magnitude of the imbalance could be strongly reduced in particular situations. For example, in the past decades, Spanish unemployment rose dramatically. Observed at a reasonable level of 5 percent during the first half of the 1970s, Spanish unemployment successively increased to reach 24 percent during the 1990s (Dolado and Jimeno, 1997). In extreme situations, such as significant outbreaks, the rate of affected people by a particular outcome can rise even higher. For example, during the Medieval Black Death, vulnerability to contracting the plague was unfortunately almost balanced in the population as it is reported that up to 50-60 percent of the European population was killed by the disease between 1347–1351 (Benedictow (2004, p. 383) and DeWitte (2014)).

Nowadays, comparable vulnerability rates about, for example, poverty or insecu- rity occur in specific areas such as in impoverished cities as Detroit, Michigan, (C. A. Wilson, 1992) or refugee camps as Sangatte, France, (Schwenken, 2014).

In this context, I will pay attention that the contribution I will put forward for exploring variable interactions with infrequent outcomes will be able to work in both balanced and imbalanced contexts. That is, in other words, being balance insensitive.

Common methods used to identify interaction effects in classification include, but not exclusively, classification trees methods (Kass,1980; Breiman et al.,1984), regression methods (McCullagh and John A Nelder,1989; John Ashworth Nelder and Baker, 2004), bayesian networks (Pearl,1986) and association rules (Agrawal et al., 1993). In the present work, I focus on classification tree methods. I make this choice on the basis of two criteria: (1) the ability to identify interactions among a large number of variables and (2) the ability to represent the interactions identified in such a way as to allow a quick diagnosis by practitioners regarding

4Such an imbalance also affects the observation stage: as there are fewer occasions to observe such vulnerable outcomes, researchers have less empirical examples and pieces of evidence allowing to figure out what successions of events or transition led to the outcome.

(31)

their relevance in connection with their theoretical model. Regression models are probably the most used tool in the social sciences. This would be the preferred tool for a thesis aimed at providing methodological tools for social scientists. Regres- sion methods allow interaction effects to be highlighted provided that continuous variables are centered to avoid multicollinearity issues (Hayes, 2017). However, in a regression model, interaction effects consume a large number of degrees of freedom. Therefore, to ensure model convergence, it is necessary to limit both the number interaction effects of testing as well as the number of orders of these interaction effects. Therefore, instead of using hypothesis-testing based model, I recommend to explore the space of predictors with a statistics-free method. Also, as part of this thesis, we are interested in situations of vulnerability, and therefore the expected target variables correspond to potentially less frequent life situations (poverty, stress, exclusion, etc.). A small number of observations on the class of interest increases the model convergence issue and therefore further limits the number of interactions that can be tested simultaneously. This effect will be even more pronounced for predictor variables with a large number of modalities. When observations are broken down into a lot of classes, it is more difficult to obtain significant results than when classes are correctly grouped together. Classification trees provide a solution to this point: by performing recursive and step-by-step partitioning of the population, a large number of possible groupings are tested successively. Moreover, the fact that the splits of a same level of the tree are built independently of each other, raises the emergence of interaction effects. Bayesian networks are an effective tool for identifying conditional dependencies between the set of descriptive variables and thereby can identify interaction effects. In particular, bayesian networks can simultaneously consider expert knowledge by acting a priori on the structure of the graph as well as the empirical evidence contained in data (Heckerman et al., 1995). On their side, association rules allow identifying associations among the frequent sets of co-occurrences of modalities according to different measures of interest. By comparing the rules to each other, and especially when the search involves both positive and negative rules (Wu et al., 2004), it is possible to identify interaction effects. However, to quickly identify an interaction effect, several types of information have to be made available to the practitioner, including how the class distribution of the target variable changes according to the values taken by the modalities of the predictor variables. The presentation of the results must also be done so as to not overwhelm the practitioner with too much information. However, Bayesian networks and association rules produce outputs that are often difficult to interpret (Bayat et al.,2009; S. Kotsiantis and Kanellopoulos, 2006). Regression models are also more complex to read in a multinomial context.

In contrast, decision trees can render both the splits and the distribution of the dependent variable within each node making easy for practitioners to assess changes at each level of the tree. Such an intuitive graphical representation of the results allows practitioners to easily identify interactions, even of multiple orders. Another concern when working with life course data is the ability to handle temporality.

Decision trees are able to handle temporal data. Considering longitudinal data organized in successive waves, there are two ways commonly used for representing data in a tabular way: the wide format and the long format. In the wide format, each row refers to a unique individual and the same variable measured at different

(32)

1.2. Research framework 13 times is stored as separated variables. In the long format, each row refers to a unique individual and time measurement while each variable is stored in a single variable. Considering biographical data, coming for instance from a retrospective survey or life calendar, a long data format is often adopted. With data stored in a wide format, the classification tree treats each variable independently to the others.

Temporality links that exist between variables representing successive measures of the same item are not taken into account when growing the model. However, the tree is able to extract from the whole set of variables the time points that maximize classification quality. But, to keep a reasonable size, the number of variables that are in play has to be limited. As a result, only the most significant association emerge, and a number of other relevant associations may be kept hidden. When using a classification tree on data stored in a long format, the stress is placed on the variables themselves and the temporal information is used to assess whether temporality moderate the effects of a variable. Therefore, classification trees are able to handle longitudinal data but not to take into account all information about temporality.

Therefore, to address the second research question, I conducted a literature review on classification tree learning in the context of infrequent outcomes. This literature review is reported in Section2.2.

1.2.3 Immersions as a social scientist

The research questions introduced in 1.2.2.1 and 1.2.2.2 focus on the exploratory analysis step (G/G’) of the data exploration stage (steps E to G/G’) introduced in Figure 1.1. To address the data exploration stage globally, I also investigate what improvement can be put forward regarding the data understanding (E) and data preparation steps (F).

As the thesis focus on the discovery of factors of vulnerability in life courses, a particular point of interest concerns the use of life course data. Life course data are expected to be more complex than cross-sectional data traditionally used in social sciences. As the life course perspective involves studying trajectories, life course data are expected to contain measures repeated over time. This feature makes database larger and, as a result, more difficult to handle. Additionally, although repeated measures inherently share several characteristics, they may also differ in some other characteristics. Indeed, the survey design is likely to change over time.

For example, the phrasing of some questions may change to make them compatible with the national survey of another country. The rating scale of a variable is also likely to change from, say, a 7-item scale to a 5-item scale for a similar reason.

Such changes make the variables not directly comparable. As a result, additional preprocessing operations may be required to make data ready for analysis. The life course perspective also involves studying the microsocial environment of individuals. The microsocial environment is made of a lot of linkages that involve the use of egocentric network data to be analyzed. The macrosocial environment plays also an important role in the life course perspective. Taking the macrosocial environment into account in the analysis involve the use of administrative socio-economics data.

(33)

Therefore, I expect that both the increases in volume and the use of differ- ently structured data complicate the task of both understanding and preparing data. Taking both an information system and data analysis point of view, my proposition is to investigate what technical difficulties a researcher in social sciences experiences due to statistical software limitations. To get this understanding, a quantitative or qualitative approach can be used. A possible quantitative approach is for example to administrate a survey. A possible qualitative approach is for example to participate to practitioner’s activities and make observations. I choosed this latter option as it allows to start exploring needs with no assumption and to successively orientate observation choices based on the results of the previous observation stages. In addition, one of the innovative strategies of the NCCR LIVES is to encourage interdisciplinarity. Being the only one IT researcher within a high number of researchers in social sciences, my strategy was therefore to immerse myself in the role of a social scientist during the first two years of this PhD thesis to get a better understanding of the issues practitioners face in their daily work.

To this purpose, I started three collaborations with researchers in social sciences in three different domains: health sociology, labour sociology, and family sociology.

These three collaborations have in common their study a either a situation or a group of people seen as vulnerable and they follow a life course perspective.

Regarding to data understanding, the observations led me to focus on data documentation access within statistical software and on the use of sampling weights to better pay attention to data representativeness. Regarding to data preparation, the observations led me to focus on tools for panel data and network data. These immersions and the associated findings are introduced in Chapter4.

1.3 Contributions

Identifying the factors that lead to experience vulnerability is achieved by the researcher in confronting knowledge coming from the literature with empirical evidence raised by means of software and data analysis methods. The software helps the researcher to handle data and data analysis methods help the researcher to extract relevant information from data. However, the very understanding of what happens to the population studied and the classification to validate the potential causal links belongs to researchers. Most confirmatory analyses rely on regression techniques that can describe relationships but do not provide certainty on the underlying causal mechanism. Therefore, drawing conclusions about the validity of the tested hypotheses belongs to the researcher. Bearing that in mind, event if the methodological contributions introduced in this research work aim to identify underlying factors of vulnerability, they actually only support the researcher in this identification. The responsibility of validating what are the underlying factors in connection with a particular vulnerability still belongs to researchers.