• Aucun résultat trouvé

Computations of French Lifetables by Département

N/A
N/A
Protected

Academic year: 2021

Partager "Computations of French Lifetables by Département"

Copied!
43
0
0

Texte intégral

(1)

HAL Id: halshs-01955515

https://halshs.archives-ouvertes.fr/halshs-01955515

Preprint submitted on 14 Dec 2018

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Computations of French Lifetables by Département

Florian Bonnet

To cite this version:

(2)

WORKING PAPER N° 2018 – 57

Computations of French Lifetables by Département

Florian Bonnet

JEL Codes: Keywords:

P

ARIS

-

JOURDAN

S

CIENCES

E

CONOMIQUES

48, BD JOURDAN – E.N.S. – 75014 PARIS

TÉL. : 33(0) 1 80 52 16 00=

www.pse.ens.fr

CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE – ECOLE DES HAUTES ETUDES EN SCIENCES SOCIALES

ÉCOLE DES PONTS PARISTECH – ECOLE NORMALE SUPÉRIEURE

(3)

Computations of French Lifetables by Département,

1901–2014

*

Florian Bonnet

November 26, 2018

Abstract

Debates concerning the territorial divide in France are deep. To bring a contribution to this issue, I compute the departmental lifetables since 1901, for both men and women. In this paper, I present the raw data collected to do so, namely yearly births and deaths by age as well as population by age at each census carried out during the 20th century. I add statistics according to military mortality and mortality in deportation to cover the periods of the Two World Wars. I also present the methods I use to compute these lifetables, which come mainly from the Human Mortality Database protocol. I revise this protocol to take into account the specificities of French departmental data, mainly the few changes in French departmental boundaries, the underestimation of infant mortality and the lack of raw data homogeneity. This new database complements a still limited supply of long-term mortality statistics computed at local level.

*This study received financial support from the ERC Grant "Demographic Uncertainty" led by Hippolyte d’Albis. The author

would especially like to thank Magali Barbieri, Carl Boe and Hippolyte d’Albis for their many advices.

(4)

Part I

Article

1

Introduction

Life expectancy has risen sharply in France since the beginning of the 20th century. The lifetables calculated by Vallin and Meslé (2001) for the 19th and 20th centuries show that men life expectancy at birth was 33 in 1806, 44.5 in 1898, 60 in 1946 and 74.5 in 1997. This increase at the national level does not say anything about increases at the local level. As such, significant differences exist between the French départements. Barbieri (2013) worked on departmental mortality and showed that the life expectancy at birth of men for the period 2006–2008 was 74.4 years in Nord, compared to 79.7 years in Hauts-de-Seine, a difference of more than 5 years. This situation explains why the debate on the territorial divide according to health is important in France. Indeed, departmental differences can not be explained from a public policy point of view: the State has to reduce these inequalities. In order to inform public decision-makers in their choices, it is important to know the history of these departmental differences.

Consequently, I compute in this paper the yearly departmental lifetables by sex for all French metropoli-tain départements between 1901 and 2014. The computation of these lifetables is based on the exhaustive collection of population flows (deaths by age and sex, births by sex) and population stocks at each census (population by age and sex). I exploit a French unique characteristic: since 1789, this country is divided into around 100 geographical units of similar size, namely départements. This division has changed very little during two centuries, and the statistical centralizations have been carried out at this geographical level. Moreover, in order to take into account the two World Wars that affected France between 1914–1918 and 1939–1945, I have collected in two original sources the military deaths by age during the two wars as well as the deaths in deportation by age and sex during the Second World War. With these lifetables, I get life expectancies and mortality rates at each age for more than 100 years. In addition, I get populations by age and sex at each January 1st.

These lifetables at the subnational level complete a still incomplete literature. Bonneuil (1997) worked on departmental mortality in the 19th century: he computed women lifetables by five-year period and for five-year age groups. He followed Van de Walle (1974) who computed similar lifetables with a different methodology. These two authors have not studied in the same way men’s mortality, because of strong fluctuations due to the wars which afflicted France at this time. From 1954 to 1999, Daguet (2006) grouped lifetables established at the departmental level, but only for the census years. Barbieri (2013) used in her study departmental lifetables calculated by INSEE for the period 1975–2008 . However, these data were provided exceptionally. Vallin and Meslé (2005) used departmental life expectancies for the period 1906–1954. However, both reconstruction methods and data have never been published. Lastly, various mortality indicators are available in official publications, namely Statistique Annuelle du Mouvement de la Population.1 However, these indicators are relatively scarce: they relate only to infant mortality rates, or standardized mortality rates.

In addition, the lifetables I compute are based on a unified methodological protocol for the whole period 1901–2014, which is not the case of the papers previously cited. This methodological protocol is available

(5)

in Wilwoth et al (2007). Many researchers are using this protocol to compute national lifetables for a large number of countries. It is also used to compute lifetables at the local level in two OECD countries. The results according to Canadian provinces for the period 1921–2011 are available in the Canadian Human Mortality Database2, and those according to the Japanese provinces since 1975 are available in the Japan Mortality Database.3 This paper therefore complements a still limited supply of local mortality data freely available by adopting an internationally recognized protocol; this allows international comparisons without methodological bias.

The rest of this paper is organized as follows. In Section 2 I present the statistical sources used to compute departmental lifetables. The methods used are explained in Section 3 in which I distinguish the methods coming from the HMD protocol and the methods specific to this study. In Section 4 I illustrate some of the results available in this new database. Part 2 of this paper is the methodological appendix.

2

Sources

Computations of departmental lifetables requires two types of data: population movement (deaths and births domiciled), and population censuses. The deaths collected do not only concern civilian deaths: both military deaths during the two World Wars and deportation deaths between 1939 and 1945 have been included.

2.1

Deaths

Civilian deaths of each département, each sex and each year over the period 1901–2014 have been retrieved from the population movement statistics published by Statistique Générale de la France (SGF) and then by Institut National de la Statistique et des Etudes Economiques (INSEE). I have retrieved deaths by age group recorded in home département. Tables7and8in Appendix (Part 2, Section6.9) provide sources in which raw statistics have been found. In addition, I have collected in Vallin and Meslé (2001) single-age and sex-specific civilian deaths at the national level for the same period.

I have retrieved deaths during the two World Wars from Defense Ministry’s website.4 They are available by year of birth at the departmental level, and by year of birth and year of death at the national level.

Individuals who died during deportation in the Second World War are not included in the civilian pop-ulation movement. However, they were nearly 100,000. I have decided to include them in my statistics, using data from Memorialgenweb Website.5 This database records deportees who left France and died in deportation published in the Journal Officiel, by département of birth if they were born in France, and by country of birth otherwise. Table1presents figures of the foreign-born deportees by country of birth. One can see that the Poles were the most numerous. Although this database is not exhaustive, the large number of observations provides a sample close to the total of deaths in deportation.

2Computed by researchers in “Université de Montréal”,www.demo.umontreal.ca/chmd/.

3Computed by researchers at the National Institute of Population and Social Security Research,http://www.ipss.go.jp/

p-toukei/JMD/index-en.asp.

4http://www.memoiredeshommes.sga.defense.gouv.fr/

(6)

Table 1: SUMMARY OF FOREIGN-BORN DEPORTEES BY NATIONALITY Country Deportees In % of foreign-born deportees

Pologne 13,599 40.46% Spain 5,075 15.10% Russia 2,741 8.16% Germany 2,425 7.21% Romania 1,861 5.54% Turkey 1,511 4.50% Algeria 1,050 3.12% Greece 939 2.79% Italia 535 1.59% Ukraine 534 1.59%

Notes: Deportees by country of birth in the Memorialgenweb’s database.

2.2

Births

I have retrieved births by year, sex and mother’s home département for the period 1901–2014. I have also recovered stillbirths by mother’s home département and year (both males and females). Finally, I have retrieved births by year, sex and mother’s home département for the period 1853-1900.6

2.3

Censuses

Finally, I have collected populations by birth year, home département and sex for each census of the period 1901–1962 from hard-copy publications of SGF and INSEE. For the period 1968–2014, these statistics have been found in on-line sources. These data are not available for each year because censuses were held at varying intervals. Between 1901 and 2014, censuses were made in 1901, 1906, 1911, 1921, 1926, 1931, 1936, 1946, 1954, 1962, 1968, 1975, 1982, 1990, 1999, 2008, 2013 and 2014.7

3

Methods

The protocol I use to compute departmental lifetables is largely inspired by the one of the Human Mortality Database (HMD). This database gathers all national lifetables computed using these methods. However, since my database is specific both for the small numbers in each département and the time period chosen (including the two World Wars), I have added specific methods.

3.1

HMD Protocol Methods

3.1.1 Raw Data Adjustments

Raw data adjustments according to deaths are the main issue since they are aggregated into five-year age groups until 1967 and by single age between 1968 and 2014. To get a 1 × 1 format (single age, year of death) for the deaths between 1901 and 1967, I distribute deaths at unknown age among age groups, and

6Tables9,10and11in Appendix (Part 2, Section6.9) give sources in which raw statistics have been found. 7Table12in Appendix (Part 2, Section6.9) gives sources in which raw statistics have been found.

(7)

adjust the curve of cumulative deaths by cubic splines. Cubic Spline is a semi-parametric estimation method which joins the points of a cumulative distribution by third degree polynomials. Let Y (x) = ∑x−1u=0Dube the

cumulative number of deaths up to age x. Y (x) is known for a limited collection of ages including 1, 5, 10... etc from the raw data. I know Y (x) for both the highest age in the distribution (80, 90 or 100) and the age above which no further deaths are observed, set at 105. Equation (1) fits a cubic spline by using these values (the indicator function I(.) equals one if the logical statement within parentheses is true and zero otherwise):

Y(x) = α0+ α1x+ α2x2+ α3x3+ β1(x − k1)I(x > k1) + ... + βn(x − kn)I(x > kn). (1)

I have to estimate the vector (α0; α1; α2; α3; β1; ... ; βn) which contains n + 4 coefficients, but I only

know n + 2 values of Y (x), and therefore n + 2 constraints. Two further constraints must be introduced to identify the model. First I assume that there is no death at the upper bound, namely 105. Second I assume that deaths observed between 1 and 5-year-old occured between 1 and 2 year-old. ˆY(x) are calculated for all ages, for each département, sex and year. Deaths at age x are found as follows:

ˆ

D(x) = ˆY(x + 1) − ˆY(x).

Negative death counts may occur when the deaths in five-year age groups are extremely low.8 The method is to set zero-deaths in age groups where negative counts occur. To balance this, deaths in the adjacent age groups are reduced pro-rata their number of deaths. If Dnegis the sum of negative death counts

for an observation, D∗s the deaths at age s after allocation of negative death counts, Dsthe estimated deaths

at age s before allocation of negative death counts, x1 and x2the lower and higher limits of the interval in

which the negative death counts are observed, then:          D∗s = 0 for s ∈ [x1, x2] , D∗s = Dneg× Ds i∈Ω1Di for s ∈ [x1− 5, x1] ∪ [x2, x2+ 5] , D∗s = Ds otherwise. (2)

Deaths estimated by cubic spline are too imprecise to be used at advanced ages: open-age interval of deaths is too low (see Tables 7 and 8, Column 5). These deaths are adjusted by means of the Kannisto model, which assumes a survival curve of logistic form, with a zero-asymptote for very old ages. I use this method for deaths beyond the open-age interval – different according to the periods, I keep a maximum of 95 so that estimates are not hindered by too small figures – and rely on the deaths observed for ages 10 years below this limit. Thus, if the open-age interval begins at age 90, I use the ages 80–89. Formally, I compute a fictitious survival curve S(80 + x):

S(80 + x) =∑

105 u=80+xDu

∑105u=80Du

for x = 0, 1, 2, ..., 9. (3) This survival function conditional on reaching age 80 may be seen as tracking a “synthetic extinct cohort”, since it is based on annual deaths and not on deaths in the cohort itself. Assuming that this fictitious

8This usually happens at around age 30. I count only seven of these occurrences, but they need to be adjusted so as not to

(8)

cohort displays survival probabilities that can be fitted by the Kannisto model, the survival function s(x) is: s(x) =  1 + a 1 + aeb(x−80) 1/b . (4)

with estimated values for a and b, I compute ˆs(x) et d(x) = ˆs(x) − ˆs(x + 1). Finally, I obtain deaths at each age: D(x) = 105

u=90 Du× d(x) ˆ s(90). (5)

I finally proceed to a uniform adjustment so that the sum of the departmental deaths for each age, year, and sex corresponds to the national data.

For censuses, raw data are generally available for groups of five-year of births. I use the Cubic Splines method in the same way to estimate populations according to their year of birth.

3.1.2 Splitting Deaths into Lexis Triangles

Figure1presents deaths by year and age. They may be split into two triangles for a single year, known as Lexis triangles. For individuals who died between ages 1 and 2 in 1903, one may distinguish two kinds of deaths. The first who died between ages 1 and 2 in 1903, born in 1901 (“a” on Figure 1, upper triangle). The others who died between ages 1 and 2 in 1903, born in 1902 (“b” on Figure1, lower triangle).

Figure 1: AN EXEMPLE OF LEXIS DIAGRAM

Overall, if the probability of death is equiprobable over time, one could think that the distribution of annual deaths by age for half in the lower triangle and the other half in the upper triangle would be sufficiant. This is not, for two main reasons. The first is that infant mortality, when high, is observed largely in the first days after birth, and must therefore be integrated into the lower triangle. The second concerns the relative size of cohorts, which also influences the distribution between triangles. When the flow of births varies greatly from one year to the next (e.g. during the two World Wars), the half-death distribution in the lower triangle is strongly biased. The HMD protocol sets a sex-specific equation allowing the distribution

(9)

of deaths in Lexis triangles. This equation takes into account the relative size of two successive cohorts, age, some historical events (e.g Spanish influenzia), and the infant mortality rate. If we call x the age and t the year, these sex-specific equations are as follows (Equation (6) for women, Equation (7) for men):

ˆ

πd(x,t) =0.4710 + ˆαF+ 0.7372 [πb(x,t) − 0.5]

+ 0.1025 It=1918− 0.0237 It=1919

− 0.0112 logIMR(t) − 0.0688 logIMR(t) Ix=0+ 0.0268 logIMR(t) Ix=1

+ 0.1526 [logIMR(t) − log(0.01)] Ix=0IIMR(t)<0.01

; (6)

ˆ

πd(x,t) =0.4836 + ˆαH+ 0.6992 [πb(x,t) − 0.5]

+ 0.0728 It=1918− 0.0352 It=1919

− 0.0088 logIMR(t) − 0.0745 logIMR(t) Ix=0+ 0.0259 logIMR(t) Ix=1

+ 0.1673 [logIMR(t) − log(0.01)] Ix=0IIMR(t)<0.01

. (7)

ˆ

πd(x,t) is defined as the proportion of death of a given year and age allocated in the lower triangle. αF

and αH are age-specific values coming from the HMD protocol.

πb(x,t) is defined as the ratio of births between two successive cohorts and calculated only once for both

sexes:

πb(x,t) =

B(t − x)

B(t − x) + B(t − x − 1). (8) Long historical series are required to calculate this ratio for all the cohorts tracked between 1901 and 2014. One can take individuals aged 80 in 1901 as an example. To calculate this ratio one needs birth in 1820 and 1821. I was unable to do so: my birth records only go back to 1853. For earlier years I assume that births before 1853 were equal to births in 1853 and use a birth ratio of 0.5.

IMR(t), the same for both sexes, is calculated as follows:

IMR(t) = 1 D(0,t)

3B(t − 1) + 2 3B(t)

. (9)

If births are not available for one of the two years, IMR(t) is calculated as follows9:

IMR(t) =D(0,t)

B(t∗) , (10)

with t∗the year for which births are available.10

3.1.3 Computations of Populations by Age at 1st January of each Year

To calculate the mortality rates required for lifetables, I need populations by age at 1stJanuary for each year from 1901 to 2014. I get populations by age in 2014 from official statistics so I may calculate populations

9When IMR(t) is equal to zero because of no infant deaths, I assume a 0, 00000001 IMR value so that logIMR(t) can be

calculated.

10I obtain proportions of deaths in the lower triangle greater than 1 for 28 female orbservations and 30 for male observations,

all in 1918 or 1919 and for deaths under age 1. This is due to the Spanish influenza epidemic, the high infant mortality rate and the size differences between the cohorts born in 1918 and 1919. To tackle this issue, the death proportions in the lower triangle are set at 1, leading to zero death in the upper triangle for these observations.

(10)

by age for the 1901–2013 period.11 Figure2 reveals the four methods used for various periods and ages. Section6.1of the Appendix precisely specifies each method used.

Figure 2: METHODS FOR COMPUTATIONS OF POPULATION AT1st JANUARY

Notes: Methods used to compute populations by age at each 1stJanuary. For more details, see Section6.1of Appendix.

The “Intercensal Survival” method is used to estimate the population under age 80 from 1902 to 2013. Starting from one census (say, 1901) the population by age at the following census (1906) is estimated by subtracting from the population by age in 1901 the deaths that occurred from 1901 to 1906. The difference between estimated and recorded populations in 1906, due to measurement errors and migrations, is then attributed to the intercensal population figures.

Second, the “Precensal Survival” method is used to estimate the population under age 80 in 1901. Since “Intercensal Survival” estimates the population under age 80 on 1stJanuary of each intercensal year, I cannot compute population by age on 1stJanuary the year of the first census. To do so, I use the population on the day of the 1901 census and add the deaths occurring between 1st January 1901 and the census day. Since there is no second census available as with “Intercensal Survival ” method, I cannot correct for migrations and errors: I assumed that in so short a period these are minimal.

With the “Extinct Cohorts” method I can estimate the population aged 80 and over born in the cohorts that died between 1901 and 2013. I assume that migrations after age 80 are small; I compute the population of a cohort still alive by summing its future observed deaths.

Finally I estimate the population aged 85 and over in 2014 with the “Survival Ratio” method. I assume that the survival ratio between two ages for the extinct cohorts can be applied to the still living cohorts in oder to estimate their size at the last census. The over estimated are then adjusted by the 85-and-over recorded in 2013. After this adjustment I compute the size of the intermediate populations located in the green quadrilateral by substracting step by step the observed deaths.

(11)

3.1.4 Adjustment of Computed Mortality Rates

I can compute departmental mortality rates by age and sex with deaths in Lexis triangles and populations at each 1st January. Mortality rates are the ratio between the number of deaths and the number of individuals exposed to the risk12:

Mxt = Dxt Ext = 1 DL(x,t) + DU(x,t) 2[P(x,t) + P(x,t + 1)] + 1 6[DL(x,t) − DU(x,t)] . (11)

Note that I do not calculate populations for 2015, although these are needed for 2014. To estimate mortality rates for that year, I assume that the population at each age in 2015 is equal to that in 2014, and the formula becomes: Mx2014= Dx2014 Ex2014 = DL(x, 2014) + DU(x, 2014) P(x, 2014) +16[DL(x, 2014) − DU(x, 2014)] . (12)

Figure3presents the set of data needed to compute mortality rates.

Figure 3: MORTALITY RATES COMPUTATIONS

These rates are not used directly to calculate lifetables. I smooth mortality rates beyond age 90 in order to avoid erratic fluctuations due to small numbers of deaths and population at risk. The instantaneous probability of dying over age 80 in the Kannisto model can be expressed as follows (with a and b ≥ 0):

µx(a, b) =

aeb(x−80)

1 + aeb(x−80). (13) Mortality rates estimated with the Kannisto model Mx(a, b) are:

Mx(a, b) = µx+0,5(a, b). (14)

If Dx∼ Poisson Exµx+0,5(a, b), then parameters a and b may be calculated by minimizing the following

function:

12For the explanation of the presence of the difference between the two Lexis triangles at the denominator, please see HMD

(12)

− logL(a, b) =

105

x=80

Dxlogµx+0,5(a, b) − Exµx+0,5(a, b) . (15)

I can calculate ˆMx( ˆa, ˆb) for all ages above 90, with estimated parameters a, ˆb. I assume that theˆ

population’s mortality rates are equal to the mortality rates in the survival tables (mx):

   mx= Mx x∈ [0, 89] mx= ˆMx x∈ [90, 105] . (16)

To convert the survival table mortality rates into probabilities of dying, one must define ax, the mean

number of years lived by people dying between ages x and x + 1. I assume that deaths are uniformly distributed at each age:

   ax=1/2 x∈ [1, 104] ax=m1∞ 105 x= 105+ . (17)

For age 0, I follow Preston (2001), who refers on Coale and Demeny (1983)’s lifetables. Thus:                m0≥ 0.107    a0= 0, 350 for women, a0= 0, 330 for men, m0< 0.107    a0= 0, 053 + 2.800 for women, a0= 0, 045 + 2.684 for men. (18)

The probabilities of death may be calculated as follows:    qx= mx 1+(1+ax)mx x∈ [0, 104] qx= 1 x= 105+ . (19)

With values of qx, I can compute each of the lifetable values, for each age: the number of survivors

(lx), the number of deaths (dx), and the life expectancies (ex). Two lifetables are estimated: complete in

format (1 × 1) i. e. for each age and each year, and in the format (1 × 5) i. e. for each age and each group of 5 years. For the sake of readability, lifetables in the (1 × 5) and (5 × 5) formats are also estimated. So I get values for age groups [0, 1[, [1, 5[ , [5, 10[ , [10, 15[ ... etc until ages 105 and over. Section 6.2 of the Appendix reviews the computations made to estimate each of the outstanding lifetable values in each specification.

3.2

Specific Departmental Methods for the Period 1901–2014

The methods presented previously come from the Human Mortality Database protocol. However, they are too general to be applied without correction to the case of French départements during the 20th century. These corrections are due to three main issues: the quality of the raw data, the two World Wars, and the territorial changes in my departmental classification.

(13)

3.2.1 Specific Methods Due to Data Quality

I include false stillbirths in births and deaths before first birthday as Vallin and Meslé (2001) did for the national lifetables. In their work they explained that before 1993, a child born alive who died before the official statement of birth was considered to be stillborn, which distorts both deaths before first birthday and births. To reduce this bias I have retrieved from official publications the false stillbirths by sex at the national level (Vallin and Meslé, 2001) and I have distributed them among départements pro rata of stillbirths. I added them to deaths before first birthday and births.

Moreover, the data retrieved from censuses are not of identical quality so I make some adjustments. The first is to distribute individuals of unknown year of birth pro rata of the numbers in known year of birth age groups. Although this do not present any problem for most censuses, this is not true for the 1901 one, when these numbers were included in those of the open-age interval. The second is to split the open-age interval 80-year-old and over in the 1906, 1921, 1926, 1931, 1936 and 1946 censuses. This open-age interval occurs too soon and generated some negative population figures. I split it in two age groupes: ages 80 to 84 and 85 and over. For these first two adjustments I use the 1911 census particularly detailled. Moreover, younger age groups did not always use the same variables: sometimes year of birth, sometimes age. I use a linear interpolation to compute figures per year of birth. Section6.3of the Appendix presents in a more detailled manner these threee adjustments.

3.2.2 Specific Methods due to the Two World Wars

The two World Wars had significant demographic effects both at national and departmental level. The first is due to internal migrations caused by the conflict and the France’s division into occupied and unoccupied zones in 1940. The raw statistics give no direct indication for this question. The second concerns the heavy military losses, which had to be included in death statistics. On this particular point, this study is the first to integrate military and deportation deaths into lifetables at subnational level.

Ideally, the statistics of military deaths should be available according to the age and the year of the soldier’s death, as well as his home département before the war. Since the sources used are incomplete, I couple two different matrices. The first provides the total of deaths by département and year of birth. It comes from the Defense Ministry’s database, which lists all the “Morts pour la France” (MPLF) of the two wars. The second provides the total of deaths at the national level by year of death and year of birth. It mobilizes the crowd-based indexing on the Mémoire des Hommes website: each individual, using his personal research on a specific soldier, inform both his year of death and his year of birth. This work has been done for just over 20% of total deaths. I wonder if this sample is representative of the distribution by year of death. For that, I use Pedroncini (1992)’s work: it gives total military deaths by year of death. Table

2 shows these distributions according to both sources. Even if discrepancies exist, I can use the sample coming from Mémoire des Hommes. Data by year of birth and year of death are therefore extracted from the Defense Ministry’s database.

By cross-referencing these two matrices, I get a matrix giving total deaths by département, year of birth and year of death. I assume that there is little variation between départements in the year of death according to the cohort.

This distribution of deaths is then adjusted by the total of deaths as estimated by researchers at national level, so as to verify the overall consistency of the various sources. Prost (2008) makes an inventory of the

(14)

Table 2: DISTRIBUTION BY YEAR OF DEATH OF SOLDIERS

Year 1914 1915 1916 1917 1918 Total

Mémoire des hommes Deaths 75,403 82,878 50,933 34,436 52,459 296,109

% of the total 25.46% 27.99% 17.20% 11.63% 17.72% 100%

Pedroncini (1992) Deaths 301,000 349,000 252,000 164,000 235,000 1,301,000

% of the total 23.14% 26.83% 19.37% 12.61% 18.06% 100%

statistical estimates of deaths during the First World War. He used the Marin’s report, followed by Hubert (1931) and Dupaquier (1988). Roure’s report cited by Prost (2008) revealed 1,357,800 military casualties, taking into account deaths of foreigners. Hubert (1931) added 40,000 soldiers dead during the 6 months after the armistice as well as sailors. Table 3 summarizes these numbers. Regarding the 28,600 deaths that occurred 6 months after the armistice, I assume that they had been included in the 1919 deaths of the population movement and do not take them into account. With regard to the 75,700 deaths of soldiers coming from settlements and abroad, since these populations were not registered in 1911 in the French départementsand were surely recorded in the civilian deaths of their home country, I do not keep them in the total. Finally, I obtain 1,304,400 deaths.

Table 3: MILITARTY DEATHS DURING THEFIRSTWORLDWAR

Source Variable Deaths

Roure

Total of French military deaths 1,282,100 Total foreign-born and settlements 75,700

Total Roure 1,357,800

Hubert Deaths 6 months after armistice 28,600

Sailors 11,400

Final total 1,397,800

The principle is the same for the Second World War. The two matrices combined come from the Defense Ministry’s database. The total of deaths I use is 200,000, in line with Lagrou et al. (2002). Section6.4of the Appendix reviews the departmental classification problems in the Defense Ministry website, as well as the cubic splines used to distribute departmental deaths by single year of birth for the two World Wars.

According to deportation during the Second World War, deportees are classified by birth place in the database, which is different from home place. I build cross-matrices between birth place and home place for the deportees born in France and those born abroad. For that purpose I use two raw materials. The first is the 1936 census for the foreign-born, which provides their distribution among départements in France. The second is the 1946 census for the French-born, which provides their distribution by birth place and home place at departmental level. Finally, I adjust these figures by the total of deportees estimated by researchers, namely 110,000, in line with Dupaquier (1988). Section6.5of the Appendix presents the computations of deportees by age, sex and home-département.

3.2.3 Specific Methods Due to Territorial Changes

The main advantage of the French départements is their stability since the beginning of the 19th century. However, there were some changes during the two last centuries, especially with regard to the eastern bor-ders and the Paris region. To take this into account, some adjustments are necessary. In this study, I use a

(15)

departmental classification with 97 départements: the 95 départements of the current metropolitan France (Corse counting as one), as well as the Seine and Seine-et-Oise in their pre-1968 boundaries. Territorial breakdowns are twofold in this study: either departmental boundaries changed because of a territorial reor-ganization, or the data are missing within the unified departmental classification that I use.

The departmental boundary changes are of two types for the period 1853–2014. The first concerns the pre-1901 period. Savoie and Nice’s Comté were attached to France following the April, 22th and 23th, 1860 plébiscite. Savoie and Haute-Savoie were created ex nihilo on June 14th, 1860 while Alpes-Maritimeswas created by aggregating a part of Var (Grasse’s canton) to the Comté. Moreover, following the war against Prussia in 1870, Meurthe and Moselle in their old form disappeared to form Moselle and Meurthe-et-Moselle.13 In addition, the départements boundaries of Haut-Rhin14, Bas-Rhin and Vosges15 changed. For this period, I distributed births of the old-classification départements between the unified-classification départements. The second change concerns the 1901–2014 period. It follows the Ile-de-Francereorganization in 1964, effective in 1968. This reorganization led to the dissolution of Seine and et-Oise. These départements were divided between Paris, Yvelines, Essonne, Hauts-de-Seine, Seine-Saint-Denis, Val-de-Marneand Val d’Oise.

The missing data in the unified departmental classification are also of two types. The first concerns the missing data due to the two World wars: Aisne, Ardennes, Marne, Meurthe-et-Moselle, Meuse, Nord, Oise, Pas-de-Calais, Sommeand Vosges for the 1914–1918 period, and Moselle, Bas-Rhin and Haut-Rhin for the period 1939–1945. Corse is also concerned in 1943 and 1944. The second category is départements temporarily under German control: this is the case of Bas-Rhin, Haut-Rhin and Moselle before 1919.

3.2.4 Specific Methods Due to Missing Data

Births of the missing départements during the period 1853–1900 are first estimated. Recall that these births allow the distribution of deaths according to Lexis triangles. I consider that the changes were synchronized between missing départements and a neighboring département. For Var and Alpes-Maritimes, whose limits are stable since 1861, I use the ratio between births in 1861 and births in Bouches-du-Rhône to deduce births between 1853 and 1860. I proceed in the same way for Savoie and Haute-Savoie, for which I use Ainas reference. Regarding Vosges, Territoire de Belfort and Meurthe-et-Moselle, I used Haute-Saône as reference for the 1853–1869 period. As I know values for Meurthe, Moselle, Haut-Rhin and Vosges (former départements), it was easy to deduce values for Moselle and Haut-Rhin in their current boundaries. For the 1870–1900 period, births in Moselle, Bas-Rhin and Haut-Rhin were estimated using Haute-Saône as reference.

Data from the population movement for missing départements during the two World wars are also estimated. Even if the lifetables of these départements should be analyzed with caution, this allows an approximation of their current mortality conditions. For that, I go further than the method used for births

13Until 1870, two departments existed, namely Meurthe and Moselle. Their gathering fell within the same limits as

Meurthe-et-Moselleand the new Moselle. The new Moselle includes the territories under German control in 1870, namely the districts of Château-Salinsand Sarrebourg for the old Meurthe and Thionville, Metz, Forbach-Boulay Moselle and Sarreguemines for the old Moselle. In contrast, the new Meurthe-et-Moselle includes the territories remained French at that time, i.e. the districts of Luneville, Nancy and Toul for the old Meurthe and the canton of Briey for the old Moselle.

14In 1870, Haut-Rhin in its former boundaries is divided between Haut-Rhin as we know today – which passes under German

control until the end of the Second World War – and Territoire de Belfort, which remains under French control.

15In 1870, the former cantons of Schirmeck and Saales (in Vosges) are attached to Bas-Rhin, which passes under German

(16)

by endogenizing the choice of the reference département. For each couple of département and missing period, I choose a panel of geographically close départements whose data are available. Table 4 gives these candidates for each set of missing départements. I then calculate a score based on the synchronicity of demographic variations over the period surrounding the missing period. From this score, a reference département is defined for each département with missing data and used to estimate these values. This method is used to both total births and deaths by age (sum of civilian, military and in deportation deaths). Section 6.6 of the Appendix goes into detail about the choice of reference département and the method used.

Table 4: PANEL OF CANDIDATE REFERENCE DÉPARTEMENTS

Period Missing départements Panel of reference départements

1914–1919 Aisne, Ardennes, Marne, Meurthe-et-Moselle, Aube, Eure, Haute-Marne, Haute-Saône, Meuse, Nord, Oise, Pas-de-Calais, Somme, Vosges Seine-Inférieure, Seine-et-Marne, Seine-et-Oise

1939–1945 Moselle, Bas-Rhin, Haut-Rhin Doubs, Meurthe-et-Moselle, Haute-Saône, Vosges

1943–1944 Corse Alpes-Maritimes, Bouches-du-Rhône, Gard, Hérault, Var

With the reorganization of Ile-de-France in 1968 I must differentiate the départements belonging to the old classification from those belonging to the new ones. The former départements are followed over the 1901–1968 period, and the new ones between 1968 and 2014. As such, I make several adjustments. The first concerns the distribution of births before 1968 among the départements of the new classification, in order to distribute deaths in Lexis triangles. It is done pro-rata 1968’s births. Then I estimate 1968’s age-populations for départements of the old classification by using the “Intercensal Survival" method: I assume that Ile-de-France migratory profile was the same for Seine and Seine-et-Oise. Section6.7of the Appendix discusses these two adjustments.

Finally, computation periods vary by département. I distinguish them according to four classes. Class 1 (C1) concerns all départements outside Moselle, Bas-Rhin, Haut-Rhin and Ile-de-France (except

Seine-et-Marne). These 85 départements are tracked over the period 1901–2014. Computations of population at each 1st January is done as shown in Figure 2. Départements in class 2 (C2) are the former

Ile-de-France départements, namely Seine (75) and Seine-et-Oise (78). Lifetables were estimated over the period 1901–1968. Class 3 (C3) concerns the new Ile-de-France départements: Essonne (91), Hauts-de-Seine

(92), Seine-Saint-Denis (93), Val-de-Marne (94), Val d’Oise (95), Paris (96), Yvelines (97). Lifetables are available for the period 1968–2014. Bas-Rhin, Haut-Rhin and Moselle are in class 4 (C4): lifetables

are estimated between 1921 and 2014. Figures in Section6.8 of the Appendix draw the methods used to estimate the January 1stpopulations for each of these four classes. These are variants of Figure2.

3.3

Reliability of the Data and Comparison with Other Studies

The raw data used in this study come from old statistical sources. I therefore verified that their use could be done without introducing bias in future analyzes.

Firstly, I was interested in the consistency of departmental and national data. Vallin and Meslé (2001) calculated the national lifetables for the 19th and 20th centuries. Consequently, I verified that the depart-mental sums of deaths, births, false stillbirths and populations are equal to national values. These expec-tations were true, which testify to the quality of the raw data. My results are therefore consistent with the

(17)

results established at the national level.

Second, I was interested in the coherence of my results with the works already done at the departmental level. To do so, I calculated the differences between the departmental life expectancies of my paper and those of Bonneuil (1997) and Daguet (2006). Results are presented in Table5.

Bonneuil (1997) calculated the life expectancies of women in 1901–1905. I have calculated life ex-pectancies for the same period as well. The comparison between these estimates shows that mine are on average higher: the median of the difference is 3.34%. In addition, 50% of départements have a difference between 0.49% and 6.05%, and 25% of them have a difference of more than 6.05%. The in-depth study of age-specific mortality rates reveals that these differences are largely explained by lower infant mortality rates (deaths under age 5). Nevertheless, since I cannot retrieve the death and population statistics of Bon-neuil (1997), I do not know if this difference comes from an underestimation of the number of deaths or an overestimation of the population at risk.

Daguet (2006) also revealed the departmental life expectancies at birth at the date of each census be-tween 1954 and 1999. I compute the differences for both men and women. Overall, differences are much smaller. The median is around 0.2%, with no distinction for men and women and no temporal trend. The differences for 50% of the départements fall between 0% and 0.7% in 1962. These differences in 1999 for men are 0.22% and 0.73%, respectively. Although slight differences remain, one can conclude that life expectancies are reliable, even if slightly overestimated.

Table 5: DIFFERENCES OF DEPARTMENTAL LIFE EXPECTANCIES AT BIRTH WITH OTHER STUDIES

Men Women

1stQuart. Med. 3rdQuart. 1stQuart. Med. 3rd Quart. 1901–1905 0.49 3.34 6.05 1954 0.18 0.65 1 0.54 0.84 1.34 1962 0 0.4 0.72 -0.01 0.37 0.68 1968 0.17 0.38 0.73 -0.02 0.33 0.78 1975 -0.17 0.15 0.5 -0.11 0.19 0.47 1982 0.01 0.27 0.59 0.04 0.21 0.5 1990 0.09 0.31 0.55 0.21 0.4 0.62 1999 0.22 0.49 0.73 0.47 0.66 0.99

Notes: Differences in % of my computations. Distribution of 90 or 95 departmental differences, according to the classification of the year.

4

Available Results and Discussion

4.1

Available Results

Results are available for the 97 metropolitan départements monitored over the period 1901–2014, namely the départements of the current classification (Corse counting as one) as well as the old Seine and Seine-et-Oise. Due to their additivity, results are also available at the regional level in the classification prior to January 2016 (22 regions). The variables available are the life expectancies at each age (ex) as well as a set

of lifetable variables between ages 0 to 105 and over (number of survivors, mortality rates, proportions of deaths). Yearly births and populations by age are also available.

(18)

Figure 4 reveals the departmental life expectancies at birth relative to the metropolitan average, for women. I chose to present the results for women, but these results are available for men too. The first map shows the results for 1901. One can see that the highest life expectancies were located on an axis connect-ing the South-West to the North-East, from Ardennes to Landes. Maximums were reached in Ardennes but also in Pays de la Loire (Loir-et-Cher, Indre, Indre-et-Loire, Deux-Sèvres, ... etc.) and Bourgogne (Côte d’Or, Yonne, Nièvre,... etc.) with values 10 to 20% higher than the metropolitan average. In contrast, life expectancies at birth in the South-East, Seine and Bretagne are significantly lower than the metropolitan av-erage (between 5 and 20% according to the département). The second map presents these life expectancies at birth in the aftermath of the Second World War. At that time, maximums were reached in Loir-et-Cher, Creuseand Alpes-Maritimes with life expectancies 5 to 10% higher than the metropolitan average : Central-West was still a leader region, while the regions of Bretagne and Normandie were still lagging behind.

Figure 4: LIFE EXPECTANCY AT BIRTH FOR WOMEN(IN%OF THE METROPOLITAN MEAN): 1901AND

1946

Notes: Sample includes 90 départements. Moselle, Bas-Rhin and Haut-Rhin values are non available in 1901 (départements under German administration).

Rather than analyzing synthetic indicators such as life expectancy, one can look at age-specific indi-cators. Since they impacted strongly life expectancies at birth, Figure5 presents infant mortality rates for women. One more time I chose to present the results for women, but these results are available for men too. I represent the rates per thousand, and no longer relative to the metropolitan average. The landscape in 1901 was relatively similar to the map of life expectancy, since infant mortality rates were in 1901 very high. One can see that in extreme cases (Seine-Inferieure, Ardèche), for a thousand children under one year, between 180 and 210 died before their first birthday. Rates were generally high in the North and the South-East (between 120 and 150), while they were lower in a broad central band connecting the Saône-et-Loire to the Charente-Maritime and the Atlantic coast. Minimums (between 60 and 90) were reached in Creuse and Allier. The second map shows the same values in 1946. Infant mortality rates decreased between the two years since they were globally around 60 per thousand in 1946. An under-mortality zone was visible, from Eure-et-Loirto Isère via Nièvre. The Mediterranean coast presented diverse situations: early mortality was

(19)

low in the East (Var, Alpes-Maritimes) and strong in the West (Hérault, Gard, Pyrénées-Orientales).

Figure 5: INFANT MORTALITY RATES FOR WOMEN (PER THOUSAND): 1901 AND1946

Notes: Sample includes 90 départements. Moselle, Bas-Rhin and Haut-Rhin values are non available in 1901 (départements under German administration).

Finally, one can analyze evolvements of a single département over the 1901–2014 period. Figure 6

shows female survivors at each age for different dates in Morbihan. I have chosen this département since it was a place or high mortality in 1901. Indeed, there was high infant mortality at that time: there were only 850 survivors in the fictitious cohort. This infant mortality almost completely disappeared in 1975. The survival curve shifted to the upper-right corner as mortality rates were globally declining. This displace-ment was important until 1975, mainly because of the drop in infant mortality. Subsequently, the curve moved mainly because of the decrease in mortality between 60 and 80 years, then beyond 80 years for the 1999–2014 period. This is in line with the literature about rectangularization of the survival curve (see Wilmoth and Horiuchi (1999), Fries (2002), Cheung et al. (2005) for example): this curve was in 2014 very flat until age 60 (there is almost no death below this age). Beyond this age the curve decreases dramatically, especially beyond age 80.

(20)

Figure 6: EVOLUTION OF SURVIVORS AT EACH AGE IN MORBIHAN

4.2

Discussion

4.2.1 Censuses Reliability

With population censuses one know the spatial distribution of the population by age and sex between the French départements along the 20th century. During this period, censuses served as a support for some public choices. The first concerns local budgets: allocations coming from central administration were based on the population of each territory. These censuses therefore affected the spatial distribution of public finance. The second concerns the electoral divisions: in order to obtain a fair representation in local or national assemblies, electoral divisions are divided so that each of them represents roughly the same population percentage. Censuses therefore had a very strong political impact. As a result, some regions have sought to inflate their census populations in order to get greater financial or electoral weight. Historians and statisticians have shown that Marseille’s population was overestimated in the 1930s.16 This was also true in Corse in 1962: results of the exhaustive counting were not published because of inconsistencies. These censuses are, however, the basis of age-population computations. Even though ambiguous cases remain marginal over the period, they nevertheless existed.

4.2.2 Interdepartmental Migrations

Methods used in this study partly take into account the issue of migrations. At each census date, the difference between estimated and recorded population can be seen as an approximation of net migration

(21)

flows at each age. These flows are then distributed in proportion to the time elapsed between the first census and January 1stof each year of the intercensal period. This approximation does not affect our results when the flows are weak or if they follow the approximation used. This is not the case in war periods. The May-June 1940 Exodus is an emblematic example. To escape the advance of German troops on French territory, the populations of the North-East migrate in mass towards the South and the West. I cannot take into account this exodus with the methodology used: the population of Ardennes on January 1st, 1941 is for example largely overestimated. This issue is presented on several occasions in the Statistique Annuelle du Mouvement de la Populationbetween 1939 and 194217; this publication has suggested to estimate the present population with ration tickets dispensed to the population. Howewer, Alary et al. (2006) showed that these tickets were circumvented during the war, questioning their reliability in counting the present population. Bonnet (2018) try to estimate these departmental populations, but only for females and for the total population.

Another issue relating to interdepartmental migrations concerns the nursery of children born in urban départements. Newborns were sent to rural départements close to major urban centers. Thus, Seine has a lower infant mortality rate than it should be because some of the infants are sent to suburbs. To overcome this issue, official publications suggests18 to divide deaths of children under age 1 born in a département and living anywhere on the national territory, by the total of births in this département. I cannot do this because I do not find these raw data in official publications; this suggests that my infant mortality rates are slightly underestimated in urban départements.

4.2.3 Domiciliation of Deaths during the Two World Wars

The sources I use to estimate life expectancies during the two World Wars are incomplete: military and deportee deaths were recorded by birth département and not by home département. I build matrices linking birth département and home département before the deportation; nevertheless, they rely on strong assump-tions about the representativity of pre- and post-war situaassump-tions concerning the phenomena that took place during the war. The few statistics kept for this period limit the possibilities to go further. Regarding military deaths, I assume that the home département was similar to the birth département concerning the “Morts pour la France”. If this hypothesis seems weaker than those assumed for deportees, it is not entirely satis-factory. Again, I miss reliable and available data to overcome this issue.

4.2.4 Small Département Figures

Estimating fertility or mortality rates is difficult when figures are small (namely around 0). Papers tackle this issue by using bayesian estimation process (Asunção et al. (2005), Schmertmann et al. (2014) for fertility rates, Alexander et al. (2017) for mortality rates). The question arose of using these methods to supplement the HMD Protocol. However, the French départements figures are not as small as geographical units used in these studies: for example, the minimum according to population was reached in Territoire de Belfortin 1901 with 50,000 women, compared to 2,000 for some counties. However, these estimation models may be applied in the future, particularly to compute confidence intervals around departmental life expectancies.

17See Statistique Annuelle du Mouvement de la Population, 1939–1942, pages 3-4, 47 and 55 18See Statistique Annuelle du Mouvement de la Population, 1939–1942, pages 55

(22)

5

Conclusion

In this paper, I have presented the sources and methods used to estimate lifetables by sex for all French metropolitan départements from 1901 to 2014. To do so, I have collected vital records and census statistics at the departmental level since the beginning of the 20th century. Since the two World Wars afflicted France between 1914–1918 and 1939–1945, military deaths and deaths in deportation were of great importance in the lifetables estimates; these statistics have been collected at the departmental level in original sources, namely the “Mémoire des Hommes” and “MemorialGenWeb” databases.

To estimate departmental lifetables, I have refered to the methods used in a large number of countries by the researchers of the Human Mortality Database. These methods transform the collected raw data into homogeneous data. They include the use of Cubics Splines to estimate deaths by age groups, the Kannisto model to extrapolate deaths at older ages, and a panel of methods to estimate populations at 1st January of each year. The HMD protocol has been amended to take into account the French data specificities. This concerns false stillbirths which are reintroduced in the statistics of births and infant deaths, and territorial breaks such as those which affected the Paris region in 1968.

This work provides a new database on departmental mortality for the entire 20th century. Coupled with Bonneuil (1997)’s estimations for the 19th century, it provides an overview of the local trends in mortality since the French Revolution. As they have been calculated for each sex, these data shed new insights on the reasons explaining the differences in life expectancy between men and women . Moreover, beyond mortality statistics, this new database can be used to analyze all demographic fields at local level: birth rates since it includes annual births, the spatial distribution of population since it provides yearly populations by age, and finally internal migrations. These fields of research are on my future agenda.

(23)

References

Alary, E., B. Vergez-Chaignon, and G. Gauvin (2006). Les Français au quotidien, 1939–1949. Perrin. Barbieri, M. (2013). La mortalité départementale en France. Population 68(3), 433–479.

Bonnet, F. (2018). Beyond the Exodus of May-June 1940: Internal Migrations in France during the Second World War. mimeo.

Cheung, S. L. K., J.-M. Robine, E. J.-C. Tu, and G. Caselli (2005). Three Dimensions of the Survival Curve: Horizontalization, Verticalization, and Longevity Extension. Demography 42(2), 243–258. Coale, A. J., P. Demeny, and B. Vaughan (2013). Regional Model Life Tables and Stable Populations:

Studies in Population. Elsevier.

Daguet, F. (2006). Données de démographie régionale de 1954 à 1999. INSEE.

Dupâquier, J. and J.-P. Bardet (1988). Histoire de la population française, Volume 4. Presses universitaires de France.

Fries, J. F. (2002). Aging, Natural Death, and the Compression of Morbidity. Bulletin of the World Health Organization 80(3), 245–250.

Huber, M. (1931). La population de la France pendant la guerre. Presses Universitaires de France.

Pedroncini, G., A. Corvisier, and A. Blanchard (1992). Histoire militaire de la France. 3. De 1871 à 1940. Presses Universitaires de France.

Preston, S., P. Heuveline, and M. Guillot (2000). Demography: Measuring and Modeling Population Processes. Wiley-Blackwell.

Prost, A. (2008). Compter les vivants et les morts: l’évaluation des pertes françaises de 1914–1918. Le Mouvement Social(1), 41–60.

Vallin, J. and F. Meslé (2001). Tables de mortalité françaises pour les XIXe et XXe siècles et projections pour le XXIe siècle. Éditions de l’Institut National d’Etudes Démographiques.

Vallin, J. and F. Meslé (2005). Convergences and Divergences: an Analytical Framework of National and Sub-National Trends in Life Expectancy. Genus 61(1), 83–124.

Van de Walle, É. (1974). The Female Population of France in the 19th Century: A Reconstruction of 82 Départements. University Press.

Wilmoth, J. R., K. Andreev, D. Jdanov, D. A. Glei, C. Boe, M. Bubenheim, D. Philipov, V. Shkolnikov, and P. Vachon (2007). Methods Protocol for the Human Mortality Database. University of California, Berke-ley, and Max Planck Institute for Demographic Research, Rostock. URL: http://mortality. org [version 31/05/2007] 9, 10–11.

Wilmoth, J. R. and S. Horiuchi (1999). Rectangularization Revisited: Variability of Age at Death Within Human Populations. Demography 36(4), 475–495.

(24)

6

Appendices

6.1

Computations of Population on 1

st

January

6.1.1 Intercensal Survival

The first method used to compute populations on 1st January of each year is “Intercensal Survival”. With this method I can estimate population by age for each intercensal period. Populations at the second census (e.g. 1906 for 1901–1906) are not estimated in the same way for all cohorts. Figure7 presents the three types of cohorts which exist in this method. There are “Pre-existing cohorts” (born before the census year), “Infant cohort” (born during the census year) and “Birth cohorts” (born after the census year). The gaps between the census date and 1st January of the census year are crucial. This gap is called f1 for the first

census and f2for the second.

Figure 7: CLASSIFICATION OF DIFFERENT COHORTS FOR INTERCENSALSURVIVAL METHOD

I begin with “Pre-existing cohorts”. I estimate age-population at date of the second census. Let t and t+ N be the first and last 1st January in the intercensal period. N is the number of full calendar years between censuses. The dates of the two censuses are:

t1= t − 1 + f1,

t2= t + N + f2.

(25)

t2− t1= N + 1 − f1+ f2.

The cohort tracked (Figure 7, in blue) was 1- or 2-years-old at the time of the 1906 census and was born in 1904. Data are by year of birth and not by age, which simplifies computations. I assume a uniform distribution of deaths in each Lexis triangle, so that for the cohort aged x on 1st January of the year of the first census,

Da= (1 − f12) × DL(x,t − 1),

Db= (1 − f1)2× DU(x − 1,t − 1),

Dc= f22× DL(x + N + 1,t + N),

Dd= (2 f2− f22) × DU(x + N,t + N).

This cohort’s estimated population at the second census may be called ˆC2and is calculated as follows:

ˆ C2= C1− (Da+ Db) − N−1

i=0 [DU(x + i,t + i) + DL(x + i + 1,t + i)] − (Dc+ Dd), (20)

where ∆x = C2− ˆC2 (the difference between the estimated population and that recorded at the date of

the second census) comprises estimation errors and intercensal migrations within the cohort. In order to compute age-population at 1st January of each intercensal year, the ∆xerror must be split between the

age-populations in each intercensal year. I asume that these rough migrations are uniformly distributed over time. Population by age is calculated as follows:

P(x + n,t + n) = C1− (Da+ Db) − n−1

i=0 [DU(x + i,t + i) + DL(x + i + 1,t + i)] + 1 − f1+ n N+ 1 − f1+ f2 ∆x. (21)

There is only one “Infant cohort” to track for each intercensal period (in Figure7, the cohort born in 1906). Thus, C1= C11+ C12, with C11 = (1 − f1) × Bt−1 and C12 the population recorded as born during

the year of the census. Thus,

ˆ C2= C1− Da− N−1

i=0 [DU(i,t + i) + DL(i + 1,t + i)] − (Dc+ Dd), (22) and P(n,t + n) = C1− (Da+ Db) − n−1

i=0 [DU(i,t + i) + DL(i + 1,t + i)] + 1 2(1 − f12) + n N+12(1 − f12) + f2 ∆0. (23)

Finally, since N is the number of full calendar years during the intercensal interval, I track N birth cohorts. A cohort born in year t + j is aged K = N − j − 1 on 01/01/t + N. The estimated population of

(26)

this cohort may be expressed as: ˆ C2= Bt+ j− DL(0,t + j) − N−1

i=1 [DU(i − 1,t + j + i) + DL(i,t + j + i)] − (Dc+ Dd). (24)

Note that the number of intermediate populations produced by the various cohorts depends on K. For k= 0, ..., K , the intermediate populations of each cohort are computed as follows:

P(k,t + j + k + 1) = Bt+ j− DL(0,t + j) − k

i=1 [DU(i − 1,t + j + i) + DL(i,t + j + i)] + 2k + 1 2K + 1 + 2 f2 ∆t+ j. (25)

6.1.2 Precensal Survival Method

The second method I use is “Precensal Survival”, to compute populations for the first 1st January of the whole period. Figure8presents the computations for population of age 1 in 1901. To do so, I must add D0a et D0b to the population born in 1901 and recorded on March 6th, 1901. If t1 is the first 1st January of the

intercensal period, then:

P(x − 1,t1− 1) = C1+ D0a+ D0b. (26)

Figure 8: PRECENSAL SURVIVAL METHOD

6.1.3 Extinct Cohorts Method

The third method I use is “Extinct Cohorts”, to calculate age-population for the cohorts extincted in 2013. Since the maximum age in my database is 105, a cohort is considered to be extinct if it reached 105 or over in 2013. Figure 9 reveals that my data comprise two kinds of extinct cohorts. The first are “Full

(27)

cohorts” (Figure9, in red), which can be tracked from ages 80 to 105 in 1901–2013. Thus, the 80-year-old population in 1903 equals the sum of the cohort’s Lexis triangles between ages 80 and 105. The others are “Truncated cohorts” (Figure9, in blue), those over age 80 in 1901. Thus, the 95-year-old population in 1901 equals the sum of the cohort’s Lexis triangles between 95 and 105. More generally, the population of age x in year t can be calculated as follows:

P(x,t) =

i=0

[DU(x + i,t + i) + DL(x + i,t + i)] .

Figure 9: EXTINCTCOHORTSMETHOD

6.1.4 Survivor Ratio Method

The last method I use is “Survivor ratio”, to calculate non-extinct cohorts of age 85 and over in 2013. Figure

10presents the computations for the cohort aged 104 in 2013. The survivor ratio R may be defined as the number of individuals alive at age x on 1st January t, divided by the number of individuals in the same cohort alive k years previously. Formally:

R= P(x,t) P(x − k,t − k).

I assume that there is no migration at these ages. R may also be expressed:

R= P(x,t) P(x,t) + ˙D.

where ˙D= ∑ki=1[DU(x − i,t − i) + DL(x − i + 1,t − i)]. Finally, P(x,t) may be expressed as a function of R:

P(x,t) = R

(28)

Figure 10: SURVIVOR RATIOMETHOD

Since the survivor ratio cannot be directly observed for a cohort, I use preceding cohorts whose age-populations have been calculated by the “Extinct Cohorts” method. I asume that the survival ratio has roughly the same value in the studied cohort and in the preceding ones. As such, the mean ratio R∗of the preceding m cohorts may be calculated as follows:

R∗(x, 2013, k, m) = ∑

m

i=iP(x, 2013 − i)

∑mi=iP(x − k, 2013 − k − i)

. I may then estimate ˜P(x, 2013):

e

P(x, 2013) = R

1 − R∗D.˙

Subsequently, I may track the cohort back in time and estimate ˜P(x − 1, 2012), ˜P(x − 2, 2011), ... by adding step by step the cohort’s deaths. I apply this method for any non-extinct cohort in 2013. For my estimations I follow the guidelines of the HMD Protocol, with k = m = 5.

The assumption of a constant survivor ratio over time is strong, and I may control by the recorded population on 1st January 2013. I compare the 85-and-over population on 1stJanuary 2013 – retrieved from the census of that year (called P85+Rec) – with the 85-and-over population on 1st January 2013 as calculated by the Survivor Ratio method (called P85+SR ). Thus, populations at each age in 2013 can be computed as follows: ˆ P(x, 2013) = c eP(x, 2013) = c R ∗ 1 − R∗D,˙ where c = P Rec 85+

P85+SR . As before, each cohort is back-followed: I make estimates for ˆP(x − 1, 2012), ˆP(x −

(29)

6.2

Set of Different Lifetables

Concerning computations of (1 × 1) and (1 × 5) lifetables, I start from values of qx. With these values I

compute px the probability of staying alive between x and x + 1. Then I compute the number of survivors

at each age per 100,000 births

lx= l0 x−1

i=0

piwith (l0= 100, 000) ,

the deaths at each age(dx)

   dx= lxqx x∈ (0, 104) dx= lx x= 105 ,

the number of years lived between x and x + 1    Lx= lx− (1 − ax)dx x∈ (0, 104) L∞ 105= lxax x= 105 ,

the number of life years remaining to live    Tx= ∑104i=xLx+ L∞105 x∈ (0, 104) Tx= L∞105 x= 105 .

Finally, life expectancy at age x is computed as follows:

ex= Tx lx.

Methods are quite the same for (1 × 5) lifetables. I therefore get lifetables for quinquennial periods: 1901–1905, 1906–1910, 1911–1915... etc. Values in abridged (5 × 1) and (5 × 5) lifetables are computed with previous variables. 5ex, 5lxand5Tx are directly retrieved from the complete lifetables. Finally, 5dx=

lx− lx+5, 5qx = dlxx and5Lx = Tx− Tx+5. One can also find 5ax and5mx from the basic formula linking all

these variables.

6.3

Census Adjustments

For my purposes it is simpler to compute population figures by birth year. Census data are given by single age after 1968. I gather populations by five-year age groups between ages 15 and 89, before taking the open-age interval 90 and over. The cubic splines adjustment takes into account that populations were given by age and not by birth year. Thus, taking the 1968 census as an example, I isolated the populations born between 01/01/1968 and the date of the census.19 Before 1968, data are given by birth year. Nevertheless some specific adjustments are needed.

19Note that the estimates of the population born in the census year are important because they are used to calculate populations

(30)

6.3.1 Distribution of Deaths of Unknown Age in 1901 Census

For the 1901 census, individuals whose birth year is unknown are put together in the open-age interval. To allocate them I use the 1911 census, which has a useful degree of detail. The process follows three steps. The first is based on the calculation of the quotient of individuals aged 95 and over by individuals aged 80 and over for each département i and each sex j in 1911:

R191195i j = ∑

105 s=95Psi j1911

∑105s=80Psi j1911

. (28)

These quotients are then applied to the 1901 census to compute the proportion of individuals aged 95 and over among individuals aged 80 and over:

105

s=95 Psi j1901= R191195i j × 105

s=80 Psi j1901. (29) By substraction, I finally deduce death of unknown year of birth for each département and sex.

6.3.2 Addition of Age Group for Pre-1946 Censuses

The 1906, 1921, 1926, 1931, 1936 and 1946 censuses did not use the same methodology for populations in the first three age groups. Some groups have to be combined or splitted (Table 6, in italics). For that purpose I assume that births were spread uniformly over time.

TABLE 6: CLASSIFICATION AND AVAILABILITY OF POPULATIONS BORN TWO YEARS BEFORE THE

CENSUS

Census 1stclass 2ndclass 3rdclass

1901 Born from 01/01/01 to 04/03/01 Born in 1900 Born in 1899 1906 Born from 01/01/06 to 03/06/06 Born in 1905 Born in 1904 1911 Born from 01/01/11 to 03/05/11 Born in 1910 Born in 1909 1921 Born from 01/01/21 to 03/05/21 Born from 03/06/20 to 12/31/20 Born from 01/01/20 to 03/05/20 1926 Born from 01/01/26 to 03/07/26 Born from 03/08/25 to 12/31/25 Born from 01/01/25 to 03/07/25 1931 Born from 01/01/31 to 03/07/31 Born from 03/08/30 to 12/31/30 Born from 01/01/30 to 03/07/30 1936 Born from 01/01/36 to 03/07/36 Born from 08/03/35 to 31/12/35 Born from 01/01/35 to 7/03/35 1946 Born from 03/10/45 to 03/09/46 Born from 01/01/44 to 03/09/45 Born in 1943

Notes: Periods in italics in the table have to be combined or splitted to get populations by year of birth. 01/01/01 means 01/01/1901.

Finally, the 1911 census is rather different because it provides data for each year of birth and not per five-year groups. Howewer, these numbers fluctuate considerably. There were two possible methods: either use the numbers given, or combine the numbers in five-year groups as for the other censuses and apply cubic splines. Although the first method provides more information, it includes inconsistent fluctuations at adult ages. Since I need to maintain consistency, I choose the second method. Raw data in 1911 have to be thoroughly reprocessed: I keep the first fifteen birth year groups, and then combine them by five-year groups (1891–1895, 1886–1890, etc.) plus the open-age interval “1820 and earlier”.

(31)

6.3.3 Adjustment of Censuses by Cubic Splines

To get populations by single year of birth and not five-year groups, I adjust census populations by cubic splines, as I do for civilian and military deaths. The cubic splines are fitted to the cumulative curve of population born before 1stJanuary of the census year. For example, according to the 1901 census, I consider the population born before 1stJanuary 1901. The population born between 1st January 1901 and the day of the census provide no further information and would involve fractional knots.

6.4

Estimates of Military Deaths during the Two World Wars

The classification of départements from the “Mémoire des Hommes” website is modified to fit the clas-sification for civilian deaths. Problems concern Corse (two départements counting as one) and the old départementsof Seine and Seine-et-Oise. For these last two, deaths are given for the new départements. To allocate deaths between Seine and Oise I first sum all deaths in Ile-de-France (without Seine-et-Marne), then I allocate these military deaths pro rata of population in the cohorts born from 1880 to 1896. These cohorts account for 83% of total military deaths in the First World War. Concerning the distribu-tion of deaths in the Parisian départements between Seine and Seine-et-Oise for the Second World War, I allocate them pro rata of populations born between 1905 and 1921 (70% of total deaths during the Second World War). Seine’s deaths are equal to 78.6% of the total.

Moreover, to ease the collection of data from the website, military deaths have been retrieved by year of birth for the youngest (born after 1889), then by five-year group for those born in 1889 and earlier. These deaths must be split by year of birth, which is done by cubic splines. The two assumptions made are (1) no deaths under age 16 and (2) no deaths over age 60.

6.5

Estimates of Deportees

The deportee database is nominative (1 line for each deportee). Sex, birth département (or country of birth if born abroad), day-month-year of birth, day-month-year of death were extracted. The age of death in days-months-years follows. For dates of birth and death, data are kept since the year was available. Thus, if only the year was available, the date chosen was January 1st. Likewise, if only the month and year of birth were available, the full date of birth was set to the first day of the month. If the date was considered irrelevant (namely if date of birth after date of death), the date is erased. For individuals whose year of death was after 1946 (for about forty individuals), I consider that those are unknown. 93% of the deceased have well-informed data for the four variables (sex, date of death, age, place of birth). For those with two or three variables missing, data were not used. This corresponds to 6.5% of the database. I did not use deportees with one variable missing too since they represented only 0.5% of the total. From these nominative data, I thus extract matrices crossing the age of death, the year of death (1940–1946), the place of birth and the sex.

One of the variables available in the deportee database is the place of birth. One has to differentiate this variable from the home place before deportation, that is where the deceased would have to be located in my lifetables. Since a 40-year-old have a non-zero probability to migrate in a different département from where he is born, I may infer the home-département before deportation. Similarly, deportees born abroad must be located in a French département.

Figure

Table 1: S UMMARY OF FOREIGN - BORN DEPORTEES BY NATIONALITY Country Deportees In % of foreign-born deportees
Figure 1 presents deaths by year and age. They may be split into two triangles for a single year, known as Lexis triangles
Figure 2: M ETHODS FOR COMPUTATIONS OF POPULATION AT 1 st J ANUARY
Figure 3: M ORTALITY RATES COMPUTATIONS
+7

Références

Documents relatifs

Finally, Bern- stein [2] has identified the obstruction for two smooth asymptotically conical expanding solutions to MCF coming out of the same cone to coincide: the obstruction can

Prompted by claims that garbage collection can outperform stack allocation when sucient physical memory is available, we present a careful analysis and set

The Eportfolio: How can it be used in French as a second language teaching and learning.. Practical paper presenting

Following Switzerland's commitment on the path of the energy transition, the new Energy Act [BFE (2017)], which defines the first package of the Energy Strategy 2050 (ES 2050),

have shown, in [CP12], that there exists no algorithm computing a simple and equitable fair division for n ≥ 3 players in the Roberston-Webb model.. The strategy used in these

that: (i) submitters had a lower percentage of goods with tariff peaks on the EGs list than on their respective total goods lists while the op- posite pattern often holds

[W3] , A priori estimates and existence for a class of fully nonlinear elliptic equations in conformal geometry, Chinese Ann.. Notes Math., Springer,

Our research team has developed and experimented software for the learning of algebra, named Aplusix, with the idea of being usable and useful for all the classes having access