class: center, middle, inverse, title-slide, clear # Modelling and quantifying mortality and longevity risk ## Tutorial session 1 <html> <div style="float:left"></div> <hr align='center' color='#106EB6' size=1px width=97%> </html> ### Katrien Antonio, Michel Vellekoop, Torsten Kleinow, Frank van Berkum, and Jens Robben ### <a href="https://gitfront.io/r/jensrobben/mUZ8CcDz5x1L/Lausanne-summer-school/">35th International Summer School of the Swiss Association of Actuaries </a> | June 3-7, 2024 ### <a> </a> --- class: inverse, center, middle name: prologue # Prologue <html><div style='float:left'></div><hr color='#FAFAFA' size=1px width=796px></html> --- name: introduction # Introduction ### Workshop <img align="left" src="img/gitfront.png" width="35" hspace="10"/> https://gitfront.io/r/jensrobben/mUZ8CcDz5x1L/Lausanne-summer-school/ The workshop's landing page on GitFront, where we upload slide decks, R code and data sets. ### Us <img align="left" src="img/Katrien.jpg" width="160"/ hspace="0"> <img align="left" src="img/Michel.jpg" width="160" hspace="45"/> <img align="left" src="img/Torsten.jpg" width="160" hspace="0"/> <img align="left" src="img/Frank.jpg" width="160" hspace="45"/> <img align="left" src="img/Jens.png" width="160" hspace="0"/> <img align="left" src="img/Empty.png" width="100" height="160" hspace="0"/> [Katrien Antonio](https://katrienantonio.github.io) [
](mailto:katrien.antonio@kuleuven.be) [Michel Vellekoop](https://www.uva.nl/profiel/v/e/m.h.vellekoop/m.h.vellekoop.html) [
](mailto:m.h.vellekoop@uva.nl) [Torsten Kleinow](https://www.uva.nl/profiel/k/l/t.kleinow/t.kleinow.html) [
](mailto:t.kleinow@uva.nl) [Frank van Berkum](https://www.uva.nl/profiel/b/e/f.vanberkum/f.vanberkum.html) [
](mailto:F.vanBerkum@uva.nl) [Jens Robben](https://jensrobben.github.io) [
](mailto:jens.robben@kuleuven.be) --- name: checklist # Checklist
Do you have a fairly recent version of R on your machine (see [here](https://cran.r-project.org/bin/windows/base/))? ```r version$version.string ```
Do you have a fairly recent version of the RStudio IDE on your machine (see [here](https://posit.co/downloads/))? ```r RStudio.Version()$version ## Requires an interactive session but should return something like "[1] ‘1.2.5001’" ```
Do you have the R packages listed in the software requirements installed on your machine?
(Alternatively) Do you have an account on [posit cloud](https://login.posit.cloud/) and access to the workspace dedicated to the Summer School, if you prefer to run the tutorial instructions in the cloud? --- class: inverse, center, middle name: goals # Goals of tutorial session 1 <html><div style='float:left'></div><hr color='#FAFAFA' size=1px width=796px></html> --- # Goals of tutorial session 1 .pull-left[ In this tutorial session you will learn to: * download and visualize mortality data sets and life tables obtained from the Human Mortality Database, see [HMD](https://mortality.org/) * perform some basic calculations with period life tables * calibrate (selected) single population mortality models, using the Poisson distributional assumption for death counts * perform some checks to inspect the in-sample fit of the calibrated model. ] .pull-right[ Outline of the tutorial: * [Prologue](#prologue) * [Life table calculations and visualizations](#lifetable) * [Download and import mortality data](#data) * [Fit single population mortality models by maximising a Poisson likelihood](#POI) * [Model selection and goodness-of-fit](#modelfit) ] --- class: inverse, center, middle name: lifetable # Life table calculations and visualizations <html><div style='float:left'></div><hr color='#FAFAFA' size=1px width=796px></html> --- # Import a life table stored as .txt file .pull-left[ We downloaded (on 18 April 2024) life tables from the [HMD](https://www.mortality.org) and stored it as .txt file on the GitHub repo. Now you can import it in R: ```r CHE_female <- read.table( file = "../data/hmd/CHE_female_life_table_1x1.txt", skip = 2, header = TRUE) CHE_male <- read.table( file = "../data/hmd/CHE_male_life_table_1x1.txt", skip = 2, header = TRUE) ``` We convert the age variable into an integer variable: ```r CHE_male$Age <- parse_number(CHE_male$Age) %>% as.integer() CHE_female$Age <- parse_number(CHE_female$Age) %>% as.integer() ``` Mind the use of `parse_number()` to handle the entry '110+' in the `Age` column of the original object. ] .pull-right[ We extract the 2022 data using the function `filter` from the {dplyr} package: ```r CHE_male_2022 <- CHE_male %>% dplyr::filter(Year == 2022) CHE_female_2022 <- CHE_female %>% dplyr::filter(Year == 2022) ``` We display the structure of the R object `CHE_female_2022` using `str(.)` ```r str(CHE_female_2022) ## 'data.frame': 111 obs. of 10 variables: ## $ Year: int 2022 2022 2022 2022 2022 2022 2022 2022 2022 2022 ... ## $ Age : int 0 1 2 3 4 5 6 7 8 9 ... ## $ mx : num 0.00323 0.0003 0.00017 0.00012 0.00016 0.00016 0.00007 0.00002 0.00005 0.00002 ... ## $ qx : num 0.00322 0.0003 0.00017 0.00012 0.00016 0.00016 0.00007 0.00002 0.00005 0.00002 ... ## $ ax : num 0.14 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ... ## $ lx : int 100000 99678 99648 99631 99620 99604 99588 99581 99579 99574 ... ## $ dx : int 322 30 17 12 16 16 7 2 5 2 ... ## $ Lx : int 99724 99663 99640 99625 99612 99596 99584 99580 99576 99573 ... ## $ Tx : int 8537621 8437897 8338234 8238595 8138969 8039358 7939762 7840177 7740598 7641021 ... ## $ ex : num 85.4 84.7 83.7 82.7 81.7 ... ``` ] --- class: clear Let's take a closer look at the period life table for males in Switzerland in the year 2022: ```r head(CHE_male_2022) ``` <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> Year </th> <th style="text-align:right;"> Age </th> <th style="text-align:right;"> mx </th> <th style="text-align:right;"> qx </th> <th style="text-align:right;"> ax </th> <th style="text-align:right;"> lx </th> <th style="text-align:right;"> dx </th> <th style="text-align:right;"> Lx </th> <th style="text-align:right;"> Tx </th> <th style="text-align:right;"> ex </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 2022 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0.00419 </td> <td style="text-align:right;"> 0.00417 </td> <td style="text-align:right;"> 0.14 </td> <td style="text-align:right;"> 100000 </td> <td style="text-align:right;"> 417 </td> <td style="text-align:right;"> 99641 </td> <td style="text-align:right;"> 8161285 </td> <td style="text-align:right;"> 81.61 </td> </tr> <tr> <td style="text-align:right;"> 2022 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0.00016 </td> <td style="text-align:right;"> 0.00016 </td> <td style="text-align:right;"> 0.50 </td> <td style="text-align:right;"> 99583 </td> <td style="text-align:right;"> 15 </td> <td style="text-align:right;"> 99575 </td> <td style="text-align:right;"> 8061644 </td> <td style="text-align:right;"> 80.95 </td> </tr> <tr> <td style="text-align:right;"> 2022 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 0.00020 </td> <td style="text-align:right;"> 0.00020 </td> <td style="text-align:right;"> 0.50 </td> <td style="text-align:right;"> 99567 </td> <td style="text-align:right;"> 20 </td> <td style="text-align:right;"> 99557 </td> <td style="text-align:right;"> 7962069 </td> <td style="text-align:right;"> 79.97 </td> </tr> <tr> <td style="text-align:right;"> 2022 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 0.00011 </td> <td style="text-align:right;"> 0.00011 </td> <td style="text-align:right;"> 0.50 </td> <td style="text-align:right;"> 99547 </td> <td style="text-align:right;"> 11 </td> <td style="text-align:right;"> 99542 </td> <td style="text-align:right;"> 7862512 </td> <td style="text-align:right;"> 78.98 </td> </tr> <tr> <td style="text-align:right;"> 2022 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 0.00011 </td> <td style="text-align:right;"> 0.00011 </td> <td style="text-align:right;"> 0.50 </td> <td style="text-align:right;"> 99536 </td> <td style="text-align:right;"> 11 </td> <td style="text-align:right;"> 99531 </td> <td style="text-align:right;"> 7762970 </td> <td style="text-align:right;"> 77.99 </td> </tr> <tr> <td style="text-align:right;"> 2022 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 0.00011 </td> <td style="text-align:right;"> 0.00011 </td> <td style="text-align:right;"> 0.50 </td> <td style="text-align:right;"> 99525 </td> <td style="text-align:right;"> 11 </td> <td style="text-align:right;"> 99520 </td> <td style="text-align:right;"> 7663439 </td> <td style="text-align:right;"> 77.00 </td> </tr> </tbody> </table> .pull-left[ `mx`: death rate at age `\(x\)`, calculated as observed deaths divided by observed exposures at age `\(x\)`. `qx`: mortality rate, i.e., the probability that an `\(x\)`-year old dies within a year. `lx`: number of survivors at age `\(x\)` in the life table population. ] .pull-right[ `dx`: number of deaths at age `\(x\)` in the life table population. `Tx`: remaining person-years for all individuals of age `\(x\)`. `ex`: period life expectancy at age `\(x\)`. ] --- # Visualize a period life table .pull-left[ We start with visualizing the `\(q_{x,t}\)` for M/F data in Switzerland, year 2022. Using `ggplot` instructions: ```r g_male <- ggplot(CHE_male_2022, aes(Age, log(qx))) + geom_point(col = RCLRbg) + geom_line(col = RCLRbg) + theme_bw(base_size = 15) + ggtitle("Switzerland - males, 2022") + labs(y = bquote(ln(q[x]))) g_fem <- ggplot(CHE_female_2022, aes(Age, log(qx))) + geom_point(col = RCLRbg) + geom_line(col = RCLRbg) + theme_bw(base_size = 15) + ggtitle("Switzerland - females, 2022") + labs(y = bquote(ln(q[x]))) ``` ] .pull-right[ <img src="Tutorial-1_files/figure-html/unnamed-chunk-11-1.svg" style="display: block; margin: auto;" /> ] --- # Visualize a period life table .pull-left[ We start with visualizing the `\(q_{x,t}\)` for M/F data in Switzerland, year 2022. Using `ggplot` instructions: ```r g_male <- ggplot(CHE_male_2022, aes(Age, log(qx))) + geom_point(col = RCLRbg) + geom_line(col = RCLRbg) + theme_bw(base_size = 15) + ggtitle("Switzerland - males, 2022") + labs(y = bquote(ln(q[x]))) g_fem <- ggplot(CHE_female_2022, aes(Age, log(qx))) + geom_point(col = RCLRbg) + geom_line(col = RCLRbg) + theme_bw(base_size = 15) + ggtitle("Switzerland - females, 2022") + labs(y = bquote(ln(q[x]))) ``` ] .pull-right[ <img src="Tutorial-1_files/figure-html/unnamed-chunk-13-1.svg" style="display: block; margin: auto;" /> ] --- class: clear .left-column[ <br>
## Your turn ] .right-column[ **<span style="color:red">Q</span>**: What is causing the sudden spike in the male and female mortality rates at age 110? Reconstruct the plot with a maximum age of 90. ] --- class: clear .left-column[ <br>
## Your turn ] .right-column[ **<span style="color:red">A</span>**: In the previous code, the `Age` variable was transformed into an integer variable. This approach was not entirely correct. The raw life table reveals this: <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> Year </th> <th style="text-align:left;"> Age </th> <th style="text-align:right;"> mx </th> <th style="text-align:right;"> qx </th> <th style="text-align:right;"> ax </th> <th style="text-align:right;"> lx </th> <th style="text-align:right;"> dx </th> <th style="text-align:right;"> Lx </th> <th style="text-align:right;"> Tx </th> <th style="text-align:right;"> ex </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 2022 </td> <td style="text-align:left;"> 105 </td> <td style="text-align:right;"> 0.65922 </td> <td style="text-align:right;"> 0.49580 </td> <td style="text-align:right;"> 0.50 </td> <td style="text-align:right;"> 179 </td> <td style="text-align:right;"> 89 </td> <td style="text-align:right;"> 135 </td> <td style="text-align:right;"> 260 </td> <td style="text-align:right;"> 1.45 </td> </tr> <tr> <td style="text-align:right;"> 2022 </td> <td style="text-align:left;"> 106 </td> <td style="text-align:right;"> 0.69559 </td> <td style="text-align:right;"> 0.51610 </td> <td style="text-align:right;"> 0.50 </td> <td style="text-align:right;"> 90 </td> <td style="text-align:right;"> 47 </td> <td style="text-align:right;"> 67 </td> <td style="text-align:right;"> 125 </td> <td style="text-align:right;"> 1.39 </td> </tr> <tr> <td style="text-align:right;"> 2022 </td> <td style="text-align:left;"> 107 </td> <td style="text-align:right;"> 0.72967 </td> <td style="text-align:right;"> 0.53462 </td> <td style="text-align:right;"> 0.50 </td> <td style="text-align:right;"> 44 </td> <td style="text-align:right;"> 23 </td> <td style="text-align:right;"> 32 </td> <td style="text-align:right;"> 58 </td> <td style="text-align:right;"> 1.33 </td> </tr> <tr> <td style="text-align:right;"> 2022 </td> <td style="text-align:left;"> 108 </td> <td style="text-align:right;"> 0.76125 </td> <td style="text-align:right;"> 0.55138 </td> <td style="text-align:right;"> 0.50 </td> <td style="text-align:right;"> 20 </td> <td style="text-align:right;"> 11 </td> <td style="text-align:right;"> 15 </td> <td style="text-align:right;"> 26 </td> <td style="text-align:right;"> 1.28 </td> </tr> <tr> <td style="text-align:right;"> 2022 </td> <td style="text-align:left;"> 109 </td> <td style="text-align:right;"> 0.79019 </td> <td style="text-align:right;"> 0.56641 </td> <td style="text-align:right;"> 0.50 </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 11 </td> <td style="text-align:right;"> 1.25 </td> </tr> <tr> <td style="text-align:right;"> 2022 </td> <td style="text-align:left;"> 110+ </td> <td style="text-align:right;"> 0.81648 </td> <td style="text-align:right;"> 1.00000 </td> <td style="text-align:right;"> 1.22 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 1.22 </td> </tr> </tbody> </table> Hence, the age 110 in fact corresponds to the age group 110+, which causes the spike at age 110 in the previous plots. We can limit the age range (e.g., to a maximum 90 years) by filtering the data argument in the `ggplot`command using the `filter` function from {dplyr}. ] --- class: clear .pull-left[ ```r g_fem_lim <- ggplot(`CHE_female_2022 %>% dplyr::filter(Age <= 90)`, aes(Age, log(qx))) + geom_point(col = RCLRbg) + geom_line(col = RCLRbg) + theme_bw(base_size = 15) + ggtitle("Switzerland - females, 2022") + labs(y = bquote(ln(q[x]))) g_fem_lim ``` ] .pull-right[ <img src="Tutorial-1_files/figure-html/unnamed-chunk-16-1.svg" style="display: block; margin: auto;" /> ] --- # Visualize the mortality rates over time .pull-left[ We visualize the `\(q_{x,t}\)` for male and female data in Switzerland, ages 0-90 and (selected) years 1880, 1910, 1940, 1970, 2000, 2022. Using `ggplot` instructions: ```r years <- c(1880, 1910, 1940, 1970, 2000, 2022) dfM <- CHE_male %>% dplyr::filter(Year %in% years, Age <= 90) ggplot(dfM, aes(x = Age, y = log(qx), group = Year, colour = factor(Year))) + geom_line(linewidth = 1.1) + ggtitle('Swiss male mortality rates') + xlab('Age (x)') + ylab(bquote(ln(q[x]))) + scale_color_manual(values = c(red_pink, turquoise, orange, blue, green, purple), name = 'Year') + scale_x_continuous(breaks = seq(0, 90, 10)) + theme_bw(base_size = 15) ``` ] .pull-right[ <img src="Tutorial-1_files/figure-html/unnamed-chunk-18-1.svg" style="display: block; margin: auto;" /> ] --- # Visualize the mortality rates over time .pull-left[ We visualize the `\(q_{x,t}\)` for male and female data in Switzerland, ages 0-90 and (selected) years 1880, 1910, 1940, 1970, 2000, 2022. Using `ggplot` instructions: ```r years <- c(1880, 1910, 1940, 1970, 2000, 2022) dfF <- CHE_female %>% dplyr::filter(Year %in% years, Age <= 90) ggplot(dfF, aes(x = Age, y = log(qx), group = Year, colour = factor(Year))) + geom_line(linewidth = 1.1) + ggtitle('Swiss female mortality rates') + xlab('Age (x)') + ylab(bquote(ln(q[x]))) + scale_color_manual(values = c(red_pink, turquoise, orange, blue, green, purple), name = 'Year') + scale_x_continuous(breaks = seq(0, 90, 10)) + theme_bw(base_size = 15) ``` ] .pull-right[ <img src="Tutorial-1_files/figure-html/unnamed-chunk-20-1.svg" style="display: block; margin: auto;" /> ] --- class: clear .left-column[ <br>
## Your turn ] .right-column[ **<span style="color:red">Q</span>**: Complete the following `ggplot` instructions to visualize the mortality rate curves for the years 1880-2022 as a function of age. You can use either Swiss male or female data. ```r years <- ... ages <- ... dfX <- ... %>% dplyr::filter(Year %in% ..., Age %in% ...) ggplot(dfX, aes(x = ..., y = ..., group = ...)) + geom_line(aes(colour = ...), linewidth = 0.8) + scale_colour_gradient2(low = blue, mid = turquoise, midpoint = 1950, high = red_pink) + scale_x_continuous( breaks = seq(ages[1], tail(ages, 1) + 1, 10)) + theme_bw(base_size = 15) + ggtitle('Swiss ... mortality rates') + ylab(expression("log" ~ q[x])) + xlab("Age (x)") ``` ] --- class: clear .left-column[ <br>
## Your turn ] .right-column[ **<span style="color:red">A</span>**: Complete the following `ggplot` instructions to visualize the mortality rate curves for the years 1880-2022 as a function of age. You can use either Swiss male or female data. ```r years <- `1880:2022` ages <- `0:90` dfX <- `CHE_male` %>% dplyr::filter(Year %in% `years`, Age %in% `ages`) ggplot(dfX, aes(x = `Age`, y = `log(qx)`, group = `Year`)) + geom_line(aes(colour = `Year`), linewidth = 0.8) + scale_colour_gradient2(low = blue, mid = turquoise, midpoint = 1950, high = red_pink) + scale_x_continuous( breaks = seq(ages[1], tail(ages, 1) + 1, 10)) + theme_bw(base_size = 15) + ggtitle('Swiss `male` mortality rates') + ylab(expression("log" ~ q[x])) + xlab("Age (x)") ``` ] --- class: clear .left-column[ <br>
## Your turn ] .right-column[ **<span style="color:red">A</span>**: Complete the following `ggplot` instructions to visualize the mortality rate curves for the years 1880-2022 as a function of age. You can use either Swiss male or female data. <br> <img src="Tutorial-1_files/figure-html/unnamed-chunk-23-1.svg" style="display: block; margin: auto;" /> ] --- # Visualize the age-specific mortality trend .pull-left[ We visualize the evolution of `\(q_{x,t}\)` over time for male and female data in Switzerland, for selected ages 20, 30, 40, and 50. Using `ggplot` instructions: ```r ages <- c(20, 30, 40, 50) dfM <- CHE_male %>% dplyr::filter(Age %in% ages) ggplot(dfM, aes(x = Year, y = qx, group = Age, colour = factor(Age))) + geom_line(linewidth = 1.1) + ggtitle('Evolution of Swiss male mortality rates') + xlab('Time (t)') + ylab(bquote(q['x,t'])) + scale_color_manual('age', values = c(red_pink, turquoise, orange, blue)) + coord_cartesian(ylim = c(0,0.025)) + theme_bw(base_size = 15) ``` ] .pull-right[ <img src="Tutorial-1_files/figure-html/unnamed-chunk-25-1.svg" style="display: block; margin: auto;" /> ] --- # Visualize the age-specific mortality trend .pull-left[ We visualize the evolution of `\(q_{x,t}\)` over time for male and female data in Switzerland, for selected ages 20, 30, 40, and 50. Using `ggplot` instructions: ```r ages <- c(20, 30, 40, 50) dfF <- CHE_female %>% dplyr::filter(Age %in% ages) ggplot(dfF, aes(x = Year, y = qx, group = Age, colour = factor(Age))) + geom_line(linewidth = 1.1) + ggtitle('Evolution of Swiss female mortality rates') + xlab('Time (t)') + ylab(bquote(q['x,t'])) + scale_color_manual('age', values = c(red_pink, turquoise, orange, blue)) + coord_cartesian(ylim = c(0,0.025)) + theme_bw(base_size = 15) ``` ] .pull-right[ <img src="Tutorial-1_files/figure-html/unnamed-chunk-27-1.svg" style="display: block; margin: auto;" /> ] --- class: clear .left-column[ <br>
## Your turn ] .right-column[ **<span style="color:red">Q</span>**: What is causing the sudden spike in the mortality rates in the year 1918? Remove this year from the plot for visualization purposes. ] --- class: clear .left-column[ <br>
## Your turn ] .right-column[ **<span style="color:red">A</span>**: The Spanish flu pandemic of 1918-1919 severely hit Switzerland, causing about 25 000 deaths and infecting half the population. The worst wave hit in October 1918. Using the `filter` function from the `dplyr` package we can remove the year 1918 from the data. <br> <br> <img src="Tutorial-1_files/figure-html/unnamed-chunk-28-1.svg" style="display: block; margin: auto;" /> ] --- class: clear .pull-left[ ```r years <- c(1918) ages <- c(20, 30, 40, 50) dfM <- CHE_male %>% dplyr::filter(`!Year %in% years`, Age %in% ages) g <- ggplot(dfM, aes(x = Year, y = qx, group = Age, colour = factor(Age))) + geom_line(linewidth = 1.1) + ggtitle('Evolution of Swiss male mortality rates') + xlab('Time (t)') + ylab(bquote(q['x,t'])) + scale_color_manual('Age', values = c(red_pink, turquoise, orange, blue)) + theme_bw(base_size = 15) g ``` Read more about the Spanish flu affecting Switzerland in 1918, for instance [here](https://www.swissinfo.ch/eng/business/devastation_when-spanish-flu-hit-switzerland/44352910). ] .pull-right[ <img src="Tutorial-1_files/figure-html/unnamed-chunk-30-1.svg" style="display: block; margin: auto;" /> ] --- # Visualize the survival curve .pull-left[ We put focus on `\(S_{0,t}(.)\)`, the survival curve for a newborn, born in period (or year) `\(t\)`. We calculate the survival probabilities `\(S_{0,t}(x)\)` for a Swiss male newborn, for selected time period `\(t\)`. Hereby we let `\(x\)` run from 0 to 100. ```r *xmax <- 100 *xmin <- 0 *years <- c(1880, 1910, 1940, 1970, 2000, 2022) surv_rate <- matrix(0, xmax-xmin+1, length(years)) surv_rate[1,] <- 1 for(t in 1:length(years)){ for(x in 1:(xmax-xmin)){ df.t <- CHE_male %>% dplyr::filter(Year == years[t]) px <- 1 - df.t[x,'qx'] surv_rate[x+1,t] <- surv_rate[x,t]*px } } ``` ] .pull-right[ We consider a minimum age `\(x\)` of 0, a maximum age of 100, and the years of interest. ] --- # Visualize the survival curve .pull-left[ We put focus on `\(S_{0,t}(.)\)`, the survival curve for a newborn, born in period (or year) `\(t\)`. We calculate the survival probabilities `\(S_{0,t}(x)\)` for a Swiss male newborn, for selected time period `\(t\)`. Hereby we let `\(x\)` run from 0 to 100. ```r xmax <- 100 xmin <- 0 years <- c(1880, 1910, 1940, 1970, 2000, 2022) *surv_rate <- matrix(0, xmax-xmin+1, length(years)) surv_rate[1,] <- 1 for(t in 1:length(years)){ for(x in 1:(xmax-xmin)){ df.t <- CHE_male %>% dplyr::filter(Year == years[t]) px <- 1 - df.t[x,'qx'] surv_rate[x+1,t] <- surv_rate[x,t]*px } } ``` ] .pull-right[ We consider a minimum age `\(x\)` of 0, a maximum age of 100, and the years of interest. We create an empty matrix `surv_rate` of dimension `\(101 \times 6\)` to store the calculated survival probabilities in the years 1880, 1910, 1940, 1970, 2000, and 2022. ] --- # Visualize the survival curve .pull-left[ We put focus on `\(S_{0,t}(.)\)`, the survival curve for a newborn, born in period (or year) `\(t\)`. We calculate the survival probabilities `\(S_{0,t}(x)\)` for a Swiss male newborn, for selected time period `\(t\)`. Hereby we let `\(x\)` run from 0 to 100. ```r xmax <- 100 xmin <- 0 years <- c(1880, 1910, 1940, 1970, 2000, 2022) surv_rate <- matrix(0, xmax-xmin+1, length(years)) *surv_rate[1,] <- 1 for(t in 1:length(years)){ for(x in 1:(xmax-xmin)){ df.t <- CHE_male %>% dplyr::filter(Year == years[t]) px <- 1 - df.t[x,'qx'] surv_rate[x+1,t] <- surv_rate[x,t]*px } } ``` ] .pull-right[ We consider a minimum age `\(x\)` of 0, a maximum age of 100, and the years of interest. We create an empty matrix `surv_rate` of dimension `\(101 \times 6\)` to store the calculated survival probabilities in the years 1880, 1910, 1940, 1970, 2000, and 2022. The probability of surviving zero years equals one, i.e. `\(S_{0,t}(0)=1\)`. ] --- # Visualize the survival curve .pull-left[ We put focus on `\(S_{0,t}(.)\)`, the survival curve for a newborn, born in period (or year) `\(t\)`. We calculate the survival probabilities `\(S_{0,t}(x)\)` for a Swiss male newborn, for selected time period `\(t\)`. Hereby we let `\(x\)` run from 0 to 100. ```r xmax <- 100 xmin <- 0 years <- c(1880, 1910, 1940, 1970, 2000, 2022) surv_rate <- matrix(0, xmax-xmin+1, length(years)) surv_rate[1,] <- 1 *for(t in 1:length(years)){ * for(x in 1:(xmax-xmin)){ * df.t <- CHE_male %>% * dplyr::filter(Year == years[t]) * px <- 1 - df.t[x,'qx'] * surv_rate[x+1,t] <- surv_rate[x,t]*px * } *} ``` ] .pull-right[ We consider a minimum age `\(x\)` of 0, a maximum age of 100, and the years of interest. We create an empty matrix `surv_rate` of dimension `\(101\times 6\)` to store the calculated survival probabilities in the years 1880, 1910, 1940, 1970, 2000, and 2022. The probability of surviving zero years equals one, i.e. `\(S_{0,t}(0)=1\)`. For every year `\(t\)` under consideration, the one-year survival probability of `\((x)\)` in year `\(t\)` is `\(S_{x,t}(1) = p_{x,t} = 1-q_{x,t}\)`. We evaluate the survival probabilities recursively, as follows: `$$S_{0,t}(x+1) = S_{0,t}(x) \cdot p_{x,t}.$$` Hereby, we follow the period approach (and keep `\(t\)` fixed)! ] --- class:clear .pull-left[ We plot these survival probabilities calculated for a male Swiss newborn in the years 1880, 1910, 1940, 1970, 2000, and 2022. ```r df <- data.frame('Age' = rep(0:100, times = length(years)), 'Year' = rep(years, each = length(0:100)), 'Surv' = as.numeric(surv_rate)) ggplot(df, aes(x = Age, y = Surv, group = Year, colour = as.factor(Year))) + geom_line(linewidth = 1.1) + ggtitle('Swiss male survival curve') + xlab('Age (x)') + ylab(bquote(S[0]*'(x)')) + scale_color_manual(values = c(red_pink, turquoise, orange, blue, green, purple), name = 'Year') + scale_x_continuous(breaks = seq(0, 100, 10)) + theme_bw(base_size = 15) ``` ] .pull-right[ <img src="Tutorial-1_files/figure-html/unnamed-chunk-36-1.svg" style="display: block; margin: auto;" /> ] --- class: clear .left-column[ <br>
## Your turn ] .right-column[ **<span style="color:red">Q</span>**: Can you adjust the instructions for the calculation and visualization of `\(S_{0,t}(.)\)` to the calculation and visualization of `\(f_{0,t}(.)\)`, the pdf of r.v. `\(T_{0,t}\)`? Recall from Module 1: `\(f_{0,t}(x)=S_{0,t}(x) \cdot \mu_{x,t}\)`. <br> <img src="Tutorial-1_files/figure-html/unnamed-chunk-37-1.svg" style="display: block; margin: auto;" /> ] --- class: clear .left-column[ <br>
## Your turn ] .right-column[ **<span style="color:red">Q</span>**: Can you adjust the instructions for the calculation and visualization of `\(S_{0,t}(.)\)` to the calculation and visualization of `\(f_{0,t}(.)\)`, the pdf of r.v. `\(T_{0,t}\)`? Recall from Module 1: `\(f_{0,t}(x)=S_{0,t}(x) \cdot \mu_{x,t}\)`. The limit on the y-axis of the plot can be adjusted with the layer `coord_cartesian(ylim = c(.,.))`. <br> <img src="Tutorial-1_files/figure-html/unnamed-chunk-38-1.svg" style="display: block; margin: auto;" /> ] --- class: inverse, center, middle name: data # Download and import mortality data <html><div style='float:left'></div><hr color='#FAFAFA' size=1px width=796px></html> --- # Download mortality data with {demography} .pull-left[ We do a .hi-pink[live] download from the [Human Mortality Database](https://www.mortality.org) with the {demography} package in R. To get started with the package, first install it: ```r install.packages("demography") ``` Then, load the package: ```r library(demography) ``` Next, we use the `hmd.mx` function from {demography} to read the "Mx" (1 x 1) data from the HMD: ```r ? hmd.mx User = "summerschool.rclr2024@outlook.com" pw = "Test1234." Df = hmd.mx("CHE", User , pw , "Switzerland") ``` ] .pull-right[ We specify the country code (`CHE`), the user name and password, along with the character string containing the country name from which the data is retrieved. To get smooth access to the data, we created a username and password for this workshop. We store the downloaded data in the object `Df`. If you encounter issues with the live download, we have pre-downloaded this mortality data set, and made it available in the GitHub repository. You can load this R object as follows: ```r Df <- readRDS(file = "../data/hmd/Df_CHE_hmd_mx.rds") ``` ] --- class: clear .pull-left[ Inspect the `Df` object (a list in R) with ```r names(Df) ## [1] "type" "label" "lambda" "year" "age" "pop" "rate" names(Df$pop) ## [1] "female" "male" "total" names(Df$rate) ## [1] "female" "male" "total" ``` Verify that, e.g., ```r str(Df$rate$female) ## num [1:111, 1:147] 0.22666 0.03892 0.01801 0.01241 0.00917 ... ## - attr(*, "dimnames")=List of 2 ## ..$ : chr [1:111] "0" "1" "2" "3" ... ## ..$ : chr [1:147] "1876" "1877" "1878" "1879" ... ``` stores the `\(m_{x,t}\)` observed central death rates for Swiss females, where age `\(x\)` ranges from 0 to 110+ across the rows and year `\(t\)` spans from 1876 to 2022 across the columns. ] .pull-right[ What is stored in the data set? ```r Df$type ## [1] "mortality" ``` ```r Df$label ## [1] "Switzerland" ``` ```r min(Df$year) ## [1] 1876 max(Df$year) ## [1] 2022 ``` ```r min(Df$age) ## [1] 0 max(Df$age) ## [1] 110 ``` ] --- class: clear .pull-left[ The mortality data set `Df` contains the observed central death rates `\(m_{x,t}\)` (`$rates`) for both males (`$male`) and females (`$female`). These rates correspond to the `mx` column in the life tables: ```r head(CHE_female_2022$mx) ## [1] 0.00323 0.00030 0.00017 0.00012 0.00016 0.00016 head(Df$rate$female[,'2022']) ## 0 1 2 3 4 5 ## 0.003227 0.000305 0.000166 0.000116 0.000161 0.000160 ``` The `$pop` attribute in `Df` stores the exposure-to-risk `\(E_{x,t}\)`, representing the total amount of person-years lived by individuals aged `\(x\)` in year `\(t\)`. ```r head(Df$pop$female[,'2022']) ## 0 1 2 3 4 5 ## 40590.46 42632.60 42251.21 43074.56 43470.10 43836.41 ``` ] .pull-right[ Death counts can be calculated by multiplying the death rates with the exposure-to-risk, i.e., `\(d_{x,t} = E_{x,t} \cdot m_{x,t}\)`. In R we do: ```r round(head(Df$pop$female[,'2022'] * Df$rate$female[,'2022'])) ## 0 1 2 3 4 5 ## 131 13 7 5 7 7 ``` These computed death counts match the death counts from the Human Mortality Database (HMD) (see [here](https://www.mortality.org/File/GetDocument/hmd.v6/CHE/STATS/Deaths_1x1.txt)). ] --- class: inverse, middle, center name: POI # Fit single population mortality models by maximizing a Poisson likelihood <html><div style='float:left'></div><hr color='#FAFAFA' size=1px width=796px></html> --- class: top # Poisson likelihood and Lee-Carter specification Along the ideas explained in Module 2, we now put focus on calibrating a Lee-Carter model on a data set with `\((d_{x,t}, E_{x,t}|x \in \{1,\ldots,X\},\ t\in\{1,\ldots,T\})\)`. We work with the following Poisson assumption for the death counts `\(D_{x,t}\)`: `\begin{align*} D_{x,t} &\sim \text{POI}(E_{x,t} \cdot \mu_{x,t}) \\ \mu_{x,t} &= \exp{(\beta_x^{(1)}+\beta_x^{(2)} \cdot \kappa_t^{(2)})}. \end{align*}` The log-likelihood then follows: `$$L(\beta^{(1)},\beta^{(2)},\kappa^{(2)}|\boldsymbol{d},\boldsymbol{e}) = \sum_{t} \sum_{x} \left[d_{x,t}(\beta_x^{(1)}+\beta_x^{(2)}\cdot \kappa^{(2)}_t)-E_{x,t}\exp{(\beta_x^{(1)}+\beta_x^{(2)}\cdot \kappa^{(2)}_t)}\right]+c.$$` Here, `\(d_{x,t}\)` and `\(E_{x,t}\)` are the observed deaths and exposures, respectively. `\(c\)` is a constant that does not depend on the Lee-Carter parameters. --- class: clear .pull-left[ ```r library(demography) years <- `1970:max(Df$year)` ages <- `0:90` ``` ```r Df <- demography::extract.years( Df, years = years) Df <- demography::extract.ages( Df, ages = ages, combine.upper = FALSE) min(Df$year) ## [1] 1970 max(Df$year) ## [1] 2022 max(Df$age) ## [1] 90 ``` ] .pull-right[ To construct the Poisson likelihood, we need observations on `\(d_{x,t}\)` and `\(E_{x,t}\)`. We specify the calibration period in `years`. We specify the age range in `ages`. We use `extract.years` and `extract.ages` from the {demography} package to subset `Df` accordingly. Recall that `Df` is the "Mx" (1 x 1) data, extracted from the HMD. ] --- class: clear .pull-left[ ```r etx <- t(Df$pop$male) dim(etx) ## [1] 53 91 str(etx) ## num [1:53, 1:91] 49109 49131 47433 44772 42661 ... ## - attr(*, "dimnames")=List of 2 ## ..$ : chr [1:53] "1970" "1971" "1972" "1973" ... ## ..$ : chr [1:91] "0" "1" "2" "3" ... dtx <- round(etx * t(Df$rate$male)) dim(dtx) ## [1] 53 91 str(dtx) ## num [1:53, 1:91] 876 821 718 681 626 500 459 395 376 358 ... ## - attr(*, "dimnames")=List of 2 ## ..$ : chr [1:53] "1970" "1971" "1972" "1973" ... ## ..$ : chr [1:91] "0" "1" "2" "3" ... ``` ] .pull-right[ We extract the exposures and put these in `etx`. Here, years are in the rows and ages in the columns of the list. `t(.)` is for matrix transpose. The deaths are not directly stored in `Df`, but follow from `\(m_{x,t} \cdot E_{x,t}\)`. ] --- class: clear We discuss today two frameworks for calibrating the Lee-Carter parameters by optimizing the Poisson log-likelihood. Alternative optimization strategies exist! .pull-left[ <center> <font size="+2"> <b> .hi-pink[Method 1] </b> </font> </center> The `fit701(.)` function from the `fitModels.R` script. This method is written by Andrew Cairns, see [LifeMetrics software](https://www.macs.hw.ac.uk/~andrewc/lifemetrics/), and uses univariate Newton-Raphson (NR) steps to optimize the log-likelihood. `fit701(.)` calibrates Lee Carter, `fit702(.)` fits the Renshaw Haberman model, `fit703(.)` the Age-Period-Cohort (APC) model, `fit705(.)` the CBD model, `fit706(.)` the CBD model with cohort effect, and so on. ] .pull-right[ <center> <font size="+2"> <b> .hi-pink[Method 2] </b> </font> </center> The R-package {StMoMo} for stochastic mortality modelling. This method is implemented by Andres Villegas, Pietro Millossovich, and Vladimir Kaishev, see [StMoMo vignette](https://cran.r-project.org/web/packages/StMoMo/vignettes/StMoMoVignette.pdf), and relies on estimation routines for so-called Generalized Non-linear Models (GNM). The R-package {StMoMo} provides a unified framework for fitting stochastic mortality models of the so-called generalized age-period-cohort (GAPC) family. ] --- name: Method 1: `fit701` # Method 1: {fit701} As discussed in Module 2, the following types of iterative steps are crucial when optimizing the likelihood with (univariate) Newton-Raphson steps: `$$\hat{\kappa}_t^{(2)(k+1)} = \hat{\kappa}_t^{(2)(k)}-\frac{\sum_{x}\left[d_{x,t}-E_{x,t}\exp{(\hat{\beta}_x^{(1)(k+1)}+\hat{\beta}_x^{(2)(k)}\hat{\kappa}_t^{(2)(k)})}\right]\hat{\beta}_x^{(2)(k)}}{-\sum_{x}E_{x,t}\exp{(\hat{\beta}_x^{(1)(k+1)}+\hat{\beta}_x^{(2)(k)}\hat{\kappa}_t^{(2)(k)})}\left(\hat{\beta}_x^{(2)(k)}\right)^2}$$` Similar expressions for the iterative steps of the parameters `\(\beta_x^{(1)}\)` and `\(\beta_x^{(2)}\)` were given during today's lectures. For example, in `fit701(.)` the function `llmax2D(.)` implements the NR step for the period effect: ```r thetat = b2*ev*exp(b1+b3*g3) s1 = sum(dv*b2*wv) f0 = sum((exp(k20*b2)*thetat)*wv)-s1 df0 = sum((exp(k20*b2)*b2*thetat)*wv) k21 = k20-f0/df0 ``` Here (cfr. the `fitModels.R` script) `wv` refers to weights used in the likelihood, and `b3*g3` is fixed at zero in the Lee-Carter specification. --- class: clear .pull-left[ ```r source('../scripts/fitModels.R') LCfit701 <- `fit701(ages, years, etx, dtx,` `matrix(1, length(years),` `length(ages)))` names(LCfit701) ``` ``` ## -176085992 -> -108584292 ->-85457.18 ## -65657.75 -> -41655.23 ->-33660.76 ## -23267.72 -> -22962.99 ->-21072.62 ## -20137.5 -> -20134.21 ->-19905.81 ## -19818.15 -> -19817.8 ->-19796.79 ## -19790.53 -> -19790.5 ->-19789 ## -19788.6 -> -19788.59 ->-19788.49 ## -19788.46 -> -19788.46 ->-19788.46 ## -19788.45 -> -19788.45 ->-19788.45 ## -19788.45 -> -19788.45 ->-19788.45 ## -19788.45 -> -19788.45 ->-19788.45 ``` ``` ## [1] "beta1" "beta2" "beta3" "kappa2" "gamma3" ## [6] "x" "y" "cy" "wa" "epsilon" ## [11] "mhat" "ll" "BIC" "npar" "mtxLastYear" ``` ] .pull-right[ We call `fit701(.)` from the `fitModels.R` script. We store the results in the object `LCfit701`. Useful attributes in this object are: - `beta1` the estimates for `\(\beta_x^{(1)}\)` - `beta2` the estimates for `\(\beta_x^{(2)}\)` - `kappa2` the estimates for `\(\kappa_t^{(2)}\)` - the BIC - `epsilon` the residuals - `mhat` the fitted `\(\hat{\mu}_{x,t}\)`. The residuals `epsilon` are Pearson residuals in a Poisson setting: `$$\epsilon_{x,t} = \frac{d_{x,t}-\hat{\mu}_{x,t} \cdot E_{x,t}}{\sqrt{\hat{\mu}_{x,t} \cdot E_{x,t} }}.$$` ] --- class: clear <img src="Tutorial-1_files/figure-html/unnamed-chunk-61-1.svg" style="display: block; margin: auto;" /> --- name: Method 2: {StMoMo} # Method 2: {StMoMo} In the {StMoMo}-package, stochastic mortality models are formulated as a generalised linear or non-linear model. Parameters are calibrated using the optimization strategies available in the `glm(.)` routine (from R's {stats} library) or the library `{gnm}`. More details are in the vignette [here](https://cran.r-project.org/web/packages/StMoMo/vignettes/StMoMoVignette.pdf) or the 2018 paper in the Journal of Statistical Software (see [here](https://www.jstatsoft.org/article/view/v084i03)). As an example, the Poisson Lee-Carter model is formulated as: `$$\log \mathbb{E}[D_{x,t}] = \log E_{x,t} + \beta_x^{(1)} + \beta_x^{(2)} \kappa_t^{(2)},$$` where `\(\log E_{x,t}\)` is an offset. We install and load the package before using it: ```r install.packages("StMoMo") library(StMoMo) ``` --- class: clear .pull-left[ ```r constLC <- function(ax, bx, kt, b0x, gc, wxt, ages){ c1 <- mean(kt[1, ], na.rm = TRUE) c2 <- sum(bx[, 1], na.rm = TRUE) list(ax = ax + c1 * bx, bx = bx / c2, kt = c2 * (kt - c1)) } LC <- StMoMo(link = "log", staticAgeFun = TRUE, periodAgeFun = "NP", constFun = constLC) st_Data <- StMoMoData(data = Df, series = "male", type = "central") LCfit <- fit(object = LC, data = st_Data) names(LCfit) ``` ``` ## [1] "model" "ax" "bx" "kt" "b0x" ## [6] "gc" "data" "Dxt" "Ext" "oxt" ## [11] "wxt" "ages" "years" "cohorts" "fittingModel" ## [16] "loglik" "deviance" "npar" "nobs" "conv" ## [21] "fail" "call" ``` ] .pull-right[ The function `constLC` implements the identifiability constraints. As discussed in Module 2, `\begin{align*} \tilde{\beta}_x^{(1)} = \beta_x^{(1)} + c_1 \beta_x^{(2)}, \:\: \tilde{\beta}_x^{(2)} = \frac{\beta_x^{(2)}}{c_2}, \:\: \tilde{\kappa}^{(2)}_t = c_2 \cdot (\kappa_t^{(2)} - c_1), \end{align*}` determine equivalent parametrizations (for `\(c_1\)` and `\(c_2\neq 0\)`). Hence, Lee & Carter suggest the following set of constraints: `\begin{align*} \displaystyle \sum_x \beta_x^{(2)} = 1, \:\:\: \sum_t \kappa^{(2)}_t = 0, \end{align*}` which can be imposed by choosing (cfr. the R code) `\begin{align*} c_2=\sum_x \beta_x^{(2)} = \beta^{(2)}_{\bullet} \ \ \ c_1 = \frac{1}{T} \sum_t \kappa^{(2)}_t = \bar{\kappa} \end{align*}` We call `StMoMo(.)` to initialise a StMoMo object, representing a mortality model from the GAPC family. ] --- class: clear .pull-left[ ```r constLC <- function(ax, bx, kt, b0x, gc, wxt, ages){ c1 <- mean(kt[1, ], na.rm = TRUE) c2 <- sum(bx[, 1], na.rm = TRUE) list(ax = ax + c1 * bx, bx = bx / c2, kt = c2 * (kt - c1)) } LC <- StMoMo(`link = "log"`, `staticAgeFun = TRUE`, `periodAgeFun = "NP"`, `constFun = constLC`) st_Data <- StMoMoData(data = Df, series = "male", type = "central") LCfit <- fit(object = LC, data = st_Data) names(LCfit) ``` ``` ## [1] "model" "ax" "bx" "kt" "b0x" ## [6] "gc" "data" "Dxt" "Ext" "oxt" ## [11] "wxt" "ages" "years" "cohorts" "fittingModel" ## [16] "loglik" "deviance" "npar" "nobs" "conv" ## [21] "fail" "call" ``` ] .pull-right[ The argument `link` specifies the link function and distributional assumption. With `"log"`, we assume Poisson distributed deaths with a log link for `\(\mu_{xt}\)`. Alternatively, `"logit"` assumes a binomial distribution and a logit link for `\(q_{xt}\)`. The argument `staticAgeFun` indicates whether a static age function `\(\beta_x^{(1)}\)` should be included, or not. The argument `periodAgeFun` indicates the number and type of period-age effects in the model. A single `"NP"` means that a non-parametric age effect `\(\beta_x^{(2)}\)` is used in the predictor. The argument `constFun` provides the user-defined identifiability constraints, as specified in `constLC`. ] --- class: clear .pull-left[ ```r constLC <- function(ax, bx, kt, b0x, gc, wxt, ages){ c1 <- mean(kt[1, ], na.rm = TRUE) c2 <- sum(bx[, 1], na.rm = TRUE) list(ax = ax + c1 * bx, bx = bx / c2, kt = c2 * (kt - c1)) } LC <- StMoMo(link = "log", staticAgeFun = TRUE, periodAgeFun = "NP", constFun = constLC) st_Data <- `StMoMoData(data = Df, series = "male",` `type = "central")` LCfit <- `fit(object = LC, data = st_Data)` names(LCfit) ``` ``` ## [1] "model" "ax" "bx" "kt" "b0x" ## [6] "gc" "data" "Dxt" "Ext" "oxt" ## [11] "wxt" "ages" "years" "cohorts" "fittingModel" ## [16] "loglik" "deviance" "npar" "nobs" "conv" ## [21] "fail" "call" ``` ] .pull-right[ We create an StMoMoData object from the demography data `Df`, and fit the Lee-Carter model. Useful attributes of `LCfit` are: - `ax` the estimates for `\(\beta_x^{(1)}\)` - `bx` the estimates for `\(\beta_x^{(2)}\)` - `kt` the estimates for `\(\kappa_t^{(2)}\)` - `loglik` the log-likelihood of the model. The person residuals `\(\epsilon_{x,t}\)` and `\(\hat{\mu}_{x,t}\)` can be calculated from the model's fit. Let's have a look at the Lee-Carter parameter estimates obtained with this method! ] --- class: clear <img src="Tutorial-1_files/figure-html/unnamed-chunk-72-1.svg" style="display: block; margin: auto;" /> --- # Estimated mortality rates Using the calibrated Lee-Carter parameters `\(\hat{\beta}_x^{(1)}\)`, `\(\hat{\beta}_x^{(2)}\)`, and `\(\hat{\kappa}_t^{(2)}\)`, we estimate the force of mortality as follows: `\begin{align*} \hat{\mu}_{x,t} = \exp\left(\hat{\beta}_x^{(1)} + \hat{\beta}_x^{(2)} \cdot \hat{\kappa}_t^{(2)}\right). \end{align*}` The estimated force of mortality can be retrieved from `LCfit701` using the attribute `$mhat`. Using the Lee-Carter model fit from the {StMoMo} package, we compute the estimated force of mortality as: ```r mhat <- matrix(NA, nrow = length(years), ncol = length(ages), dimnames = list(years, ages)) for(t in 1:length(years)){ mhat[t,] <- exp(LCfit$ax + LCfit$bx * LCfit$kt[t]) } ``` The estimated mortality rates are obtained as follows: `\begin{align*} \hat{q}_{x,t} = 1 - \exp\left(- \hat{\mu}_{x,t}\right). \end{align*}` --- class: clear .pull-left[ We compute the observed and estimated mortality rates from the `LCfit701` object: ```r qobs <- 1 - exp(-dtx/etx) qhat <- 1 - exp(-LCfit701$mhat) ``` We construct a data frame that stores the observed and estimated mortality rates at age 25: ```r age <- `'25'` df.age <- data.frame(Year = years, obs = qobs[,age], fit = qhat[,age]) ``` We visualize these using {ggplot}. ```r ggplot(df.age) + geom_point(aes(x = Year, y = obs)) + geom_line(aes(Year, fit), col = RCLRbg, linewidth = 1) + ggtitle("Switzerland - males, 1970 - 2022") + ylab(bquote(hat(q)['t,`25`'])) + theme_bw(base_size = 15) ``` ] .pull-right[ <br> <img src="Tutorial-1_files/figure-html/unnamed-chunk-78-1.svg" style="display: block; margin: auto;" /> ] --- class: clear .pull-left[ We compute the observed and estimated mortality rates from the `LCfit701` object: ```r qobs <- 1 - exp(-dtx/etx) qhat <- 1 - exp(-LCfit701$mhat) ``` We construct a data frame that stores the observed and estimated mortality rates at age 45: ```r age <- `'45'` df.age <- data.frame(Year = years, obs = qobs[,age], fit = qhat[,age]) ``` We visualize these using {ggplot}. ```r ggplot(df.age) + geom_point(aes(x = Year, y = obs)) + geom_line(aes(Year, fit), col = RCLRbg, linewidth = 1) + ggtitle("Switzerland - males, 1970 - 2022") + ylab(bquote(hat(q)['t,`45`'])) + theme_bw(base_size = 15) ``` ] .pull-right[ <br> <img src="Tutorial-1_files/figure-html/unnamed-chunk-83-1.svg" style="display: block; margin: auto;" /> ] --- class: clear .pull-left[ We compute the observed and estimated mortality rates from the `LCfit701` object: ```r qobs <- 1 - exp(-dtx/etx) qhat <- 1 - exp(-LCfit701$mhat) ``` We construct a data frame that stores the observed and estimated mortality rates at age 65: ```r age <- `'65'` df.age <- data.frame(Year = years, obs = qobs[,age], fit = qhat[,age]) ``` We visualize these using {ggplot}. ```r ggplot(df.age) + geom_point(aes(x = Year, y = obs)) + geom_line(aes(Year, fit), col = RCLRbg, linewidth = 1) + ggtitle("Switzerland - males, 1970 - 2022") + ylab(bquote(hat(q)['t,`65`'])) + theme_bw(base_size = 15) ``` ] .pull-right[ <br> <img src="Tutorial-1_files/figure-html/unnamed-chunk-88-1.svg" style="display: block; margin: auto;" /> ] --- class: clear .pull-left[ We compute the observed and estimated mortality rates from the `LCfit701` object: ```r qobs <- 1 - exp(-dtx/etx) qhat <- 1 - exp(-LCfit701$mhat) ``` We construct a data frame that stores the observed and estimated mortality rates at age 85: ```r age <- `'85'` df.age <- data.frame(Year = years, obs = qobs[,age], fit = qhat[,age]) ``` We visualize these using {ggplot}. ```r ggplot(df.age) + geom_point(aes(x = Year, y = obs)) + geom_line(aes(Year, fit), col = RCLRbg, linewidth = 1) + ggtitle("Switzerland - males, 1970 - 2022") + ylab(bquote(hat(q)['t,`85`'])) + theme_bw(base_size = 15) ``` ] .pull-right[ <br> <img src="Tutorial-1_files/figure-html/unnamed-chunk-93-1.svg" style="display: block; margin: auto;" /> ] --- class: clear .left-column[ <br>
## Your turn ] .right-column[ **<span style="color:red">Q</span>**: The Cairns-Blake-Dowd (CBD) model is specified as follows: `\begin{align*} D_{x,t} &\sim \text{POI}(E_{x,t} \cdot \mu_{x,t}) \\ \log \mu_{x,t} &= \kappa_t^{(2)} + (x - \overline{x}) \cdot \kappa_{t}^{(3)}. \end{align*}` Here, `\(\overline{x}\)` represents the average age within the specified age range. Which identifiability constraints are necessary for the CBD model (if any)? Use the {StMoMo} package to calibrate the CBD model. Plot the parameter estimates using {ggplot}. Use the below instructions to plot. **Hint.** You can visualize the calibrated period effects as follows: ```r data_period <- tibble(year = years, k2 = ...) ggplot(...) + geom_point(aes(..., ...)) + geom_line(aes(..., ...), col = "black") + theme_bw(base_size = 15) + ggtitle("Switzerland - males, 1970 - 2022, CBD, Poisson") + labs(y = bquote(hat(kappa)[t]^"(2)")) ``` ] --- class: clear .left-column[ <br>
## Your turn ] .right-column[ **<span style="color:red">A</span>**: The CBD model does not have identiability issues. We create a StMoMo object with `\(\beta_x^{(2)} = 1\)` for the second age-period effect and a parametric age function, defined through `b3`, for the third age-period effect `\(\beta_x^{(3)}\)`. The CBD model does not include a static age effect `\(\beta_x^{(1)}\)`. ```r b3 <- function(x, ages) x - mean(ages) CBD <- StMoMo(link = "log", staticAgeFun = FALSE, periodAgeFun = c("1", b3)) ``` Next, we fit the mortality model on the data object: ```r CBDfit <- fit(object = CBD, data = st_Data, verbose = FALSE) ``` Let's have a look at the CBD parameter estimates! ] --- class: clear .left-column[ <br>
## Your turn ] .right-column[ <img src="Tutorial-1_files/figure-html/unnamed-chunk-97-1.svg" style="display: block; margin: auto;" /> ] --- class: inverse, middle, center name: modelfit # Model selection and goodness-of-fit <html><div style='float:left'></div><hr color='#FAFAFA' size=1px width=796px></html> --- # Model selection Various mortality model specifications have been proposed in the literature, cfr. Module 2. How to select an appropriate mortality model? Here are a few tools; more discussion in Modules 2-4. - .hi-pink[Goodness-of-Fit]: The model should accurately fit the historical, observed data. This can be evaluated using Pearson residuals: `\begin{align*} \epsilon_{x,t} = \frac{d_{x,t}-\hat{\mu}_{x,t} \cdot E_{x,t}}{\sqrt{\hat{\mu}_{x,t} \cdot E_{x,t}}}. \end{align*}` - .hi-pink[Predictive Performance]: The model should perform well on both in-sample statistical measures as well as out-of-time back-tests. - .hi-pink[Parsimony]: The model should have a good balance between complexity and goodness-of-fit. Compare the AIC/BIC values of different calibrated models to assess this balance. - .hi-pink[Reasonable]: The chosen model should lead to mortality rates (and their evolutions) that are biologically and demographically reasonable. - ... --- # Pearson residuals .pull-left[ We construct a {ggplot2} heatmap of the residuals produced by `fit701(.)`. ```r grid <- expand.grid(period = years, age = ages) grid$res <- as.vector(LCfit701$epsilon) names(grid) <- c("Year", "Age", "Residual") ggplot(grid, aes(x = Year, y = Age)) + geom_tile(aes(fill = Residual)) + scale_fill_gradientn(colours = topo.colors(7), breaks = c(-4,-2,0,2,4,6)) + theme_bw(base_size = 15) + theme(legend.position = "bottom") ``` ```r head(grid)[1:3,] ## Year Age Residual ## 1 1970 0 7.956282 ## 2 1971 0 6.009977 ## 3 1972 0 4.508311 ``` ] .pull-right[ <img src="Tutorial-1_files/figure-html/unnamed-chunk-101-1.svg" style="display: block; margin: auto;" /> ] --- class: clear .pull-left[ Diagonal patterns in heatmaps of residuals might hint to the inclusion of a cohort effect in a stochastic mortality model. For a more in depth discussion of .hi-pink[cohort effects], we refer to Cairns et al. (2009) on [A quantitative comparison of stochastic mortality models using data from England and Wales and the United States](http://www.macs.hw.ac.uk/~andrewc/papers/naaj2009.pdf). In the longer version of this paper (available [online](http://www.macs.hw.ac.uk/~andrewc/papers/ajgc50.pdf)) the authors discuss the plot shown on the right (Figure 3 on page 7). This is often used as the rationale for including cohort effects in mortality forecasting models. However, calibrating and forecasting such cohort effects typically comes with a lot of difficulties! ] .pull-right[ .center[ <img src="img/cohort_effect_Cairns.png" width="100%" style="display: block; margin: auto;" /> ] This Figure shows .hi-pink[improvement rates] in mortality for England & Wales by calendar year and age relative to mortality rates at the same age in the previous year. Red cells imply that mortality is deteriorating; green small rates of improvement, and blue and white strong rates of improvement. The black diagonal line follows the .hi-pink[progress of the 1930 cohort]. ] --- # Thanks! <img src="img/xaringan.png" class="title-hex" width="50" height="50" align="right"> <br> <br> <br> <br> Slides created with the R package [xaringan](https://github.com/yihui/xaringan). <br> <br> <br> Course material available via <br> <img align="left" src="img/gitfront.png" width="35" hspace="10"/> https://gitfront.io/r/jensrobben/mUZ8CcDz5x1L/Lausanne-summer-school/ <style type="text/css"> .pull-right ~ * { clear: unset; } .pull-right + * { clear: both; } </style> <style type="text/css"> .inverse .remark-slide-number { display: none; } </style>