class: center, middle, inverse, title-slide, clear # Modelling and quantifying mortality and longevity risk ## Tutorial session 2 <html> <div style="float:left"></div> <hr align='center' color='#106EB6' size=1px width=97%> </html> ### Katrien Antonio, Torsten Kleinow, Frank van Berkum, Michel Vellekoop and Jens Robben ### <a href="https://gitfront.io/r/jensrobben/mUZ8CcDz5x1L/Lausanne-summer-school/">35th International Summer School of the Swiss Association of Actuaries </a> | June 3-7, 2024 ### <a> </a> --- class: inverse, center, middle name: prologue # Prologue <html><div style='float:left'></div><hr color='#FAFAFA' size=1px width=796px></html> --- name: introduction # Introduction ### Workshop <img align="left" src="img/gitfront.png" width="35" hspace="10"/> https://gitfront.io/r/jensrobben/mUZ8CcDz5x1L/Lausanne-summer-school/ The workshop's landing page on GitFront, where we upload slide decks, R code and data sets. ### Us <img align="left" src="img/Katrien.jpg" width="160"/ hspace="0"> <img align="left" src="img/Michel.jpg" width="160" hspace="45"/> <img align="left" src="img/Torsten.jpg" width="160" hspace="0"/> <img align="left" src="img/Frank.jpg" width="160" hspace="45"/> <img align="left" src="img/Jens.png" width="160" hspace="0"/> <img align="left" src="img/Empty.png" width="100" height="160" hspace="0"/> [Katrien Antonio](https://katrienantonio.github.io) [
](mailto:katrien.antonio@kuleuven.be) [Michel Vellekoop](https://www.uva.nl/profiel/v/e/m.h.vellekoop/m.h.vellekoop.html) [
](mailto:m.h.vellekoop@uva.nl) [Torsten Kleinow](https://www.uva.nl/profiel/k/l/t.kleinow/t.kleinow.html) [
](mailto:t.kleinow@uva.nl) [Frank van Berkum](https://www.uva.nl/profiel/b/e/f.vanberkum/f.vanberkum.html) [
](mailto:F.vanBerkum@uva.nl) [Jens Robben](https://jensrobben.github.io) [
](mailto:jens.robben@kuleuven.be) --- class: inverse, center, middle name: goals # Goals of tutorial session 2 <html><div style='float:left'></div><hr color='#FAFAFA' size=1px width=796px></html> --- # Goals of tutorial session 2 .pull-left[ In this tutorial session you will learn to: * download, import, and visualize multi-population mortality data * calibrate a multi-population mortality model of type Li-Lee * generate scenarios for future mortality rates from single and multi-population mortality models ] .pull-right[ Outline of the workshop: * [Download and import multi-population mortality data](#data) * [Multi-population mortality model calibration](#fitlilee) * [Generating scenarios for future mortality rates](#forecasting) ] --- class: inverse, center, middle name: data # Download and import multi-population mortality data <html><div style='float:left'></div><hr color='#FAFAFA' size=1px width=796px></html> --- # Data collection .pull-left[ ```r countries <- c('AT', 'BE', 'CH', 'DE', 'DK', 'FI', 'FR','IE', 'IS', 'LU', 'NL', 'NO', 'SE', 'UK') ``` ] .pull-right[ We construct a vector `countries` with selected ISO country codes. As motivated in Module 2, we opt for a set of countries with GDP per capita exceeding the average of the EUR zone. ] .pull-left[ ```r shapef <- read_sf( '../data/shapefile/NUTS_RG_01M_2021_4326.shp') shapef <- shapef %>% dplyr::filter(LEVL_CODE == 0) shapef[,'Select'] <- shapef$CNTR_CODE %in% countries ``` ] .pull-right[ We pre-downloaded a shape file from Eurostat, see [here](https://ec.europa.eu/eurostat/web/gisco/geodata/statistical-units/territorial-units-statistics). Read more about the shapefile format [here](https://en.wikipedia.org/wiki/Shapefile). We read in the shape file containing geometric data for European regions and filter it on country level (`LEVL_CODE == 0`) using the `filter` function from {dplyr}. We create a new Boolean variable `Select` to denote the inclusion of countries in the multi-population mortality data set (`TRUE` or `FALSE`). ] --- class: clear .pull-left[ We plot the selected countries using `ggplot` instructions, in particular the layer `geom_sf` is useful to visualize spatial data. The argument `geometry` specifies the shapes of the countries. Let's have a look at the result! ```r ggplot(shapef) + geom_sf(aes(fill = Select, group = CNTR_CODE, geometry = geometry), show.legend = FALSE) + coord_sf(xlim = c(-22,32), ylim = c(37,70)) + ggtitle('Multi-population data set') + theme_bw(base_size = 15) + scale_fill_manual(values = c('white', RCLRblue)) ``` ] .pull-right[ <img src="Tutorial-2_files/figure-html/unnamed-chunk-5-1.svg" style="display: block; margin: auto;" /> ] --- # Life expectancy In the R script [life_exp_data.R](https://gitfront.io/r/jensrobben/mUZ8CcDz5x1L/Lausanne-summer-school/raw/scripts/life_exp_data.R), we explain how to retrieve the country-specific period life expectancy at birth from the HMD using the `hmd.e0` function of the {demography} package. We ran this script in advance and stored the resulting data set in the GitHub repository, enabling direct access through: ```r Data_e0 <- readRDS("../data/hmd/life_exp.rds") ``` The countries stored in this data set are shown below. We print the first observations for Switzerland, as an example: ```r Data_e0$Country %>% unique() ## [1] "AT" "BE" "CH" "DE" "DK" "FI" "FR" "IE" "IS" "LU" "NL" "NO" "SE" "UK" ``` ```r head(Data_e0 %>% dplyr::filter(Country == 'CH'), n = 3) ## Year Country Female Male Total ## 1 1876 CH 41.93 38.53 40.19 ## 2 1877 CH 41.52 38.52 39.99 ## 3 1878 CH 41.88 39.30 40.57 ``` Our goal is to visualize the evolution in period life expectancy across the selected countries, similar to some of the graphs shown in Module 1 and 3. --- class: clear .left-column[ <br>
## Your turn ] .right-column[ **<span style="color:red">Q</span>**: Complete the following `ggplot` instructions to plot the country-specific life expectancy of a male newborn as a function of the year (from 1920 onwards). Plot the life expectancy curve of Switzerland in orange and the Netherlands in blue. ```r df <- Data_e0 %>% dplyr::filter(... >= 1920) colours <- rep('gray90', length(countries)) colours[which(countries %in% ...)] <- orange colours[which(countries %in% ...)] <- RCLRbg ggplot(df) + geom_line(aes(x = ..., y = ..., group = ..., colour = ...), linewidth = 1.1, show.legend = FALSE) + theme_bw(base_size = 15) + ggtitle("Male life expectancy") + ylab(bquote(e['0,t'])) + scale_color_manual(values = ...) + coord_cartesian(ylim = c(38, 85)) ``` ] --- class: clear .left-column[ <br>
## Your turn ] .right-column[ **<span style="color:red">A</span>**: Complete the following `ggplot` instructions to plot the country-specific life expectancy of a male newborn as a function of the year (from 1920 onwards). Plot the life expectancy curve of Switzerland in orange and the Netherlands in blue. ```r df <- Data_e0 %>% dplyr::filter(`Year` >= 1920) colours <- rep('gray90', length(countries)) colours[which(countries %in% `c('CH')`)] <- orange colours[which(countries %in% `c('NL')`)] <- RCLRbg ggplot(df) + geom_line(aes(x = `Year`, y = `Male`, group = `Country`, colour = `Country`), linewidth = 1.1, show.legend = FALSE) + theme_bw(base_size = 15) + ggtitle("Male life expectancy") + ylab(bquote(e['0,t'])) + scale_color_manual(values = `colours`) + coord_cartesian(ylim = c(38, 85)) ``` ] --- class: clear .left-column[ <br>
## Your turn ] .right-column[ **<span style="color:red">A</span>**: Complete the following `ggplot` instructions to plot the country-specific life expectancy of a male newborn as a function of the year (from 1920 onwards). Plot the life expectancy curve of Switzerland in orange and of Netherlands in blue. <br> <img src="Tutorial-2_files/figure-html/unnamed-chunk-12-1.svg" style="display: block; margin: auto;" /> ] --- class: clear .left-column[ <br>
## Your turn ] .right-column[ **<span style="color:red">A</span>**: Complete the following `ggplot` instructions to plot the country-specific life expectancy of a female newborn as a function of the year (from 1920 onwards). Plot the life expectancy curve of Switzerland in orange and of the Netherlands in blue. <br> <img src="Tutorial-2_files/figure-html/unnamed-chunk-13-1.svg" style="display: block; margin: auto;" /> ] --- # Download mortality data In the R-script [multi_pop_mort_data.R](https://gitfront.io/r/jensrobben/mUZ8CcDz5x1L/Lausanne-summer-school/raw/scripts/multi_pop_mort_data.R), we explain how to construct a multi-population mortality data set, including deaths and exposures for each considered country, by downloading mortality data from the HMD using `hmd.mx` function ({demography}). We stored the data set in the GitHub repository and access it through: ```r Data_M <- readRDS("../data/hmd/data_M.rds") Data_F <- readRDS("../data/hmd/data_F.rds") names(Data_M) ## [1] "UNI" "ALL" ``` The data objects `Data_M` (for males) and `Data_F` (for females) consist of two entries: `$UNI` contains the .hi-pink[country-specific] mortality data sets and `$ALL` the combined mortality data set, where we .hi-pink[aggregated] over all countries. The country-specific mortality data sets include the following countries: ```r names(Data_M$UNI) ## [1] "AT" "BE" "CH" "DE" "DK" "FI" "FR" "IE" "IS" "LU" "NL" "NO" "SE" "UK" ``` and consists of deaths (`$dtx`), exposures (`$etx`), and weights (`$wa`) for the years 1970-2019 and ages 0-90: ```r names(Data_M$UNI$CH) ## [1] "dtx" "etx" "wa" ``` --- class: clear .pull-left[ ```r *mat_etx <- sapply(countries, function(c) * Data_M$UNI[[c]][['etx']]["2019",]) df_etx <- reshape2::melt(mat_etx) colnames(df_etx) <- c('Age','Country','Value') order <- c('IS','LU', 'IE', 'NO','DK','FI','CH', 'AT', 'SE','BE','NL','FR','UK','DE') df_etx$Country <- factor(df_etx$Country, levels = order) ggplot(df_etx) + geom_bar(aes(x = Age, y = Value, group = Country, fill = Country, alpha = Country), stat = 'identity', width = 1) + theme_bw(base_size = 15) + ylab('Stacked exposure') + scale_fill_manual(values = pal) + scale_alpha_manual(values = ifelse( order=='CH',1,0.25)) ``` ] .pull-right[ We retrieve the male exposure data `etx` for all selected countries in the year 2019 from `Data_M`. The resulting `mat_etx` has dimension `\(91 \times 14\)`, with ages 0-90 and 14 countries under consideration. ] --- class: clear .pull-left[ ```r mat_etx <- sapply(countries, function(c) Data_M$UNI[[c]][['etx']]["2019",]) *df_etx <- reshape2::melt(mat_etx) *colnames(df_etx) <- c('Age','Country','Value') order <- c('IS','LU', 'IE', 'NO','DK','FI','CH', 'AT', 'SE','BE','NL','FR','UK','DE') df_etx$Country <- factor(df_etx$Country, levels = order) ggplot(df_etx) + geom_bar(aes(x = Age, y = Value, group = Country, fill = Country, alpha = Country), stat = 'identity', width = 1) + theme_bw(base_size = 15) + ylab('Stacked exposure') + scale_fill_manual(values = pal) + scale_alpha_manual(values = ifelse( order=='CH',1,0.25)) ``` ] .pull-right[ We retrieve the male exposure data `etx` for all selected countries in the year 2019 from `Data_M`. We transform the obtained matrix into a long data format (i.e. a data frame), suitable for plotting with `ggplot`. ] --- class: clear .pull-left[ ```r mat_etx <- sapply(countries, function(c) Data_M$UNI[[c]][['etx']]["2019",]) df_etx <- reshape2::melt(mat_etx) colnames(df_etx) <- c('Age','Country','Value') *order <- c('IS','LU', 'IE', 'NO','DK','FI','CH', * 'AT', 'SE','BE','NL','FR','UK','DE') *df_etx$Country <- factor(df_etx$Country, * levels = order) ggplot(df_etx) + geom_bar(aes(x = Age, y = Value, group = Country, fill = Country, alpha = Country), stat = 'identity', width = 1) + theme_bw(base_size = 15) + ylab('Stacked exposure') + scale_fill_manual(values = pal) + scale_alpha_manual(values = ifelse( order=='CH',1,0.25)) ``` ] .pull-right[ We retrieve the male exposure data `etx` for all selected countries in the year 2019 from `Data_M`. We transform the obtained matrix into a long data format (i.e. a data frame), suitable for plotting with `ggplot`. We fix the order in which countries should be displayed on the plot. ] --- class: clear .pull-left[ ```r mat_etx <- sapply(countries, function(c) Data_M$UNI[[c]][['etx']]["2019",]) df_etx <- reshape2::melt(mat_etx) colnames(df_etx) <- c('Age','Country','Value') order <- c('IS','LU', 'IE', 'NO','DK','FI','CH', 'AT', 'SE','BE','NL','FR','UK','DE') df_etx$Country <- factor(df_etx$Country, levels = order) *ggplot(df_etx) + * geom_bar(aes(x = Age, y = Value, * group = Country, * fill = Country, * alpha = Country), * stat = 'identity', width = 1) + * theme_bw(base_size = 15) + * ylab('Stacked exposure') + * scale_fill_manual(values = pal) + * scale_alpha_manual(values = ifelse( * order=='CH',1,0.25)) ``` ] .pull-right[ We retrieve the male exposure data `etx` for all selected countries in the year 2019 from `Data_M`. We transform the obtained matrix into a long data format (i.e. a data frame), suitable for plotting with `ggplot`. We fix the order in which countries should be displayed on the plot. We create a stacked exposure plot showing exposures for each country at each individual age. We highlight the Swiss exposures in red and apply shading to differentiate with other countries. In {ggplot} alpha refers to opacity. Values of alpha range from 0 to 1, with lower values corresponding to more transparent colors. ] --- class: clear <img src="Tutorial-2_files/figure-html/unnamed-chunk-21-1.svg" style="display: block; margin: auto;" /> --- class: clear .left-column[ <br>
## Your turn ] .right-column[ **<span style="color:red">Q</span>**: Create a similar plot that stacks the country-specific death counts in 2019 by age. Highlight the Swiss death counts in red. ] --- class: clear .left-column[ <br>
## Your turn ] .right-column[ **<span style="color:red">A</span>**: Create a similar plot that stacks the country-specific death counts in 2019 by age. Highlight the Swiss death counts in red. ```r mat_dtx <- sapply(countries, function(c) Data_M$UNI[[c]][['dtx']]["2019",]) df_dtx <- reshape2::melt(mat_dtx) colnames(df_dtx) <- c('Age','Country','Value') order <- c('IS','LU', 'IE', 'NO','DK','FI','CH', 'AT', 'SE','BE','NL','FR','UK','DE') df_dtx$Country <- factor(df_dtx$Country, levels = order) ggplot(df_dtx) + geom_bar(aes(x = Age, y = Value, group = Country, fill = Country, alpha = Country), stat = 'identity', width = 1) + theme_bw(base_size = 15) + ylab('Stacked death counts') + scale_fill_manual(values = pal) + scale_alpha_manual(values = ifelse( order=='CH', 1, 0.25)) ``` ] --- class: clear <img src="Tutorial-2_files/figure-html/unnamed-chunk-23-1.svg" style="display: block; margin: auto;" /> --- class: clear .left-column[ <br>
## Your turn ] .right-column[ **<span style="color:red">Q</span>**: What is the reason behind the sudden decrease in the stacked exposure at the age of 73? <img src="Tutorial-2_files/figure-html/unnamed-chunk-24-1.svg" style="display: block; margin: auto;" /> ] --- class: clear .left-column[ <br>
## Your turn ] .right-column[ **<span style="color:red">A</span>**: Recall that our plots were constructed for data from the year 2019. People aged 73 in the year 2019 were born in 1946, the start of the so-called baby boom, with many newborns in 1946 and also 1947. An increase in exposure is noticeable when moving from the right (ages > 73) to the left in the exposure plot. ] --- class: inverse, center, middle name: fitlilee # Multi-population mortality model calibration <html><div style='float:left'></div><hr color='#FAFAFA' size=1px width=796px></html> --- # A multi-population model of type Li-Lee .pull-left[ We put focus on the Poisson multi-population mortality model of type Li-Lee: `\begin{align*} D_{x,t}^{(c)} &\sim \text{POI}\left(E_{x,t}^{(c)} \cdot \mu_{x,t}^{(c)} \right) \\ \log \mu_{x,t}^{(c)} &= \log \mu_{x,t}^{(EU)} + \log \tilde{\mu}_{x,t}^{(c)}, \end{align*}` where `\((c)\)` refers to the country of interest, e.g., NL or CH. ] .pull-right[ In this tutorial, we calibrate the Li-Lee model in two steps and build (for each step) on what we learned in Tutorial 1: 1. calibration of the .hi-pink[common] European mortality trend: `\begin{align*} D_{x,t}^{(EU)} &\sim \text{POI}\left(E_{x,t}^{(EU)} \cdot \mu_{x,t}^{(EU)} \right) \\ \log \mu_{x,t}^{(EU)} &= A_x + B_x K_t. \end{align*}` on the European (aggregated) death counts and exposures 2. calibration of the .hi-pink[country-specific deviation] from the common trend: `\begin{align*} D_{x,t}^{(c)} &\sim \text{POI}\left(E_{x,t}^{(c)} \cdot \mu_{x,t}^{(c)} \right) \\ \log \mu_{x,t}^{(c)} &= (\hat{A}_x + \hat{B}_x \hat{K}_t) + (\alpha_x + \beta_x \kappa_t) \\ &= \log \hat{\mu}_{x,t}^{(EU)} + (\alpha_x + \beta_x \kappa_t) \end{align*}` on the country-specific death counts and exposures. ] --- # Calibration step 1: EU mortality trend .pull-left[ ```r years <- 1970:2019 ages <- 0:90 etxALL <- Data_M$ALL$etx dtxALL <- Data_M$ALL$dtx waALL <- Data_M$ALL$wa ``` ] .pull-right[ We specify the calibration period as `years` and the age range as `ages`. We extract the (aggregated) European exposures, deaths, and weights and put these in `etxALL`, `dtxALL`, and `waALL` respectively. ] .pull-left[ ```r source('../scripts/fitModels.R') LC1 <- fit701(ages, years, etxALL, dtxALL, waALL) ``` ] .pull-right[ We fit the Lee-Carter model on the European data using the `fit701` function (as discussed in Tutorial 1). ] .pull-left[ ```r A.x <- LC1$beta1 B.x <- LC1$beta2 K.t <- LC1$kappa2 sum(B.x); sum(K.t) ## [1] 1 ## [1] 5.484502e-14 ``` ] .pull-right[ We store the calibrated parameters `\(\hat{A}_x\)`, `\(\hat{B}_x\)`, and `\(\hat{K}_t\)` in `A.x`, `B.x`, and `K.t`, respectively, and verify the identifiability constraints. ] .pull-left[ ```r mutxALL <- LC1$mhat ``` ] .pull-right[ We extract the calibrated force of mortality at EU level `\(\hat{\mu}_{x,t}^{(EU)}\)` and store it in `mutxALL`. ] --- class: clear <img src="Tutorial-2_files/figure-html/unnamed-chunk-29-1.svg" style="display: block; margin: auto;" /> --- # Calibration step 2: Swiss mortality deviation .pull-left[ ```r years <- 1970:2019 ages <- 0:90 etxCH <- Data_M$UNI$CH$etx dtxCH <- Data_M$UNI$CH$dtx waCH <- Data_M$UNI$CH$wa ``` ```r LC2 <- fit701(ages, years, etxCH * mutxALL, dtxCH, waCH) ``` ```r a.x <- LC2$beta1 b.x <- LC2$beta2 k.t <- LC2$kappa2 ``` ```r sum(b.x); sum(k.t) ## [1] 1 ## [1] -9.048318e-15 ``` ```r mutxCH <- mutxALL * LC2$mhat ``` ] .pull-right[ We now focus on calibrating the country-specific deviation from the EU mortality trend: `\begin{align*} D_{x,t}^{(c)} &\sim \text{POI}\left((E_{x,t}^{(c)} \cdot \hat{\mu}_{x,t}^{(EU)}) \cdot \tilde{\mu}_{x,t}^{(c)} \right) \\ \log \tilde{\mu}_{x,t}^{(c)} &= \alpha_x + \beta_x \kappa_t. \end{align*}` As such, we calibrate the Swiss mortality deviation from the death counts `\(d_{x,t}^{(CH)}\)` and adjusted exposures `\(E_{x,t}^{(CH)} \times \hat{\mu}_{x,t}^{(EU)}\)`. We store the calibrated parameters `\(\hat{\alpha}_x\)`, `\(\hat{\beta}_x\)`, and `\(\hat{\kappa}_t\)` in `a.x`, `b.x`, and `k.t`, respectively. We verify the identifiability constraints. We calculate the estimated Swiss force of mortality `\(\hat{\mu}_{x,t}^{(CH)}\)` and store it in `mutxCH`. ] --- class: clear <img src="Tutorial-2_files/figure-html/unnamed-chunk-35-1.svg" style="display: block; margin: auto;" /> --- name: Goodness-of-fit # Goodness-of-fit .pull-left[ We construct a {ggplot2} heatmap of the residuals. ```r grid <- expand.grid(period = years, age = ages) grid$res <- as.vector(LC2$epsilon) names(grid) <- c("Year", "Age", "Residual") p <- ggplot(grid, aes(x = Year, y = Age)) + geom_tile(aes(fill = Residual)) + scale_fill_gradientn(colours = topo.colors(7), breaks = c(-4,-2,0,2,4,6)) + theme_bw(base_size = 15) + theme(legend.position = "bottom") p ``` ] .pull-right[ <img src="Tutorial-2_files/figure-html/unnamed-chunk-37-1.svg" style="display: block; margin: auto;" /> ] --- class: clear <img src="Tutorial-2_files/figure-html/unnamed-chunk-39-1.svg" style="display: block; margin: auto;" /> --- class: clear .left-column[ <br>
## Your turn ] .right-column[ **<span style="color:red">Q</span>**: Calibrate the Li-Lee on Swiss female data. Visualize the parameter estimates and perform some goodness-of-fit tests. As a challenge you may expand the multi-population data set with pandemic data points (2020 and 2021) as well as the year 2022 observations. Recalibrate the Li-Lee models and discuss your findings. ] --- class: inverse, center, middle name: forecasting # Generating scenarios for future mortality rates <html><div style='float:left'></div><hr color='#FAFAFA' size=1px width=796px></html> --- # Time dynamics in the Li-Lee model As discussed in Modules 2-3, we use the ARIMA toolbox to find a suitable time series specification for the fitted period effects. .pull-left[ We use the .hi-pink[Random Walk with Drift (RWD)], or ARIMA(0,1,0), for the period effect in the Lee-Carter model and for the common period effect in the Li-Lee model. We denote the calibrated period effects by `\(\hat{K}_t\)`. This implies: `\begin{align*} \hat{K}_{t} &= \hat{K}_{t-1}+\theta+\epsilon_{t} \\ \epsilon_t & \sim N(0,\sigma_{\epsilon}^2), \end{align*}` where `\(\theta\)` is the drift parameter and `\(\epsilon_t\)` are i.i.d. error terms. We have to estimate the unknown parameters `\(\theta\)` and `\(\sigma_{\epsilon}^2\)`. Explain intuitively how you would do this. ] .pull-right[ We use the .hi-pink[autoregressive process of order 1], or ARIMA(1,0,0), for the country-specific period effect in the Li-Lee model. We denote the calibrated period effects by `\(\hat{\kappa}_t\)`. This implies: `\begin{align*} \hat{\kappa}_{t} &= c+a \hat{\kappa}_{t-1}+\delta_t \\ \delta_t & \sim N(0,\sigma_{\delta}^2), \end{align*}` where `\(a\)` is the AR(1) parameter, `\(c\)` the intercept, and `\(\delta_t\)` are i.i.d. error terms. We have to estimate the unknown parameters `\(a\)`, `\(c\)`, and `\(\sigma_{\delta}^2\)`. ] --- # Time dynamics in the Li-Lee model (cont'd) .pull-left[ As explained in Module 3, AG2022 multi-population mortality model in fact uses a multivariate time series for `\((K_t^m,\: K_t^v,\: \kappa_t^m,\: \kappa_t^v)\)`, both in the calibration as well as simulation steps. In this tutorial we take a *simplified* approach. We focus on one gender, and calibrate the time series models for `\(K_t\)` and `\(\kappa_t\)` independent from each other. ] .pull-right[ Some hints and references for the calibration of the 4-dimensional time series model outlined in Module 3: - assume that the four-dimensional vector of error terms `\((\epsilon_t^m, \epsilon_t^v, \delta_t^m, \delta_t^v)\)` is time-independent and follows a multivariate Gaussian distribution with mean `\((0,0,0,0)\)` and covariance matrix `\(\boldsymbol{C}\)`. - based on this distributional assumption, you can set up the Gaussian log-likelihood function using the observations `\((K_t^m, K_t^v, \kappa_t^m, \kappa_t^v)\)`. This log-likelihood function can then be implemented in R. - next, optimize the log-likelihood with respect to the parameters `\(\theta^m\)`, `\(\theta^v\)`, `\(a^m\)`, `\(a^v\)`, `\(c^m\)`, `\(c^v\)`, and `\(\boldsymbol{C}\)`. You can use standard multi-parameter optimization algorithms, such as the `optim` or `nlminb` functions available in the R package {stats}. ] --- class: clear .pull-left[ ```r library(forecast) df_K.t <- data.frame(Year = years, K.t = K.t) ts_RWD <- Arima(df_K.t$K.t, order = c(0, 1, 0), include.drift = TRUE) ts_RWD ## Series: df_K.t$K.t ## ARIMA(0,1,0) with drift ## ## Coefficients: ## drift ## -2.0014 ## s.e. 0.2199 ## ## sigma^2 = 2.418: log likelihood = -90.65 ## AIC=185.3 AICc=185.56 BIC=189.09 ``` ] .pull-right[ We continue with the Li-Lee parameters calibrated earlier on in this tutorial. We store the calibrated common period effect `\(\hat{K}_t\)` in the data frame `df_K.t`. We use the `Arima(.)` function from the {forecast} package to fit the RWD to the calibrated `\(\hat{K}_t\)`. We store the calibrated model in `ts_RWD`. The argument `order` specifies the ARIMA model: `c(0,1,0)` refers to the random walk. More general, in the `Arima(.)` function the components `(p, d, q)` are the AR order, the degree of differencing and the MA order. We put `include.drift = TRUE` to include the drift term (denoted as `\(\theta\)` on the previous sheet). ] --- class: clear .pull-left[ We call the `simulate` function form the {forecast} package to generate 1 000 scenarios or paths for the random walk with drift. We append each generated path with the calibrated parameter estimate in the last observed year, i.e., `\(\hat{K}_{2019}\)`. We simulate 51 years ahead in the future, from 2020 to 2070. ```r K.tmax <- K.t[length(years)] sim_RWD <- replicate(1000, c(K.tmax, simulate(ts_RWD, nsim = 51))) ``` We store the generated scenarios in long format, in a data frame, using the `melt` function from the {reshape2} package, suitable for visualizing using `ggplot` instructions. We also keep track of the year to which the simulated period effect corresponds. ```r df_sim_RWD <- reshape2::melt(sim_RWD, varnames = c('Year','Sim')) df_sim_RWD$Year <- 2019 + df_sim_RWD$Year - 1 ``` ] .pull-right[ ```r head(df_sim_RWD) ## Year Sim value ## 1 2019 1 -53.46803 ## 2 2020 1 -54.52742 ## 3 2021 1 -56.30656 ## 4 2022 1 -61.05451 ## 5 2023 1 -62.04191 ## 6 2024 1 -63.97753 ``` ```r tail(df_sim_RWD) ## Year Sim value ## 51995 2065 1000 -152.3061 ## 51996 2066 1000 -153.4829 ## 51997 2067 1000 -158.0877 ## 51998 2068 1000 -159.3650 ## 51999 2069 1000 -160.1182 ## 52000 2070 1000 -161.2104 ``` ] --- class: clear .pull-left[ We visualize simulation 1: ```r ggplot() + geom_line(aes(x = Year, y = K.t), data = df_K.t) + geom_point(aes(x = Year, y = K.t), data = df_K.t) + geom_line(aes(x = Year, y = value), data = df_sim_RWD %>% dplyr::filter(Sim %in% c(1)), col = RCLRbg) + theme_bw(base_size = 15) + xlab('Year') + ylab(bquote(K[t])) + xlim(1969, 2071) ``` ] .pull-right[ <img src="Tutorial-2_files/figure-html/unnamed-chunk-46-1.svg" style="display: block; margin: auto;" /> ] --- class: clear We visualize simulations 1 - 2: <img src="Tutorial-2_files/figure-html/unnamed-chunk-47-1.svg" style="display: block; margin: auto;" /> --- class: clear We visualize simulations 1 - 3: <img src="Tutorial-2_files/figure-html/unnamed-chunk-48-1.svg" style="display: block; margin: auto;" /> --- class: clear We visualize simulations 1 - 100: <img src="Tutorial-2_files/figure-html/unnamed-chunk-49-1.svg" style="display: block; margin: auto;" /> --- class: clear We visualize simulations 1 - 1000: <img src="Tutorial-2_files/figure-html/unnamed-chunk-50-1.svg" style="display: block; margin: auto;" /> --- class: clear .pull-left[ The best-estimate for the future value `\(K_{t_{\max} + h}\)`, `\(h\)` years from `\(t_{\max}\)`, is `\begin{align*} K_t + h \theta \end{align*}` The 95% bounds for the prediction intervals are `\begin{align*} \pm 1.96 \sqrt{h} \sigma_{\epsilon}, \end{align*}` on top of the best-estimate value. We calculate the best-estimate and the prediction intervals using the `forecast` function from the {forecast} package. We store the results in `df_cb.Kt` ```r fcast <- forecast(ts_RWD, h = 51, level = 95) df_cb_K.t <- data.frame( 'Year' = 2019:2070, 'BE' = c(K.tmax, fcast$mean), 'Q1' = c(K.tmax, fcast$lower[,1]), 'Q2' = c(K.tmax, fcast$upper[,1])) ``` ] .pull-right[ <img src="Tutorial-2_files/figure-html/unnamed-chunk-52-1.svg" style="display: block; margin: auto;" /> ] --- class: clear .pull-left[ Alternatively, we can calculate the empirical, point-wise prediction intervals by taking the 2.5%, 50%, and 97.5% quantile of the 1 000 simulated values for `\(K_{t_{\max} +h}\)`, at every `\(h>0\)`. ```r df_cb_K.t <- df_sim_RWD %>% group_by(Year) %>% reframe('MD' = quantile(value, 0.5), 'Q1' = quantile(value, 0.025), 'Q2' = quantile(value, 0.975)) ``` Let us have a look at the prediction intervals produced with this approach! ] .pull-right[ <img src="Tutorial-2_files/figure-html/unnamed-chunk-54-1.svg" style="display: block; margin: auto;" /> ] --- class: clear .pull-left[ ```r library(forecast) df_k.t <- data.frame(Year = years, k.t = k.t) ts_AR1 <- Arima(df_k.t$k.t, order = c(1, 0, 0), include.mean = TRUE, method = "ML") c <- ts_AR1$coef[2]*(1-ts_AR1$coef[1]) c ## intercept ## -0.06202683 lt_be <- as.numeric(c/(1-ts_AR1$coef[1])) lt_be ## [1] -3.550983 ``` ] .pull-right[ We continue with the AR(1) specification for the Swiss-specific period effect, stored in `k.t`. We store the calibrated Swiss period effect `\(\hat{\kappa}_t\)` in the data frame `df_k.t`. We use the `Arima(.)` function from the {forecast} package to fit the AR(1)-process with intercept to the calibrated `\(\hat{\kappa}_t\)`. The argument `order` specifies the ARIMA model: `c(1,0,0)` refers to the AR(1)-process. We set `include.mean = TRUE` to include an intercept, as explained in the section *Understanding constants in R* in [this source](https://otexts.com/fpp2/arima-r.html). The connection between the fitted mean `\(\mu\)` and the intercept `\(c\)` in our AR(1) specification is: `\(c = \mu (1-a)\)`, with `\(a\)` the AR(1) coefficient. As explained in Module 3, in the long term the `\(\kappa_t\)` series converges to a constant, stored in `lt_be`. ] --- class: clear .pull-left[ We call the `simulate` function form the {forecast} package to generate 1 000 scenarios for the AR(1) process. We append each forecast with the calibrated parameter estimate in the last year, i.e., `\(\hat{\kappa}_{2019}\)`. ```r k.tmax <- k.t[length(years)] sim_AR1 <- replicate(1000, c(k.tmax, simulate(ts_AR1, nsim = 51))) ``` We put the generated scenarios in a long data frame using the `melt` function from the {reshape2} package, suitable for visualizing using `ggplot` instructions. We also keep track of the year to which the simulated period effect corresponds. ```r df_sim_AR1 <- reshape2::melt(sim_AR1, varnames = c('Year','Sim')) df_sim_AR1$Year <- 2019 + df_sim_AR1$Year - 1 ``` ] .pull-right[ ```r head(df_sim_AR1) ## Year Sim value ## 1 2019 1 -10.50163 ## 2 2020 1 -10.25245 ## 3 2021 1 -12.46729 ## 4 2022 1 -13.19151 ## 5 2023 1 -14.41084 ## 6 2024 1 -13.40085 ``` ```r tail(df_sim_AR1) ## Year Sim value ## 51995 2065 1000 -2.083091 ## 51996 2066 1000 -3.044011 ## 51997 2067 1000 -3.461855 ## 51998 2068 1000 -2.407353 ## 51999 2069 1000 -2.207797 ## 52000 2070 1000 -1.618796 ``` ] --- class: clear .pull-left[ We visualize simulation 1: ```r ggplot() + geom_line(aes(x = Year, y = k.t), data = df_k.t) + geom_point(aes(x = Year, y = k.t), data = df_k.t) + geom_line(aes(x = Year, y = value), data = df_sim_AR1 %>% dplyr::filter(Sim %in% c(1)), col = pal[1]) + theme_bw(base_size = 15) + xlab('Year') + ylab(bquote(kappa[t])) + xlim(1969, 2071) ``` ] .pull-right[ <img src="Tutorial-2_files/figure-html/unnamed-chunk-61-1.svg" style="display: block; margin: auto;" /> ] --- class: clear We visualize simulations 1 - 2: <img src="Tutorial-2_files/figure-html/unnamed-chunk-62-1.svg" style="display: block; margin: auto;" /> --- class: clear We visualize simulations 1 - 3: <img src="Tutorial-2_files/figure-html/unnamed-chunk-63-1.svg" style="display: block; margin: auto;" /> --- class: clear We visualize simulations 1 - 100: <img src="Tutorial-2_files/figure-html/unnamed-chunk-64-1.svg" style="display: block; margin: auto;" /> --- class: clear We visualize simulations 1 - 1 000: <img src="Tutorial-2_files/figure-html/unnamed-chunk-65-1.svg" style="display: block; margin: auto;" /> --- class: clear .pull-left[ The best-estimate for the future value `\(\kappa_{t_{\max} + h}\)`, `\(h\)` years from `\(t_{\max}\)`, is `\begin{align*} \frac{c}{1-a} + \left(\kappa_{t_{\max}} - \frac{c}{1-a} \right) a^h \end{align*}` The 95% confidence bounds are calculated as `\begin{align*} \pm 1.96\times \sigma_{\delta} \times \frac{\sqrt{1-a^{2h}}}{\sqrt{1-a^2}}, \end{align*}` on top of the best-estimate value. We calculate the best-estimate and the confidence bounds using the `forecast` function from the {forecast} package. We store the results in `df_cb.kt` ```r fcast <- forecast(ts_AR1, h = 51, level = 95) df_cb_k.t <- data.frame( 'Year' = 2019:2070, 'BE' = c(k.tmax, fcast$mean), 'Q1' = c(k.tmax, fcast$lower[,1]), 'Q2' = c(k.tmax, fcast$upper[,1])) ``` ] .pull-right[ <img src="Tutorial-2_files/figure-html/unnamed-chunk-67-1.svg" style="display: block; margin: auto;" /> ] --- class: clear .pull-left[ Alternatively, we can calculate the empirical, point-wise prediction intervals by taking the 2.5%, 50%, and 97.5% quantile of the 1 000 simulated values for `\(\kappa_{t_{\max} +h}\)`, at every `\(h>0\)`. ```r df_cb_k.t <- df_sim_AR1 %>% group_by(Year) %>% reframe('MD' = quantile(value, 0.5), 'Q1' = quantile(value, 0.025), 'Q2' = quantile(value, 0.975)) ``` Let us have a look at the prediction intervals produced with this approach! ] .pull-right[ <img src="Tutorial-2_files/figure-html/unnamed-chunk-69-1.svg" style="display: block; margin: auto;" /> ] --- class: clear .left-column[ <br>
## Your turn ] .right-column[ Empirically examine the long term convergence of the `\(\kappa_t\)` series for Switzerland. As explained in Module 3, the series `$$\kappa_t = c+ a \cdot \kappa_{t-1}+\delta_t,$$` with i.i.d. Gaussian noise terms `\(\delta_t \sim N(0,\sigma^2_{\delta})\)`, converges to `\(\frac{c}{1-a}\)` in the long run. We already calculated this constant and stored the result in `lt_be`. ```r lt_be ## [1] -3.550983 ``` Do you expect this convergence to go slow or rather fast? Why? ] --- class: clear .pull-left[ <img src="Tutorial-2_files/figure-html/unnamed-chunk-71-1.svg" style="display: block; margin: auto;" /> ] .pull-right[ <img src="Tutorial-2_files/figure-html/unnamed-chunk-72-1.svg" style="display: block; margin: auto;" /> ] --- # Scenarios for future mortality rates .pull-left[ We generate paths for future mortality rates by combining the calibrated age effects with the generated scenarios for the period effects. For each scenario `\(i\)`, we calculate `\begin{align*} \hat{\mu}_{x,t,i} &= \exp\left(\hat{A}_x + \hat{B}_x \hat{K}_{t,i} + \hat{\alpha}_x + \hat{\beta}_x \hat{\kappa}_{t,i} \right) \\ \hat{q}_{x,t,i} &= 1 - \exp\left(- \hat{\mu}_{x,t,i}\right). \end{align*}` where - `\(\hat{K}_{t,i}\)` is the `\(i\)`-th generated scenario of the common period effect at time `\(t\)`, - `\(\hat{\kappa}_{t,i}\)` is the `\(i\)`-th generated scenario of the Swiss period effect at time `\(t\)`. ] .pull-right[ We obtain the generated mortality rates as ```r sim.qtxi <- array(NA, dim = c(52, length(ages), 1000), dimnames = list(2019:2070, ages, 1:1000)) for(x in ages){ mu.txi <- exp(A.x[x+1] + B.x[x+1] * sim_RWD + a.x[x+1] + b.x[x+1] * sim_AR1) sim.qtxi[,x+1,] <- 1 - exp(-mu.txi) } ``` ] --- class: clear <img src="Tutorial-2_files/figure-html/unnamed-chunk-74-1.svg" style="display: block; margin: auto;" /> --- # Thanks! <img src="img/xaringan.png" class="title-hex" width="50" height="50" align="right"> <br> <br> <br> <br> Slides created with the R package [xaringan](https://github.com/yihui/xaringan). <br> <br> <br> Course material available via <br> <img align="left" src="img/gitfront.png" width="35" hspace="10"/> https://gitfront.io/r/jensrobben/mUZ8CcDz5x1L/Lausanne-summer-school/ <style type="text/css"> .pull-right ~ * { clear: unset; } .pull-right + * { clear: both; } </style> <style type="text/css"> .inverse .remark-slide-number { display: none; } </style>