Tutorial-2.knit

# Modelling and quantifying mortality and longevity risk 
## Tutorial session 2

### Katrien Antonio, Torsten Kleinow, Frank van Berkum, Michel Vellekoop and Jens Robben
### <a href="https://gitfront.io/r/jensrobben/mUZ8CcDz5x1L/Lausanne-summer-school/">35th International Summer School of the Swiss Association of Actuaries </a> | June 3-7, 2024
###  <a> </a>

---

# Prologue

---

# Introduction

### Workshop

<img align="left" src="img/gitfront.png" width="35" hspace="10"/> https://gitfront.io/r/jensrobben/mUZ8CcDz5x1L/Lausanne-summer-school/

The workshop's landing page on GitFront, where we upload slide decks, R code and data sets.

### Us

[Katrien Antonio](https://katrienantonio.github.io) [<svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#106EB6;overflow:visible;position:relative;"><path d="M64 112c-8.8 0-16 7.2-16 16v22.1L220.5 291.7c20.7 17 50.4 17 71.1 0L464 150.1V128c0-8.8-7.2-16-16-16H64zM48 212.2V384c0 8.8 7.2 16 16 16H448c8.8 0 16-7.2 16-16V212.2L322 328.8c-38.4 31.5-93.7 31.5-132 0L48 212.2zM0 128C0 92.7 28.7 64 64 64H448c35.3 0 64 28.7 64 64V384c0 35.3-28.7 64-64 64H64c-35.3 0-64-28.7-64-64V128z"/></svg>](mailto:katrien.antonio@kuleuven.be) &nbsp; &nbsp; &nbsp; [Michel Vellekoop](https://www.uva.nl/profiel/v/e/m.h.vellekoop/m.h.vellekoop.html) [<svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#106EB6;overflow:visible;position:relative;"><path d="M64 112c-8.8 0-16 7.2-16 16v22.1L220.5 291.7c20.7 17 50.4 17 71.1 0L464 150.1V128c0-8.8-7.2-16-16-16H64zM48 212.2V384c0 8.8 7.2 16 16 16H448c8.8 0 16-7.2 16-16V212.2L322 328.8c-38.4 31.5-93.7 31.5-132 0L48 212.2zM0 128C0 92.7 28.7 64 64 64H448c35.3 0 64 28.7 64 64V384c0 35.3-28.7 64-64 64H64c-35.3 0-64-28.7-64-64V128z"/></svg>](mailto:m.h.vellekoop@uva.nl) &nbsp; &nbsp;  [Torsten Kleinow](https://www.uva.nl/profiel/k/l/t.kleinow/t.kleinow.html) [<svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#106EB6;overflow:visible;position:relative;"><path d="M64 112c-8.8 0-16 7.2-16 16v22.1L220.5 291.7c20.7 17 50.4 17 71.1 0L464 150.1V128c0-8.8-7.2-16-16-16H64zM48 212.2V384c0 8.8 7.2 16 16 16H448c8.8 0 16-7.2 16-16V212.2L322 328.8c-38.4 31.5-93.7 31.5-132 0L48 212.2zM0 128C0 92.7 28.7 64 64 64H448c35.3 0 64 28.7 64 64V384c0 35.3-28.7 64-64 64H64c-35.3 0-64-28.7-64-64V128z"/></svg>](mailto:t.kleinow@uva.nl) &nbsp; &nbsp; &nbsp; [Frank van Berkum](https://www.uva.nl/profiel/b/e/f.vanberkum/f.vanberkum.html) [<svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#106EB6;overflow:visible;position:relative;"><path d="M64 112c-8.8 0-16 7.2-16 16v22.1L220.5 291.7c20.7 17 50.4 17 71.1 0L464 150.1V128c0-8.8-7.2-16-16-16H64zM48 212.2V384c0 8.8 7.2 16 16 16H448c8.8 0 16-7.2 16-16V212.2L322 328.8c-38.4 31.5-93.7 31.5-132 0L48 212.2zM0 128C0 92.7 28.7 64 64 64H448c35.3 0 64 28.7 64 64V384c0 35.3-28.7 64-64 64H64c-35.3 0-64-28.7-64-64V128z"/></svg>](mailto:F.vanBerkum@uva.nl) &nbsp; &nbsp; [Jens Robben](https://jensrobben.github.io) [<svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#106EB6;overflow:visible;position:relative;"><path d="M64 112c-8.8 0-16 7.2-16 16v22.1L220.5 291.7c20.7 17 50.4 17 71.1 0L464 150.1V128c0-8.8-7.2-16-16-16H64zM48 212.2V384c0 8.8 7.2 16 16 16H448c8.8 0 16-7.2 16-16V212.2L322 328.8c-38.4 31.5-93.7 31.5-132 0L48 212.2zM0 128C0 92.7 28.7 64 64 64H448c35.3 0 64 28.7 64 64V384c0 35.3-28.7 64-64 64H64c-35.3 0-64-28.7-64-64V128z"/></svg>](mailto:jens.robben@kuleuven.be)

---

# Goals of tutorial session 2

---

# Goals of tutorial session 2

In this tutorial session you will learn to:

* download, import, and visualize multi-population mortality data

* calibrate a multi-population mortality model of type Li-Lee

* generate scenarios for future mortality rates from single and multi-population mortality models

]

* [Download and import multi-population mortality data](#data)

* [Multi-population mortality model calibration](#fitlilee)

* [Generating scenarios for future mortality rates](#forecasting)

]
---

# Download and import multi-population mortality data

---

# Data collection

```r
countries <- c('AT', 'BE', 'CH', 'DE', 'DK', 
               'FI', 'FR','IE', 'IS', 'LU', 
               'NL', 'NO', 'SE', 'UK')
```

]

We construct a vector `countries` with selected ISO country codes.

As motivated in Module 2, we opt for a set of countries with GDP per capita exceeding the average of the EUR zone.

]

```r
shapef <- read_sf(
  '../data/shapefile/NUTS_RG_01M_2021_4326.shp')

shapef <- shapef %>% dplyr::filter(LEVL_CODE == 0)

shapef[,'Select'] <- shapef$CNTR_CODE %in% countries
```

]

We pre-downloaded a shape file from Eurostat, see [here](https://ec.europa.eu/eurostat/web/gisco/geodata/statistical-units/territorial-units-statistics). Read more about the shapefile format [here](https://en.wikipedia.org/wiki/Shapefile).

We read in the shape file containing geometric data for European regions and filter it on country level (`LEVL_CODE == 0`) using the `filter` function from {dplyr}.

We create a new Boolean variable `Select` to denote the inclusion of countries in the multi-population mortality data set (`TRUE` or `FALSE`).

]

---
class: clear

We plot the selected countries using `ggplot` instructions, in particular the layer `geom_sf` is useful to visualize spatial data. The argument `geometry` specifies the shapes of the countries.

Let's have a look at the result!

```r
ggplot(shapef) + 
  geom_sf(aes(fill = Select, group = CNTR_CODE,
              geometry = geometry),
          show.legend = FALSE) + 
  coord_sf(xlim = c(-22,32), ylim = c(37,70)) +
  ggtitle('Multi-population data set') + 
  theme_bw(base_size = 15) +
  scale_fill_manual(values = 
                      c('white', RCLRblue))
```

]

<img src="Tutorial-2_files/figure-html/unnamed-chunk-5-1.svg" style="display: block; margin: auto;" />
]
---

# Life expectancy

In the R script [life_exp_data.R](https://gitfront.io/r/jensrobben/mUZ8CcDz5x1L/Lausanne-summer-school/raw/scripts/life_exp_data.R), we explain how to retrieve the country-specific period life expectancy at birth from the HMD using the `hmd.e0` function of the {demography} package.

We ran this script in advance and stored the resulting data set in the GitHub repository, enabling direct access through:

```r
Data_e0 <- readRDS("../data/hmd/life_exp.rds")  
```

The countries stored in this data set are shown below. We print the first observations for Switzerland, as an example:

```r
Data_e0$Country %>% unique()
##  [1] "AT" "BE" "CH" "DE" "DK" "FI" "FR" "IE" "IS" "LU" "NL" "NO" "SE" "UK"
```

```r
head(Data_e0 %>% dplyr::filter(Country == 'CH'), n = 3)
##   Year Country Female  Male Total
## 1 1876      CH  41.93 38.53 40.19
## 2 1877      CH  41.52 38.52 39.99
## 3 1878      CH  41.88 39.30 40.57
```

Our goal is to visualize the evolution in period life expectancy across the selected countries, similar to some of the graphs shown in Module 1 and 3.

---
class: clear

.left-column[
<br>
<svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:2.5em;width:2.5em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:black;overflow:visible;position:relative;"><path d="M373.1 24.97C401.2-3.147 446.8-3.147 474.9 24.97L487 37.09C515.1 65.21 515.1 110.8 487 138.9L289.8 336.2C281.1 344.8 270.4 351.1 258.6 354.5L158.6 383.1C150.2 385.5 141.2 383.1 135 376.1C128.9 370.8 126.5 361.8 128.9 353.4L157.5 253.4C160.9 241.6 167.2 230.9 175.8 222.2L373.1 24.97zM440.1 58.91C431.6 49.54 416.4 49.54 407 58.91L377.9 88L424 134.1L453.1 104.1C462.5 95.6 462.5 80.4 453.1 71.03L440.1 58.91zM203.7 266.6L186.9 325.1L245.4 308.3C249.4 307.2 252.9 305.1 255.8 302.2L390.1 168L344 121.9L209.8 256.2C206.9 259.1 204.8 262.6 203.7 266.6zM200 64C213.3 64 224 74.75 224 88C224 101.3 213.3 112 200 112H88C65.91 112 48 129.9 48 152V424C48 446.1 65.91 464 88 464H360C382.1 464 400 446.1 400 424V312C400 298.7 410.7 288 424 288C437.3 288 448 298.7 448 312V424C448 472.6 408.6 512 360 512H88C39.4 512 0 472.6 0 424V152C0 103.4 39.4 64 88 64H200z"/></svg>
## Your turn

]

.right-column[
**<span style="color:red">Q</span>**: Complete the following `ggplot` instructions to plot the country-specific life expectancy of a male newborn as a function of the year (from 1920 onwards). Plot the life expectancy curve of Switzerland in orange and the Netherlands in blue.

```r
df <- Data_e0 %>% 
  dplyr::filter(... >= 1920)

colours <- rep('gray90', length(countries))
colours[which(countries %in% ...)] <- orange
colours[which(countries %in% ...)] <- RCLRbg

ggplot(df) + 
  geom_line(aes(x = ..., y = ..., 
                group = ..., colour = ...),
            linewidth = 1.1,
            show.legend = FALSE) + 
  theme_bw(base_size = 15) + 
  ggtitle("Male life expectancy") +
  ylab(bquote(e['0,t'])) + 
  scale_color_manual(values = ...) + 
  coord_cartesian(ylim = c(38, 85))
```
]
---

]

.right-column[
**<span style="color:red">A</span>**: Complete the following `ggplot` instructions to plot the country-specific life expectancy of a male newborn as a function of the year (from 1920 onwards). Plot the life expectancy curve of Switzerland in orange and the Netherlands in blue.

```r
df <- Data_e0 %>% 
  dplyr::filter(`Year` >= 1920)

colours <- rep('gray90', length(countries))
colours[which(countries %in% `c('CH')`)] <- orange
colours[which(countries %in% `c('NL')`)] <- RCLRbg

ggplot(df) + 
  geom_line(aes(x = `Year`, y = `Male`, 
                group = `Country`, colour = `Country`),
            linewidth = 1.1,
            show.legend = FALSE) + 
  theme_bw(base_size = 15) + 
  ggtitle("Male life expectancy") +
  ylab(bquote(e['0,t'])) + 
  scale_color_manual(values = `colours`) + 
  coord_cartesian(ylim = c(38, 85))
```
]
---

]

<br>

<img src="Tutorial-2_files/figure-html/unnamed-chunk-12-1.svg" style="display: block; margin: auto;" />
]
---

]

.right-column[
**<span style="color:red">A</span>**: Complete the following `ggplot` instructions to plot the country-specific life expectancy of a female newborn as a function of the year (from 1920 onwards). Plot the life expectancy curve of Switzerland in orange and of the Netherlands in blue.

<br>

<img src="Tutorial-2_files/figure-html/unnamed-chunk-13-1.svg" style="display: block; margin: auto;" />
]
---

# Download mortality data

In the R-script [multi_pop_mort_data.R](https://gitfront.io/r/jensrobben/mUZ8CcDz5x1L/Lausanne-summer-school/raw/scripts/multi_pop_mort_data.R), we explain how to construct a multi-population mortality data set, including deaths and exposures for each considered country, by downloading mortality data from the HMD using `hmd.mx` function ({demography}). We stored the data set in the GitHub repository and access it through:

```r
Data_M <- readRDS("../data/hmd/data_M.rds")  
Data_F <- readRDS("../data/hmd/data_F.rds")
names(Data_M)
## [1] "UNI" "ALL"
```

The data objects `Data_M` (for males) and `Data_F` (for females) consist of two entries: `$UNI` contains the .hi-pink[country-specific] mortality data sets and `$ALL` the combined mortality data set, where we .hi-pink[aggregated] over all countries.

The country-specific mortality data sets include the following countries:

```r
names(Data_M$UNI)
##  [1] "AT" "BE" "CH" "DE" "DK" "FI" "FR" "IE" "IS" "LU" "NL" "NO" "SE" "UK"
```

and consists of deaths (`$dtx`), exposures (`$etx`), and weights (`$wa`) for the years 1970-2019 and ages 0-90:

```r
names(Data_M$UNI$CH)
## [1] "dtx" "etx" "wa"
```

---

```r
*mat_etx <- sapply(countries, function(c) 
* Data_M$UNI[[c]][['etx']]["2019",])

df_etx <- reshape2::melt(mat_etx)
colnames(df_etx) <- c('Age','Country','Value')

order <- c('IS','LU', 'IE', 'NO','DK','FI','CH',
           'AT', 'SE','BE','NL','FR','UK','DE')
df_etx$Country <- factor(df_etx$Country, 
                         levels = order)

ggplot(df_etx) + 
  geom_bar(aes(x = Age, y = Value, 
               group = Country, 
               fill = Country,
               alpha = Country), 
           stat = 'identity', width = 1) + 
  theme_bw(base_size = 15) + 
  ylab('Stacked exposure') + 
  scale_fill_manual(values = pal) + 
  scale_alpha_manual(values = ifelse(
    order=='CH',1,0.25))
```
]

.pull-right[
We retrieve the male exposure data `etx` for all selected countries in the year 2019 from `Data_M`. The resulting `mat_etx` has dimension `$91 \times 14$`, with ages 0-90 and 14 countries under consideration.

]

---
class: clear

```r
mat_etx <- sapply(countries, function(c) 
  Data_M$UNI[[c]][['etx']]["2019",])

*df_etx <- reshape2::melt(mat_etx)
*colnames(df_etx) <- c('Age','Country','Value')

order <- c('IS','LU', 'IE', 'NO','DK','FI','CH',
           'AT', 'SE','BE','NL','FR','UK','DE')
df_etx$Country <- factor(df_etx$Country, 
                         levels = order)

.pull-right[
We retrieve the male exposure data `etx` for all selected countries in the year 2019 from `Data_M`.

We transform the obtained matrix into a long data format (i.e. a data frame), suitable for plotting with `ggplot`.
]

---
class: clear

```r
mat_etx <- sapply(countries, function(c) 
  Data_M$UNI[[c]][['etx']]["2019",])

df_etx <- reshape2::melt(mat_etx)
colnames(df_etx) <- c('Age','Country','Value')

*order <- c('IS','LU', 'IE', 'NO','DK','FI','CH',
*          'AT', 'SE','BE','NL','FR','UK','DE')
*df_etx$Country <- factor(df_etx$Country, 
*                        levels = order)

.pull-right[
We retrieve the male exposure data `etx` for all selected countries in the year 2019 from `Data_M`.

We transform the obtained matrix into a long data format (i.e. a data frame), suitable for plotting with `ggplot`.

We fix the order in which countries should be displayed on the plot.
]

---
class: clear

```r
mat_etx <- sapply(countries, function(c) 
  Data_M$UNI[[c]][['etx']]["2019",])

df_etx <- reshape2::melt(mat_etx)
colnames(df_etx) <- c('Age','Country','Value')

order <- c('IS','LU', 'IE', 'NO','DK','FI','CH',
           'AT', 'SE','BE','NL','FR','UK','DE')
df_etx$Country <- factor(df_etx$Country, 
                         levels = order)

*ggplot(df_etx) + 
* geom_bar(aes(x = Age, y = Value, 
*              group = Country, 
*              fill = Country,
*              alpha = Country), 
*          stat = 'identity', width = 1) + 
* theme_bw(base_size = 15) + 
* ylab('Stacked exposure') + 
* scale_fill_manual(values = pal) + 
* scale_alpha_manual(values = ifelse(
*   order=='CH',1,0.25))
```
]

.pull-right[
We retrieve the male exposure data `etx` for all selected countries in the year 2019 from `Data_M`.

We transform the obtained matrix into a long data format (i.e. a data frame), suitable for plotting with `ggplot`.

We fix the order in which countries should be displayed on the plot.

We create a stacked exposure plot showing exposures for each country at each individual age. We highlight the Swiss exposures in red and apply shading to differentiate with other countries.

In {ggplot} alpha refers to opacity. Values of alpha range from 0 to 1, with lower values corresponding to more transparent colors.
]

---
class: clear

---

]

.right-column[
**<span style="color:red">Q</span>**: Create a similar plot that stacks the country-specific death counts in 2019 by age. Highlight the Swiss death counts in red.
]

---

]

.right-column[
**<span style="color:red">A</span>**: Create a similar plot that stacks the country-specific death counts in 2019 by age. Highlight the Swiss death counts in red.

```r
mat_dtx <- sapply(countries, function(c) 
  Data_M$UNI[[c]][['dtx']]["2019",])

df_dtx <- reshape2::melt(mat_dtx)
colnames(df_dtx) <- c('Age','Country','Value')

order <- c('IS','LU', 'IE', 'NO','DK','FI','CH',
           'AT', 'SE','BE','NL','FR','UK','DE')
df_dtx$Country <- factor(df_dtx$Country, 
                         levels = order)

ggplot(df_dtx) + 
  geom_bar(aes(x = Age, y = Value, 
               group = Country, 
               fill = Country,
               alpha = Country), 
           stat = 'identity', width = 1) + 
  theme_bw(base_size = 15) + 
  ylab('Stacked death counts') + 
  scale_fill_manual(values = pal) + 
  scale_alpha_manual(values = ifelse(
    order=='CH', 1, 0.25))
```
]

---

---

]

.right-column[
**<span style="color:red">Q</span>**: What is the reason behind the sudden decrease in the stacked exposure at the age of 73?

<img src="Tutorial-2_files/figure-html/unnamed-chunk-24-1.svg" style="display: block; margin: auto;" />
]

---
class: clear

]

.right-column[
**<span style="color:red">A</span>**: Recall that our plots were constructed for data from the year 2019. People aged 73 in the year 2019 were born in 1946, the start of the so-called baby boom, with many newborns in 1946 and also 1947.

An increase in exposure is noticeable when moving from the right (ages > 73) to the left in the exposure plot.

]

---

# Multi-population mortality model calibration

---

# A multi-population model of type Li-Lee

.pull-left[
We put focus on the Poisson multi-population mortality model of type Li-Lee:
`\begin{align*}
D_{x,t}^{(c)} &\sim \text{POI}\left(E_{x,t}^{(c)} \cdot \mu_{x,t}^{(c)} \right) \\
\log \mu_{x,t}^{(c)} &= \log \mu_{x,t}^{(EU)} + \log \tilde{\mu}_{x,t}^{(c)},
\end{align*}`
where `$(c)$` refers to the country of interest, e.g., NL or CH.
]

.pull-right[
In this tutorial, we calibrate the Li-Lee model in two steps and build (for each step) on what we learned in Tutorial 1: 
1. calibration of the .hi-pink[common] European mortality trend:
`\begin{align*}
D_{x,t}^{(EU)} &\sim \text{POI}\left(E_{x,t}^{(EU)} \cdot \mu_{x,t}^{(EU)} \right) \\
\log \mu_{x,t}^{(EU)} &= A_x + B_x K_t.
\end{align*}`
on the European (aggregated) death counts and exposures

2. calibration of the .hi-pink[country-specific deviation] from the common trend:
`\begin{align*}
D_{x,t}^{(c)} &\sim \text{POI}\left(E_{x,t}^{(c)} \cdot \mu_{x,t}^{(c)} \right) \\
\log \mu_{x,t}^{(c)} &= (\hat{A}_x + \hat{B}_x \hat{K}_t) + (\alpha_x + \beta_x \kappa_t) \\
&= \log \hat{\mu}_{x,t}^{(EU)} + (\alpha_x + \beta_x \kappa_t)
\end{align*}`
on the country-specific death counts and exposures.
]

---

# Calibration step 1: EU mortality trend

```r
years  <- 1970:2019
ages   <- 0:90
etxALL <- Data_M$ALL$etx
dtxALL <- Data_M$ALL$dtx
waALL  <- Data_M$ALL$wa
```
]

.pull-right[
We specify the calibration period as `years` and the age range as `ages`. We extract the (aggregated) European exposures, deaths, and weights and put these in `etxALL`, `dtxALL`, and `waALL` respectively.
]

```r
source('../scripts/fitModels.R')
LC1 <- fit701(ages, years, etxALL, dtxALL, waALL)
```
]

.pull-right[
We fit the Lee-Carter model on the European data using the `fit701` function (as discussed in Tutorial 1).
]

```r
A.x  <- LC1$beta1
B.x  <- LC1$beta2		
K.t  <- LC1$kappa2
sum(B.x); sum(K.t)
## [1] 1
## [1] 5.484502e-14
```
]

.pull-right[
We store the calibrated parameters `$\hat{A}_x$`, `$\hat{B}_x$`, and `$\hat{K}_t$` in `A.x`, `B.x`, and `K.t`, respectively, and verify the identifiability constraints.

]

```r
mutxALL <- LC1$mhat
```
]

.pull-right[
We extract the calibrated force of mortality at EU level `$\hat{\mu}_{x,t}^{(EU)}$` and store it in `mutxALL`.
]

---

---

# Calibration step 2: Swiss mortality deviation

```r
years <- 1970:2019
ages  <- 0:90
etxCH <- Data_M$UNI$CH$etx
dtxCH <- Data_M$UNI$CH$dtx
waCH  <- Data_M$UNI$CH$wa
```

```r
LC2 <- fit701(ages, years, etxCH * mutxALL, 
              dtxCH, waCH)
```

```r
a.x  <- LC2$beta1
b.x  <- LC2$beta2		
k.t  <- LC2$kappa2
```

```r
sum(b.x); sum(k.t)
## [1] 1
## [1] -9.048318e-15
```

```r
mutxCH <- mutxALL * LC2$mhat
```

]

.pull-right[
We now focus on calibrating the country-specific deviation from the EU mortality trend:
`\begin{align*}
D_{x,t}^{(c)} &\sim \text{POI}\left((E_{x,t}^{(c)} \cdot \hat{\mu}_{x,t}^{(EU)}) \cdot \tilde{\mu}_{x,t}^{(c)} \right) \\
\log \tilde{\mu}_{x,t}^{(c)} &= \alpha_x + \beta_x \kappa_t.
\end{align*}`

As such, we calibrate the Swiss mortality deviation from the death counts `$d_{x,t}^{(CH)}$` and adjusted exposures `$E_{x,t}^{(CH)} \times  \hat{\mu}_{x,t}^{(EU)}$`.

We store the calibrated parameters `$\hat{\alpha}_x$`, `$\hat{\beta}_x$`, and `$\hat{\kappa}_t$` in `a.x`, `b.x`, and `k.t`, respectively.

We verify the identifiability constraints.

We calculate the estimated Swiss force of mortality `$\hat{\mu}_{x,t}^{(CH)}$` and store it in `mutxCH`.
]

---

---

# Goodness-of-fit

We construct a {ggplot2} heatmap of the residuals.

```r
grid <- expand.grid(period = years, age = ages)
grid$res <- as.vector(LC2$epsilon)
names(grid) <- c("Year", "Age", "Residual")

p <- ggplot(grid, aes(x = Year, y = Age)) + 
     geom_tile(aes(fill = Residual)) +
     scale_fill_gradientn(colours = topo.colors(7), 
                       breaks = c(-4,-2,0,2,4,6)) +
     theme_bw(base_size = 15) + 
     theme(legend.position = "bottom")
p
```

]

<img src="Tutorial-2_files/figure-html/unnamed-chunk-37-1.svg" style="display: block; margin: auto;" />
]

---

---
class: clear

]

.right-column[
**<span style="color:red">Q</span>**: Calibrate the Li-Lee on Swiss female data. Visualize the parameter estimates and perform some goodness-of-fit tests.

As a challenge you may expand the multi-population data set with pandemic data points (2020 and 2021) as well as the year 2022 observations. Recalibrate the Li-Lee models and discuss your findings.
]

---

# Generating scenarios for future mortality rates

---

# Time dynamics in the Li-Lee model

As discussed in Modules 2-3, we use the ARIMA toolbox to find a suitable time series specification for the fitted period effects.

We use the .hi-pink[Random Walk with Drift (RWD)], or ARIMA(0,1,0), for the period effect in the Lee-Carter model and for the common period effect in the Li-Lee model.

We denote the calibrated period effects by `$\hat{K}_t$`.

This implies:

`\begin{align*}
\hat{K}_{t} &= \hat{K}_{t-1}+\theta+\epsilon_{t} \\
\epsilon_t & \sim  N(0,\sigma_{\epsilon}^2),
\end{align*}`

where `$\theta$` is the drift parameter and `$\epsilon_t$` are i.i.d. error terms.

We have to estimate the unknown parameters `$\theta$` and `$\sigma_{\epsilon}^2$`. Explain intuitively how you would do this.

]

We use the .hi-pink[autoregressive process of order 1], or ARIMA(1,0,0), for the country-specific period effect in the Li-Lee model. We denote the calibrated period effects by `$\hat{\kappa}_t$`.

This implies:

`\begin{align*}
\hat{\kappa}_{t} &= c+a \hat{\kappa}_{t-1}+\delta_t \\
\delta_t & \sim  N(0,\sigma_{\delta}^2),
\end{align*}`

where `$a$` is the AR(1) parameter, `$c$` the intercept, and `$\delta_t$` are i.i.d. error terms.

We have to estimate the unknown parameters `$a$`, `$c$`, and `$\sigma_{\delta}^2$`.

]

---

# Time dynamics in the Li-Lee model (cont'd)

As explained in Module 3, AG2022 multi-population mortality model in fact uses a multivariate time series for `$(K_t^m,\: K_t^v,\: \kappa_t^m,\: \kappa_t^v)$`, both in the calibration as well as simulation steps.

In this tutorial we take a *simplified* approach. We focus on one gender, and calibrate the time series models for `$K_t$` and `$\kappa_t$` independent from each other.

]

Some hints and references for the calibration of the 4-dimensional time series model outlined in Module 3:

-  assume that the four-dimensional vector of error terms `$(\epsilon_t^m, \epsilon_t^v, \delta_t^m, \delta_t^v)$` is time-independent and follows a multivariate Gaussian distribution with mean `$(0,0,0,0)$` and covariance matrix `$\boldsymbol{C}$`.

- based on this distributional assumption, you can set up the Gaussian log-likelihood function using the observations `$(K_t^m, K_t^v, \kappa_t^m, \kappa_t^v)$`. This log-likelihood function can then be implemented in R.

- next, optimize the log-likelihood with respect to the parameters `$\theta^m$`, `$\theta^v$`, `$a^m$`, `$a^v$`, `$c^m$`, `$c^v$`, and `$\boldsymbol{C}$`. You can use standard multi-parameter optimization algorithms, such as the `optim` or `nlminb` functions available in the R package {stats}.

]

---

```r
library(forecast)

df_K.t <- data.frame(Year = years, K.t = K.t)

ts_RWD <- Arima(df_K.t$K.t, 
                order = c(0, 1, 0),
                include.drift = TRUE)

ts_RWD
## Series: df_K.t$K.t 
## ARIMA(0,1,0) with drift 
## 
## Coefficients:
##         drift
##       -2.0014
## s.e.   0.2199
## 
## sigma^2 = 2.418:  log likelihood = -90.65
## AIC=185.3   AICc=185.56   BIC=189.09
```

]

We continue with the Li-Lee parameters calibrated earlier on in this tutorial.

We store the calibrated common period effect `$\hat{K}_t$` in the data frame `df_K.t`.

We use the `Arima(.)` function from the {forecast} package to fit the RWD to the calibrated `$\hat{K}_t$`. We store the calibrated model in `ts_RWD`.

The argument `order` specifies the ARIMA model: `c(0,1,0)` refers to the random walk.

More general, in the `Arima(.)` function the components `(p, d, q)` are the AR order, the degree of differencing and the MA order.

We put `include.drift = TRUE` to include the drift term (denoted as `$\theta$` on the previous sheet).

]

---

.pull-left[
We call the `simulate` function form the {forecast} package to generate 1 000 scenarios or paths for the random walk with drift. We append each generated path with the calibrated parameter estimate in the last observed year, i.e., `$\hat{K}_{2019}$`. We simulate 51 years ahead in the future, from 2020 to 2070.

```r
K.tmax   <- K.t[length(years)]
sim_RWD <- 
  replicate(1000, c(K.tmax, 
                    simulate(ts_RWD, nsim = 51)))
```

We store the generated scenarios in long format, in a data frame, using the `melt` function from the {reshape2} package, suitable for visualizing using `ggplot` instructions. We also keep track of the year to which the simulated period effect corresponds.

```r
df_sim_RWD <- reshape2::melt(sim_RWD, 
                             varnames = 
                               c('Year','Sim'))
df_sim_RWD$Year <- 2019 + df_sim_RWD$Year - 1
```
]

```r
head(df_sim_RWD)
##   Year Sim     value
## 1 2019   1 -53.46803
## 2 2020   1 -54.52742
## 3 2021   1 -56.30656
## 4 2022   1 -61.05451
## 5 2023   1 -62.04191
## 6 2024   1 -63.97753
```

```r
tail(df_sim_RWD)
##       Year  Sim     value
## 51995 2065 1000 -152.3061
## 51996 2066 1000 -153.4829
## 51997 2067 1000 -158.0877
## 51998 2068 1000 -159.3650
## 51999 2069 1000 -160.1182
## 52000 2070 1000 -161.2104
```

]

---

We visualize simulation 1:

```r
ggplot() + 
  geom_line(aes(x = Year, y = K.t), data = df_K.t) + 
  geom_point(aes(x = Year, y = K.t), data = df_K.t) + 
  geom_line(aes(x = Year, y = value), 
      data = df_sim_RWD %>% dplyr::filter(Sim %in% c(1)),
      col = RCLRbg) + 
  theme_bw(base_size = 15) +
  xlab('Year') + ylab(bquote(K[t])) +
  xlim(1969, 2071)
```
]

.pull-right[
<img src="Tutorial-2_files/figure-html/unnamed-chunk-46-1.svg" style="display: block; margin: auto;" />
]

---

We visualize simulations 1 - 2:

---

We visualize simulations 1 - 3:

---

We visualize simulations 1 - 100:

---

We visualize simulations 1 - 1000:

---

The best-estimate for the future value `$K_{t_{\max} + h}$`, `$h$` years from `$t_{\max}$`, is
`\begin{align*}
K_t + h \theta
\end{align*}`
The 95% bounds for the prediction intervals are 
`\begin{align*}
\pm 1.96 \sqrt{h} \sigma_{\epsilon},
\end{align*}`
on top of the best-estimate value.

We calculate the best-estimate and the prediction intervals using the `forecast` function from the {forecast} package. We store the results in `df_cb.Kt`

```r
fcast <- forecast(ts_RWD, h = 51, level = 95)
df_cb_K.t <- data.frame(
                'Year' = 2019:2070, 
                'BE' = c(K.tmax, fcast$mean), 
                'Q1' = c(K.tmax, fcast$lower[,1]),
                'Q2' = c(K.tmax, fcast$upper[,1]))
```

]

]

---
class: clear

Alternatively, we can calculate the empirical, point-wise prediction intervals by taking the 2.5%, 50%, and 97.5% quantile of the 1 000 simulated values for `$K_{t_{\max} +h}$`, at every `$h>0$`.

```r
df_cb_K.t <- df_sim_RWD %>% 
  group_by(Year) %>%
  reframe('MD' = quantile(value, 0.5),
          'Q1' = quantile(value, 0.025),
          'Q2' = quantile(value, 0.975))
```

Let us have a look at the prediction intervals produced with this approach!

]

]

---

```r
library(forecast)

df_k.t <- data.frame(Year = years, k.t = k.t)

ts_AR1 <- Arima(df_k.t$k.t, 
                order = c(1, 0, 0),
                include.mean = TRUE,
                method = "ML")

c <- ts_AR1$coef[2]*(1-ts_AR1$coef[1])
c
##   intercept 
## -0.06202683

lt_be <- as.numeric(c/(1-ts_AR1$coef[1]))
lt_be
## [1] -3.550983
```

]

We continue with the AR(1) specification for the Swiss-specific period effect, stored in `k.t`.

We store the calibrated Swiss period effect `$\hat{\kappa}_t$` in the data frame `df_k.t`.

We use the `Arima(.)` function from the {forecast} package to fit the AR(1)-process with intercept to the calibrated `$\hat{\kappa}_t$`.

The argument `order` specifies the ARIMA model: `c(1,0,0)` refers to the AR(1)-process.

We set `include.mean = TRUE` to include an intercept, as explained in the section *Understanding constants in R* in [this source](https://otexts.com/fpp2/arima-r.html).

The connection between the fitted mean `$\mu$` and the intercept `$c$` in our AR(1) specification is: `$c = \mu (1-a)$`, with `$a$` the AR(1) coefficient. As explained in Module 3, in the long term the `$\kappa_t$` series converges to a constant, stored in `lt_be`.

]

---

.pull-left[
We call the `simulate` function form the {forecast} package to generate 1 000 scenarios for the AR(1) process. We append each forecast with the calibrated parameter estimate in the last year, i.e., `$\hat{\kappa}_{2019}$`.

```r
k.tmax   <- k.t[length(years)]
sim_AR1 <- 
  replicate(1000, c(k.tmax, 
                    simulate(ts_AR1, nsim = 51)))
```

We put the generated scenarios in a long data frame using the `melt` function from the {reshape2} package, suitable for visualizing using `ggplot` instructions. We also keep track of the year to which the simulated period effect corresponds.

```r
df_sim_AR1 <- reshape2::melt(sim_AR1, 
                             varnames = 
                               c('Year','Sim'))
df_sim_AR1$Year <- 2019 + df_sim_AR1$Year - 1
```
]

```r
head(df_sim_AR1)
##   Year Sim     value
## 1 2019   1 -10.50163
## 2 2020   1 -10.25245
## 3 2021   1 -12.46729
## 4 2022   1 -13.19151
## 5 2023   1 -14.41084
## 6 2024   1 -13.40085
```

```r
tail(df_sim_AR1)
##       Year  Sim     value
## 51995 2065 1000 -2.083091
## 51996 2066 1000 -3.044011
## 51997 2067 1000 -3.461855
## 51998 2068 1000 -2.407353
## 51999 2069 1000 -2.207797
## 52000 2070 1000 -1.618796
```

]

---

We visualize simulation 1:

```r
ggplot() + 
  geom_line(aes(x = Year, y = k.t), data = df_k.t) + 
  geom_point(aes(x = Year, y = k.t), data = df_k.t) + 
  geom_line(aes(x = Year, y = value), 
            data = df_sim_AR1 %>% dplyr::filter(Sim %in% c(1)),
            col = pal[1]) + 
  theme_bw(base_size = 15) +
  xlab('Year') + ylab(bquote(kappa[t])) + 
  xlim(1969, 2071)
```
]

]

---

We visualize simulations 1 - 2:

---

We visualize simulations 1 - 3:

---

We visualize simulations 1 - 100:

---

We visualize simulations 1 - 1 000:

---

The best-estimate for the future value `$\kappa_{t_{\max} + h}$`, `$h$` years from `$t_{\max}$`, is
`\begin{align*}
\frac{c}{1-a} + \left(\kappa_{t_{\max}} - \frac{c}{1-a} \right) a^h
\end{align*}`
The 95% confidence bounds are calculated as
`\begin{align*}
\pm 1.96\times \sigma_{\delta} \times \frac{\sqrt{1-a^{2h}}}{\sqrt{1-a^2}},
\end{align*}`
on top of the best-estimate value.

We calculate the best-estimate and the confidence bounds using the `forecast` function from the {forecast} package. We store the results in `df_cb.kt`

```r
fcast <- forecast(ts_AR1, h = 51, level = 95)
df_cb_k.t <- data.frame(
              'Year' = 2019:2070, 
              'BE' = c(k.tmax, fcast$mean), 
              'Q1' = c(k.tmax, fcast$lower[,1]),
              'Q2' = c(k.tmax, fcast$upper[,1]))
```

]

]

---
class: clear

Alternatively, we can calculate the empirical, point-wise prediction intervals by taking the 2.5%, 50%, and 97.5% quantile of the 1 000 simulated values for `$\kappa_{t_{\max} +h}$`, at every `$h>0$`.

```r
df_cb_k.t <- df_sim_AR1 %>% 
  group_by(Year) %>%
  reframe('MD' = quantile(value, 0.5),
          'Q1' = quantile(value, 0.025),
          'Q2' = quantile(value, 0.975))
```

Let us have a look at the prediction intervals produced with this approach!

]

]
---

]

Empirically examine the long term convergence of the `$\kappa_t$` series for Switzerland.

As explained in Module 3, the series

`$$\kappa_t = c+ a \cdot \kappa_{t-1}+\delta_t,$$`

with i.i.d. Gaussian noise terms `$\delta_t \sim N(0,\sigma^2_{\delta})$`, converges to `$\frac{c}{1-a}$` in the long run.

We already calculated this constant and stored the result in `lt_be`.

```r
lt_be
## [1] -3.550983
```

Do you expect this convergence to go slow or rather fast? Why?

]

---

]

]

---

# Scenarios for future mortality rates

.pull-left[
We generate paths for future mortality rates by combining the calibrated age effects with the generated scenarios for the period effects.

For each scenario `$i$`, we calculate
`\begin{align*}
\hat{\mu}_{x,t,i} &= \exp\left(\hat{A}_x + \hat{B}_x \hat{K}_{t,i} + \hat{\alpha}_x + \hat{\beta}_x \hat{\kappa}_{t,i} \right) \\
\hat{q}_{x,t,i} &= 1 - \exp\left(- \hat{\mu}_{x,t,i}\right).
\end{align*}`
where

- `$\hat{K}_{t,i}$` is the `$i$`-th generated scenario of the common period effect at time `$t$`,
- `$\hat{\kappa}_{t,i}$` is the `$i$`-th generated scenario of the Swiss period effect at time `$t$`.

]

We obtain the generated mortality rates as

```r
sim.qtxi <- array(NA, 
              dim = c(52, length(ages), 1000),  
              dimnames = list(2019:2070, ages, 1:1000))

for(x in ages){
  mu.txi <- exp(A.x[x+1] + B.x[x+1] * sim_RWD + 
                  a.x[x+1] + b.x[x+1] * sim_AR1)
  sim.qtxi[,x+1,] <- 1 - exp(-mu.txi)
}
```
]

---
class: clear

---

# Thanks!   <img src="img/xaringan.png" class="title-hex" width="50" height="50" align="right">

Slides created with the R package [xaringan](https://github.com/yihui/xaringan).
<br> <br> <br>
Course material available via 
<br>

<img align="left" src="img/gitfront.png" width="35" hspace="10"/> https://gitfront.io/r/jensrobben/mUZ8CcDz5x1L/Lausanne-summer-school/