Tutorial-1.knit

# Modelling and quantifying mortality and longevity risk 
## Tutorial session 1

### Katrien Antonio, Michel Vellekoop, Torsten Kleinow, Frank van Berkum, and Jens Robben
### <a href="https://gitfront.io/r/jensrobben/mUZ8CcDz5x1L/Lausanne-summer-school/">35th International Summer School of the Swiss Association of Actuaries </a> | June 3-7, 2024
###  <a> </a>

---

# Prologue

---

# Introduction

### Workshop

<img align="left" src="img/gitfront.png" width="35" hspace="10"/> https://gitfront.io/r/jensrobben/mUZ8CcDz5x1L/Lausanne-summer-school/

The workshop's landing page on GitFront, where we upload slide decks, R code and data sets.

### Us

[Katrien Antonio](https://katrienantonio.github.io) [<svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#106EB6;overflow:visible;position:relative;"><path d="M64 112c-8.8 0-16 7.2-16 16v22.1L220.5 291.7c20.7 17 50.4 17 71.1 0L464 150.1V128c0-8.8-7.2-16-16-16H64zM48 212.2V384c0 8.8 7.2 16 16 16H448c8.8 0 16-7.2 16-16V212.2L322 328.8c-38.4 31.5-93.7 31.5-132 0L48 212.2zM0 128C0 92.7 28.7 64 64 64H448c35.3 0 64 28.7 64 64V384c0 35.3-28.7 64-64 64H64c-35.3 0-64-28.7-64-64V128z"/></svg>](mailto:katrien.antonio@kuleuven.be) &nbsp; &nbsp; &nbsp; [Michel Vellekoop](https://www.uva.nl/profiel/v/e/m.h.vellekoop/m.h.vellekoop.html) [<svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#106EB6;overflow:visible;position:relative;"><path d="M64 112c-8.8 0-16 7.2-16 16v22.1L220.5 291.7c20.7 17 50.4 17 71.1 0L464 150.1V128c0-8.8-7.2-16-16-16H64zM48 212.2V384c0 8.8 7.2 16 16 16H448c8.8 0 16-7.2 16-16V212.2L322 328.8c-38.4 31.5-93.7 31.5-132 0L48 212.2zM0 128C0 92.7 28.7 64 64 64H448c35.3 0 64 28.7 64 64V384c0 35.3-28.7 64-64 64H64c-35.3 0-64-28.7-64-64V128z"/></svg>](mailto:m.h.vellekoop@uva.nl) &nbsp; &nbsp;  [Torsten Kleinow](https://www.uva.nl/profiel/k/l/t.kleinow/t.kleinow.html) [<svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#106EB6;overflow:visible;position:relative;"><path d="M64 112c-8.8 0-16 7.2-16 16v22.1L220.5 291.7c20.7 17 50.4 17 71.1 0L464 150.1V128c0-8.8-7.2-16-16-16H64zM48 212.2V384c0 8.8 7.2 16 16 16H448c8.8 0 16-7.2 16-16V212.2L322 328.8c-38.4 31.5-93.7 31.5-132 0L48 212.2zM0 128C0 92.7 28.7 64 64 64H448c35.3 0 64 28.7 64 64V384c0 35.3-28.7 64-64 64H64c-35.3 0-64-28.7-64-64V128z"/></svg>](mailto:t.kleinow@uva.nl) &nbsp; &nbsp; &nbsp; [Frank van Berkum](https://www.uva.nl/profiel/b/e/f.vanberkum/f.vanberkum.html) [<svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#106EB6;overflow:visible;position:relative;"><path d="M64 112c-8.8 0-16 7.2-16 16v22.1L220.5 291.7c20.7 17 50.4 17 71.1 0L464 150.1V128c0-8.8-7.2-16-16-16H64zM48 212.2V384c0 8.8 7.2 16 16 16H448c8.8 0 16-7.2 16-16V212.2L322 328.8c-38.4 31.5-93.7 31.5-132 0L48 212.2zM0 128C0 92.7 28.7 64 64 64H448c35.3 0 64 28.7 64 64V384c0 35.3-28.7 64-64 64H64c-35.3 0-64-28.7-64-64V128z"/></svg>](mailto:F.vanBerkum@uva.nl) &nbsp; &nbsp; [Jens Robben](https://jensrobben.github.io) [<svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#106EB6;overflow:visible;position:relative;"><path d="M64 112c-8.8 0-16 7.2-16 16v22.1L220.5 291.7c20.7 17 50.4 17 71.1 0L464 150.1V128c0-8.8-7.2-16-16-16H64zM48 212.2V384c0 8.8 7.2 16 16 16H448c8.8 0 16-7.2 16-16V212.2L322 328.8c-38.4 31.5-93.7 31.5-132 0L48 212.2zM0 128C0 92.7 28.7 64 64 64H448c35.3 0 64 28.7 64 64V384c0 35.3-28.7 64-64 64H64c-35.3 0-64-28.7-64-64V128z"/></svg>](mailto:jens.robben@kuleuven.be)

---

# Checklist

# Goals of tutorial session 1

---

# Goals of tutorial session 1

In this tutorial session you will learn to:

* download and visualize mortality data sets and life tables obtained from the Human Mortality Database, see [HMD](https://mortality.org/)

* perform some basic calculations with period life tables

* calibrate (selected) single population mortality models, using the Poisson distributional assumption for death counts

* perform some checks to inspect the in-sample fit of the calibrated model.

]

* [Prologue](#prologue)

* [Life table calculations and visualizations](#lifetable)

* [Download and import mortality data](#data)

* [Fit single population mortality models by maximising a Poisson likelihood](#POI)

* [Model selection and goodness-of-fit](#modelfit)

]
---

# Life table calculations and visualizations

---

# Import a life table stored as .txt file

.pull-left[
We downloaded (on 18 April 2024) life tables from the [HMD](https://www.mortality.org) and stored it as .txt file on the GitHub repo. Now you can import it in R:

```r
CHE_female <- read.table(
  file = "../data/hmd/CHE_female_life_table_1x1.txt", 
  skip = 2, header = TRUE)
CHE_male   <- read.table(
  file = "../data/hmd/CHE_male_life_table_1x1.txt", 
  skip = 2, header = TRUE)
```

We convert the age variable into an integer variable:

```r
CHE_male$Age   <- parse_number(CHE_male$Age)   %>% 
  as.integer()
CHE_female$Age <- parse_number(CHE_female$Age) %>% 
  as.integer()
```

Mind the use of `parse_number()` to handle the entry '110+' in the `Age` column of the original object.

]

We extract the 2022 data using the function `filter` from the {dplyr} package:

```r
CHE_male_2022   <- CHE_male   %>% 
  dplyr::filter(Year == 2022)
CHE_female_2022 <- CHE_female %>% 
  dplyr::filter(Year == 2022)
```

We display the structure of the R object `CHE_female_2022` using `str(.)`

```r
str(CHE_female_2022)
## 'data.frame':	111 obs. of  10 variables:
##  $ Year: int  2022 2022 2022 2022 2022 2022 2022 2022 2022 2022 ...
##  $ Age : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ mx  : num  0.00323 0.0003 0.00017 0.00012 0.00016 0.00016 0.00007 0.00002 0.00005 0.00002 ...
##  $ qx  : num  0.00322 0.0003 0.00017 0.00012 0.00016 0.00016 0.00007 0.00002 0.00005 0.00002 ...
##  $ ax  : num  0.14 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
##  $ lx  : int  100000 99678 99648 99631 99620 99604 99588 99581 99579 99574 ...
##  $ dx  : int  322 30 17 12 16 16 7 2 5 2 ...
##  $ Lx  : int  99724 99663 99640 99625 99612 99596 99584 99580 99576 99573 ...
##  $ Tx  : int  8537621 8437897 8338234 8238595 8138969 8039358 7939762 7840177 7740598 7641021 ...
##  $ ex  : num  85.4 84.7 83.7 82.7 81.7 ...
```

]

---
class: clear

Let's take a closer look at the period life table for males in Switzerland in the year 2022:

```r
head(CHE_male_2022)
```

.pull-left[
`mx`: death rate at age `$x$`, calculated as observed deaths divided by observed exposures at age `$x$`.

`qx`: mortality rate, i.e., the probability that an `$x$`-year old dies within a year.

`lx`: number of survivors at age `$x$` in the life table population.
]

`Tx`: remaining person-years for all individuals of age `$x$`.

`ex`: period life expectancy at age `$x$`.
]

---

# Visualize a period life table

We start with visualizing the `$q_{x,t}$` for M/F data in Switzerland, year 2022.

Using `ggplot` instructions:

```r
g_male <- 
  ggplot(CHE_male_2022, aes(Age, log(qx))) + 
  geom_point(col = RCLRbg) + 
  geom_line(col = RCLRbg) + 
  theme_bw(base_size = 15) +
  ggtitle("Switzerland - males, 2022") + 
  labs(y = bquote(ln(q[x])))

g_fem <- 
  ggplot(CHE_female_2022, aes(Age, log(qx))) + 
  geom_point(col = RCLRbg) + 
  geom_line(col = RCLRbg) + 
  theme_bw(base_size = 15) +
  ggtitle("Switzerland - females, 2022") + 
  labs(y = bquote(ln(q[x])))
```

]

]

---

# Visualize a period life table

We start with visualizing the `$q_{x,t}$` for M/F data in Switzerland, year 2022.

Using `ggplot` instructions:

]

]

---
class: clear

.left-column[
<br>
<svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:2.5em;width:2.5em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:black;overflow:visible;position:relative;"><path d="M373.1 24.97C401.2-3.147 446.8-3.147 474.9 24.97L487 37.09C515.1 65.21 515.1 110.8 487 138.9L289.8 336.2C281.1 344.8 270.4 351.1 258.6 354.5L158.6 383.1C150.2 385.5 141.2 383.1 135 376.1C128.9 370.8 126.5 361.8 128.9 353.4L157.5 253.4C160.9 241.6 167.2 230.9 175.8 222.2L373.1 24.97zM440.1 58.91C431.6 49.54 416.4 49.54 407 58.91L377.9 88L424 134.1L453.1 104.1C462.5 95.6 462.5 80.4 453.1 71.03L440.1 58.91zM203.7 266.6L186.9 325.1L245.4 308.3C249.4 307.2 252.9 305.1 255.8 302.2L390.1 168L344 121.9L209.8 256.2C206.9 259.1 204.8 262.6 203.7 266.6zM200 64C213.3 64 224 74.75 224 88C224 101.3 213.3 112 200 112H88C65.91 112 48 129.9 48 152V424C48 446.1 65.91 464 88 464H360C382.1 464 400 446.1 400 424V312C400 298.7 410.7 288 424 288C437.3 288 448 298.7 448 312V424C448 472.6 408.6 512 360 512H88C39.4 512 0 472.6 0 424V152C0 103.4 39.4 64 88 64H200z"/></svg>
## Your turn

]

.right-column[
**<span style="color:red">Q</span>**: What is causing the sudden spike in the male and female mortality rates at age 110? Reconstruct the plot with a maximum age of 90.

]

---

]

**<span style="color:red">A</span>**: In the previous code, the `Age` variable was transformed into an integer variable. This approach was not entirely correct. The raw life table reveals this:

Hence, the age 110 in fact corresponds to the age group 110+, which causes the spike at age 110 in the previous plots.

We can limit the age range (e.g., to a maximum 90 years) by filtering the data argument in the `ggplot`command using the `filter` function from {dplyr}.

]

---

```r
g_fem_lim <- 
  ggplot(`CHE_female_2022 %>% dplyr::filter(Age <= 90)`, 
         aes(Age, log(qx))) + 
  geom_point(col = RCLRbg) + 
  geom_line(col = RCLRbg) + 
  theme_bw(base_size = 15) +
  ggtitle("Switzerland - females, 2022") + 
  labs(y = bquote(ln(q[x])))
g_fem_lim
```
]

<img src="Tutorial-1_files/figure-html/unnamed-chunk-16-1.svg" style="display: block; margin: auto;" />
]

---
# Visualize the mortality rates over time

We visualize the `$q_{x,t}$` for male and female data in Switzerland, ages 0-90 and (selected) years 1880, 1910, 1940, 1970, 2000, 2022.

Using `ggplot` instructions:

```r
years <- c(1880, 1910, 1940, 1970, 2000, 2022)
dfM   <- CHE_male %>% 
  dplyr::filter(Year %in% years,
                Age <= 90)

ggplot(dfM, aes(x = Age, y = log(qx), 
               group = Year, colour = factor(Year))) + 
  geom_line(linewidth = 1.1)  + 
  ggtitle('Swiss male mortality rates') + 
  xlab('Age (x)') + ylab(bquote(ln(q[x]))) +
  scale_color_manual(values = c(red_pink, turquoise,
                                orange, blue, green,
                                purple), name = 'Year') + 
  scale_x_continuous(breaks = seq(0, 90, 10)) + 
  theme_bw(base_size = 15)
```
]

]

---

# Visualize the mortality rates over time

We visualize the `$q_{x,t}$` for male and female data in Switzerland, ages 0-90 and (selected) years 1880, 1910, 1940, 1970, 2000, 2022.

Using `ggplot` instructions:

```r
years <- c(1880, 1910, 1940, 1970, 2000, 2022)
dfF   <- CHE_female %>% 
  dplyr::filter(Year %in% years,
                Age <= 90)

ggplot(dfF, aes(x = Age, y = log(qx), 
               group = Year, colour = factor(Year))) + 
  geom_line(linewidth = 1.1)  + 
  ggtitle('Swiss female mortality rates') + 
  xlab('Age (x)') + ylab(bquote(ln(q[x]))) +
  scale_color_manual(values = c(red_pink, turquoise,
                                orange, blue, green,
                                purple), name = 'Year') + 
  scale_x_continuous(breaks = seq(0, 90, 10)) + 
  theme_bw(base_size = 15)
```
]

]

---

]

.right-column[
**<span style="color:red">Q</span>**: Complete the following `ggplot` instructions to visualize the mortality rate curves for the years 1880-2022 as a function of age. You can use either Swiss male or female data.

```r
years <- ...
ages  <- ...

dfX   <- ... %>% 
  dplyr::filter(Year %in% ...,
                Age  %in% ...)

ggplot(dfX, 
       aes(x = ..., y = ..., group = ...)) +
  geom_line(aes(colour = ...), linewidth = 0.8) +
  scale_colour_gradient2(low = blue, mid = turquoise, 
                         midpoint = 1950, 
                         high = red_pink) +
  scale_x_continuous(
    breaks = seq(ages[1], tail(ages, 1) + 1, 10)) +
  theme_bw(base_size = 15) + 
  ggtitle('Swiss ... mortality rates') + 
  ylab(expression("log" ~ q[x])) + 
  xlab("Age (x)") 
```

]

---

]

.right-column[
**<span style="color:red">A</span>**: Complete the following `ggplot` instructions to visualize the mortality rate curves for the years 1880-2022 as a function of age. You can use either Swiss male or female data.

```r
years <- `1880:2022`
ages  <- `0:90`

dfX   <- `CHE_male` %>% 
  dplyr::filter(Year %in% `years`,
                Age  %in% `ages`)

ggplot(dfX, 
       aes(x = `Age`, y = `log(qx)`, group = `Year`)) +
  geom_line(aes(colour = `Year`), linewidth = 0.8) +
  scale_colour_gradient2(low = blue, mid = turquoise, 
                         midpoint = 1950, 
                         high = red_pink) +
  scale_x_continuous(
    breaks = seq(ages[1], tail(ages, 1) + 1, 10)) +
  theme_bw(base_size = 15) + 
  ggtitle('Swiss `male` mortality rates') + 
  ylab(expression("log" ~ q[x])) + 
  xlab("Age (x)") 
```

]

---

]

<br>

]

---
# Visualize the age-specific mortality trend

We visualize the evolution of `$q_{x,t}$` over time for male and female data in Switzerland, for selected ages 20, 30, 40, and 50.

Using `ggplot` instructions:

```r
ages <- c(20, 30, 40, 50)
dfM  <- CHE_male %>% 
  dplyr::filter(Age %in% ages)

ggplot(dfM, aes(x = Year, y = qx, 
               group = Age, colour = factor(Age))) + 
  geom_line(linewidth = 1.1)  + 
  ggtitle('Evolution of Swiss male mortality rates') + 
  xlab('Time (t)') + ylab(bquote(q['x,t'])) +
  scale_color_manual('age', values = c(red_pink, turquoise,
                                       orange, blue)) + 
  coord_cartesian(ylim = c(0,0.025)) + 
  theme_bw(base_size = 15)
```
]

]

---

# Visualize the age-specific mortality trend

We visualize the evolution of `$q_{x,t}$` over time for male and female data in Switzerland, for selected ages 20, 30, 40, and 50.

Using `ggplot` instructions:

```r
ages <- c(20, 30, 40, 50)
dfF  <- CHE_female %>% 
  dplyr::filter(Age %in% ages)

ggplot(dfF, aes(x = Year, y = qx, 
               group = Age, colour = factor(Age))) + 
  geom_line(linewidth = 1.1)  + 
  ggtitle('Evolution of Swiss female mortality rates') + 
  xlab('Time (t)') + ylab(bquote(q['x,t'])) +
  scale_color_manual('age', values = c(red_pink, turquoise,
                                       orange, blue)) + 
  coord_cartesian(ylim = c(0,0.025)) + 
  theme_bw(base_size = 15)
```
]

]

---

]

.right-column[
**<span style="color:red">Q</span>**: What is causing the sudden spike in the mortality rates in the year 1918? Remove this year from the plot for visualization purposes.

]

---

]

.right-column[
**<span style="color:red">A</span>**: The Spanish flu pandemic of 1918-1919 severely hit Switzerland, causing about 25 000 deaths and infecting half the population. The worst wave hit in October 1918. Using the `filter` function from the `dplyr` package we can remove the year 1918 from the data.
<br>
<br>

<img src="Tutorial-1_files/figure-html/unnamed-chunk-28-1.svg" style="display: block; margin: auto;" />
]

---

```r
years <- c(1918)
ages <- c(20, 30, 40, 50)

dfM  <- CHE_male %>% 
  dplyr::filter(`!Year %in% years`, Age %in% ages)

g <- ggplot(dfM, aes(x = Year, y = qx, 
           group = Age, colour = factor(Age))) + 
  geom_line(linewidth = 1.1)  + 
  ggtitle('Evolution of Swiss male mortality rates') + 
  xlab('Time (t)') + ylab(bquote(q['x,t'])) +
  scale_color_manual('Age', values = c(red_pink, turquoise,
                                orange, blue)) + 
  theme_bw(base_size = 15)
g
```

Read more about the Spanish flu affecting Switzerland in 1918, for instance [here](https://www.swissinfo.ch/eng/business/devastation_when-spanish-flu-hit-switzerland/44352910).

]

]

---

# Visualize the survival curve

We put focus on `$S_{0,t}(.)$`, the survival curve for a newborn, born in period (or year) `$t$`.

We calculate the survival probabilities `$S_{0,t}(x)$` for a Swiss male newborn, for selected time period `$t$`. Hereby we let `$x$` run from 0 to 100.

```r
*xmax  <- 100
*xmin  <- 0
*years <- c(1880, 1910, 1940, 1970, 2000, 2022)
surv_rate <- matrix(0, xmax-xmin+1, length(years))
surv_rate[1,] <- 1

for(t in 1:length(years)){
  for(x in 1:(xmax-xmin)){
    df.t <- CHE_male %>% 
      dplyr::filter(Year == years[t])
    px <- 1 - df.t[x,'qx']
    surv_rate[x+1,t] <- surv_rate[x,t]*px
  }
}
```

]

.pull-right[
We consider a minimum age `$x$` of 0, a maximum age of 100, and the years of interest.
]

---

# Visualize the survival curve

.pull-left[
We put focus on `$S_{0,t}(.)$`, the survival curve for a newborn, born in period (or year) `$t$`.

We calculate the survival probabilities `$S_{0,t}(x)$` for a Swiss male newborn, for selected time period `$t$`. Hereby we let `$x$` run from 0 to 100.

```r
xmax  <- 100
xmin  <- 0
years <- c(1880, 1910, 1940, 1970, 2000, 2022)
*surv_rate <- matrix(0, xmax-xmin+1, length(years))
surv_rate[1,] <- 1

for(t in 1:length(years)){
  for(x in 1:(xmax-xmin)){
    df.t <- CHE_male %>% 
      dplyr::filter(Year == years[t])
    px <- 1 - df.t[x,'qx']
    surv_rate[x+1,t] <- surv_rate[x,t]*px
  }
}
```

]

We create an empty matrix `surv_rate` of dimension `$101 \times 6$` to store the calculated survival probabilities in the years 1880, 1910, 1940, 1970, 2000, and 2022.
]

---

# Visualize the survival curve

.pull-left[
We put focus on `$S_{0,t}(.)$`, the survival curve for a newborn, born in period (or year) `$t$`.

We calculate the survival probabilities `$S_{0,t}(x)$` for a Swiss male newborn, for selected time period `$t$`. Hereby we let `$x$` run from 0 to 100.

```r
xmax  <- 100
xmin  <- 0
years <- c(1880, 1910, 1940, 1970, 2000, 2022)
surv_rate <- matrix(0, xmax-xmin+1, length(years))
*surv_rate[1,] <- 1

for(t in 1:length(years)){
  for(x in 1:(xmax-xmin)){
    df.t <- CHE_male %>% 
      dplyr::filter(Year == years[t])
    px <- 1 - df.t[x,'qx']
    surv_rate[x+1,t] <- surv_rate[x,t]*px
  }
}
```

]

We create an empty matrix `surv_rate` of dimension `$101 \times 6$` to store the calculated survival probabilities in the years 1880, 1910, 1940, 1970, 2000, and 2022.

The probability of surviving zero years equals one, i.e. `$S_{0,t}(0)=1$`.
]

---

# Visualize the survival curve

.pull-left[
We put focus on `$S_{0,t}(.)$`, the survival curve for a newborn, born in period (or year) `$t$`.

We calculate the survival probabilities `$S_{0,t}(x)$` for a Swiss male newborn, for selected time period `$t$`. Hereby we let `$x$` run from 0 to 100.

```r
xmax  <- 100
xmin  <- 0
years <- c(1880, 1910, 1940, 1970, 2000, 2022)
surv_rate <- matrix(0, xmax-xmin+1, length(years))
surv_rate[1,] <- 1

*for(t in 1:length(years)){
* for(x in 1:(xmax-xmin)){
*   df.t <- CHE_male %>% 
*     dplyr::filter(Year == years[t])
*   px <- 1 - df.t[x,'qx']
*   surv_rate[x+1,t] <- surv_rate[x,t]*px
* }
*}
```

]

We create an empty matrix `surv_rate` of dimension `$101\times 6$` to store the calculated survival probabilities in the years 1880, 1910, 1940, 1970, 2000, and 2022.

The probability of surviving zero years equals one, i.e. `$S_{0,t}(0)=1$`.

For every year `$t$` under consideration, the one-year survival probability of `$(x)$` in year `$t$` is `$S_{x,t}(1) = p_{x,t} = 1-q_{x,t}$`. We evaluate the survival probabilities recursively, as follows:

`$$S_{0,t}(x+1) =  S_{0,t}(x) \cdot p_{x,t}.$$`
Hereby, we follow the period approach (and keep `$t$` fixed)!
]

---
class:clear

.pull-left[
We plot these survival probabilities calculated for a male Swiss newborn in the years 1880, 1910, 1940, 1970, 2000, and 2022.

```r
df <- data.frame('Age' = rep(0:100, times = length(years)),
                 'Year' = rep(years, each = length(0:100)),
                 'Surv' = as.numeric(surv_rate))

ggplot(df, aes(x = Age, y = Surv, 
               group = Year, colour = as.factor(Year))) + 
  geom_line(linewidth = 1.1)  + 
  ggtitle('Swiss male survival curve') + 
  xlab('Age (x)') + ylab(bquote(S[0]*'(x)')) +
  scale_color_manual(values = c(red_pink, turquoise,
                                orange, blue, green,
                                purple), name = 'Year') + 
  scale_x_continuous(breaks = seq(0, 100, 10)) + 
  theme_bw(base_size = 15)
```

]

.pull-right[
<img src="Tutorial-1_files/figure-html/unnamed-chunk-36-1.svg" style="display: block; margin: auto;" />
]
---

]

.right-column[
**<span style="color:red">Q</span>**:  Can you adjust the instructions for the calculation and visualization of `$S_{0,t}(.)$` to the calculation and visualization of `$f_{0,t}(.)$`, the pdf of r.v. `$T_{0,t}$`?

Recall from Module 1: `$f_{0,t}(x)=S_{0,t}(x) \cdot \mu_{x,t}$`.

<br>

]

---

]

Recall from Module 1: `$f_{0,t}(x)=S_{0,t}(x) \cdot \mu_{x,t}$`.

The limit on the y-axis of the plot can be adjusted with the layer `coord_cartesian(ylim = c(.,.))`.

<br>

]

---

# Download and import mortality data

---

# Download mortality data with {demography}

.pull-left[
We do a .hi-pink[live] download from the [Human Mortality Database](https://www.mortality.org) with the {demography} package in R.

To get started with the package, first install it:

```r
install.packages("demography")
```

Then, load the package:

```r
library(demography)
```

Next, we use the `hmd.mx` function from {demography} to read the "Mx" (1 x 1) data from the HMD:

```r
? hmd.mx
User = "summerschool.rclr2024@outlook.com"
pw   = "Test1234."
Df   = hmd.mx("CHE", User , pw , "Switzerland")
```

]

.pull-right[
We specify the country code (`CHE`), the user name and password, along with the character string containing the country name from which the data is retrieved.

To get smooth access to the data, we created a username and password for this workshop.

We store the downloaded data in the object `Df`.

If you encounter issues with the live download, we have pre-downloaded this mortality data set, and made it available in the GitHub repository. You can load this R object as follows:

```r
Df <- readRDS(file = "../data/hmd/Df_CHE_hmd_mx.rds") 
```
]

---

Inspect the `Df` object (a list in R) with

```r
names(Df)
## [1] "type"   "label"  "lambda" "year"   "age"    "pop"    "rate"
names(Df$pop)
## [1] "female" "male"   "total"
names(Df$rate)
## [1] "female" "male"   "total"
```

Verify that, e.g.,

```r
str(Df$rate$female)
##  num [1:111, 1:147] 0.22666 0.03892 0.01801 0.01241 0.00917 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:111] "0" "1" "2" "3" ...
##   ..$ : chr [1:147] "1876" "1877" "1878" "1879" ...
```

stores the `$m_{x,t}$` observed central death rates for Swiss females, where age `$x$` ranges from 0 to 110+ across the rows and year `$t$` spans from 1876 to 2022 across the columns.

]

What is stored in the data set?

```r
Df$type
## [1] "mortality"
```

```r
Df$label
## [1] "Switzerland"
```

```r
min(Df$year)
## [1] 1876
max(Df$year)
## [1] 2022
```

```r
min(Df$age)
## [1] 0
max(Df$age)
## [1] 110
```

]

---
class: clear
.pull-left[
The mortality data set `Df` contains the observed central death rates `$m_{x,t}$` (`$rates`) for both males (`$male`) and females (`$female`). These rates correspond to the `mx` column in the life tables:

```r
head(CHE_female_2022$mx)
## [1] 0.00323 0.00030 0.00017 0.00012 0.00016 0.00016
head(Df$rate$female[,'2022'])
##        0        1        2        3        4        5 
## 0.003227 0.000305 0.000166 0.000116 0.000161 0.000160
```

The `$pop` attribute in `Df` stores the exposure-to-risk `$E_{x,t}$`, representing the total amount of person-years lived by individuals aged `$x$` in year `$t$`.

```r
head(Df$pop$female[,'2022'])
##        0        1        2        3        4        5 
## 40590.46 42632.60 42251.21 43074.56 43470.10 43836.41
```
]

.pull-right[
Death counts can be calculated by multiplying the death rates with the exposure-to-risk, i.e., `$d_{x,t} = E_{x,t} \cdot m_{x,t}$`. In R we do:

```r
round(head(Df$pop$female[,'2022'] * Df$rate$female[,'2022']))
##   0   1   2   3   4   5 
## 131  13   7   5   7   7
```

These computed death counts match the death counts from the Human Mortality Database (HMD) (see [here](https://www.mortality.org/File/GetDocument/hmd.v6/CHE/STATS/Deaths_1x1.txt)).
]
---

# Fit single population mortality models by maximizing a Poisson likelihood

---

#  Poisson likelihood and Lee-Carter specification

Along the ideas explained in Module 2, we now put focus on calibrating a Lee-Carter model on a data set with `$(d_{x,t}, E_{x,t}|x \in \{1,\ldots,X\},\ t\in\{1,\ldots,T\})$`.

We work with the following Poisson assumption for the death counts `$D_{x,t}$`:

`\begin{align*}
D_{x,t} &\sim \text{POI}(E_{x,t} \cdot \mu_{x,t}) \\
\mu_{x,t} &= \exp{(\beta_x^{(1)}+\beta_x^{(2)} \cdot \kappa_t^{(2)})}.
\end{align*}`

The log-likelihood then follows:

`$$L(\beta^{(1)},\beta^{(2)},\kappa^{(2)}|\boldsymbol{d},\boldsymbol{e}) = \sum_{t} \sum_{x} \left[d_{x,t}(\beta_x^{(1)}+\beta_x^{(2)}\cdot \kappa^{(2)}_t)-E_{x,t}\exp{(\beta_x^{(1)}+\beta_x^{(2)}\cdot \kappa^{(2)}_t)}\right]+c.$$`

Here, `$d_{x,t}$` and `$E_{x,t}$` are the observed deaths and exposures, respectively.

`$c$` is a constant that does not depend on the Lee-Carter parameters.

---

```r
library(demography)
years   <- `1970:max(Df$year)`
ages    <- `0:90`
```

```r
Df  <- demography::extract.years(
                      Df, years = years)
Df  <- demography::extract.ages(
                      Df, ages = ages, 
                      combine.upper = FALSE)

min(Df$year)
## [1] 1970
max(Df$year)
## [1] 2022
max(Df$age)
## [1] 90
```

]

To construct the Poisson likelihood, we need observations on `$d_{x,t}$` and `$E_{x,t}$`.

We specify the calibration period in `years`.

We specify the age range in `ages`.

We use `extract.years` and `extract.ages` from the {demography} package to subset `Df` accordingly.

Recall that `Df` is the "Mx" (1 x 1) data, extracted from the HMD.

]

---

```r
etx <- t(Df$pop$male)
dim(etx)
## [1] 53 91
str(etx)
##  num [1:53, 1:91] 49109 49131 47433 44772 42661 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:53] "1970" "1971" "1972" "1973" ...
##   ..$ : chr [1:91] "0" "1" "2" "3" ...

dtx <- round(etx * t(Df$rate$male))
dim(dtx)
## [1] 53 91
str(dtx)
##  num [1:53, 1:91] 876 821 718 681 626 500 459 395 376 358 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:53] "1970" "1971" "1972" "1973" ...
##   ..$ : chr [1:91] "0" "1" "2" "3" ...
```
]

We extract the exposures and put these in `etx`.

Here, years are in the rows and ages in the columns of the list. `t(.)` is for matrix transpose.

The deaths are not directly stored in `Df`, but follow from `$m_{x,t} \cdot E_{x,t}$`.
]

---

We discuss today two frameworks for calibrating the Lee-Carter parameters by optimizing the Poisson log-likelihood. Alternative optimization strategies exist!

<center>
<font size="+2"> 
<b> .hi-pink[Method 1] </b>
</font>
</center>

The `fit701(.)` function from the `fitModels.R` script.

This method is written by Andrew Cairns, see [LifeMetrics software](https://www.macs.hw.ac.uk/~andrewc/lifemetrics/), and uses univariate Newton-Raphson (NR) steps to optimize the log-likelihood.

`fit701(.)` calibrates Lee Carter, `fit702(.)` fits the Renshaw Haberman model, `fit703(.)` the Age-Period-Cohort (APC) model, `fit705(.)` the CBD model, `fit706(.)` the CBD model with cohort effect, and so on.

]

The R-package {StMoMo} for stochastic mortality modelling.

This method is implemented by Andres Villegas, Pietro Millossovich, and Vladimir Kaishev, see  [StMoMo vignette](https://cran.r-project.org/web/packages/StMoMo/vignettes/StMoMoVignette.pdf), and relies on estimation routines for so-called Generalized Non-linear Models (GNM).

The R-package {StMoMo} provides a unified framework for fitting stochastic mortality models of the so-called generalized age-period-cohort (GAPC) family.
]

---

# Method 1: {fit701}

As discussed in Module 2, the following types of iterative steps are crucial when optimizing the likelihood with (univariate) Newton-Raphson steps:

`$$\hat{\kappa}_t^{(2)(k+1)} =  \hat{\kappa}_t^{(2)(k)}-\frac{\sum_{x}\left[d_{x,t}-E_{x,t}\exp{(\hat{\beta}_x^{(1)(k+1)}+\hat{\beta}_x^{(2)(k)}\hat{\kappa}_t^{(2)(k)})}\right]\hat{\beta}_x^{(2)(k)}}{-\sum_{x}E_{x,t}\exp{(\hat{\beta}_x^{(1)(k+1)}+\hat{\beta}_x^{(2)(k)}\hat{\kappa}_t^{(2)(k)})}\left(\hat{\beta}_x^{(2)(k)}\right)^2}$$`

Similar expressions for the iterative steps of the parameters `$\beta_x^{(1)}$` and `$\beta_x^{(2)}$` were given during today's lectures.

For example, in `fit701(.)` the function `llmax2D(.)` implements the NR step for the period effect:

```r
thetat = b2*ev*exp(b1+b3*g3)
s1     = sum(dv*b2*wv)
f0     = sum((exp(k20*b2)*thetat)*wv)-s1
df0    = sum((exp(k20*b2)*b2*thetat)*wv)
k21    = k20-f0/df0
```

Here (cfr. the `fitModels.R` script) `wv` refers to weights used in the likelihood, and `b3*g3` is fixed at zero in the Lee-Carter specification.

---

```r
source('../scripts/fitModels.R')

LCfit701 <- `fit701(ages, years, etx, dtx,` 
                    `matrix(1, length(years),`
                              `length(ages)))`

names(LCfit701)
```

```
## -176085992 -> -108584292  ->-85457.18 
## -65657.75 -> -41655.23  ->-33660.76 
## -23267.72 -> -22962.99  ->-21072.62 
## -20137.5 -> -20134.21  ->-19905.81 
## -19818.15 -> -19817.8  ->-19796.79 
## -19790.53 -> -19790.5  ->-19789 
## -19788.6 -> -19788.59  ->-19788.49 
## -19788.46 -> -19788.46  ->-19788.46 
## -19788.45 -> -19788.45  ->-19788.45 
## -19788.45 -> -19788.45  ->-19788.45 
## -19788.45 -> -19788.45  ->-19788.45
```

```
##  [1] "beta1"       "beta2"       "beta3"       "kappa2"      "gamma3"     
##  [6] "x"           "y"           "cy"          "wa"          "epsilon"    
## [11] "mhat"        "ll"          "BIC"         "npar"        "mtxLastYear"
```

]

We call `fit701(.)` from the `fitModels.R` script.

We store the results in the object `LCfit701`.

Useful attributes in this object are:

- `beta1` the estimates for `$\beta_x^{(1)}$` 
- `beta2` the estimates for `$\beta_x^{(2)}$` 
- `kappa2` the estimates for `$\kappa_t^{(2)}$` 
- the BIC
- `epsilon` the residuals 
- `mhat` the fitted `$\hat{\mu}_{x,t}$`.

The residuals `epsilon` are Pearson residuals in a Poisson setting:

`$$\epsilon_{x,t} = \frac{d_{x,t}-\hat{\mu}_{x,t} \cdot E_{x,t}}{\sqrt{\hat{\mu}_{x,t} \cdot E_{x,t} }}.$$`
]

---

---

# Method 2: {StMoMo}

In the {StMoMo}-package, stochastic mortality models are formulated as a generalised linear or non-linear model. Parameters are calibrated using the optimization strategies available in the `glm(.)` routine (from R's {stats} library) or the library `{gnm}`.

More details are in the vignette [here](https://cran.r-project.org/web/packages/StMoMo/vignettes/StMoMoVignette.pdf) or the 2018 paper in the Journal of Statistical Software (see [here](https://www.jstatsoft.org/article/view/v084i03)).

As an example, the Poisson Lee-Carter model is formulated as:

`$$\log \mathbb{E}[D_{x,t}] = \log E_{x,t} + \beta_x^{(1)} + \beta_x^{(2)} \kappa_t^{(2)},$$`
where `$\log E_{x,t}$` is an offset.

We install and load the package before using it:

```r
install.packages("StMoMo")
library(StMoMo)
```

---

```r
constLC <- function(ax, bx, kt, b0x, gc, wxt, ages){
  c1 <- mean(kt[1, ], na.rm = TRUE)
  c2 <- sum(bx[, 1], na.rm = TRUE)
  list(ax = ax + c1 * bx, bx = bx / c2, 
       kt = c2 * (kt - c1))
}

LC <- StMoMo(link = "log", staticAgeFun = TRUE, 
             periodAgeFun = "NP", constFun = constLC)

st_Data <- StMoMoData(data = Df, series = "male", 
                      type = "central")

LCfit <- fit(object = LC, data = st_Data)

names(LCfit)
```

```
##  [1] "model"        "ax"           "bx"           "kt"           "b0x"         
##  [6] "gc"           "data"         "Dxt"          "Ext"          "oxt"         
## [11] "wxt"          "ages"         "years"        "cohorts"      "fittingModel"
## [16] "loglik"       "deviance"     "npar"         "nobs"         "conv"        
## [21] "fail"         "call"
```

]

The function `constLC` implements the identifiability constraints.

As discussed in Module 2, 
`\begin{align*}
\tilde{\beta}_x^{(1)} = \beta_x^{(1)} + c_1 \beta_x^{(2)}, \:\:
\tilde{\beta}_x^{(2)} = \frac{\beta_x^{(2)}}{c_2}, \:\:
\tilde{\kappa}^{(2)}_t = c_2 \cdot (\kappa_t^{(2)} - c_1),
\end{align*}`
determine equivalent parametrizations (for `$c_1$` and `$c_2\neq 0$`). Hence, Lee & Carter suggest the following set of constraints:
`\begin{align*}
\displaystyle \sum_x \beta_x^{(2)} = 1, \:\:\: \sum_t \kappa^{(2)}_t = 0,
\end{align*}`
which can be imposed by choosing (cfr. the R code)
`\begin{align*}
c_2=\sum_x \beta_x^{(2)} = \beta^{(2)}_{\bullet} \ \ \ c_1 = \frac{1}{T} \sum_t \kappa^{(2)}_t = \bar{\kappa}
\end{align*}`

We call `StMoMo(.)` to initialise a StMoMo object, representing a mortality model from the GAPC family.
]

---
class: clear

```r
constLC <- function(ax, bx, kt, b0x, gc, wxt, ages){
  c1 <- mean(kt[1, ], na.rm = TRUE)
  c2 <- sum(bx[, 1], na.rm = TRUE)
  list(ax = ax + c1 * bx, bx = bx / c2, 
       kt = c2 * (kt - c1))
}

LC <- StMoMo(`link = "log"`, `staticAgeFun = TRUE`, 
             `periodAgeFun = "NP"`, `constFun = constLC`)

st_Data <- StMoMoData(data = Df, series = "male", 
                      type = "central")

LCfit <- fit(object = LC, data = st_Data)

names(LCfit)
```

]

.pull-right[
The argument `link` specifies the link function and distributional assumption. With `"log"`, we assume Poisson distributed deaths with a log link for `$\mu_{xt}$`. Alternatively, `"logit"` assumes a binomial distribution and a logit link for `$q_{xt}$`.

The argument `staticAgeFun` indicates whether a static age function `$\beta_x^{(1)}$` should be included, or not.

The argument `periodAgeFun` indicates the number and type of period-age effects in the model. A single `"NP"` means that a non-parametric age effect `$\beta_x^{(2)}$` is used in the predictor.

The argument `constFun` provides the user-defined identifiability constraints, as specified in `constLC`.
]
---

```r
constLC <- function(ax, bx, kt, b0x, gc, wxt, ages){
  c1 <- mean(kt[1, ], na.rm = TRUE)
  c2 <- sum(bx[, 1], na.rm = TRUE)
  list(ax = ax + c1 * bx, bx = bx / c2, 
       kt = c2 * (kt - c1))
}

LC <- StMoMo(link = "log", staticAgeFun = TRUE, 
             periodAgeFun = "NP", constFun = constLC)

st_Data <- `StMoMoData(data = Df, series = "male",` 
                      `type = "central")`

LCfit <- `fit(object = LC, data = st_Data)`

names(LCfit)
```

]

We create an StMoMoData object from the demography data `Df`, and fit the Lee-Carter model.

Useful attributes of `LCfit` are:

- `ax` the estimates for `$\beta_x^{(1)}$` 
- `bx` the estimates for `$\beta_x^{(2)}$` 
- `kt` the estimates for `$\kappa_t^{(2)}$` 
- `loglik` the log-likelihood of the model.

The person residuals `$\epsilon_{x,t}$` and `$\hat{\mu}_{x,t}$` can be calculated from the model's fit.

Let's have a look at the Lee-Carter parameter estimates obtained with this method!
]

---
class: clear

---

# Estimated mortality rates

Using the calibrated Lee-Carter parameters `$\hat{\beta}_x^{(1)}$`, `$\hat{\beta}_x^{(2)}$`, and `$\hat{\kappa}_t^{(2)}$`, we estimate the force of mortality as follows:
`\begin{align*}
\hat{\mu}_{x,t} = \exp\left(\hat{\beta}_x^{(1)} + \hat{\beta}_x^{(2)} \cdot \hat{\kappa}_t^{(2)}\right).
\end{align*}`
The estimated force of mortality can be retrieved from `LCfit701` using the attribute `$mhat`.

Using the Lee-Carter model fit from the {StMoMo} package, we compute the estimated force of mortality as:

```r
mhat <- matrix(NA, nrow = length(years), ncol = length(ages),
               dimnames = list(years, ages))

for(t in 1:length(years)){
  mhat[t,] <- exp(LCfit$ax + LCfit$bx * LCfit$kt[t])
}
```

The estimated mortality rates are obtained as follows:
`\begin{align*}
\hat{q}_{x,t} = 1 - \exp\left(- \hat{\mu}_{x,t}\right).
\end{align*}`

---

class: clear
.pull-left[
We compute the observed and estimated mortality rates from the `LCfit701` object:

```r
qobs <- 1 - exp(-dtx/etx)
qhat <- 1 - exp(-LCfit701$mhat)
```

We construct a data frame that stores the observed and estimated mortality rates at age 25:

```r
age    <- `'25'`
df.age <- data.frame(Year = years, obs = qobs[,age],
                     fit = qhat[,age])
```

We visualize these using {ggplot}.

```r
ggplot(df.age) + 
  geom_point(aes(x = Year, y = obs)) +
  geom_line(aes(Year, fit), col = RCLRbg, 
            linewidth = 1) +
  ggtitle("Switzerland - males, 1970 - 2022") + 
  ylab(bquote(hat(q)['t,`25`'])) + 
  theme_bw(base_size = 15)
```
]

<img src="Tutorial-1_files/figure-html/unnamed-chunk-78-1.svg" style="display: block; margin: auto;" />
]

---
class: clear

```r
qobs <- 1 - exp(-dtx/etx)
qhat <- 1 - exp(-LCfit701$mhat)
```

We construct a data frame that stores the observed and estimated mortality rates at age 45:

```r
age    <- `'45'`
df.age <- data.frame(Year = years, obs = qobs[,age],
                     fit = qhat[,age])
```

We visualize these using {ggplot}.

<img src="Tutorial-1_files/figure-html/unnamed-chunk-83-1.svg" style="display: block; margin: auto;" />
]

---
class: clear

```r
qobs <- 1 - exp(-dtx/etx)
qhat <- 1 - exp(-LCfit701$mhat)
```

We construct a data frame that stores the observed and estimated mortality rates at age 65:

```r
age    <- `'65'`
df.age <- data.frame(Year = years, obs = qobs[,age],
                     fit = qhat[,age])
```

We visualize these using {ggplot}.

<img src="Tutorial-1_files/figure-html/unnamed-chunk-88-1.svg" style="display: block; margin: auto;" />
]

---
class: clear

```r
qobs <- 1 - exp(-dtx/etx)
qhat <- 1 - exp(-LCfit701$mhat)
```

We construct a data frame that stores the observed and estimated mortality rates at age 85:

```r
age    <- `'85'`
df.age <- data.frame(Year = years, obs = qobs[,age],
                     fit = qhat[,age])
```

We visualize these using {ggplot}.

<img src="Tutorial-1_files/figure-html/unnamed-chunk-93-1.svg" style="display: block; margin: auto;" />
]

---

]

.right-column[
**<span style="color:red">Q</span>**: The Cairns-Blake-Dowd (CBD) model is specified as follows:
`\begin{align*}
D_{x,t} &\sim \text{POI}(E_{x,t} \cdot \mu_{x,t}) \\
\log \mu_{x,t} &= \kappa_t^{(2)} + (x - \overline{x}) \cdot \kappa_{t}^{(3)}.
\end{align*}`
Here, `$\overline{x}$` represents the average age within the specified age range.

Which identifiability constraints are necessary for the CBD model (if any)?

Use the {StMoMo} package to calibrate the CBD model.

Plot the parameter estimates using {ggplot}. Use the below instructions to plot.

**Hint.** You can visualize the calibrated period effects as follows:

```r
data_period <- tibble(year = years, k2 = ...)

ggplot(...) + 
  geom_point(aes(..., ...)) + 
  geom_line(aes(..., ...), col = "black") +
  theme_bw(base_size = 15) +
  ggtitle("Switzerland - males, 1970 - 2022, CBD, Poisson") + 
  labs(y = bquote(hat(kappa)[t]^"(2)")) 
```

]

---

]

.right-column[
**<span style="color:red">A</span>**: The CBD model does not have identiability issues.

We create a StMoMo object with `$\beta_x^{(2)} = 1$` for the second age-period effect and a parametric age function, defined through `b3`, for the third age-period effect `$\beta_x^{(3)}$`. The CBD model does not include a static age effect `$\beta_x^{(1)}$`.

```r
b3 <- function(x, ages) x - mean(ages)
CBD <- StMoMo(link = "log", staticAgeFun = FALSE, 
             periodAgeFun = c("1", b3))
```

Next, we fit the mortality model on the data object:

```r
CBDfit <- fit(object = CBD, data = st_Data, verbose = FALSE)
```

Let's have a look at the CBD parameter estimates!
]
---
class: clear

]

]

---
class: inverse, middle, center
name: modelfit

# Model selection and goodness-of-fit

---

# Model selection

Various mortality model specifications have been proposed in the literature, cfr. Module 2.

How to select an appropriate mortality model? Here are a few tools; more discussion in Modules 2-4.

- .hi-pink[Goodness-of-Fit]: The model should accurately fit the historical, observed data. This can be evaluated using Pearson residuals:
`\begin{align*}
\epsilon_{x,t} = \frac{d_{x,t}-\hat{\mu}_{x,t} \cdot E_{x,t}}{\sqrt{\hat{\mu}_{x,t} \cdot E_{x,t}}}.
\end{align*}`

- .hi-pink[Predictive Performance]: The model should perform well on both in-sample statistical measures as well as out-of-time back-tests.

- .hi-pink[Parsimony]: The model should have a good balance between complexity and goodness-of-fit. Compare the AIC/BIC values of different calibrated models to assess this balance.

- .hi-pink[Reasonable]: The chosen model should lead to mortality rates (and their evolutions) that are biologically and demographically reasonable.

- ...

---
# Pearson residuals

We construct a {ggplot2} heatmap of the residuals produced by `fit701(.)`.

```r
grid <- expand.grid(period = years, age = ages)
grid$res <- as.vector(LCfit701$epsilon)
names(grid) <- c("Year", "Age", "Residual")

ggplot(grid, aes(x = Year, y = Age)) + 
  geom_tile(aes(fill = Residual)) +
  scale_fill_gradientn(colours = topo.colors(7), 
                       breaks = c(-4,-2,0,2,4,6)) +
  theme_bw(base_size = 15) + 
  theme(legend.position = "bottom")
```

```r
head(grid)[1:3,]
##   Year Age Residual
## 1 1970   0 7.956282
## 2 1971   0 6.009977
## 3 1972   0 4.508311
```

]

<img src="Tutorial-1_files/figure-html/unnamed-chunk-101-1.svg" style="display: block; margin: auto;" />
]

---

Diagonal patterns in heatmaps of residuals might hint to the inclusion of a cohort effect in a stochastic mortality model.

For a more in depth discussion of .hi-pink[cohort effects], we refer to Cairns et al. (2009) on [A quantitative comparison of stochastic mortality models using data from England and Wales and the United States](http://www.macs.hw.ac.uk/~andrewc/papers/naaj2009.pdf).

In the longer version of this paper (available [online](http://www.macs.hw.ac.uk/~andrewc/papers/ajgc50.pdf)) the authors discuss the plot shown on the right (Figure 3 on page 7).

This is often used as the rationale for including cohort effects in mortality forecasting models.

However, calibrating and forecasting such cohort effects typically comes with a lot of difficulties!

]

.center[
<img src="img/cohort_effect_Cairns.png" width="100%" style="display: block; margin: auto;" />
]

This Figure shows .hi-pink[improvement rates] in mortality for England & Wales by calendar year and age relative to mortality rates at the same age in the previous year. Red cells
imply that mortality is deteriorating; green small rates of improvement, and blue
and white strong rates of improvement.

The black diagonal line follows the .hi-pink[progress of the 1930 cohort].

]

---

# Thanks!   <img src="img/xaringan.png" class="title-hex" width="50" height="50" align="right">

Slides created with the R package [xaringan](https://github.com/yihui/xaringan).
<br> <br> <br>
Course material available via 
<br>

<img align="left" src="img/gitfront.png" width="35" hspace="10"/> https://gitfront.io/r/jensrobben/mUZ8CcDz5x1L/Lausanne-summer-school/