# Chapter 8 Temporal Models

## 8.1 Introduction

The analysis of time series refers to the analysis of data collected
sequentially over time. Time can be indexed over a discrete domain (e.g.,
years) or a continuous one. In this section we will consider models to analyze
both types of temporal data. The discrete case will be tackled with some of
the autoregressive models covered in Chapter 3. The analysis of
temporal data over a continuous domain will be done with smoothing methods
described in Chapter 9. The analysis of space-time data will
also be considered in the last part of this chapter. `INLA`

provides a number
of options to model data collected in space and time. Krainski et al. (2018)
provides examples on space-time models for geostatistical data and point
patterns, that we have omitted here.

This chapter starts with an introduction to the analysis of time series using autoregressive models in Section 8.2. Non-Gaussian time series are discussed in Section 8.3. Forecasting of time series is described in Section 8.4. Space-state models are in Section 8.5. Finally, Section 8.6 covers different types of space-time models.

## 8.2 Autoregressive models

When time is indexed over a discrete domain autoregressive models are a convenient way to model the data. Autoregressive models are described in Section 3.3 and in this chapter we will focus on providing some applications to temporal data.

As a first example of time series we will consider the climate reconstruction dataset analyzed in Fahrmeir and Kneib (2011). The original data is described in Moberg et al. (2005) and values represent the temperature anomalies (in Celsius degrees) from the Northern Hemisphere annual mean temperature 1961-90 average. Climate reconstruction is often based on indirect measurements from tree rings and other sources.

The variables in the dataset are available in Table 8.1. The original file has been obtained from the associated website of Fahrmeir and Kneib (2011). See Preface for details.

Variable |
Description |
---|---|

`year` |
Year of measurement (from AD 1 to 1979). |

`temp` |
Temperature anomaly. |

```
read.table(file = "data/moberg2005.raw.txt", header = TRUE)
climate <-summary(climate)
```

```
## year temp
## Min. : 1 Min. :-1.263
## 1st Qu.: 496 1st Qu.:-0.489
## Median : 990 Median :-0.353
## Mean : 990 Mean :-0.354
## 3rd Qu.:1484 3rd Qu.:-0.208
## Max. :1979 Max. : 0.372
```

The analysis will focus on estimating the main trend in the climate reconstruction time series. The model to be fit is an AR(1) on the year:

\[ y_t = \alpha + u_t + \varepsilon_t,\ t = 1, \ldots, 1979 \]

\[ u_1 \sim N(0, \tau_u (1-\rho^2)^{-1}) \]

\[ u_i = \rho u_{t-1} +\epsilon_t, \ t = 2, \ldots, 1979 \]

\[ \varepsilon_t \sim N(0, \tau_{\varepsilon}, t = 1, \ldots 1979 \]

Here, \(\alpha\) is an intercept, \(\rho\) a temporal correlation term (with \(|\rho| < 1\)) and \(\epsilon_t\) is a Gaussian error term with zero mean and precision \(\tau_u\). Note that variable \(y_t\) can be regarded as a Gaussian response with mean \(\alpha + u_t\) and a fixed large precision so that the error term is close to zero.

First of all, an AR(1) model will be fit:

```
inla(temp ~ 1 + f(year, model = "ar1"), data = climate,
climate.ar1 <-control.predictor = list(compute = TRUE)
)summary(climate.ar1)
```

```
##
## Call:
## c("inla(formula = temp ~ 1 + f(year, model = \"ar1\"), data =
## climate, ", " control.predictor = list(compute = TRUE))")
## Time used:
## Pre = 0.482, Running = 1.72, Post = 0.293, Total = 2.49
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) -0.352 0.023 -0.397 -0.352 -0.308 -0.353 0
##
## Random effects:
## Name Model
## year AR1 model
##
## Model hyperparameters:
## mean sd 0.025quant
## Precision for the Gaussian observations 6.66e+04 2.29e+04 3.10e+04
## Precision for year 2.07e+01 1.98e+00 1.72e+01
## Rho for year 9.09e-01 9.00e-03 8.89e-01
## 0.5quant 0.975quant mode
## Precision for the Gaussian observations 6.36e+04 1.22e+05 5.78e+04
## Precision for year 2.06e+01 2.51e+01 2.02e+01
## Rho for year 9.09e-01 9.24e-01 9.11e-01
##
## Expected number of effective parameters(stdev): 1971.33(2.78)
## Number of equivalent replicates : 1.00
##
## Marginal log-Likelihood: 1896.03
## Posterior marginals for the linear predictor and
## the fitted values are computed
```

This model does not provide a good fit because it fits the observed data
exactly, i.e., it overfits the data. For this reason, it is necessary to
provide some prior information about what we expect the variation of
the process to be. In the next example a prior Gamma with parameters 10 and
100 is used, so that the precision is centered at 0.1 and has a small variance
of \(0.001\). Note that this prior is closer to the scale of the actual
data than the default prior as the variance of `temp`

is
0.0484. The prior is set by using the `hyper`

argument inside the
`f()`

function where the `ar1`

latent effect is defined and in the
`control.family`

argument (for the precision of the likelihood).

```
inla(temp ~ 1 + f(year, model = "ar1",
climate.ar1 <-hyper = list(prec = list(param = c(10, 100)))), data = climate,
control.predictor = list(compute = TRUE),
control.compute = list(dic = TRUE, waic = TRUE, cpo = TRUE),
control.family = list(hyper = list(prec = list(param = c(10, 100))))
)summary(climate.ar1)
```

```
##
## Call:
## c("inla(formula = temp ~ 1 + f(year, model = \"ar1\", hyper =
## list(prec = list(param = c(10, ", " 100)))), data = climate,
## control.compute = list(dic = TRUE, ", " waic = TRUE, cpo = TRUE),
## control.predictor = list(compute = TRUE), ", " control.family =
## list(hyper = list(prec = list(param = c(10, ", " 100)))))")
## Time used:
## Pre = 0.224, Running = 2.76, Post = 0.141, Total = 3.12
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) -0.288 2.965 -6.161 -0.288 5.575 -0.288 0
##
## Random effects:
## Name Model
## year AR1 model
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant
## Precision for the Gaussian observations 8.033 0.259 7.535 8.029
## Precision for year 0.122 0.033 0.068 0.118
## Rho for year 1.000 0.000 1.000 1.000
## 0.975quant mode
## Precision for the Gaussian observations 8.555 8.022
## Precision for year 0.199 0.111
## Rho for year 1.000 1.000
##
## Expected number of effective parameters(stdev): 42.04(6.69)
## Number of equivalent replicates : 47.08
##
## Deviance Information Criterion (DIC) ...............: -50.99
## Deviance Information Criterion (DIC, saturated) ....: 434.63
## Effective number of parameters .....................: 42.51
##
## Watanabe-Akaike information criterion (WAIC) ...: -84.97
## Effective number of parameters .................: 8.38
##
## Marginal log-Likelihood: -757.94
## CPO and PIT are computed
##
## Posterior marginals for the linear predictor and
## the fitted values are computed
```

The next model to fit is a random walk of order 1:

\[ y_t = \alpha + u_t + \varepsilon_t,\ t = 1, \ldots 1979 \]

\[ u_t - u_{t-1} \sim N(0,\tau_u), \ t=2, \ldots, 1979 \]

\[ \varepsilon_t \sim N(0, \tau_{\varepsilon}), t = 1, \ldots 1979 \]

The model is fit similar to the `ar1`

models and we fix the precision
of the Gaussian likelihood:

```
inla(temp ~ 1 + f(year, model = "rw1", constr = FALSE,
climate.rw1 <-hyper = list(prec = list(param = c(10, 100)))), data = climate,
control.predictor = list(compute = TRUE),
control.compute = list(dic = TRUE, waic = TRUE, cpo = TRUE),
control.family = list(hyper = list(prec = list(param = c(10, 100))))
)summary(climate.rw1)
```

```
##
## Call:
## c("inla(formula = temp ~ 1 + f(year, model = \"rw1\", constr =
## FALSE, ", " hyper = list(prec = list(param = c(10, 100)))), data =
## climate, ", " control.compute = list(dic = TRUE, waic = TRUE, cpo
## = TRUE), ", " control.predictor = list(compute = TRUE),
## control.family = list(hyper = list(prec = list(param = c(10, ", "
## 100)))))")
## Time used:
## Pre = 0.201, Running = 4.09, Post = 0.308, Total = 4.6
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) 0.003 603.3 -1258 -0.032 1258 0.001 0
##
## Random effects:
## Name Model
## year RW1 model
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant
## Precision for the Gaussian observations 4.74 0.091 4.55 4.74
## Precision for year 4.00 0.065 3.85 4.00
## 0.975quant mode
## Precision for the Gaussian observations 4.91 4.75
## Precision for year 4.10 4.04
##
## Expected number of effective parameters(stdev): 946.31(7.85)
## Number of equivalent replicates : 2.09
##
## Deviance Information Criterion (DIC) ...............: 2422.54
## Deviance Information Criterion (DIC, saturated) ....: 1872.36
## Effective number of parameters .....................: 928.67
##
## Watanabe-Akaike information criterion (WAIC) ...: 1782.52
## Effective number of parameters .................: 233.11
##
## Marginal log-Likelihood: -2104.74
## CPO and PIT are computed
##
## Posterior marginals for the linear predictor and
## the fitted values are computed
```

The model summary shows a strong temporal correlation, with a posterior mean of the parameter of 1.

Figure 8.2 shows the posterior means and 95% credible
intervals of the autoregressive and random walks models fit to the temperature
data. The `ar1`

model produces smoother estimates than the `rw1`

model, which
seems to overfit the data. Although both models use the same priors on the
model precisions it is difficult to compare them directly unless the latent
effects are scaled (see Section 5.6).

For both models the AIC and WAIC have been computed, and these clearly favor
the `ar1`

model (probably because the `rw1`

overfits the data). Similarly, the CPO has been computed and they can be compared
(see Section 2.4):

```
# ar1
- sum(log(climate.ar1$cpo$cpo))
```

`## [1] -42.48`

```
# rw1
- sum(log(climate.rw1$cpo$cpo))
```

`## [1] 923.9`

Again, this criterion shows a preference for the `ar1`

model.

## 8.3 Non-Gaussian data

Similar models can be fit to non-Gaussian data. The next example uses the number
of yearly major earthquakes from 1900 to 2006 (Zucchini, MacDonald, and Langrock 2016). A major
earthquake is one with a magnitude of 7 or greater. These are available in the
`earthquake`

dataset provided by the `MixtureInf`

(Li, Chen, and Li 2016) package.

Variable |
Description |
---|---|

`number` |
Number of major earthquakes (magnitude 7 or greater). |

The `earthquake`

data can be loaded and summarized as follows
(norte that a new column with the year has been added as well):

```
library("MixtureInf")
data(earthquake)
#Add year
$year <- 1900:2006
earthquake
#Summary of the data
summary(earthquake)
```

```
## number year
## Min. : 6.0 Min. :1900
## 1st Qu.:15.0 1st Qu.:1926
## Median :18.0 Median :1953
## Mean :19.4 Mean :1953
## 3rd Qu.:23.0 3rd Qu.:1980
## Max. :41.0 Max. :2006
```

Figure 8.3 shows the time series of the number of major earthquakes from 1900 to 2006.

The first model to be fit to the `earthquake`

data is an `ar1`

:

\[ y_t \sim Po(\mu_t),\ t=1900, \ldots, 2006 \]

\[ \log(\mu_t) = \alpha + u_t,\ t=1900, \ldots, 2006 \]

\[ u_1 \sim N(0, \tau_u (1-\rho^2)^{-1}) \]

\[ u_i = \rho u_{t-1} +\epsilon_t, \ t = 2, \ldots, 1979 \]

\[ \epsilon_t \sim N(0, \tau_{\varepsilon}), \ t = 2, \ldots, 1979 \]

Note that now the distribution of the observed number of earthquakes is modeled using a Poisson distribution and that the linear predictor is linked to the mean using the natural logarithm.

```
inla(number ~ 1 + f(year, model = "ar1"), data = earthquake,
quake.ar1 <-family = "poisson", control.predictor = list(compute = TRUE))
summary(quake.ar1)
```

```
##
## Call:
## c("inla(formula = number ~ 1 + f(year, model = \"ar1\"), family =
## \"poisson\", ", " data = earthquake, control.predictor =
## list(compute = TRUE))" )
## Time used:
## Pre = 0.207, Running = 0.296, Post = 0.025, Total = 0.528
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) 2.863 0.19 2.436 2.877 3.191 2.891 0
##
## Random effects:
## Name Model
## year AR1 model
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant 0.975quant mode
## Precision for year 12.055 4.67 4.870 11.449 22.951 10.116
## Rho for year 0.889 0.05 0.773 0.896 0.966 0.914
##
## Expected number of effective parameters(stdev): 31.30(5.98)
## Number of equivalent replicates : 3.42
##
## Marginal log-Likelihood: -343.45
## Posterior marginals for the linear predictor and
## the fitted values are computed
```

Similar to the climate example, a random walk of order one can be fit to the data:

\[ y_t \sim Po(\mu_t),\ t=1900, \ldots, 2006 \]

\[ \log(\mu_t) = \alpha + u_t \]

\[ u_t - u_{t-1} \sim N(0,\tau_u), \ t=2, \ldots, 1979 \]

This model is fit using the code below:

```
inla(number ~ 1 + f(year, model = "rw1"), data = earthquake,
quake.rw1 <-family = "poisson", control.predictor = list(compute = TRUE))
summary(quake.rw1)
```

```
##
## Call:
## c("inla(formula = number ~ 1 + f(year, model = \"rw1\"), family =
## \"poisson\", ", " data = earthquake, control.predictor =
## list(compute = TRUE))" )
## Time used:
## Pre = 0.202, Running = 0.132, Post = 0.026, Total = 0.361
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) 2.923 0.023 2.877 2.923 2.968 2.923 0
##
## Random effects:
## Name Model
## year RW1 model
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant 0.975quant mode
## Precision for year 82.39 33.92 36.22 75.83 166.93 64.67
##
## Expected number of effective parameters(stdev): 26.73(4.89)
## Number of equivalent replicates : 4.00
##
## Marginal log-Likelihood: -342.67
## Posterior marginals for the linear predictor and
## the fitted values are computed
```

Figure 8.4 shows the models fit to the `earthquake`

dataset.
This includes the posterior means and 95% credible intervals. Apparently,
both models provide a similar fit to the data.

## 8.4 Forecasting

Time series forecasting in Bayesian inference can be regarded as fitting a
model with some missing observations in the response, which requires computing
the predictive distribution (see Section 12.3) at future times.
If the model being fit includes covariates these must also be available as well
for the years to be predicted. We will illustrate forecasting in time series
with the `earthquake`

dataset to try to estimate the future number of major
earthquakes and the uncertainty about these estimates.

First of all, we use a few new lines (from year 2007 to 2020) to forecast the
number of major earthquakes. These lines include the new time points (i.e.,
years) at which the prediction will be made and the number of earthquakes is
set to `NA`

for these new time points:

```
rbind(earthquake,
quake.pred <-data.frame(number = rep(NA, 14), year = 2007:2020))
```

Next, we fit the `ar1`

model to the new dataset so that the predictive
distributions are computed:

```
inla(number ~ 1 + f(year, model = "ar1"), data = quake.pred,
quake.ar1.pred <-family = "poisson", control.predictor = list(compute = TRUE, link = 1))
summary(quake.ar1.pred)
```

```
##
## Call:
## c("inla(formula = number ~ 1 + f(year, model = \"ar1\"), family =
## \"poisson\", ", " data = quake.pred, control.predictor =
## list(compute = TRUE, ", " link = 1))")
## Time used:
## Pre = 0.212, Running = 0.321, Post = 0.0253, Total = 0.559
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) 2.863 0.19 2.436 2.877 3.191 2.891 0
##
## Random effects:
## Name Model
## year AR1 model
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant 0.975quant mode
## Precision for year 12.055 4.67 4.870 11.449 22.951 10.116
## Rho for year 0.889 0.05 0.773 0.896 0.966 0.914
##
## Expected number of effective parameters(stdev): 31.30(5.98)
## Number of equivalent replicates : 3.42
##
## Marginal log-Likelihood: -343.45
## Posterior marginals for the linear predictor and
## the fitted values are computed
```

In the previous code, the use of `link = 1`

is required so that predicted
values are in the scale of the response, i.e., they are transformed taking the
inverse of the link function. See 4.6 for details about the
use of `link`

in non-Gaussian models.

The mode fit is the same as the one obtained without the missing observations. Figure 8.5 shows the fitted and predicted values together with 95% credible intervals. Note how these get wider in the period from 2007 to 2020 as time progresses from the last year with observed data (i.e., 2006).

## 8.5 Space-state models

A class of temporal models where coefficients and other latent effects are allowed to change in time is that of dynamic models (Ruiz-Cárdenas, Krainski, and Rue 2012). These models can be summarized as follows:

\[ y_t = \beta_t x_t + \epsilon_t,\ \epsilon_t \sim N(0, \tau_t),\ t=1,\ldots, n \]

\[ x_t = \gamma_t x_{t-1} + \omega_t,\ \omega_t \sim N(0, \nu_t),\ t=2,\ldots, n \]

Here, \(y_t\) is a temporal observation, \(x_t\) is a state latent parameter, \(\beta_t\) and \(\gamma_t\) are temporally varying coefficients, and \(\epsilon_t\) and \(\omega_t\) are temporal errors or perturbations which follow a Gaussian distribution with 0 mean and precisions \(\tau_t\) and \(\nu_t\), respectively.

Latent effects \(\mathbf{x} = (x_1, \ldots, x_n))\) represent different states
which need to be estimated. Note that they are not assigned a stochastic
distribution because they are constants to be estimated. This will
be simulated by modeling them using an `iid`

distribution with a very large
variance to conform to the unconstrained nature of these values.

Ruiz-Cárdenas, Krainski, and Rue (2012) describe how to use `INLA`

to fit some classes of
dynamic models when the data are fully observed. We will illustrate model fitting using a simple model.

\[ y_t = x_t + \nu_t,\ \nu_t \sim N(0, V),\ t=1,\ldots,n \]

\[ x_t = x_{t-1} + \omega_t,\ \omega_t \sim N(0,w),\ t = 2, \ldots, n \]

Note that in this case the temporal coefficients are constant and equal to one. Also, \((x_1,\ldots,x_n)\) is distributed as a random walk of order 1, which simplifies model fitting because the second equation above can be rewritten as:

\[ x_t - x_{t-1} = \omega_t,\ \omega_t \sim N(0,w),\ t = 2, \ldots, n \]

which is the definition of a first order random walk. Hence, model fitting here reduces to a model with a Gaussian response and random walk of order 1 in the linear predictor. However, we will describe the new approach developed in Ruiz-Cárdenas, Krainski, and Rue (2012) as this provides a more general framework to fit space-state models.

The second line in the previous equation can be rewritten as follows:

\[ 0 = x_t - x_{t-1} - \omega_t,\ \omega_t \sim N(0,w),\ t = 2, \ldots, n \]

This can be regarded as a submodel with “faked zero observations” which depends on the vector of latent effects \(\mathbf{x}\) and error terms \(\mathbf{\omega}\).

Hence, this model can be fit using two likelihoods (as explained in Section
6.4): one to fit the model that explains \(\mathbf{y}\) and the
second one to fit the model that explains the “fake zero observations”. Given
that both submodels depend on the same latent effects \(\mathbf{x}\), they can be
shared between the two models using the `copy`

feature described in Section
6.5.

### 8.5.1 Example: Alcohol consumption in Finland

As an example of the analysis of non-Gaussian data we will consider the
analysis of alcohol related deaths in Finland in the period 1969-2013 in the
age group between 30 and 39 years. This is available as dataset `alcohol`

in
package `KFAS`

(Helske 2017), which is a time series object.

Variable |
Description |
---|---|

`death at age 30-39` |
Number of alcohol related deaths (age group 30-39). |

`death at age 40-49` |
Number of alcohol related deaths (age group 40-49). |

`death at age 50-59` |
Number of alcohol related deaths (age group 50-59). |

`death at age 60-69` |
Number of alcohol related deaths (age group 60-69). |

`population by age 30-39` |
Population (age group 30-39) divided by 100000. |

`population by age 40-49` |
Population (age group 40-49) divided by 100000. |

`population by age 50-59` |
Population (age group 50-59) divided by 100000. |

`population by age 60-69` |
Population (age group 60-69) divided by 100000. |

The `alcohol`

dataset can be loaded with:

```
library("KFAS")
data(alcohol)
```

In order to fit a space-state model data need to be put in a two-column matrix
as the model will contain two likelihoods, as explained in Section
6.4. In the first column, we will include the number of
deaths in the age group 30-39, and in the second column the “fake zero”
observations.
Note that we have to remove the last row because it contains `NA`

’s.

```
nrow(alcohol) - 1 #There is an NA
n <- matrix(NA, ncol = 2, nrow = n + (n - 1))
Y <-1:n, 1] <- alcohol[1:n, 1]
Y[-c(1:n), 2] <- 0 Y[
```

The model will contain an offset with the population in the age group 30-39 for the data in the first column only:

```
#offset
c(alcohol[1:n, 5], rep(NA, n - 1)) oset <-
```

Next, a number of indices and weights for the latent effect need to be created
as well. These indices and weights are described in the formulation of the model
above. Note that all indices have length \(n\) + \(n+1\) because the model
will be made of two likelihoods. The first one is a Poisson likelihood
to model the response, and the second one is a Gaussian likelihood to
model the latent “fake zeroes” than define variable \(x_t\).
When an index only works on one likelihood, then the corresponding values
for the other likelihood will be set to `NA`

.

```
#x_t
c(1:n, 2:n)
i <-#x_(t-1) 2:n
c(rep(NA, n), 1:(n - 1))
j <-# Weight to have -1 * x_(t-1)
c(rep(NA, n), rep(-1, n - 1))
w1 <-#x_(t-1), 2:n
c(rep(NA, n), 2:n)
l <-# Weight to have * omega_(t-1)
c(rep(NA, n), rep(-1, n - 1)) w2 <-
```

The final model is fit as follows:

```
list(prec = list(param = c(0.001, 0.001)))
prec.prior <- inla(Y ~ 0 + offset(log(oset)) +
alc.inla <- f(i, model = "iid",
hyper = list(prec = list(initial = -10, fixed = TRUE))) +
f(j, w1, copy = "i") + f(l, w2, model = "iid"),
data = list(Y = Y, oset = oset), family = c("poisson", "gaussian"),
control.family = list(list(),
list(hyper = list(prec = list(initial = 10, fixed = TRUE)))),
control.predictor = list(compute = TRUE)
)
```

Note that the previous model does not impose any distribution of the values
of \(\mathbf{x}\) and they can take any values. For this reason, they
have been included in the linear predictor using an `iid`

model with
a fixed tiny precision equal to \(\exp(-10)\), which is equivalent to having
a large variance of \(\exp(10)\). That is, in practice effects in \(\mathbf{x}\)
can take any value. Furthermore, `control.family`

takes a list of two
values (because of the two likelihoods) but only the second one is set
because the first likelihoods do not require any extra constraints.
The second argument is to fix the precision of the Gaussian likelihood.

The estimates and model fitting are summarized here:

`summary(alc.inla)`

```
##
## Call:
## c("inla(formula = Y ~ 0 + offset(log(oset)) + f(i, model =
## \"iid\", ", " hyper = list(prec = list(initial = -10, fixed =
## TRUE))) + ", " f(j, w1, copy = \"i\") + f(l, w2, model = \"iid\"),
## family = c(\"poisson\", ", " \"gaussian\"), data = list(Y = Y,
## oset = oset), control.predictor = list(compute = TRUE), ", "
## control.family = list(list(), list(hyper = list(prec =
## list(initial = 10, ", " fixed = TRUE)))))")
## Time used:
## Pre = 0.321, Running = 0.129, Post = 0.0246, Total = 0.475
## Random effects:
## Name Model
## i IID model
## l IID model
## j Copy
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant 0.975quant mode
## Precision for l 111.08 53.31 43.38 99.63 245.91 81.26
##
## Expected number of effective parameters(stdev): 63.52(3.50)
## Number of equivalent replicates : 1.37
##
## Marginal log-Likelihood: -457.33
## Posterior marginals for the linear predictor and
## the fitted values are computed
```

Figure 8.6 shows the fitted values together with 95% credible intervals (shaded area). Note how the space-state model provides smoothed estimates.

## 8.6 Spatio-temporal models

Spatial models have already been covered in Chapter 7. When time is also available, it is possible to build spatio-temporal models that include spatial and temporal random effects, as well as interaction effects between space and time.

A spatio-temporal model is said to be *separable* when the space-time
covariance structure can be decomposed as a spatial and a temporal term. This
often means that the spatio-temporal covariance can be written as a Kronecker
product of a spatial and a temporal covariance (Fuentes, Chen, and Davis 2007)
*Non-separable* covariance structures do not allow for a separate modeling of
the spatial and temporal covariances (Wikle, Berliner, and Cressie 1998).

### 8.6.1 Separable models

Including separable spatial and temporal terms in the linear predictor
can be done with ease by using the `f()`

function to define two separate
effects on space and time. Spatial effects are usually of the classes
presented in Chapter 7, while temporal effects can
be those presented in this chapter. However, latent effects used for
spatial modeling can be used for temporal effects when the adjacency
structure used reflects dependence between different temporal points.

The `brainNM`

dataset from the `DClusterm`

package (Gómez-Rubio, Moraga, et al. 2019)
provides observed and expected counts of cases of brain cancer in the counties
of New Mexico from 1973 to 1991 in a spatio-temporal `STFDF`

object from the
`spacetime`

package (Pebesma 2012). The original data have been analyzed in
Kulldorff et al. (1998). See the Preface for more information about the original
data sources.

Table 8.4 shows the variables in the dataset, that can be loaded as follows:

```
library("DClusterm")
data(brainNM)
```

Variable |
Description |
---|---|

`Observed` |
Observed number of cases. |

`Expected` |
Expected number of cases. |

`SMR` |
Standardized Mortality Ratio. |

`Year` |
Year. |

`FIPS` |
County FIPS code. |

`ID` |
County ID (from 1 to 32). |

`IDLANL` |
Inverse distance to Los Alamos National Laboratory. |

`IDLANLre` |
Re-scaled value of `IDLANL` by dividing by its mean. |

A simple estimate of risk is the standardized mortality ratio (SMR) which is a simple estimate of the relative risk which is equal to the observed number of cases divided by the expected cases. Figure 8.7 shows the SMR for each county and year.

Before fitting the spatio-temporal model the adjacency structure of the
counties in New Mexico is computed using function `poly2nb()`

on the
spatial object of the `brainNM`

dataset.

```
poly2nb(brainst@sp)
nm.adj <- as(nb2mat(nm.adj, style = "B"), "Matrix") adj.mat <-
```

Next, the model is fit considering a separable model. In particular, the
spatial effect is modeled using an ICAR model and the temporal trend using a
`rw1`

latent effect. We will use vague priors (as in Chapter 3)
to avoid overfitting.

```
# Prior of precision
list(prec = list(param = c(0.001, 0.001)))
prec.prior <-
inla(Observed ~ 1 + f(Year, model = "rw1",
brain.st <-hyper = prec.prior) +
f(as.numeric(ID), model = "besag", graph = adj.mat,
hyper = prec.prior),
data = brainst@data, E = Expected, family = "poisson",
control.predictor = list(compute = TRUE, link = 1))
summary(brain.st)
```

```
##
## Call:
## c("inla(formula = Observed ~ 1 + f(Year, model = \"rw1\", hyper =
## prec.prior) + ", " f(as.numeric(ID), model = \"besag\", graph =
## adj.mat, hyper = prec.prior), ", " family = \"poisson\", data =
## brainst@data, E = Expected, control.predictor = list(compute =
## TRUE, ", " link = 1))")
## Time used:
## Pre = 0.305, Running = 0.557, Post = 0.0526, Total = 0.914
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) -0.048 0.038 -0.126 -0.047 0.025 -0.045 0
##
## Random effects:
## Name Model
## Year RW1 model
## as.numeric(ID) Besags ICAR model
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant 0.975quant
## Precision for Year 484.88 525.71 55.10 328.80 1856.82
## Precision for as.numeric(ID) 65.37 88.71 7.51 39.03 285.08
## mode
## Precision for Year 145.71
## Precision for as.numeric(ID) 17.88
##
## Expected number of effective parameters(stdev): 10.35(3.60)
## Number of equivalent replicates : 58.75
##
## Marginal log-Likelihood: -824.58
## Posterior marginals for the linear predictor and
## the fitted values are computed
```

Figure 8.8 shows the estimated relative risk. Note how several counties in the center of the state show an increased risk throughout the whole period.

This model assumes that the variation in each county is the sum of the spatial random effect and the overall temporal trend, i.e., there is no way to account for county-specific patterns. This may cause poor model fitting in some of the counties for certain years. Non-separable models may include space-time terms that account for this area-time interaction (Knorr-Held 2000).

### 8.6.2 Separable models with the `group`

option

The `f()`

function provides the `group`

argument to set an index
for the temporal structure of the data. This index will be used
to define different types of temporal dependence, whose definition
will be done via the `control.group`

argument in the `f()`

function.

Available models are

`names(inla.models()$group)`

```
## [1] "exchangeable" "exchangeablepos" "ar1" "ar"
## [5] "rw1" "rw2" "besag" "iid"
```

See Table 8.6 for a short summary of them. The options available
to define the type of grouped effect and other required parameters is available
in Table 8.5. Note that most of these options are only
required if the latent effect defined in `model`

needs them.

The resulting latent effect is a GMRF with zero mean and covariance matrix (Simpson 2016):

\[ \tau \Sigma_{b} \otimes \Sigma_{w} \]

where \(\Sigma_{b}\) is the structure of the covariance between the different groups and \(\Sigma_{w}\) the structure of the covariance matrix within each group.

The within-group effect is the one defined in the main `f()`

function,
while the between effect is the one defined using the `control.group`

argument and it is of `exchangeable`

type by default.

Argument |
Default |
Description |
---|---|---|

`model` |
`"exchangeable"` |
Type of model. |

`order` |
`NULL` |
Order of the model. |

`cyclic` |
`FALSE` |
Whether this is a cyclic effect. |

`graph` |
`NULL` |
Graph for spatial models. |

`scale.model` |
`TRUE` |
Whether to scale the model. |

`adjust.for.con.comp` |
`TRUE` |
Adjustment for non-connected components. |

`hyper` |
`NULL` |
Definition of hyperprior. |

Effect |
Description |
---|---|

`exchangeable` |
Exchangeable effect. |

`exchangeablepos` |
Exchangeable effect. |

`ar1` |
Autoregressive model of order 1. |

`ar` |
Autoregressive model of order \(p\). |

`rw1` |
Random walk of order 1. |

`rw2` |
Random walk of order 2. |

`besag` |
Besag spatial model. |

`iid` |
Independent and identically distributed random effects. |

A similar model can be fit using the `group`

argument. In this case,
variable `ID.Year`

is created to index the year (from 1 to 19) used
to define the `group`

. Argument `control.group`

takes a list with the model
(`ar1`

) and other arguments to define this effect could be added.
Index `ID2`

is created as a copy of the `ID`

index so that it can be used
in another latent effect.

```
@data$ID.Year <- brainst@data$Year - 1973 + 1
brainst@data$ID2 <- brainst@data$ID brainst
```

The model is then defined as in the code below. Note that the model includes only the spatio-temporal random effects. Note that this and the previous models are not exactly the same and that estimates may differ.

```
inla(Observed ~ 1 +
brain.st2 <- f(as.numeric(ID2), model = "besag", graph = adj.mat,
group = ID.Year, control.group = list(model = "ar1"),
hyper = prec.prior),
data = brainst@data, E = Expected, family = "poisson",
control.predictor = list(compute = TRUE, link = 1))
summary(brain.st2)
```

```
##
## Call:
## c("inla(formula = Observed ~ 1 + f(as.numeric(ID2), model =
## \"besag\", ", " graph = adj.mat, group = ID.Year, control.group =
## list(model = \"ar1\"), ", " hyper = prec.prior), family =
## \"poisson\", data = brainst@data, ", " E = Expected,
## control.predictor = list(compute = TRUE, link = 1))" )
## Time used:
## Pre = 0.252, Running = 11.4, Post = 0.0526, Total = 11.7
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) -0.027 0.038 -0.105 -0.026 0.042 -0.022 0
##
## Random effects:
## Name Model
## as.numeric(ID2) Besags ICAR model
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant 0.975quant
## Precision for as.numeric(ID2) 54.752 70.094 7.45 33.825 230.546
## GroupRho for as.numeric(ID2) 0.822 0.229 0.13 0.908 0.996
## mode
## Precision for as.numeric(ID2) 16.814
## GroupRho for as.numeric(ID2) 0.992
##
## Expected number of effective parameters(stdev): 9.40(7.14)
## Number of equivalent replicates : 64.64
##
## Marginal log-Likelihood: -1196.66
## Posterior marginals for the linear predictor and
## the fitted values are computed
```

Figure 8.9 show the posterior means of the relative risk.
Compared to the previous model this has a higher degree of smoothing. Note
that this may be due to the fact that in the previous model the two effects were
additive while now there is a single spatio-temporal effect. Furthermore, the
estimate of the correlation of the `ar1`

model is large, which means that the
model is assuming a high correlation (i.e., similarities) between consecutive
years and very similar estimates from year to another are obtained.

Non-separable models include an interaction term between space and time that
cannot be decomposed as the product of two terms. These models can be fit with
`INLA`

but are harder to define. The specification of these models is not
straightforward as it often requires imposing additional constraints on the
space-time random effects (Knorr-Held 2000). Function `inla.knmodels()`

can be
used to fit these models with `INLA`

. Goicoa et al. (2018) discuss the effect of
imposing constraints on the space-time random effects and their impact on the
estimates. Blangiardo and Cameletti (2015) cover some non-separable spatio-temporal
models in Chapter 7.

## 8.7 Final remarks

Several types of spatial and spatio-temporal models that `INLA`

can fit have
been described in this chapter. Some of the latent effects mentioned here are also described in
Chapter 3 and Chapter 9. For other models not
implemented in `INLA`

, the tools described in Chapter 11 can
be used to implement new latent effects. Krainski et al. (2018) also
propose a number of spatio-temporal models in the context of continuous
processes and point processes.

Other spatio-temporal models that could be fit with `INLA`

include those
proposed in Knorr-Held (2000), which can be fit using function `inla.knmodels()`

.
Note that these models require imposing further constraints on the latent
effects (see Section 6.6). Goicoa et al. (2018) discuss the effect of
imposing constraints on the space-time random effects and their impact on the
estimates.

Fitting non-separable spatio-temporal models with `INLA`

is more difficult.
Blangiardo and Cameletti (2015) cover some non-separable spatio-temporal models in
Chapter 7. New models could be implemented using the methods described in
Chapter 11 and, in particular, the `rgeneric`

approach.
Palmí-Perales, Gómez-Rubio, and Martínez-Beneito (2019) describe how to fit multivariate spatial latent effects with highly
structured precision matrices that are implemented using `rgeneric`

. This could
be a guide on how to implement non-separable models because they often are
based on specific parameterization of the spatio-temporal precision matrix.

### References

Blangiardo, Marta, and Michela Cameletti. 2015. *Spatial and Spatio-Temporal Bayesian Models with R-INLA*. Chichester, UK: John Wiley & Sons Ltd.

Fahrmeir, Ludwig, and Thomas Kneib. 2011. *Bayesian Smoothing Regression for Longitudinal, Spatial and Event History Data*. New York: Oxford University Press.

Fuentes, M., L. Chen, and J. M. Davis. 2007. “A Class of Nonseparable and Nonstationary Spatial Temporal Covariance Functions.” *Environmetrics* 19 (5): 487–507. https://doi.org/doi:10.1002/env.891.

Goicoa, T., A. Adin, M. D. Ugarte, and J. S. Hodges. 2018. “In Spatio-Temporal Disease Mapping Models, Identifiability Constraints Affect PQL and INLA Results.” *Stochastic Environmental Research and Risk Assessment* 3 (32): 749–70.

Gómez-Rubio, Virgilio, Paula Moraga, John Molitor, and Barry Rowlingson. 2019. “DClusterm: Model-Based Detection of Disease Clusters.” *Journal of Statistical Software, Articles* 90 (14): 1–26. https://doi.org/10.18637/jss.v090.i14.

Helske, Jouni. 2017. “KFAS: Exponential Family State Space Models in R.” *Journal of Statistical Software* 78 (10): 1–39. https://doi.org/10.18637/jss.v078.i10.

Knorr-Held, Leonhard. 2000. “Bayesian Modelling of Inseparable Space-Time Variation in Disease Risk.” *Statistics in Medicine* 19 (17‐18): 2555–67.

Krainski, Elias T., Virgilio Gómez-Rubio, Haakon Bakka, Amanda Lenzi, Daniela Castro-Camilo, Daniel Simpson, Finn Lindgren, and Håvard Rue. 2018. *Advanced Spatial Modeling with Stochastic Partial Differential Equations Using R and INLA*. Boca Raton, FL: Chapman & Hall/CRC.

Kulldorff, M., W. F. Athas, E. J. Feurer, B. A. Miller, and C. R. Key. 1998. “Evaluating Cluster Alarms: A Space-Time Scan Statistic and Brain Cancer in Los Alamos, New Mexico.” *American Journal of Public Health* 88: 1377–80.

Li, Shaoting, Jiahua Chen, and Pengfei Li. 2016. *MixtureInf: Inference for Finite Mixture Models*. https://CRAN.R-project.org/package=MixtureInf.

Moberg, Anders, Dmitry M. Sonechkin, Karin Holmgren, Nina M. Datsenko, and Wibjörn Karlén. 2005. “Highly Variable Northern Hemisphere Temperatures Reconstructed from Low- and High-Resolution Proxy Data.” *Nature* 443: 613–17.

Palmí-Perales, Francisco, Virgilio Gómez-Rubio, and Miguel A. Martínez-Beneito. 2019. “Bayesian Multivariate Spatial Models for Lattice Data with INLA.” *arXiv E-Prints*, September, arXiv:1909.10804. http://arxiv.org/abs/1909.10804.

Pebesma, Edzer. 2012. “spacetime: Spatio-Temporal Data in R.” *Journal of Statistical Software* 51 (7): 1–30. http://www.jstatsoft.org/v51/i07/.

Ruiz-Cárdenas, R., E. T. Krainski, and H. Rue. 2012. “Direct Fitting of Dynamic Models Using Integrated Nested Laplace Approximations — INLA.” *Computational Statistics & Data Analysis* 56 (6): 1808–28. https://doi.org/http://dx.doi.org/10.1016/j.csda.2011.10.024.

Simpson, Daniel. 2016. “INLA for Spatial Statistics. Grouped Models.” http://faculty.washington.edu/jonno/SISMIDmaterial/8-Groupedmodels.pdf.

Wikle, C. K., L. M. Berliner, and N. Cressie. 1998. “Hierarchical Bayesian Space-Time Models.” *Environmental and Ecological Statistics* 5: 117–54.

Zucchini, W., I. L. MacDonald, and R. Langrock. 2016. *Hidden Markov Models for Time Series: An Introduction Using R*. 2nd ed. CRC Press.