Chapter 7 Space-time models

In this chapter we detail how to fit separable space-time models. A separable space-time model is defined as a SPDE model for the spatial domain and an autoregressive model of order 1, i.e., AR(\(1\)), for the time dimension. The space-time separable model is defined by the Kronecker product between the precision matrices of the spatial and temporal random effects. Additional information about separable space-time models can be found in Cameletti et al. (2013).

In this chapter we start by showing two different ways to implement space-time models. The first one uses discrete time domain and the second one considers continuous time and discretizes this over a set of knots. The main difference in the model fitting process is that when we use continuous time, we need to choose time knots and to adjust the projector matrix to use these knots. However, none of the approaches requires the measurement locations to be the same over time.

In this chapter we focus on basic code examples, together with information on how to structure the models for faster computation. In the following chapter we will provide several advanced examples.

7.1 Discrete time domain

In this section we show how to fit a space-time separable model, as in Cameletti et al. (2013). Additionally, we show how to include a categorical covariate.

7.1.1 Data simulation

The study region considered in this example is the border of Paraná state, available in package INLA, as in Section 2.8. This boundary will be used as the domain of the spatial process and it can be loaded as:

The first step is to define the spatial model. To be able to fit the model quickly, we use the low resolution mesh for Paraná state border created in Section 2.6.

There are two options to simulate from the model proposed in Cameletti et al. (2013). The first one is based on the simultaneous distribution of the latent field and the second one is based on the conditional simulation at each time. This last option is easy to compute as each time point is simulated conditionally on the previous one, giving linear run time for long temporal simulations.

First, the time dimension is set to \(k=12\):

The locations of the points are the same as in dataset PRprec, but considered in a randomized order:

In the following simulation step we will use the book.rspde() function available in the file spde-book-functions.R. The \(k\) independent realizations of the spatial model can be generated as follows:

The number of space-time observations is the number of rows of x.k and it can be checked with:

Now, the autoregressive parameter \(\rho\) for the temporal effect is defined:

Next, temporal correlation is introduced:

Here, the \(\sqrt{1-\rho^2}\) term is added to make the process stationary in time, see Rue and Held (2005) and Cameletti et al. (2013). Figure 7.1 shows the realization of the space-time process.

Realization of the space-time random field.

Figure 7.1: Realization of the space-time random field.

In this example, a categorical covariate will be included in the model. We simulate a categorical covariate with three levels (labeled A, B and C):

The distribution of values of this categorical covariate is:

The regression coefficients and the regression parameters are:

The response variable will be computed by adding the fixed effect on the categorical covariate, the spatio-temporal random effect and some random white noise (with standard deviation 0.1):

The average value of the response on the levels of the categorical covariate are:

To show that we can use different locations at different times, some of the observations will be dropped. In particular, only half of the simulated data will be kept. This can be done by creating an index for the selected observations, as follows:

These data are then put together in a data.frame:

In real applications there may be completely misaligned locations across different times. The code we provide in this example will work in that situation.

7.1.2 Data stack preparation

We use the PC-priors derived in Fuglstad et al. (2018) for the model parameters range and marginal standard deviation. These are set when defining the SPDE, as follows:

Now, additional data preparation is required to build the space-time model. The index set is made taking into account the number of mesh points in the SPDE model and the number of groups, as:

Note that the index set for the latent field does not depend on the data set locations. It only depends on the SPDE model size and on the time dimension. The projection matrix is defined using the coordinates of the observed data. We need to pass the time index to the group argument to build the projector matrix and the inla.spde.make.A() function:

The effects in the stack is a list with two elements: the first one is the index set and the second one the categorical covariate. The stack data is defined as:

7.1.3 Fitting the model and some results

In this example, a PC-prior (see Section 1.6.5) is also used for the temporal autoregressive parameter, i.e., the autocorrelation parameter. In particular, this prior considers that \(P(cor>0)=0.9\) and it is defined as follows:

To deal with the categorical covariate we need to use expand.factor.strategy = 'inla' in the control.fixed argument list to get an intuitive result. Hence, model fitting is done as follows:

A summary of the three intercepts, together with the observed mean for each covariate level, is:

The posterior marginal distributions for the random field parameters and the marginal distribution for the temporal correlation are displayed in Figure 7.2.

Marginal posterior distribution for the precision of the Gaussian likelihood (top-left), the practical range (top-right), standard deviation of the field (bottom-left) and the temporal correlation (bottom-right). The red vertical lines are placed at the true values of the parameters.

Figure 7.2: Marginal posterior distribution for the precision of the Gaussian likelihood (top-left), the practical range (top-right), standard deviation of the field (bottom-left) and the temporal correlation (bottom-right). The red vertical lines are placed at the true values of the parameters.

7.1.4 A look at the posterior random field

The random field posterior distribution can be compared to the realized random field by means of the posterior mean, median, mode or any other quantile.

Before we get to this point, we need the index for the random field at the data locations:

The correlation between the simulated data response and the posterior mean of the predicted values can be computed as follows:

The correlation is almost one because there is no error term in the model.

We now compute predictions for each time point. First, a grid is defined in the same way as in the rainfall example in Section 2.8:

Then, the prediction for each time can be done as follows:

Next, we subset to the points of the grid inside the boundaries of Paraná state, and set the points of the grid out of the Paraná border to NA:

We visualize the result in Figure 7.3.

Visualization of the posterior mean of the space-time random field. Time flows from top to bottom and left to right.

Figure 7.3: Visualization of the posterior mean of the space-time random field. Time flows from top to bottom and left to right.

7.1.5 Validation

The inference results we just showed are based only on part of the simulated data. The other part of the simulated data can now be used for validation. Therefore, another data stack is required to compute posterior distributions for the validation data:

We compute a projection matrix and the data stack for the validation data:

Next, we join these two stacks together into a full stack and re-fit the model. We use the estimates of the hyperparameters obtained with the previous model to speed up computations:

Predicted values versus observed values have be plotted in Figure 7.4 to assess goodness of fit; they are in close agreement. The indices of the fitted values to be used when extracting the results from the inla object for plotting have been obtained with:

Validation: Observed values versus posterior means from the fitted model.

Figure 7.4: Validation: Observed values versus posterior means from the fitted model.

7.2 Continuous time domain

We now eliminate the assumption that the observations have been collected over discrete time points. This is the case for, e.g., fishing data, and space-time point processes in general. Similarly to how we use the Finite Element Method approach for space, we use a set of time knots to set up piecewise linear basis functions over time.

7.2.1 Data simulation

First, we set the spatial locations and sample time points from a continuous interval:

To sample from the current model, we define a space-time separable covariance function. We use a Matérn covariance in space and exponential decaying covariance function over time:

Function local.stcov() will be used to compute the covariance function at the simulated space-time points and to sample from the model:

7.2.2 Data stack preparation

To fit the space-time continuous model we must first define the time knots and the temporal mesh. For this, we define a one-dimensional mesh with 10 knots:

The knots in the resulting temporal mesh are the following:

We continue using the low resolution mesh for the border of Paraná state created in Section 2.6. This means that we can also re-use the SPDE model defined in the previous example.

The index set for the spatio-temporal model can be defined as:

The projection matrix considers both the spatial and temporal projection.
Hence, it needs the spatial mesh and the spatial locations, the time points and the temporal mesh. These are passed to function inla.spde.make.A as follows:

The effects in the data stack are a list with two elements:
the index set for the spatial effect and the categorical covariate. The stack data is defined as:

7.2.3 Fitting the model and some results

An exponential correlation function is used for time with parameter \(\kappa\) as the inverse range parameter. It gives a correlation between time knots equal to:

We fit the model using an AR(\(1\)) temporal correlation over the time knots as follows:

We summarize the posterior marginal distributions for the likelihood precision and the random field parameters:

The posterior marginal distributions of these parameters are shown in Figure 7.5, which includes the marginal distribution for the intercept, error precision, spatial range, standard deviation and temporal correlation in the space-time field.

Marginal posterior distribution for the intercept, likelihood precision and the parameters in the space-time process.

Figure 7.5: Marginal posterior distribution for the intercept, likelihood precision and the parameters in the space-time process.

7.3 Lowering the resolution of a spatio-temporal model

Model fitting can be challenging when dealing with large data sets. In this section we show techniques for lowering the resolution of the representation of the space-time random effect, to make model fitting faster.

First, we build the spatial mesh and the SPDE model using the rainfall data in Paraná state with the following code:

7.3.1 Data temporal aggregation

In this subsection we set up the data that we will use for the example in the next subsection. The reader may consider the dataframe df as an original binomial dataset, and how it was constructed is not of any large importance.

The data we analyze are composed of 616 location points observed over 365 days in Paraná state (Brazil). The response variable is daily rainfall. The dimension of the data.frame with this dataset and the first 7 variables from the first two rows:

In this example the aim is to analyze the probability of rain. Therefore we now convert this continuous dataset of rainfall amount into occurrence of rain. The response variable is whether rainfall was higher than \(0.1\) or not.

To reduce the size of the dataset, we will aggregate by summing over five consecutive days. We would model the original dataset with a Bernoulli, therefore the aggregated dataset is modeled by a binomial (because a sum of Bernoulli variables is binomial distributed). There will be many 5 day blocks with less than 5 observations, because of missing values, and these will give binomials with less than 5 trials.

First, a new index is created to group the days in groups of five days:

The number of raining days is obtained with:

Next, the number of days with observed data in each group of 5 days, i.e. the trials in our binomial likelihood, is computed as:

Now, the aggregated data has 73 time points.

From the table above, it can be seen how there are 3563 periods of five days with no data recorded. The first approach when dealing with these missing values could be to remove such pairs of data, both \(y\) and \(n\). If these are not removed, value NA has to be assigned to \(y\) when \(n=0\). However, \(n\) needs to be assigned a positive value (e.g., five). This is done as follows:

We set up all the variables in a dataframe:

7.3.2 Reducing the temporal resolution

This approach can be seen in the template code in Section 3.2 in Lindgren and Rue (2015) and has also been considered in the last example in Blangiardo and Cameletti (2015). The main idea is to place some knots over the time window and define the model at such knots. Then, the projection is defined from the time knots, similarly as is done for the spatial case with the mesh.

The knots are placed at every 6 time points of the temporally aggregated data, which has 73 time points altogether. So, in the end there are only 12 knots over time.

The model dimension is then 1152.

Then, when the projection matrix is computed, it is necessary to consider the temporal mesh and the group index in the scale of the data to be analyzed. The projection matrix with the definition of the spatial and temporal meshes used above can be obtained as follows:

The index set and the data stack are built as usual:

Note that in the previous code the values of the altitude have been rescaled by dividing them by 1000. In general, we need to rescale the covariates to get stable numerical inference.

The formula is also the usual for a separable spatio-temporal model:

In order to reduce computational time in this example, a number of options will be set in the call to inla(). In particular, the adaptive approximation (strategy = 'adaptive') and the Empirical Bayes integration strategy over the hyperparameters (int.strategy = 'eb') will be used. These options are passed in the control.inla argument of inla(). Furthermore, we start the optimizer at the initial values init:

The fitted spatial effect can be plotted for each temporal knot and overlay the proportion of raining days considering the data closest to the time knots. First, a grid to make the projection is required:

Next, the projection of the posterior mean fitted at each time knot is computed:

These projections are shown in Figure 7.6. We add the locations points with point size proportional to the rain occurrence in each period.

Spatial effect at each time knot obtanined with the spatio-temporal model fitted to the number of raining days in Paraná state (Brazil).

Figure 7.6: Spatial effect at each time knot obtanined with the spatio-temporal model fitted to the number of raining days in Paraná state (Brazil).

7.4 Conditional simulation: Combining two meshes

7.4.1 Motivation

There are a number of prediction problems that require modeling and prediction of the spatial, or spatio-temporal phenomenon over an extensive region, such as a country. However, in many situations, observations are only available in a limited part of the region. In this section we will discuss how to deal with this in a computationally efficient manner. Although practical computational issues are more common when dealing with spatio-temporal data, the example developed here focuses on spatial data to simplify the presentation. The same principle can be, however, easily extended to the spatio-temporal case.

We illustrate the case where there is a process that is only observed in part of the study region by using locations from Paraná state in Brazil. We use the boundary domain of the Paraná state and assume that we only have data from the left half part of the Paraná, while spatial prediction is desired for the entire state. Additionally, we consider fitting the model using a mesh around the available data and predicting using a mesh over the entire area of interest. These has been obtained with the following code and the resulting dataset has been plotted in Figure 7.7.

Problem setting: available data over half of the domain.

Figure 7.7: Problem setting: available data over half of the domain.

One way to obtain such predictions is to fit the model using a mesh that contains both, the locations where data is observed and the prediction locations (denoted by mesh2). Here we show a more efficient way, where the model is fitted using a mesh only around where the observations are placed (denoted by mesh1), and then conditional simulations are used to predict at the nodes of mesh2. This is achieved by taking advantage of numerical methods for sparse matrices applied to conditional simulation of GRMFs and the result is a considerable speedup in the computations when comparing to fitting the model directly using mesh2. After conditioning, the predictions at the data locations will be exactly the same values obtained from the fit using mesh1. In the geostatistics literature, this can be achieved with conditioning by kriging. The basis of the conditioning approach is to use the same covariance function for both the unconditional simulations and the predictions.

This is equivalent to the problem of sampling from a GMRF under the linear constraint

\[\begin{align} \mathbf{Ax} = \mathbf{b}, \end{align}\]

where \(\mathbf{A}\) is a \(n_1 \times n_2\) matrix, with \(n_1\) and \(n_2\) being the number of nodes in mesh1 and mesh2, respectively. The vector \(\mathbf{b}\) is the vector of constraints of length \(n_1\) and corresponds to the predicted latent field from the fit using mesh1.

A way to obtain the correct conditional distribution of \(\mathbf{x}^{*}\) is to sample from the unconstrained GMRF \(\mathbf{x} \sim \textrm{N}(\boldmath{\mu}, \mathbf{Q}^{-1})\) and then compute

\[\begin{align} \tag{7.1} \mathbf{x}^{*} = \mathbf{x} - \mathbf{Q}^{-1}\mathbf{A}^{T}(\mathbf{AQ}^{-1}\mathbf{A}^{T})^{-1}(\mathbf{Ax}-\mathbf{b}), \end{align}\]

where \(\mathbf{Q}\) is a \(n_2 \times n_2\) precision matrix and \(\mathbf{AQ}^{-1}\mathbf{A}^{T}\) is a dense matrix with dimensions equal to the number of constrains, that is, \(n_1 \times n_1\). To factorize \(\mathbf{AQ}^{-1}\mathbf{A}^{T}\), we can take advantage of the sparse structure of \(\mathbf{Q}\) and obtain fast computations for \(n_1 \ll n_2\).

7.4.2 Paraná state example

At each location \(i\), we suppose the following distribution for the data \(y_i\):

\[\begin{align} y_i \sim \textrm{N}(\eta_i, \sigma_{\epsilon}), \end{align}\]

with \(\sigma_{\epsilon}\) being an iid Gaussian noise and \(\eta_i\) being the linear predictor, defined as:

\[\begin{align} \eta_i = \beta_0 + u_i, \end{align}\]

where \(\beta_0\) is the intercept and \(u_i\) is a realization of a spatial random Gaussian field with Matérn covariance at the data locations \(i\).

We use simulated data to show how to obtain predictions in mesh2 after fitting the model using mesh1. To illustrate the ability of our method to predict using a different mesh, we assume that the data comes from the random field based on mesh2, namely \(u_i\).

We build mesh1 considering only the data from the western half of Paraná state:

Next, we create mesh2 such that all the nodes from mesh1 are also nodes of mesh2. In addition, we use the border of the Paraná state to define the high resolution interior of the mesh2. In order to implement both these restrictions we first create an auxiliary mesh mesh2a considering the border of the Paraná state. Then we use the locations from mesh1 and from the auxiliary mesh to create mesh2.

As a result, mesh1 has 379 nodes and mesh2 has 1477 nodes. We can see these two meshes in Figure 7.8.

The first mesh, together with the data locations in blue (left). Mesh mesh2, which will be used for predictions, and the points of the first mesh represented as red points (right). The inner blue polygon shows the Paraná state border.

Figure 7.8: The first mesh, together with the data locations in blue (left). Mesh mesh2, which will be used for predictions, and the points of the first mesh represented as red points (right). The inner blue polygon shows the Paraná state border.

To simulate data, we need to fix the range and standard deviation and then define the SPDE models for both meshs as follows:

The precision matrices for both SPDE models are built with:

Simulation of the random field at the nodes of mesh2 can be performed as follows:

We complete the data simulation by projecting the mesh nodes into the observed data points and adding an iid noise. We also build the projection matrix for mesh1, which will be used for fitting the model:

We now sample the spatial field and iid Gaussian noise at the observation locations:

The stack data includes the intercept and the SPDE model defined at mesh1:

7.4.4 Obtaining predictions

Before proceeding to the actual prediction, we need to sample from the posterior distribution using the fitted model. We draw 100 samples from the posterior considering the internal parametrization for the hyperparameters with:

We can find the indices for the spatial random effect \(i\) in the following way:

For each sample from the posterior distribution, the following code produces predictions of the latent field \(\mathbf{u}\) at the nodes of mesh2 constrained on the predictions at the nodes of mesh1 being equal of the values of the latent field from the posterior samples generated from fitting the model with mesh1. This code is based on Equation (7.1), but with additional complications to achieve greater computational speed:

Notice that the code above computes the Cholesky factorization of the precision matrix \(\mathbf{Q}\) rapidly by taking the sparsity of this matrix into account. This makes it possible to obtain fast predictions in a large number of locations. Another possibility is to fit the model directly using mesh2 instead of mesh1. However, this would require a lot more computational time and results would be similar to the procedure shown here.

To produce maps of the predicted random field in a fine grid, we compute a projection matrix for a grid of points over a square that contains the locations of the border of the Paraná state.

Then, we can obtain the projection of the simulated random field and compare with the projection of the posterior mean of the predictions. Missing values are assigned to the grid points that are outside of the Paraná state:

The simulated field (top), the estimated posterior mean (middle) and the posterior marginal standard deviation (bottom).

Figure 7.9: The simulated field (top), the estimated posterior mean (middle) and the posterior marginal standard deviation (bottom).

References

Cameletti, M., F. Lindgren, D. Simpson, and H. Rue. 2013. “Spatio-Temporal Modeling of Particulate Matter Concentration Through the Spde Approach.” Advances in Statistical Analysis 97 (2): 109–31.

Rue, H., and L. Held. 2005. Gaussian Markov Random Fields: Theory and Applications. Monographs on Statistics & Applied Probability. Boca Raton, FL: Chapman & Hall.

Fuglstad, G-A., D. Simpson, F. Lindgren, and H. Rue. 2018. “Constructing Priors That Penalize the Complexity of Gaussian Random Fields.” Journal of the American Statistical Association to appear. Taylor & Francis. https://doi.org/10.1080/01621459.2017.1415907.

Lindgren, F., and H. Rue. 2015. “Bayesian Spatial and Spatio-Temporal Modelling with R-INLA.” Journal of Statistical Software 63 (19).

Blangiardo, M., and M. Cameletti. 2015. Spatial and SpatioTemporal Bayesian Models with R-INLA. Chichester, UK: John Wiley & Sons, Ltd.