Chapter 8 Space-time applications

In this chapter we generalize some of the examples presented in the book so far to space-time. In particular, we consider space-time coregionalization models, dynamic regression models, space-time point processes, and space-time Hurdle models.

8.1 Space-time coregionalization model

In this section we generalize the coregionalization model found in Section 3.1 to a space-time model. This model is very similar, but much more computationally demanding. Because of this, we use a cruder mesh in our example than what we would usually recommend.

8.1.1 Model and parametrization

The model is similar to the spatial model:

\[ y_1(s,t) = \alpha_1 + z_1(s,t) + e_1(s,t) \]

\[ y_2(s,t) = \alpha_2 + \lambda_1 z_1(s,t) + z_2(s,t) + e_2(s,t) \]

\[ y_3(s,t) = \alpha_3 + \lambda_2 z_1(s,t) + \lambda_3 z_2(s,t) + z_3(s,t) + e_3(s,t) \]

Here, \(z_{k}(s,t)\) are space-time effects and \(e_{k}(s,t)\) are uncorrelated error terms, \(k=1,2,3\).

8.1.2 Data simulation

First of all, these are the values of the parameters that will be used to simulate the data:

We use the same spatial and temporal locations for all response variables. Note that in Section 3.1 we use different spatial locations, and that it is also possible to use different time points (when using a temporal mesh).

The locations are simulated as follows:

Then, the book.rMatern() function defined in Section 2.1.4 will be used to simulate independent random field realizations for each time:

The temporal dependency is modeled as an autoregressive first order process, the same as was used in Chapter 7.

We use the constants \(\sqrt{(1-\rho_j^2)},\ j=1,2,3\) to ensure that the samples are taken from the stationary distribution.

Then the response variables are sampled:

8.1.3 Model fitting

We define a crude mesh to save computational time:

Similarly as in previous examples, the SPDE model will consider the PC-priors derived in Fuglstad et al. (2018) for the model parameters as the range, \(\sqrt{8\nu}/\kappa\), and the marginal standard deviation. These are set when defining the SPDE latent effect:

Indices for the space-time fields and for the copies need to be defined as well. As the same mesh is considered in all effects, these indices are the same for all the effects:

The prior on \(\rho_j\) is chosen as a Penalized Complexity prior (Simpson et al. 2017) as well:

The prior above is chosen to consider \(P(\rho_j > 0)=0.9\).

Priors for each of the copy parameters are Gaussian with zero mean and precision 10:

The formula, which includes all the terms in the model and the priors previously defined, is:

The projector matrix is defined as:

Note that in this example the projector matrices (the \(\mathbf{A}\)-matrix) are all equal for the different time points because all points have the same coordinates at different times, but the projector matrix can be different when observations at different times are at different locations.

Then data are organized in three data stacks, which are joined:

Another PC-prior is considered for the precision of the errors (Simpson et al. 2017) in the three likelihoods in the model:

This model has 15 hyperparameters. To make the optimization process fast, the parameter values used in the simulation will be used as the initial values:

Then, the model is fitted with:

Computation time for this model in seconds is:

##     Pre Running    Post   Total 
##  5.3354 52.4424  0.2659 58.0437

Table 8.1 summarizes the posterior marginal distributions of the parameters in the model. These include the intercepts, precisions of the errors, temporal correlations, copy parameters, and range and standard deviations of the random fields.

Table 8.1: Summary of the posterior distributions of the parameters in the model.
Parameter True Mean St. Dev. 2.5% quant. 97.5% quant.
intercept1 -5.00 -5.0261 0.1344 -5.2901 -4.7624
intercept2 3.00 3.1673 0.2071 2.7607 3.5736
intercept3 10.00 9.7655 0.2802 9.2154 10.3152
e1 11.11 16.0295 2.3083 11.9426 21.0135
e2 25.00 14.6702 2.4464 10.4295 20.0336
e3 44.44 15.2115 2.2840 11.1578 20.1285
GroupRho for s1 0.70 0.8737 0.0438 0.7707 0.9411
GroupRho for s2 0.80 0.9040 0.0355 0.8192 0.9569
GroupRho for s3 0.90 0.9829 0.0101 0.9577 0.9961
Beta for s12 0.70 0.6629 0.1294 0.4100 0.9184
Beta for s13 0.50 0.5157 0.1214 0.2790 0.7557
Beta for s23 -0.50 -0.5119 0.1357 -0.7823 -0.2479
Range for s1 0.20 0.1515 0.0401 0.0849 0.2416
Range for s2 0.30 0.2462 0.0622 0.1441 0.3872
Range for s3 0.40 0.2389 0.0612 0.1409 0.3802
Stdev for s1 0.50 0.7366 0.1246 0.5285 1.0161
Stdev for s2 0.60 0.6778 0.0918 0.5168 0.8767
Stdev for s3 0.70 0.9177 0.1224 0.7035 1.1832

The posterior mean for each random field is projected to the observation locations and shown against the simulated correspondent fields in Figure 8.1.

True and fitted random field values.

Figure 8.1: True and fitted random field values.

Remember that the crude mesh leads to a crude approximation for the spatial covariance. This is not recommended when fitting a model in practice. However, this setting can be considered to obtain initial results, and for illustrative code examples. In this particular case, it seems that the method provided reasonable estimates of the model parameters.

8.2 Dynamic regression example

There is large literature about dynamic models, which includes some books, such as West and Harrison (1997) and Petris, Petroni, and Campagnoli (2009). These models basically define a hierarchical framework for a class of time series models. A particular case is the dynamic regression model, where the regression coefficients are modeled as time series. That is the case when the regression coefficients vary smoothly over time.

8.2.1 Dynamic space-time regression

The specific class of models for spatially structured time series was proposed in Gelfand et al. (2003), where the regression coefficients vary smoothly over time and space. For the areal data case, the use of proper Gaussian Markov random fields (PGMRF) over space has been proposed in Vivar and Ferreira (2009). There exists a particular class of such models called ``spatially varying coefficient models’’, in which the regression coefficients vary over space. See, for example, Assunção, Gamerman, and Assunção (1999), Assunção, Potter, and Cavenaghi (2002) and Gamerman, Moreira, and Rue (2003).

In Gelfand et al. (2003), the Gibbs sampler was used for inference and it was claimed that a better algorithm is needed due to strong autocorrelations. In Vivar and Ferreira (2009), the use of forward information filtering and backward sampling (FIFBS) recursions were proposed. Both MCMC algorithms are computationally expensive.

The FIFBS algorithm can be avoided as a relation between the Kalman-filter and the Cholesky factorization is proposed in Knorr-Held and Rue (2002). The Cholesky factorization is more general and has a better performance when using sparse matrix methods (p. 149, Rue and Held 2005). Additionally, the restriction that the prior for the latent field has to be proper can be avoided.

When the likelihood is Gaussian, there is no approximation needed in the inference process since the distribution of the latent field given the data and the hyperparameters is Gaussian. So, the main task is to perform inference for the hyperparameters in the model. For this, the mode and curvature around can be found without any sampling method. For the class of models in Vivar and Ferreira (2009) it is natural to use INLA, as shown in Ruiz-Cárdenas, Krainski, and Rue (2012), and for the models in Gelfand et al. (2003), the SPDE approach can be used when considering the Matérn covariance for the spatial part.

In this example, it will be shown how to fit the space-time dynamic regression model as discussed in Gelfand et al. (2003), considering the Matérn spatial covariance and the AR(1) model for time, which corresponds to the exponential correlation function. This particular covariance choice corresponds to the model in Cameletti et al. (2013), where only the intercept is dynamic. Here, the considered case is that of a dynamic intercept and a dynamic regression coefficient for a harmonic over time.

8.2.2 Simulation from the model

In order to simulate some data to fit the model, the spatial locations are sampled first, as follows:

To sample from a random field at a set of locations, the book.rMatern() function defined in the Section 2.1.4 will be used to simulate independent random field realizations for each time.

\(k\) (number of time points) samples will be drawn from the random field. Then, they are temporally correlated considering the time autoregression:

Here, the \((1-\rho_j^2)\) term appears because it is in parametrization of the AR(\(1\)) model in INLA.

To get the response, the harmonic is defined as a function over time, and then the mean and the error terms are added up:

8.2.3 Fitting the model

There are two space-time terms in the model, each one with three hyperparameters: precision, spatial scale and temporal scale (or temporal correlation). So, considering the likelihood precision, there are \(7\) hyperparameters in total. To perform fast inference, a crude mesh with a small number of vertices is chosen:

This mesh has 195 points.

As in previous examples, the SPDE model will consider the PC-priors derived in Fuglstad et al. (2018) for the model parameters as the practical range, \(\sqrt{8\nu}/\kappa\), and the marginal standard deviation:

A different index is needed for each call to the f() function, even if they are the same, so:

In the SPDE approach, the space-time model is defined at a set of mesh nodes. As a continuous time is being considered, it is also defined on a set of time knots. So, it is necessary to deal with the projection from the model domain (nodes, knots) to the space-time data locations. For the intercept, it is the same way as in previous examples. For the regression coefficients, all that is required is to multiply the projector matrix by the covariate vector column, i. e., each column of the projector matrix is multiplied by the covariate vector. This can be seen from the following structure of the linear predictor \(\boldeta\):

\[ \begin{array}{rcl} \boldeta & = & \mu_{\beta_0} + \mu_{\beta_2}\mb{h} + \mb{A} \mb{\beta}_0 + (\mb{A} \mb{\beta}_1) \mb{h} \nonumber \\ & = & \mu_{\beta_0} + \mu_{\beta_1}\mb{h} + \mb{A} \mb{\beta}_0 + (\mb{A} \oplus (\mb{h}\mb{1}^{\top}))\mb{\beta}_1 \end{array} \tag{8.1} \]

Here, \(\mb{A} \oplus (\mb{h} \mb{1}^{\top})\) is the row-wise Kronecker product between \(\mb{A}\) and vector \(\mb{h}\) (with length equal the number of rows in \(\mb{A}\)) expressed as the Kronecker sum of \(\mb{A}\) and \(\mb{h}\mb{1}^{\top}\). This operation can be performed using the inla.row.kron() function and is done internally in the function inla.spde.make.A() when supplying a vector in the weights argument.

The space-time projector matrix \(\mb{A}\) is defined as follows:

The data stack is as follows:

Here, i0 is similar to i1 and variables mu1 and h in the second element of the effects data.frame are for \(\mu_{\beta_0}\), \(\mu_{\beta_1}\) and \(\mu_{\beta_2}\).

The formula considered in this model takes the following effects into account:

As the model considers a Gaussian likelihood, there is no approximation in the fitting process. The first step of the INLA algorithm is the optimization to find the mode of the \(7\) hyperparameters in the model. By choosing good starting values, fewer iterations will be needed in this optimization process. Below, starting values are defined for the hyperparameters in the internal scale considering the values used to simulate the data:

The integration step when using the CCD strategy will integrate over 79 hyperparameter configurations, as we have \(7\) hyperparameters. For complex models, model fitting may take a few minutes. A bigger tolerance value in inla.control can be set to reduce the number of posterior evaluations, which will also reduce computational time. However, in the following inla() call we avoid it by using an Empirical Bayes strategy.

Finally, model fitting considering the initial values defined above will be done as follows:

The time required to fit this model has been:

Summary of the posterior marginals of \(\mu_{\beta_1}\), \(\mu_{\beta_2}\) and the likelihood precision (i.e., \(1/\sigma^2_e\)) are available in Table 8.2.

Table 8.2: Summary of the posterior distributions of the parameters in the model.
Parameter True Mean St. Dev. 2.5% quant. 97.5% quant.
\(\mu_{\beta_1}\) -5 -4.7789 0.2022 -5.1759 -4.382
\(\mu_{\beta_2}\) 1 0.9303 0.0587 0.8151 1.046
\(1/\sigma^2_e\) 20 10.8342 0.4940 9.9004 11.841

The posterior marginal distributions for the range, standard deviation and autocorrelation parameter for each spatio-temporal process are in Figure 8.2.

Posterior marginal distributions for the hyperparameters of the space-time fields. Red lines represent the true values of the parameters.

Figure 8.2: Posterior marginal distributions for the hyperparameters of the space-time fields. Red lines represent the true values of the parameters.

In order to look deeper into the posterior means of the dynamic coefficients, the correlation between the mean of the simulated values and the corresponding posterior means have been computed:

8.3 Space-time point process: Burkitt example

In this section a model for space-time point processes is developed and applied to a real dataset.

8.3.1 The dataset

The model developed in this section will be applied in the analysis of the burkitt dataset from the splancs package (Rowlingson and Diggle 1993). This dataset records cases of Burkitt’s lymphoma in the Western Nile district of Uganda during the period 1960-1975 (see, Bailey and Gatrell 1995, Chapter 3). This dataset contains the five columns described in Table 8.3.

Table 8.3: Description of the burkitt dataset, which records cases of Burkitt’s lymphoma in Uganda.
Variable Description
x Easting
y Northing
t Day, starting at 1/1/1960 of onset
age age of child patient
dates Day, as string yy-mm-dd

This dataset can be loaded as follows:

The spatial coordinates and time values can be summarized as follows:

A set of knots over time needs to be defined in order to fit a SPDE spatio-temporal model. It is then used to build a temporal mesh, as follows:

Figure 8.3 shows the temporal mesh as well as the times at which the events occurred.

Time when each event occurred (black) and knots used for inference (blue).

Figure 8.3: Time when each event occurred (black) and knots used for inference (blue).

The spatial mesh can be created using the polygon of the region as a boundary. The domain polygon can be converted into a SpatialPolygons class with:

This boundary is then used to compute the mesh:

Again, the SPDE model is defined to use the PC-priors derived in Fuglstad et al. (2018) for the range and the marginal standard deviation. These are defined now:

The spatio temporal projection matrix is made considering both spatial and temporal locations and both spatial and temporal meshes, as follows:

The dimension of the resulting projector matrix is:

Internally, the inla.spde.make.A() function makes a row Kronecker product (see manual page of function inla.row.kron()) between the spatial projector matrix and the group (temporal dimension, in our case) projector one. This matrix has number of columns equal to the number of nodes in the mesh times the number of groups.

The index set is made considering the group feature:

The data stack can be made considering the ideas for the purely spatial model. So, it is necessary to consider the expected number of cases at the integration points and the data locations. For the integration points, it is the space-time volume computed for each mesh node and time knot, considering the spatial area of the dual mesh polygons, as in Chapter 4, times the length of the time window at each time point. For the data locations, it is zero as for a point the expectation is zero, as in the likelihood approximation proposed by Simpson et al. (2016).

The dual mesh is extracted considering function book.mesh.dual(), available in file spde-book-functions.R, as follows:

Then, the intersection with each polygon from the dual mesh is computed using functions gIntersection(), from the rgeos package, as:

The sum of all the weights is equal to \(1.1035\times 10^{4}\). This is the same as the domain area:

The spatio-temporal volume is the product of these values and the time window length of each time knot. It is computed here:

The data stack is built using the following lines of R code:

Finally, model fitting will be done using the cruder Gaussian approximation:

The exponential of the intercept plus the random effect at each space-time integration point is the relative risk at each of these points. This relative risk times the space-time volume will give the expected number of points (E(n)) at each one of these space-time locations. Summing over them will give a value that approaches the number of observations:

The posterior marginal distributions for the intercept and the other parameters in the model have been plotted in Figure 8.4.

Intercept and random field parameters posterior marginal distributions.

Figure 8.4: Intercept and random field parameters posterior marginal distributions.

The projection over a grid for each time knot can be computed as:

The fitted latent field at each time knot has been displayed in Figure 8.5. A similar plot could be produced for the standard deviation.

Fitted latent field at each time knot overlayed by the points closer in time.

Figure 8.5: Fitted latent field at each time knot overlayed by the points closer in time.

8.4 Large point process dataset

In this section an approach to fit a spatio-temporal log-Gaussian Cox point process model for a large dataset is shown using a simulated dataset.

8.4.1 Simulated dataset

The dataset will be simulated by drawing samples from a separable space-time intensity function. We assume that the logarithm of the intensity function is a Gaussian process. This space-time point process can be sampled in two steps. First, a sample from a separable spacetime Gaussian process is drawn. Second, the point process is sampled conditional to this realization.

The separable space-time covariance assumed considers a Matérn covariance for space and the Exponential for time. In this case the temporal correlation at lag \(\delta t\) is \(\textrm{e}^{-\theta \delta t}\). Considering this continuous process sampled at equally spaced intervals \(t_1, t_2, ...\), with \(t_2-t_1=\delta t\), then the correlation can be expressed as \(\rho=\textrm{e}^{-\theta \delta t}\). If \(\delta t=1\) we have \(\rho=\textrm{e}^{-\theta}\). This establishes a link with the first order autoregression that we will consider in the fitting process, where \(\rho\) is the lag one correlation parameter.

The sample is drawn using the lgcp package (Taylor et al. 2013). We have to specify the parameters for the Gaussian process. These are the marginal standard deviation \(\sigma\), the spatial correlation parameter \(\phi\) (which gives us \(\sqrt{\phi}\) as the spatial range in our parametrization) and the temporal correlation parameter \(\theta=-\log(\rho)\). These parameters are passed to the lgcpSim() function considering the lgcppars() function.

There are two additional parameters for the lgcpSim() function which are related to the mean of the Gaussian latent process, the intercept \(\mu\) and \(\beta\) that is used in case of covariate. We can increase \(\mu\) in order to increase the intensity function and then increase the number of points in the sample. The expected number of points in the sample depends on the mean of the intensity function which is modeled by the mean of the latent field, the variance of the latent field, the size of the spatial domain and the length of the time window as \(\textrm{E}(N) = \exp(\mu + \sigma^2/2) * V\), where \(V\) is the area of the spatial domain times the time length.

First, the spatial domain is defined as follows:

Then, it is converted into an object of the SpatialPolygons class:

The area can be computed as:

We can now define the model parameters:

Then we use the lgcpSim() function to sample the points:

In the previous code we have used the require() function to check whether the lgcp package can be loaded. The lgcp package depends on the rpanel package (Bowman et al. 2010) which in turn depends on the TCL/TK widget library BWidget. This is a system dependence, which cannot be installed from R, and may not be available on all systems by default. In case the BWidget library is not installed locally, the lgcp package will fail to install and the code above cannot be run, but the simulated data can be downloaded from the book website in order to run the examples below.

In order to fit the model, a discretization over space and over time needs to be defined. For the temporal domain, a temporal mesh based on a number of time knots will be used:

In order to consider fast computations, we lower the mesh resolution. However, it has to be tuned with the range of the spatial process. One should think about the problem of having a too coarse mesh that may not represent the Matérn field. One way to consider this is to try with a coarse mesh and look at the estimated range and then improve from there if necessary. It is better to avoid having the spatial range smaller than the mesh edge length.

The spatial mesh is defined using the domain polygon:

Figure 8.6 shows a plot of a sample of the data over time and the time knots, as well as a plot of the data over space and the spatial mesh.

Time for a sample of the events (black), time knots (blue) in the upper plot. Spatial locations of another sample on the spatial domain (bottom plot).

Figure 8.6: Time for a sample of the events (black), time knots (blue) in the upper plot. Spatial locations of another sample on the spatial domain (bottom plot).

8.4.2 Space-time aggregation

For large datasets it can be computationally demanding to fit the model. The problem is that the dimension of the model will be \(n + m * k\), where \(n\) is the number of data points, \(m\) is the number of nodes in the mesh and \(k\) is the number of time knots. In this section the approach chosen to deal with the model is to aggregate the data in a way that the problem is reduced to one of dimension \(2*m*k\). So, this approach really makes sense when \(n\gg m*k\).

Data will be aggregated according to the integration points to make the fitting process easier. Dual mesh polygons will also be considered, as shown in Chapter 4.

So, the first step is to find the Voronoi polygons for the mesh nodes:

Then, these are converted into a SpatialPolygons object, as follows:

The next step is to find to which polygon each data point belongs:

Similarly, it is necessary to find to which part of the time mesh each data point belongs:

The distribution of data points on the time knots is summarized here:

Then, both identification index sets are used to aggregate the data:

The resulting data.frame contains the area, time span and frequency of the aggregated data:

The expected number of cases needs to be defined (at least) proportional to the area of the polygons times the width length of the time knots. Computing the intersection area of each polygon with the domain (show the sum) is done as follows:

A summary of the areas of the polygons is:

The total sum of the weights is \(50.2655\) and the area of the spatial domain is:

The time length (domain) is 12 and the width of each knot is

Here, the knots at the boundaries of the time period have a lower width than the internal ones.

Since the intensity function is the number of cases per volume unit, with \(n\) cases the intensity varies about the average number of cases (intensity) by unit volume. This quantity is related to the intercept in the model. Actually, the log of it is an estimative of the intercept in the model without the space-time effect. See below:

The space-time volume (area unit per time unit) at each polygon and time knot is:

8.4.3 Model fitting

The projector matrix, SPDE model object and the space-time index set definition are computed as follows:

The data stack is defined as:

The formula to fit the model considers the intercept, spatial effect and temporal effect:

Finally, model fitting is carried out:

The value of \(\mu\) and the intercept summary can be obtained as follows:

The expected number of cases at each integration point can be used to compute the total expected number of cases (Est. N below), as:

A summary for the hyperparameters can be obtained with this R code:

The spatial surface at each time knot can be computed as well:

Figure 8.7 shows the predicted surfaces at each time knot.

Spatial surface fitted at each time knot overlayed by the point pattern formed by the points nearest to each time knot.

Figure 8.7: Spatial surface fitted at each time knot overlayed by the point pattern formed by the points nearest to each time knot.

8.5 Accumulated rainfall: Hurdle Gamma model

For some applications it is possible to have the outcome be zero or a positive number. Common examples are fish biomass and accumulated rainfall. In this case one can build a model that accommodates the zero and positive outcome considering a combination of two likelihoods, one to model the occurrence and another to model the amount. One case is when considering the Bernoulli distribution for the occurrence and the Gamma for the amount. The advantage of having this two-part model is that we can model the probability of rain and the rainfall amount separately. It may be the case that some terms in each part can be shared.

8.5.1 The model

We will consider the daily rainfall data considered in Section 2.8. Let

\[\begin{align} z_{i,t} = \begin{cases} 1, & \text{ if it has rained at location } \textbf{s}_i \text{ and time } t \\ \\ 0, & \text{otherwise} \end{cases} \end{align}\]

and the rainfall amount as

\[\begin{align} y_{i,t} = \begin{cases} \text{NA}, &\text{if it did not rain at}\\ & \text{location } \textbf{s}_i \text{ and time } t\\ \text{rainfall amount at location } \textbf{s}_i \text{ and time } t, & \text{otherwise} \end{cases} \end{align}\]

We then define a likelihood for each outcome. We choose to set a Bernoulli distribution for \(z_i\) and a Gamma distribution for \(y_i\):

\[\begin{align} z_{i,t} \sim \text{Binomial}(\pi_{i,t},n_{i,t}=1) \;\;\;\text{and}\;\;\;\; y_{i,t} \sim \text{Gamma}(a_{i,t}, b_{i,t}). \end{align}\]

This setting is equivalent to consider a Hurdle-Gamma model where we have the expected value of the rainfall as \(p_{i,t} + (1-p_{i,t})\mu_{i,t}\) where \(\mu_{i,t}\) is the expected value for the Gamma part. Next, we define the model for \(p_{i,t}\) and \(\mu_{i,t}\).

For the occurrence, the model is for the linear predictor as the logit of the probability as usual for the Bernoulli, specified as

\[\begin{equation} \tag{8.2} \text{logit}(\pi_{i,t}) = \alpha^z + \xi_{i,t} \end{equation}\]

with \(\alpha^z\) being the intercept and \(\xi_i\) coming from a space-time random effect, i.e. a GF modeled through the SPDE approach.

The parameterization of the Gamma distribution in R-INLA considers that E\((y) = \mu = a/b\) and Var\((y) = a/b^2=1/\tau\), where \(\tau\) is the precision parameter. The linear predictor is defined on \(\log(\mu)\) and we have

\[\begin{equation} \tag{8.3} \log(\mu_{i,t}) = \alpha^y + \beta \xi_{i,t} + u_{i,t} \end{equation}\]

where \(\alpha^y\) is the intercept and \(\beta\) the scaling parameter for \(\xi_{i,t}\), which is the space-time effect considered for the occurrence probability, which is being shared in the model for the rainfall amount. The linear predictor affects both the E\((y)\) and Var\((y)\) because \(a=b\mu\) and then \(a/b^2=\mu/b\).

Notice that \(\xi_{i,t}\) will be computed as \(\xi_{i,t}\mathbf{A}\xi_0\), where \(\xi_0\) is the space-time process at the mesh nodes and time points and \(\mathbf{A}\) is the correspondent space-time projector matrix. This is similar for \(u_{i,t}\).

We will consider the model in Cameletti et al. (2013) for both \(\mathbf{\xi}\) and \(\mathbf{u}\). However, we will consider the PC-prior for each one of the three parameters. Thus for the marginal standard deviation and the spatial range we consider the prior proposed in Fuglstad et al. (2018). We set it such that the standard deviation median is 0.5, \(P(\sigma>0.5=0.5)\).

For the practical range we consider the size of the Parana state. First we load the data which also load the Parana border, .

We consider that the coordinates are in longitude and latitude and project them into UTM with units in kilometers as follows:

We have that the Paraná state is around 663.8711 kilometers width by 464.7481 kilometers height. The PC-prior for the practical range is built considering the probability of the practical range being less than a chosen distance. We chose to set the prior considering the median as 100 kilometers.

For the temporal correlation parameter, the first order autoregression parameter, we also consider the PC-prior framework, as in Simpson et al. (2016). We choose to have the correlation one as the base model and set the prior considering P(\(\rho>0.5\))=0.7 as follows:

For the shared parameter \(\beta\) we can set a prior based on some knowledge about the correlation between the rain occurrence and the rainfall amount. We assume a N(0, 1) prior for this parameter as follows

We also have a likelihood parameter to set prior on and, again, we consider the PC-prior framework. Thus we set the prior on the precision choosing a value for \(\lambda\). We consider it equals one as follows:

8.5.2 Rainfall data in Paraná

In this section we consider the rainfall data introduced in Section 2.8. In this data we have the longitude in the first column, the latitude in the second, the altitude in the third and from the fourth column we have the data for each day, as we can see below:

We will consider the first 8 days of data for illustration. The two response variables \(z_i\) and \(y_i\) are defined as follows. First we define the occurrence variable

The rainfall is then defined as

8.5.3 Fitting the model

We have to build a mesh in order to define the SPDE model. We consider all the gauges’ locations in the following code:

And we have the resulting mesh in Figure 8.8.

Mesh for the Paraná state with 138 nodes. Black points denote the 616 rain gauges.

Figure 8.8: Mesh for the Paraná state with 138 nodes. Black points denote the 616 rain gauges.

The SPDE model is defined through

and the corresponding spacetime predictor matrix is given by

We need to define the space-time indices \(\mathbf{\xi}\) in both the linear predictors.

The next step is to organize the data into stacks. First, we create a data stack for the occurrence data bearing in mind that we have the amount data. So, we have the occurrence in the first column of a two-column matrix and the amount of rainfall in the second column:

It is useful to have a stack for prediction at the mesh nodes so it will be easy to map the predicted values later:

We join all the data stacks:

We now set some parameters to supply for the inla() function. The prior for the precision parameter of the Gamma likelihood will go in a list for control family arguments.

Note that the empty list above, i.e., list(), is required and it could be used to pass additional arguments to the Binomial likelihood in the model.

For having a fast approximation of the marginals we use the adaptive approximation strategy (by setting strategy = 'adaptive' below). This strategy mostly uses the Gaussian approximation to avoid the second Laplace approximation in the INLA algorithm, but applies the default strategy for fixed effects and random effects with a length \(\leq\) adaptive.max (see ?control.inla). Additionally we choose to not integrate over the hyperparameters by choosing the Empirical Bayes estimation as int.strategy = 'eb'. These options will be passed in argument control.inla to function inla() when fitting the model and are defined now:

We also consider not to return the marginal distribution for the latent field. Thus we set

We can define the model formula for the model specified. We use the spde object to define the model of the space-time component together with the definition of the prior for the AR(\(1\)) temporal dynamics. In order to define the \(\beta\) parameter of the shared space-time component we set fixed = FALSE to estimate \(\beta\) and insert its prior. In order to achieve fewer iterations during the optimization over the posterior for the hyperparameters we set initial values near the optimal ones as we have run this model previously, and restart the optimization from there. The joint model with the shared space-time component is then fitted as follows:

The model without the shared space-time component is fitted as follows:

Alternatively, the model with only the shared component is fitted as follows:

Sometimes, the CPO is not computed automatically for all the observations. In this case we can use the inla.cpo() function to manually compute it.

We can now perform a model comparison. This can be done using the marginal likelihood, DIC, WAIC or CPO. Because we have two outcomes, we need to account for this with care. The DIC, WAIC and CPO are computed for each observation. Thus, we can sum it for each outcome as follows:

and we can see that the separate model fits slightly better.

8.5.4 Visualizing some results

We extract the useful indices for later use, one for each outcome at the observation locations and one for each outcome at the mesh locations, for all time points:

It may be useful to show maps of the space-time effect at each time, or the probability of rain or the expected value of rainfall. In order to compute it we do need to have the projector from the mesh nodes to a fine grid:

It is better to discard the values interpolated outside the border. Thus we identify those pixels which are outside of the Paraná border:

Figure 8.9 shows the posterior mean of the probability of rain at each time known. It has been produced with the following code:

Posterior mean of the probability of rain at each time knot. Time flows from top to bottom and left to right.

Figure 8.9: Posterior mean of the probability of rain at each time knot. Time flows from top to bottom and left to right.

References

Fuglstad, G-A., D. Simpson, F. Lindgren, and H. Rue. 2018. “Constructing Priors That Penalize the Complexity of Gaussian Random Fields.” Journal of the American Statistical Association to appear. Taylor & Francis. https://doi.org/10.1080/01621459.2017.1415907.

Simpson, D. P., H. Rue, A. Riebler, T. G. Martins, and S. H. Sørbye. 2017. “Penalising Model Component Complexity: A Principled, Practical Approach to Constructing Priors.” Statistical Science 32 (1): 1–28.

West, M., and J. Harrison. 1997. Bayesian Forecasting and Dynamic Models. New York: Springer-Verlag.

Petris, G., S. Petroni, and P. Campagnoli. 2009. Dynamic Linear Models with R. New York: Springer-Verlag.

Gelfand, A. E., H-J Kim, C. F. Sirmans, and S. Banerjee. 2003. “Spatial Modeling with Spatially Varying Coefficient Processes.” Journal of the American Statistical Association 98 (462): 387–96.

Vivar, J. C., and M. A. R. Ferreira. 2009. “Spatiotemporal Models for Gaussian Areal Data.” Journal of Computational and Graphical Statistics 18 (3): 658–74.

Assunção, J. J., D. Gamerman, and R. M. Assunção. 1999. “Regional Differences in Factor Productivities of Brazilian Agriculture: A Space-Varying Parameter Approach.” Universidade Federal do Rio de Janeiro, Statistical Laboratory.

Assunção, R. M., J. E. Potter, and S. M. Cavenaghi. 2002. “A Bayesian Space Varying Parameter Model Applied to Estimating Fertility Schedules.” Statistics in Medicine 21: 2057–75.

Gamerman, D., A. R. B. Moreira, and H. Rue. 2003. “Space-Varying Regression Models: Specifications and Simulation.” Computational Statistics & Data Analysis - Special Issue: Computational Econometrics 42 (3): 513–33.

Knorr-Held, L., and H. Rue. 2002. “On Block Updating in Markov Random Field Models for Disease Mapping.” Scandinavian Journal of Statistics 20: 597–614.

Rue, H., and L. Held. 2005. Gaussian Markov Random Fields: Theory and Applications. Monographs on Statistics & Applied Probability. Boca Raton, FL: Chapman & Hall.

Ruiz-Cárdenas, R., E. T. Krainski, and H. Rue. 2012. “Direct Fitting of Dynamic Models Using Integrated Nested Laplace Approximations — INLA.” Computational Statistics & Data Analysis 56 (6): 1808–28. https://doi.org/http://dx.doi.org/10.1016/j.csda.2011.10.024.

Cameletti, M., F. Lindgren, D. Simpson, and H. Rue. 2013. “Spatio-Temporal Modeling of Particulate Matter Concentration Through the Spde Approach.” Advances in Statistical Analysis 97 (2): 109–31.

Rowlingson, B. S., and P. J. Diggle. 1993. “Splancs: Spatial Point Pattern Analysis Code in S-Plus.” Computers & Geosciences 19 (5): 627–55. https://doi.org/https://doi.org/10.1016/0098-3004(93)90099-Q.

Bailey, T. C., and A. C. Gatrell. 1995. Interactive Spatial Data Analysis. Harlow, UK: Longman Scientific & Technical.

Simpson, D. P., J. B. Illian, F. Lindren, S. H Sørbye, and H. Rue. 2016. “Going Off Grid: Computationally Efficient Inference for Log-Gaussian Cox Processes.” Biometrika 103 (1): 49–70.

Taylor, B. M., T. M. Davies, Rowlingson B. S., and P. J. Diggle. 2013. “lgcp: An R Package for Inference with Spatial and Spatio-Temporal Log-Gaussian Cox Processes.” Journal of Statistical Software 52 (4): 1–40. http://www.jstatsoft.org/v52/i04/.

Bowman, A. W., I. Gibson, E. M. Scott, and E. Crawford. 2010. “Interactive Teaching Tools for Spatial Sampling.” Journal of Statistical Software 36 (13): 1–17. http://www.jstatsoft.org/v36/i13/.