Chapter 11 Implementing New Latent Models
11.1 Introduction
In previous chapters we have covered several latent effects implemented
in the INLA
package. These include a number of important and popular
latent effects that are widely used. However, it may be the case that
an unimplemented latent effect is required in our model. Given that
most of the latent effects are implemented in a mix of C and R code, it
is difficult to add new latent effects to the INLA
package directly.
The INLA methodology can handle many more latent effects than those included in
the INLA
package. For this reason, we will describe now how to add new latent
effects. In particular, the spatial autoregressive model described in Section
11.2 will be considered. The implementation of this new
latent effect will be done considering three different approaches. In Section
11.3 we will cover the rgeneric
latent effect that allows
new latent effects implemented in R
code. In Section 11.4 we
describe a novel approach to fit unimplemented latent effects by means of
Bayesian model averaging (BMA, Hoeting et al. 1999; Gómez-Rubio, Bivand, and Rue 2019). Finally, in Section
11.5 we describe how to combine INLA and MCMC
(Gómez-Rubio and HRue 2018) to increase the number of models that can be fitted using
INLA.
11.2 Spatial latent effects
As a case study, we will show how to implement a spatial model. In particular, we will consider the conditionally autoregressive (CAR) specification (Banerjee, Carlin, and Gelfand 2014). Now, the latent effects have a Gaussian distribution with zero mean and precision matrix
\[ \Sigma = \tau (I - \rho W) \]
Here, \(\tau\) is a precision hyperparameter, \(\rho\) is a spatial autocorrelation parameter and \(W\) is a binary adjacency matrix. \(\Sigma\) is a very sparse matrix given that both \(I\) and \(W\) are very sparse matrices.
Note that matrix \(W\) must be provided. This can be easily obtained
by loading some spatial data from R
. For example, we will consider
the North Carolina SIDS dataset that was analyzed in Section 2.3.2.
However, a spatial analysis will be considered now. First of all, a shapefile
with the data will be loaded and the expected number of cases, standardized
mortality ratios and proportion of non-white births per county in the period
will be computed:
library("spdep")
library("rgdal")
readOGR(system.file("shapes/sids.shp", package="spData")[1])
SIDS <-proj4string(SIDS) <- CRS("+proj=longlat +ellps=clrk66")
#Expected cases
$EXP74 <- SIDS$BIR74 * sum(SIDS$SID74) / sum(SIDS$BIR74)
SIDS#Standardised Mortality Ratio
$SMR74 <- SIDS$SID74 / SIDS$EXP74
SIDS#Proportion of non-white births
$PNWB74 <- SIDS$NWBIR74 / SIDS$BIR74 SIDS
Figure 11.1 shows the standardized mortality ratio (i.e., observed divided by expected cases) in each county in North Carolina.
Next, a binary adjacency matrix needs to be computed using function
poly2nb
:
#Adjacency matrix
poly2nb(SIDS)
adj <- as(nb2mat(adj, style = "B"), "sparseMatrix") W <-
This model is well defined when \(\rho\) is in the range \((1/\lambda_{min}, 1/\lambda_{max})\), where \(\lambda_{min}\) and \(\lambda_{max}\) are the minimum and maximum eigenvalues of \(W\), respectively. Hence, we will compute this first:
eigen(W)$values
e.values <- min(e.values)
rho.min <- max(e.values) rho.max <-
In order to have the spatial autocorrelation parameter constrained to be lower than 1 (see, for example, Haining 2003 for details), the adjacency matrix will be re-scaled by dividing by \(\lambda_{max}\):
#Re-scale adjacency matrix
W / rho.max W <-
Usually, the range of the spatial autocorrelation parameter will be considered to be \((0, 1)\), so that it can be interpreted similarly as a correlation parameter.
11.3 R implementation with rgeneric
The rgeneric
latent effect is a mechanism implemented in the INLA
package
to allow users to implement latent effects in R
. This is possible because R
can be embedded into other front-end programs in a way that allows R
code to
be run from C
(R Core Team 2019b). A detailed description of this feature can be accessed by
typing inla.doc('rgeneric')
.
The definition of a new latent effect requires the definition of the latent effect as a GMRF. This means that the mean and precision matrix of the GRMF need to be defined, as well as the hyperparameters \(\bm\theta\) involved and their prior distributions.
Specifying the latent effect as a GMRF also requires a binary representation of
the precision matrix to exploit conditional independence properties. i.e., a
‘graph’. This is simply a matrix of the same dimension as the precision matrix
with entries equal to zero where the precision matrix is zero and 1 otherwise.
When passing this graph matrix to define an rgeneric
latent effect, the
precision matrix can be used as well regardless of whether it is binary or not.
To sum up, in order to define a new latent effect, the following need to be
defined using several R
functions:
The mean of the latent effects \(\mu(\theta)\).
The precision of the latent effects \(Q(\theta)\).
A ‘graph’, with a binary representation of the precision matrix.
The initial values of \(\theta\).
A log-normalizing constant.
The log-prior of \(\theta\).
Before starting coding all this, it is important to provide an internal representation of the hyperparameters that make numerical optimization easier. As a rule, it is good to have a reparameterization of the parameters so that the internal parameters are not bounded as this will simplify computations. In particular, we will use \(\theta_1 = \log(\tau)\) and \(\theta_2 = logit(\rho)\).
INLA
will internally work with parameters \((\theta_1, \theta_2)\). For this
reason, we will need first to define the following function to convert the
hyperparameters from the internal scale to the model scale,
i.e, to obtain parameters \((\tau, \rho)\) from \((\theta_1, \theta_2)\):
\[ \tau = \exp(\theta_1) \]
\[ \rho = \frac{1}{1 + \exp(-\theta_2)} \]
A variable theta
is defined by INLA
in the code to store \(\theta = (\theta_1, \theta_2)\). So, the following function interpret.theta
will take
the parameters in the internal scale and return the precision and spatial
autocorrelation parameters:
function() {
interpret.theta <-return(
list(prec = exp(theta[1L]),
rho = 1 / (1 + exp(-theta[2L])))
) }
Next, we will define the ‘graph’, which essentially defines what entries of
the precision matrix are non-zero. Note that this is a matrix of the same size
as the precision matrix and that \(W\) must be passed as a sparse matrix (as
defined in package Matrix
) and the returned matrix must be sparse too.
function(){
graph <-require(Matrix)
return(Diagonal(nrow(W), x = 1) + W)
}
The precision matrix is defined in a similar way:
function() {
Q <-require(Matrix)
interpret.theta()
param <-
return(param$prec * (Diagonal(nrow(W), x = 1) - param$rho * W) )
}
The mean can be defined very easily now because it is zero:
function()
mu =
{return(numeric(0))
}
The logarithm of the normalizing constant in this case is the typical normalizing constant of a multivariate Gaussian distribution:
function() {
log.norm.const <- interpret.theta()
param <- nrow(W)
n <-
param$prec * (Diagonal(nrow(W), x = 1) - param$rho * W)
Q <-
n * (-0.5 * log(2 * pi)) +
res <- 0.5 * Matrix::determinant(Q, logarithm = TRUE)
return(res)
}
Note that INLA
will compute this normalizing constant if numeric(0)
is
returned, so this can be omitted.
The log-prior must be specified in another function. For precision \(\tau\), we will be using a gamma distribution with parameters \(1\) and \(0.00005\), and for \(\rho\) a uniform distribution on \((0,1)\):
function() {
log.prior <- interpret.theta()
param =
dgamma(param$prec, 1, 5e-05, log = TRUE) + log(param$prec) +
res <- log(1) + log(param$rho) + log(1 - param$rho)
return(res)
}
Note that the extra terms that appear in the definition of the log-density
of the prior are due to the change of variable involved. INLA
works
with \((\theta_1,\theta_2)\) internally, but the prior is set on \((\tau, \rho)\).
See Chapter 5 for details on how to set the prior properly.
Finally, a function to set the initial values of the parameters in the internal scale must be provided:
function() {
initial <-return(rep(0, 0))
}
This implies that the initial values of \(\tau\) and \(\rho\) are \(1\) and \(0.5\), respectively.
A quit()
function is called when all computations are finished before
exiting the C
code. In this case, we will simply return nothing.
function() {
quit <-return(invisible())
}
The actual definition of the latent effect is done via
function inla.rgeneric.define
. This function takes as first argument
the functions defined before and some extra arguments needed to evaluate
the different functions:
'inla.rgeneric.CAR.model' <- function(
cmd = c("graph", "Q", "mu", "initial", "log.norm.const",
"log.prior", "quit"),
theta = NULL) {
#Internal function
function() {
interpret.theta <-return(
list(prec = exp(theta[1L]),
rho = 1 / (1 + exp(-theta[2L])))
)
}
function(){
graph <-require(Matrix)
return(Diagonal(nrow(W), x = 1) + W)
}
function() {
Q <-require(Matrix)
interpret.theta()
param <-
return(param$prec * (Diagonal(nrow(W), x = 1) - param$rho * W) )
}
function()
mu <-
{return(numeric(0))
}
function() {
log.norm.const <-return(numeric(0))
}
function() {
log.prior <- interpret.theta()
param =
dgamma(param$prec, 1, 5e-05, log = TRUE) + log(param$prec) +
res <- log(1) + log(param$rho) + log(1 - param$rho)
return(res)
}
function() {
initial <-return(c(0, 0))
}
function() {
quit <-return(invisible())
}
do.call(match.arg(cmd), args = list())
res <-return(res)
}
Then, the model is defined to be used by INLA
using function
inla.rgeneric.define
as follows:
library("INLA")
inla.rgeneric.define(inla.rgeneric.CAR.model, W = W) CAR.model <-
Note that inla.rgeneric.define()
takes as first argument the function that
defines the latent effect followed by a sequence of named arguments with
variables that are required in the computation of the latent effect. In this
case, matrix W
is the adjacency matrix required by the CAR latent effect but
more arguments could follow when needed.
The model can now be fitted as follows:
$idx <- 1:nrow(SIDS)
SIDS
SID74 ~ 1 + f(idx, model = CAR.model)
f.car <-
inla(f.car, data = as.data.frame(SIDS), family = "poisson",
m.car <-E = SIDS$EXP74)#, control.inla = list(tolerance = 1e-20, h = 1e-4))
A summary of the model can be obtained as usual:
summary(m.car)
##
## Call:
## c("inla(formula = f.car, family = \"poisson\", data =
## as.data.frame(SIDS), ", " E = SIDS$EXP74)")
## Time used:
## Pre = 0.405, Running = 2.59, Post = 0.0271, Total = 3.02
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) 0.034 0.109 -0.172 0.03 0.26 0.022 0
##
## Random effects:
## Name Model
## idx RGeneric2
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant 0.975quant mode
## Theta1 for idx 2.20 0.331 1.562 2.19 2.86 2.16
## Theta2 for idx 2.30 1.163 0.108 2.27 4.67 2.15
##
## Expected number of effective parameters(stdev): 36.73(6.24)
## Number of equivalent replicates : 2.72
##
## Marginal log-Likelihood: -241.04
The marginals of the hyperparameters are in the internal scale and they need to be transformed:
inla.tmarginal(exp, m.car$marginals.hyperpar[[1]])
marg.prec <- inla.tmarginal(function(x) { 1/(1 + exp(-x))},
marg.rho <-$marginals.hyperpar[[2]]) m.car
Figure 11.2 shows the posterior marginals of \(\tau\) and \(\rho\).
These show a strong spatial autocorrelation. Finally, summary statistics on
these parameters can be obtained with inla.zmarginal
(see Section 2.6):
inla.zmarginal(marg.prec, FALSE)
## Mean 9.49098
## Stdev 3.25664
## Quantile 0.025 4.78851
## Quantile 0.25 7.16016
## Quantile 0.5 8.91091
## Quantile 0.75 11.1926
## Quantile 0.975 17.4427
inla.zmarginal(marg.rho, FALSE)
## Mean 0.867327
## Stdev 0.122387
## Quantile 0.025 0.52924
## Quantile 0.25 0.818303
## Quantile 0.5 0.906084
## Quantile 0.75 0.955611
## Quantile 0.975 0.990365
Posterior means of the random effects have been plotted in a map in Figure 11.3. It shows the underlying spatial pattern of the risk from SIDS. It should be mentioned that the spatial pattern of the proportion of non-white births is very similar to the one shown in Figure 11.3 and that is why when this covariate is included in the model (as discussed in Section 2.3.2) it accounts for most of the overdispersion in the data and random effects are not required any more.
$CAR <- m.car$summary.random$idx[, "mean"]
SIDSspplot(SIDS, "CAR", col.regions = rev(inferno(16)))
11.4 Bayesian model averaging
Bivand, Gómez-Rubio, and HRue (2014) describe a novel approach to fit new models with INLA
.
Their approach is based on conditioning on one of the hyperparameters in the
model and fit conditional models on several values of this parameter. Then,
all models are combined using Bayesian model averaging (BMA, Hoeting et al. 1999)
to obtain the posterior marginals of all the parameters in the model.
Bivand, Gómez-Rubio, and HRue (2014) use this approach to fit a number of spatial econometrics
models, which are available in package INLABMA
(Bivand, Gómez-Rubio, and Rue 2015).
Recently, Gómez-Rubio, Bivand, and Rue (2019) have extended the work in Bivand, Gómez-Rubio, and HRue (2014)
to consider a wider family of spatial econometrics models.
In the case of the CAR specification, it is worth noting that, given \(\rho\),
the resulting model can be implemented using a generic0
latent effect because
the structure of the precision matrix is known and only hyperparameter \(\tau\)
needs to be estimated. By giving values to \(\rho\) in a bounded interval the
posterior marginal of \(\rho\) can be computed as well by Bayesian model
averaging.
First of all, we will create a function that will take a value of \(\rho\) and the adjacency matrix \(W\) and fit the CAR model conditional on \(\rho\):
function(formula, d, rho, W, ...) {
car.inla <-
#Create structure of precision matrix I - rho * W
Matrix::Diagonal(nrow(W), x = 1) - rho * W
IrhoW <-
#Add CAR index
$CARidx <- 1:nrow(d)
d
update(formula, . ~ . + f(CARidx, model = "generic0",
formula <-Cmatrix = IrhoW))
inla(formula, data = d, ..., control.compute = list(mlik = TRUE),
res <-control.inla = list(tolerance = 1e-20, h = 1e-4))
#Update mlik
determinant(IrhoW)$modulus
logdet <-$mlik <- res$mlik + logdet / 2
res
return(res)
}
Note that the log-marginal likelihood needs to be corrected to account for the
structure of the precision matrix and this is why term logdet / 2
is added to
the value of the marginal likelihood. By default, INLA
will not compute it in
this case because it assumes that the structure of the precision matrix is
known. Hence, its determinant is constant and does not need to be computed and
only the precision hyperparameter \(\tau\) is accounted for when computing the
marginal likelihood. Furthermore, control.inla
has been used to set the
values of tolerance
and h
to obtain accurate estimates of the marginal
likelihood.
Next, a regular grid of points in the domain of \(\rho\) where to evaluate the model is required. In this case, the domain is the interval \((0, 1)\) and the grid points will be denoted by \(\{\rho^{(i)}\}_{i=1}^N\), where \(N\) is the number of points. For this particular case, as we already know that the region of high probability of the posterior marginal of \(\rho\) is very close to one, the grid will be made of 100 points from \(0.5\) to \(1\):
seq(0.5, 1, length.out = 100) rho.val <-
It should be noted that values very close to 1 may cause computational problems because the precision matrix becomes almost singular.
Next, we evaluate the model at all these points. Although
this involves fitting a large number of models, these can be fitted in parallel
using function mclapply
. In the next lines, the parallel
library
will be loaded and set up to use 4 cores before fitting the conditional
models:
library("parallel")
options(mc.cores = 4)
Model fitting will be in parallel and we set each parallel INLA
process
to use a single core (by setting num.threads
to 1):
mclapply (rho.val, function(rho) {
car.models <-car.inla(SID74 ~ 1, as.data.frame(SIDS), rho, W,
family = "poisson", E = SIDS$EXP74,
num.threads = 1)
})
When INLA
is used to fit several models in parallel on the same machine it is
convenient to set the number of threads used by each INLA
process to a value
that ensures that no more cores than available are used. For example, in the
previous example, the mclapply
loop will be using 4 cores (as set in
mc.cores
), which means that 4 INLA
processes will be run. Each one of these
processes will be using 1 core at a time (as set in num.threads
), so that
altogether 4 cores are used. Similarly, if the total number of cores is 8,
then num.threads
could be set to 2 so that each INLA
process takes
2 cores and the total number of cores used is 8.
Then, we have a number of conditional models on \(\rho\) that we can combine to obtain the desired model. These models will be combined using Bayesian model averaging, which is the appropriate way of combining models.
Note that, given any particular value of \(\rho\), the fitted conditional model
with INLA
will provide the conditional marginals of the parameters,
\(\pi(\cdot \mid \rho)\), and the conditional marginal likelihood \(\pi(\mathbf{y} \mid \rho)\).
The posterior marginal of \(\rho\) can be obtained as:
\[ \pi(\rho \mid \mathbf{y}) = \frac{\pi(\mathbf{y} \mid \rho)\pi(\rho)}{\pi(\mathbf{y})} \propto \pi(\mathbf{y} \mid \rho)\pi(\rho) \]
This can be computed from the previous output because \(\pi(\mathbf{y} \mid\rho)\) is the conditional marginal likelihood for a conditional model on \(\rho\) and \(\pi(\rho)\) is the value of the prior on \(\rho\) (which is known and equal to 1 in this case). \(\pi(\mathbf{y})\) is a constant that can be ignored as the posterior marginal can be re-scaled to integrate 1.
In practice, we will estimate the value of the posterior marginals at the
points of the grid, fit a smooth function and then re-scale the resulting
function to integrate 1, so that we have a proper density function. Here, this
is done by function INLABMA
but function inla.merge()
could have also been
used (as shown in the next section):
library("INLABMA")
INLABMA(car.models, rho.val, log(1)) car.bma <-
The posterior marginal of \(\rho\) estimated with this method can be seen in Figure 11.5, which also includes two other estimates using different approaches explained below.
If required, the log-marginal likelihoods (conditional on \(\rho\)) from the fit models can be easily obtained with:
#Obtain log-marginal likelihoods
unlist(lapply(car.models, function(X){X$mlik[1, 1]})) mliks <-
The posterior marginals for the remainder of latent effects and hyperparameters in the model can be computed by integrating \(\rho\) out. For example, for a hyperparameter \(\theta_t\) (and similarly for any latent effect) this would mean:
\[ \pi(\theta_t \mid \mathbf{y}) = \int \pi(\theta_t , \rho \mid \mathbf{y}) d\rho = \int \pi(\theta_t \mid \mathbf{y}, \rho) \pi(\rho \mid \mathbf{y}) d\rho \]
Taking a set of values of \(\rho\) in a grid \(\{\rho^{(i)}\}_{i=1}^N\), this can be approximated by the following expression (Bivand, Gómez-Rubio, and HRue 2014):
\[ \pi(\theta_t \mid \mathbf{y}) \simeq \sum_{i=1}^N \pi(\theta_t \mid \mathbf{y}, \rho^{(i)}) w_i \]
with weights \(w_i\) equal to
\[ w_i = \frac{\pi(\mathbf{y} \mid \rho^{(i)}) \pi(\rho^{(i)}) } {\sum_{i=1}^N \pi(\mathbf{y}\mid \rho^{(i)}) \pi(\rho^{(i)})} \]
Note the approximation to \(\pi(\theta_t \mid \mathbf{y})\) can be regarded as Bayesian model averaging (Hoeting et al. 1999), as it is a weighted sum of conditional posterior marginals from different models.
Function INLAMH()
computes this model average to obtain the marginals of the
other hyperparameters and latent effects in the model. See Bivand, Gómez-Rubio, and Rue (2015)
for details. The summaries of the coefficients of the fixed effects and the
hyperparameters can be obtained as follows:
# Fixed effects
$summary.fixed car.bma
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) 0.0336 0.1011 -0.1717 0.03573 0.227 0.03983 6.082e-12
It is worth mentioning that function inla.merge()
can also compute the
average of the posterior marginals from different models (see next section). In
this case, each model has a different weight \(w_i\) that must be passed to the
inla.merge()
function in order to obtain the right model.
11.5 INLA within MCMC
Finally, the third approach that will be discussed to use INLA
to fit models
with unimplemented latent effects is based on Gómez-Rubio and HRue (2018).
They point out that the approach in Bivand, Gómez-Rubio, and HRue (2014) is useful for a very small
number of hyperparameters with bounded support. However, when the
hyperparameters do not have a bounded support or there are many of them,
simple numerical approximations will be difficult to implement.
For this reason, Gómez-Rubio and HRue (2018) propose the use of INLA within
the Metropolis-Hastings algorithm so that the hyperparameter space is
conveniently
explored. The main idea is to split the set of hyperparameters into two
groups, so that conditional on one of the subsets the resulting model can be
easily fitted with INLA
. This is essentially what we have done in
Section 11.4 when fitting models conditional on \(\rho\).
Function INLAMH
, in package INLABMA
, implements the INLA within
Metropolis-Hastings algorithm. It requires a function to fit the model
conditional on some of the hyperparameters, two functions (that implement the
proposal distribution) to sample new values of the hyperparameters and compute
its density, and the prior of the hyperparameters.
Similarly as in Section 11.4, the models fitted with INLA
will be conditional on \(\rho\), and the marginal distribution of \(\rho\) will
be obtained from the samples from the Metropolis-Hastings algorithm.
The function to fit a model conditional on \(\rho\) is the following:
function(d, x){
fit.inla <-#Rho
x
rho <-
car.inla(SID74 ~ 1, d, rho, W,
res <-family = "poisson", E = d$EXP74)
return(list(model.sim = res, mlik = res$mlik[1, 1]))
}
This function returns a list with the fit model (model.sim
) and the
estimate of the marginal likelihood (mlik
).
The function used to propose new values of \(\rho\), in the logit-scale, will be
a Gaussian centered at the current value with a small precision. For this,
function rlogitnorm
from package logitnorm
(Wutzler 2018) will be used.
By using this function we avoid dealing with changes in the scale
of the parameters (see Chapter 5 for details).
library("logitnorm")
#Sample values of rho
function(x, sigma = 0.15) {
rq.rho <-rlogitnorm(1, mu = logit(x), sigma = sigma)
}
#Log-density of proposal distribution
function(x, y, sigma = 0.15, log = TRUE) {
dq.rho <-dlogitnorm(y, mu = logit(x), sigma = sigma, log = log)
}
Here, x
represents the current value of \(\rho\), y
the proposed value
and sigma
is the standard deviation of the proposal distribution. Note also
that the density is returned in the log-scale.
Finally, we need to set the prior on \(\rho\):
#Log-density of prior
function(rho, log = TRUE) {
prior.rho <-return(dunif(rho, log = log))
}
Once all the required functions have been defined, the INLAMH
function
can be called to run the INLA with MCMC algorithm. In the next
lines, the starting point for \(\rho\) is set to 0.95:
INLAMH(as.data.frame(SIDS), fit.inla, 0.95, rq.rho, dq.rho,
inlamh.res <-
prior.rho,n.sim = 100, n.burnin = 100, n.thin = 5, verbose = TRUE)
The resulting object inlamh.res
is a list with three elements:
acc.sim
, indicates whether the proposed value of \(\rho\) was accepted or not, and it is useful to compute the acceptance rate.model.sim
is a list with the conditional models fitted withINLA
.b.sim
is a list with the values of \(\rho\). In this case, each element in the list is a vector of length one, but this can take many other forms.
A simple way to put all the samples from \(\rho\) together is the following:
unlist(inlamh.res$b.sim) rho.sim <-
Figure 11.4 displays the samples from the posterior of \(\rho\) drawn by INLA within MCMC and a kernel density estimate of the posterior marginal.
In addition, summary statistics can be obtained from these samples:
#Posterior mean
mean(rho.sim)
## [1] 0.8956
#Posterior st. dev.
sd(rho.sim)
## [1] 0.07529
#Quantiles
quantile(rho.sim)
## 0% 25% 50% 75% 100%
## 0.6886 0.8432 0.9157 0.9560 0.9930
Finally, the posterior marginals of all the other parameters of interest can be obtained from the output generated by the INLA within MCMC algorithm. Bayesian model averaging can be used on the conditional marginals and all fitted models are assumed to have the same weight, as these are associated with independent samples from the posterior distribution of \(\rho\).
11.6 Comparison of results
In the previous sections three different methods to implement new
latent effects have been described. Figure 11.5 displays the
different
posterior marginals of \(\rho\) computed using these three different approaches.
Marginals obtained with rgeneric
and INLA within MCMC are very close,
and the one obtained by BMA is not far from those. In all cases,
marginals seem to favor values of \(\rho\) very close to the upper bound
of its support. This is a well known behavior for the proper CAR model when
data have a strong spatial pattern that is not accounted for by the covariates.
Having estimates close to the boundaries of the support of the parameter can be problematic. In particular, BMA may struggle. The reason may be that when the parameter is close to the boundary of its support INLA finds it difficult to obtain accurate estimates of the marginal likelihood required to combine all the conditional models together.
11.7 Final remarks
In this chapter we have considered three different options to use INLA
to fit models with unimplemented latent effects. The rgeneric
latent effect
provides an adequate way of making computations completely within the
INLA
package. The other two options, based on BMA and MCMC, are alternative
approaches that can be more flexible as to the types of variables that they can
handle.
For example, INLA assumes that all hyperparameters are continuous in some interval. BMA and INLA within MCMC can be used to fit models in which the hyperparameters are discrete. This is discussed, for example, in Chapter 13 where fitting mixture models with INLA is considered and Chapter 12, where the imputation of missing covariates is discussed.
In addition, INLA within MCMC can be regarded as a novel way of implementing
MCMC algorithms. As seen in the previous sections, it makes implementing
MCMC algorithms easy as it focuses in on a small number of hyperparameters
and all the others are integrated out by INLA
.
Finally, the rgeneric
framework can be used to develop multivariate latent
random effects. This requires a model with different likelihoods and a
definition of the latent effect that works on data in more than one likelihood
of the model. Palmí-Perales, Gómez-Rubio, and Martínez-Beneito (2019) show how to implement multivariate latent
spatial effects for multivariate spatial data and provide different
applications in disease mapping. Similarly, Gómez-Rubio, Cameletti, and Blangiardo (2019)
describe how to implement new latent effects to deal with missing values
in the covariates and impute them.
References
Banerjee, S., B. P. Carlin, and A. E. Gelfand. 2014. Hierarchical Modeling and Analysis for Spatial Data. 2nd ed. Boca Raton, FL: Chapman & Hall/CRC.
Bivand, Roger S., Virgilio Gómez-Rubio, and HRue. 2014. “Approximate Bayesian Inference for Spatial Econometrics Models.” Spatial Statistics 9: 146–65.
Bivand, Roger S., Virgilio Gómez-Rubio, and Håvard Rue. 2015. “Spatial Data Analysis with R-INLA with Some Extensions.” Journal of Statistical Software 63 (20): 1–31. http://www.jstatsoft.org/v63/i20/.
Gómez-Rubio, Virgilio, Roger S. 2019. “Bayesian model averaging with the integrated nested Laplace approximation.” arXiv E-Prints, November, arXiv:1911.00797. http://arxiv.org/abs/1911.00797.
Gómez-Rubio, Virgilio, Michela Cameletti, and Marta Blangiardo. 2019. “Missing Data Analysis and Imputation via Latent Gaussian Markov Random Fields.” arXiv E-Prints, December, arXiv:1912.10981. http://arxiv.org/abs/1912.10981.
Gómez-Rubio, Virgilio, and HRue. 2018. “Markov Chain Monte Carlo with the Integrated Nested Laplace Approximation.” Statistics and Computing 28 (5): 1033–51.
Haining, Robert. 2003. Spatial Data Analysis: Theory and Practice. Cambridge University Press.
Hoeting, Jennifer, David Madigan, Adrian Raftery, and Chris Volinsky. 1999. “Bayesian Model Averaging: A Tutorial.” Statistical Science 14: 382–401.
Palmí-Perales, Francisco, Virgilio Gómez-Rubio, and Miguel A. Martínez-Beneito. 2019. “Bayesian Multivariate Spatial Models for Lattice Data with INLA.” arXiv E-Prints, September, arXiv:1909.10804. http://arxiv.org/abs/1909.10804.
R Core Team. 2019b. Writing R Extensions. https://cran.r-project.org/doc/manuals/R-exts.html.
Wutzler, Thomas. 2018. Logitnorm: Functions for the Logitnormal Distribution. https://CRAN.R-project.org/package=logitnorm.