Chapter 9 Smoothing
9.1 Introduction
Many applications in statistics require that the model is flexible in a way that makes the relationship between the response and the covariates non-linear. For example, for Gaussian data observations \(y_i\) can be modeled as
\[ y_i = f(x_i) + \epsilon_i\ i=1,\ldots, n \]
Here, \(\epsilon_i\) represent i.i.d. Gaussian errors with zero mean and precision
\(\tau\). The shape of function \(f()\) is typically not known and it is often
estimated using semi-parametric or non-parametric methods, which are flexible
enough to cope with non-linear effects. Ruppert, Wand, and Carroll (2003) provide a nice
introduction to semiparametric regression, and we will use some of the datasets
in this book to illustrate some examples. Fahrmeir and Kneib (2011) give a thorough
description of Bayesian smoothing methods. Finally, Wang, Ryan, and Faraway (2018), Chapter 7,
cover smoothing with INLA
in detail, including some approaches not mentioned
here.
In this chapter we will introduce some non-parametric estimation methods that
are useful to model complex relationships between the explanatory variables and
the response. As described in Chapter 3, INLA
implements some
latent effects that can be used to smooth effects. The spatial models described
in Chapter 7 can also be regarded as smoothers in two
dimensions.
Splines are covered in Section 9.2. Other types of smooth terms
with INLA
are discussed in Section 9.3. Including smooth
terms using SPDE’s in one dimension is tackled in Section
9.4. Finally, smoothing in non-Gaussian models is described
in Section 9.5.
9.2 Splines
In order to illustrate why smoothing may be required, we will consider the
lidar
dataset available in the package SemiPar
(Wand 2018). Original data
have been analyzed in Holst et al. (1996) and Fahrmeir and Kneib (2011). LIDAR (LIght
Detection And Ranging) is a remote-sensing technique widely used to obtain
measurements of the distribution of atmospheric species.
library("SemiPar")
data(lidar)
Table 9.1 shows the variables in the lidar
dataset, and Figure
9.1 displays the two variables in the dataset in a scatterplot.
These measurements were obtained during a study on the concentration of
atmospheric atomic mercury in an Italian geothermal field (see Holst et al. 1996 for details). There is a clear non-linear dependence between both variables,
which shows the need for a smooth term in the linear predictor.
Variable | Description |
---|---|
range |
Distance traveled before the light is reflected back to its source. |
logratio |
Logarithm of the ratio of received light from two laser sources. |
A first approach to developing a smooth function to fit the lidar
dataset is
by using linear combinations of smooth functions on the covariates. The set
of functions is called a basis of functions. For example, polynomials up to a
certain degree can be easily introduced into the linear predictor by using the
I()
function in the formula. Alternatively, the poly()
function can be
used to compute orthogonal polynomials on a covariate up to a certain degree.
Before fitting the model we will create a synthetic dataset with values of
range
between 390 and 720 (with a step of 5) and NA
values, so that these
are predicted with INLA
(see Section 12.3 for details):
seq(390, 720, by = 5)
xx <-# Add data for prediction
cbind(range = xx, logratio = NA)
new.data <- rbind(lidar, new.data) new.data <-
This new dataset will allow us to obtain estimates of the fitted curve
at regular intervals. Next, a model with a 3-degree polynomial on range
will be fit:
inla(logratio ~ 1 + range + I(range^2) + I(range^3),
m.poly <-data = new.data, control.predictor = list(compute = TRUE))
The fitted values using these models are shown in Figure 9.2. Although the 3-degree polynomial does a good job of capturing the general variation of the curve, polynomials are not a good choice in general because they produce smooth terms that are fit using all the data because they lack a compact support. Hence, polynomials are too rigid to obtain good local smoothing and the smooth term produced will fit different parts of the dataset with different accuracy. This behavior can be observed in Figure 9.2 at the left edge of the plot where the polynomial does not follow the trend of the data but the polynomial itself.
A better fit could be achieved by using piece-wise functions that vary locally (Ruppert, Wand, and Carroll 2003). A simple approach is to consider broken stick functions. These functions are defined on a number of points or knots so that they have two different slopes that join at the knots. For example, taking knot \(\kappa\), a function can be defined such as
\[ (x - \kappa)_{+} = \left\{ \begin{array}{rl} 0 & x < \kappa\\ x - \kappa & x \geq \kappa \end{array} \right. \]
Note that given a set of knots this definition will produce a particular basis that can be used to model the LIDAR data. Note that \(f(x)\) will be defined by a linear combination of the basis functions and that their coefficients will be estimated as part of the model. This basis is called a linear spline basis.
Hence, given a basis function \(\{(x - \kappa_k)\}_{k=1}^K\) defined by a set of knots \(\{\kappa_k\}_{k=1}^K\), the \(f(\cdot)\) function can be represented as
\[ f(x) = \beta_0 + \beta_ 1 x + \sum_{k=1}^K b_k (x - \kappa_k)_{+} \]
Note that the basis here is made of functions
\[ \{1, x, (x - \kappa_1)_{+}, \ldots, (x- \kappa_K)_{+}\} \]
A truncated power basis of degree \(p\) can be obtained by taking the powers of degree \(p\) of functions \((x - \kappa_k)\), i.e., \((x - \kappa_k)^p_{+}\). This will produce smoother functions with continuous \(p\) derivatives in all the domain but \(\kappa_k\). In this case, function \(f(x)\) is defined as
\[ f(x) = 1 + x + \ldots + x^p + \sum_{k=1}^K b_k (x - \kappa_k)^p_{+} \]
Although this is a first approach to producing local smoothing, broken stick functions may not be smooth enough and other type of functions may be required. For example, instead of piece-wise linear functions, other simple non-linear functions could be used.
In particular, splines (Ruppert, Wand, and Carroll 2003) are a popular option when it comes to producing smooth terms with local smoothing. Splines are smooth functions that are defined using piece-wise functions that join at particular points called knots.
Local smoothing is determined by the number and location of the knots, and they
must be chosen with care (Ruppert, Wand, and Carroll 2003). Package splines
(R Core Team 2019a)
includes several functions for the computation of different types of splines
that can be easily combined with INLA
to include smooth terms in the model.
For example, function bs()
will return a B-spline basis matrix for a
polynomial of a given degree. B-splines are preferred to splines because they
are computationally more stable. B-splines have the property that they are
equivalent to the truncated spline basis of the same degree, i.e., both bases
are equivalent and produce the same values in the linear predictor. This means
that they will produce the same values of \(f(x)\) but with different
coefficients, the B-splines coefficient estimation being more numerically
stable (Ruppert, Wand, and Carroll 2003).
In the next example, knots are placed between 400 and 700 at regular
intervals using a step of 10. This ensures that the range of values of
variable range
is covered. The B-spline spline is of degree 3 (i.e.,
the default).
library("splines")
seq(400, 700, by = 10)
knots <- inla(logratio ~ 1 + bs(range, knots = knots),
m.bs3 <-data = new.data, control.predictor = list(compute = TRUE))
Note that this model will estimate one coefficient per knot and that, in this case, there are 31 and the model summary is not shown here.
As an alternative to the knots, it is possible to specify the degrees of freedom (Ruppert, Wand, and Carroll 2003) of the spline. Degrees of freedom are related to how flexible a smooth term is in order to capture the the variability in the data.
Natural cubic splines are implemented in function ns()
. These types of splines
are cubic splines with the added constraint that they are linear at the tail
beyond the boundary knots by imposing that the second and third derivatives at
the boundary knots (i.e., the ones at the extremes) are equal to zero. In the
next example a natural cubic spline with 10 degrees of freedom is used to fit
the data (by setting parameter df
to 10).
inla(logratio ~ 1 + ns(range, df = 10),
m.ns3 <-data = new.data, control.predictor = list(compute = TRUE))
summary(m.ns3)
##
## Call:
## c("inla(formula = logratio ~ 1 + ns(range, df = 10), data =
## new.data, ", " control.predictor = list(compute = TRUE))")
## Time used:
## Pre = 0.274, Running = 0.253, Post = 0.0491, Total = 0.577
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) -0.050 0.032 -0.113 -0.050 0.014 -0.050 0
## 1 -0.010 0.042 -0.093 -0.010 0.073 -0.010 0
## 2 -0.007 0.053 -0.111 -0.007 0.098 -0.007 0
## 3 -0.008 0.048 -0.102 -0.008 0.085 -0.008 0
## 4 0.026 0.051 -0.074 0.026 0.126 0.026 0
## 5 -0.326 0.049 -0.422 -0.326 -0.230 -0.326 0
## 6 -0.554 0.050 -0.652 -0.554 -0.456 -0.554 0
## 7 -0.537 0.049 -0.635 -0.537 -0.440 -0.537 0
## 8 -0.687 0.042 -0.769 -0.687 -0.605 -0.687 0
## 9 -0.656 0.083 -0.819 -0.656 -0.494 -0.656 0
## 10 -0.652 0.038 -0.726 -0.652 -0.578 -0.652 0
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant
## Precision for the Gaussian observations 159.78 15.51 130.83 159.27
## 0.975quant mode
## Precision for the Gaussian observations 191.62 158.25
##
## Expected number of effective parameters(stdev): 11.21(0.021)
## Number of equivalent replicates : 19.72
##
## Marginal log-Likelihood: 173.09
## Posterior marginals for the linear predictor and
## the fitted values are computed
In this case we have summarized the output because the number of
coefficients is simply 10, in accordance
with the degrees of freedom set in function ns()
above. Note that
there is an extra parameter in the linear predictor which is due to the
presence of the intercept.
Figure 9.3 shows both smooth terms estimated using splines. It suggests that it may not be necessary to use as many as 31 when building the spline. The knots used by the second spline are computed using equally spaced quantiles by default:
attr(ns(lidar$range, df = 10), "knots")
## 10% 20% 30% 40% 50% 60% 70% 80% 90%
## 423 456 489 522 555 588 621 654 687
Hence, it is clear that using too many knots will make the smooth term overfit the data, while using too few knots will most likely produce biased estimates (Ruppert, Wand, and Carroll 2003).
The need for more or fewer knots can be assessed by looking at the estimates of the coefficients of the spline. Figure 9.4 shows the posterior means and 95% credible intervals on the 3-degree B-spline used above.
It is clear from that plot that most of the terms in the linear predictor can probably be removed without compromising model fitting. In addition, the values of the coefficients seem to follow the shape of the scatterplot of the data, with values close to zero in the regions where the data are more or less constant. Coefficients are away from zero in the regions where data change shape and show more variability.
Hence, it seems necessary to find a way to constrain or penalize the knots and coefficients of the spline so that it is not unnecessarily complex. This can be achieved by resorting to penalized splines or P-splines (Ruppert, Wand, and Carroll 2003; Hastie, Tibshirani, and Friedman 2009), described in the next section.
9.3 Smooth terms with INLA
As seen above, splines provide a good way to include smooth terms in the linear predictor. However, splines have the problem of the selection of the number and position of the knots. Depending on the number and location, splines may overfit the data. As described in Ruppert, Wand, and Carroll (2003), it may be necessary to constrain the influence of the knots by imposing further constraints on the spline coefficients.
Typically, this can be re-formulated as an optimization problem
\[ \min\ \sum_{i=1}^n(y_i - f(x_i))^2 \]
subject to
\[ \sum_{k=1}^p b_k^2 < C \] with \(C\) being a positive constant that controls the degree of shrinkage of the coefficients. Small values of \(C\) will produce small estimates of the coefficients, while very large values will produce estimates that are close to the maximum likelihood estimates (Hastie, Tibshirani, and Friedman 2009; James et al. 2013). Note that only coefficients \(\{b_k\}_{k=1}^K\) are used in the penalty term.
This is equivalent to minimizing the term
\[ \min \sum_{i=1}^n(y_i - f(x_i))^2 + \lambda (\sum_{k=1}^p b_k^2) \] for a fixed parameter \(\lambda >0\). This parameter can be set using cross-validation (Ruppert, Wand, and Carroll 2003; Hastie, Tibshirani, and Friedman 2009; James et al. 2013).
In a Bayesian framework including the penalty term is equivalent to setting a specific prior on the coefficients of the covariates (see, for example, Fahrmeir and Kneib 2011).
For equidistant knots \(\{(\kappa_0, \ldots, \kappa_K)\}\) the prior is on the differences:
\[ f(\kappa_{i + 1}) - f(\kappa_i) \sim N(0, \tau),\ i = 1,\ldots, K-1 \]
which is equivalent to setting a rw1
prior on \(f(\kappa_i)\).
Alternatively, when the prior is on the second order differences
\[ f(\kappa_{i + 1}) - 2 f(\kappa_i) + f(\kappa_{i-1}) \sim N(0, \tau),\ i=2,\ldots,K \]
This is equivalent to setting a rw2
prior on the coefficients.
Hence, latent effects rw1
and rw2
can be used to include smooth terms on
the covariates. The elements of the latent effect represent the values of the
smooth term.
In the next example, a rw1
latent effect is used. Note that constr
is set
to FALSE
and that, for this reason, the intercept is not included in the
linear predictor.
library("INLA")
inla(logratio ~ -1 + f(range, model = "rw1", constr = FALSE),
m.rw1 <-data = lidar, control.predictor = list(compute = TRUE))
summary(m.rw1)
##
## Call:
## c("inla(formula = logratio ~ -1 + f(range, model = \"rw1\", constr
## = FALSE), ", " data = lidar, control.predictor = list(compute =
## TRUE))" )
## Time used:
## Pre = 0.21, Running = 0.783, Post = 0.0299, Total = 1.02
## Random effects:
## Name Model
## range RW1 model
##
## Model hyperparameters:
## mean sd 0.025quant
## Precision for the Gaussian observations 166.64 17.20 135.01
## Precision for range 4408.48 1358.58 2287.49
## 0.5quant 0.975quant mode
## Precision for the Gaussian observations 165.90 202.69 164.59
## Precision for range 4230.93 7572.85 3894.04
##
## Expected number of effective parameters(stdev): 27.75(4.67)
## Number of equivalent replicates : 7.96
##
## Marginal log-Likelihood: 251.74
## Posterior marginals for the linear predictor and
## the fitted values are computed
Similarly, a rw2
model can be used to fit a spline to the data:
inla(logratio ~ -1 + f(range, model = "rw2", constr = FALSE),
m.rw2 <-data = lidar, control.predictor = list(compute = TRUE))
summary(m.rw2)
##
## Call:
## c("inla(formula = logratio ~ -1 + f(range, model = \"rw2\", constr
## = FALSE), ", " data = lidar, control.predictor = list(compute =
## TRUE))" )
## Time used:
## Pre = 0.195, Running = 0.764, Post = 0.0297, Total = 0.988
## Random effects:
## Name Model
## range RW2 model
##
## Model hyperparameters:
## mean sd 0.025quant
## Precision for the Gaussian observations 159.28 15.80 130.07
## Precision for range 165927.62 51964.52 84509.71
## 0.5quant 0.975quant mode
## Precision for the Gaussian observations 158.65 192.22 157.59
## Precision for range 159291.21 286249.23 146560.77
##
## Expected number of effective parameters(stdev): 20.30(1.68)
## Number of equivalent replicates : 10.89
##
## Marginal log-Likelihood: 318.99
## Posterior marginals for the linear predictor and
## the fitted values are computed
Figure 9.5 shows the lidar
dataset together with the
fitted values under a rw1
and rw2
latent effects. Predictions seem very
similar but, as expected, using a rw2
smooth term provides a smoother
function.
Note that this assumes that all the knots are regularly spaced. For the case of
irregularly placed data, it is possible to use function inla.group()
to bin
data into groups according to the values of the covariate. The way in which
values are grouped can be set with the method
argument, that can be either
"cut"
(the default) or "quantile"
. Option "cut"
splits the data using
equal length intervals, while option "quantile"
uses equidistant quantiles in
the probability space.
For example, to split the data into 20 groups using equally spaced quantiles the code is:
$range.grp <- inla.group(lidar$range, n = 20, method = "quantile")
lidarsummary(lidar$range.grp)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 394 477 546 555 633 716
The binned data can be see in Figure 9.6. Note that this does not guarantee that the new values of range are equally spaced, but it will help to reduce the number of knots used in the estimation of the smoothing effect.
The newly created variable range.grp
can then be used to define the
smooth term in the call to inla()
inside the f()
function. In the next
models, this is employed to fit rw1
and rw2
models to the data:
inla(logratio ~ -1 + f(range.grp, model = "rw1", constr = FALSE),
m.grp.rw1 <-data = lidar, control.predictor = list(compute = TRUE))
summary(m.grp.rw1)
##
## Call:
## c("inla(formula = logratio ~ -1 + f(range.grp, model = \"rw1\",
## constr = FALSE), ", " data = lidar, control.predictor =
## list(compute = TRUE))" )
## Time used:
## Pre = 0.196, Running = 0.418, Post = 0.0269, Total = 0.641
## Random effects:
## Name Model
## range.grp RW1 model
##
## Model hyperparameters:
## mean sd 0.025quant
## Precision for the Gaussian observations 155.87 15.30 127.16
## Precision for range.grp 4194.49 1403.65 2041.25
## 0.5quant 0.975quant mode
## Precision for the Gaussian observations 155.49 187.13 155.13
## Precision for range.grp 3999.05 7506.97 3627.23
##
## Expected number of effective parameters(stdev): 16.30(0.976)
## Number of equivalent replicates : 13.56
##
## Marginal log-Likelihood: 238.13
## Posterior marginals for the linear predictor and
## the fitted values are computed
inla(logratio ~ -1 + f(range.grp, model = "rw2", constr = FALSE),
m.grp.rw2 <-data = lidar, control.predictor = list(compute = TRUE))
summary(m.grp.rw2)
##
## Call:
## c("inla(formula = logratio ~ -1 + f(range.grp, model = \"rw2\",
## constr = FALSE), ", " data = lidar, control.predictor =
## list(compute = TRUE))" )
## Time used:
## Pre = 0.197, Running = 0.496, Post = 0.027, Total = 0.72
## Random effects:
## Name Model
## range.grp RW2 model
##
## Model hyperparameters:
## mean sd 0.025quant
## Precision for the Gaussian observations 154.60 15.25 126.37
## Precision for range.grp 170344.21 54708.73 83859.33
## 0.5quant 0.975quant mode
## Precision for the Gaussian observations 154.02 186.34 153.05
## Precision for range.grp 163722.02 296925.58 150611.60
##
## Expected number of effective parameters(stdev): 18.45(0.498)
## Number of equivalent replicates : 11.98
##
## Marginal log-Likelihood: 273.06
## Posterior marginals for the linear predictor and
## the fitted values are computed
Figure 9.7 shows the smooth terms estimated with the previous two models. They give very similar fits. The plot has also included the original data (in black) and the binned data (in gray). As compared to the previous models seen in Figure 9.5, the models fit to the binned data seem to be less smooth than the previous ones.
9.4 Smoothing with SPDE
For the case of irregularly spaced data or knots, it is possible to
build smooth terms using other latent effects in INLA
. In particular,
the spde
latent effect described in Chapter 7 can also
be used in one dimension. In this case, the effect models a Matérn
process in one dimension with correlation matrix
\[ Cov(d_{ij}) = \frac{1}{\tau}\frac{2^{1-\nu}}{\Gamma(\nu)}(\kappa d_{ij})^{\nu}K_{\nu}(\kappa d_{ij}) \]
Here, \(d_{ij}\) is the distance between observations \(i\) and \(j\), \(\nu > 0\) a smoothness parameter, \(\kappa >0\) a scale parameter, \(\Gamma(\cdot)\) is the Gamma function and \(K_{\nu}(\cdot)\) is the modified Bessel function of the second kind.
This is described in detail in Krainski et al. (2018), Chapter 2. Usually, the smoothness parameter is fixed and only the precision and the effective range, which is \(\sqrt{8\nu} / \kappa\), are estimated. We describe now how to fit this model, for which we will rely on some of the contents described in Chapter 7 to define the model. Note that the focus now is in modeling in one dimension instead of in spatial models (that are defined in two dimensions). Furthermore, being familiar with Krainski et al. (2018) will be helpful now.
First of all, a set of knots to define the spde
model is defined
to create a mesh in one dimension and compute the projector matrix:
inla.mesh.1d(seq(390, 720, by = 20))
mesh1d <- inla.spde.make.A(mesh1d, lidar$range) A1 <-
Next, the spde
model is defined.
inla.spde2.matern(mesh1d, constr = FALSE)
spde1 <- inla.spde.make.index("x", n.spde = spde1$n.spde) spde1.idx <-
In order to fit the model and predict the curve at some points, two data stacks (see Section 7.3) need to be created. The first one includes the data to fit the model:
inla.stack(data = list(y = lidar$logratio),
stack <-A = list(1, A1),
effects = list(Intercept = rep(1, nrow(lidar)), spde1.idx),
tag = "est")
The second stack refers to the points at which the function needs to
be estimated. In this case, the points span the values of range
with a
step of 5 units between each two consecutive values:
# Predict at a finer grid
seq(390, 720, by = 5)
xx <- inla.spde.make.A(mesh1d, xx)
A.xx <- inla.stack(data = list(y = NA),
stack.pred <-A = list(1, A.xx),
effects = list(Intercept = rep(1, length(xx)), spde1.idx),
tag = "pred")
Both data stacks must be put together using function inla.stack()
.
inla.stack(stack, stack.pred) joint.stack <-
The model is defined now. The formula does not include an intercept because it has been included within the data stack in order to make prediction easier to compute:
y ~ -1 + f(x, model = spde1) formula <-
Then, the inla()
function is called to fit the model:
inla(formula, data = inla.stack.data(joint.stack),
m.spde <-control.predictor = list(A = inla.stack.A(joint.stack), compute = TRUE)
)
Figure 9.8 shows the smooth term fitted to the data using the
SPDE model. The fitted curve seems very similar to the ones obtained with
rw1
and rw2
models. However, 95% credible intervals seem to be narrower.
This shows that the hypothesis of constant variance does not hold for
this data. Hence, a family that does not assume a constant variance could
also be used to model this data.
9.5 Non-Gaussian models
So far, models for a Gaussian response have been considered. However, the models presented in this chapter can be easily extended to consider the non-Gaussian case. In particular, generalized linear models with smooth terms can be considered. This means that the mean of the non-Gaussian distribution is linked to the linear predictor using an appropriate function. As an example, we will consider an example based on a binomial likelihood using dose-response data.
Dose-response data are a typical example in which the use of a smooth term to explain the relationship between the dose and the number of events is required. Hence, non-parametric methods may be useful to learn the shape of the function that governs this relationship.
The H.virescens
dataset (Venables and Ripley 2002) in the drc
package
(Ritz et al. 2015) contains data from a dose-response experiment in which moths of the
tobacco budworm (Heliothis virescens) were exposed for three days to
different doses of the pyrethroid trans-cypermethrin (an insecticide). Table
9.2 describes the variables in the dataset.
Variable | Description |
---|---|
dose |
Dose values (in \(\mu\)g). |
numdead |
Number of dead or knock-down moths. |
total |
Total number of moths. |
sex |
Sex (M for male, F for female). |
Data can be loaded and summarized as follows:
library("drc")
data(H.virescens)
levels(H.virescens$sex) <- c("Female", "Male")
summary(H.virescens)
## dose numdead total sex
## Min. : 1.0 Min. : 0.00 Min. :20 Female:6
## 1st Qu.: 2.0 1st Qu.: 3.50 1st Qu.:20 Male :6
## Median : 6.0 Median : 9.50 Median :20
## Mean :10.5 Mean : 9.25 Mean :20
## 3rd Qu.:16.0 3rd Qu.:13.75 3rd Qu.:20
## Max. :32.0 Max. :20.00 Max. :20
Dose-response data are typically non-linear because of the effect of the
drug administered to the subjects in the study. Furthermore, this data
are typically analyzed using a binomial distribution. In the next model,
sex
is included as control variable and sex
is modeled using a
rw1
latent effect:
inla(numdead ~ -1 + f(dose, model = "rw1", constr = FALSE) + sex,
vir.rw1 <-data = H.virescens, family ="binomial", Ntrial = total,
control.predictor = list(compute = TRUE))
summary(vir.rw1)
##
## Call:
## c("inla(formula = numdead ~ -1 + f(dose, model = \"rw1\", constr =
## FALSE) + ", " sex, family = \"binomial\", data = H.virescens,
## Ntrials = total, ", " control.predictor = list(compute = TRUE))")
## Time used:
## Pre = 0.224, Running = 0.0778, Post = 0.0206, Total = 0.322
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## sexFemale -0.532 22.36 -44.43 -0.532 43.34 -0.532 0
## sexMale 0.532 22.36 -43.37 0.531 44.40 0.532 0
##
## Random effects:
## Name Model
## dose RW1 model
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant 0.975quant mode
## Precision for dose 4.07 3.07 0.645 3.27 12.09 1.96
##
## Expected number of effective parameters(stdev): 5.75(0.447)
## Number of equivalent replicates : 2.09
##
## Marginal log-Likelihood: -36.49
## Posterior marginals for the linear predictor and
## the fitted values are computed
Another model using a rw2
model is fit:
inla(numdead ~ -1 + f(dose, model = "rw2", constr = FALSE) + sex,
vir.rw2 <-data = H.virescens, family = "binomial", Ntrial = total,
control.predictor = list(compute = TRUE))
summary(vir.rw2)
##
## Call:
## c("inla(formula = numdead ~ -1 + f(dose, model = \"rw2\", constr =
## FALSE) + ", " sex, family = \"binomial\", data = H.virescens,
## Ntrials = total, ", " control.predictor = list(compute = TRUE))")
## Time used:
## Pre = 0.217, Running = 0.0725, Post = 0.0216, Total = 0.311
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## sexFemale -0.527 22.36 -44.43 -0.528 43.34 -0.527 0
## sexMale 0.527 22.36 -43.38 0.527 44.39 0.527 0
##
## Random effects:
## Name Model
## dose RW2 model
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant 0.975quant mode
## Precision for dose 3608.70 8982.06 39.97 783.39 28970.09 81.64
##
## Expected number of effective parameters(stdev): 4.19(0.646)
## Number of equivalent replicates : 2.87
##
## Marginal log-Likelihood: -25.38
## Posterior marginals for the linear predictor and
## the fitted values are computed
The models fit to the data are displayed in Figure 9.9. Both models provide very similar estimates of the dose-response effect. Note that the fitted values are shown in Figure 9.10.
9.6 Final remarks
This chapter has covered different models for including smooth terms on the covariates in the linear predictor. Smoothing in two dimensions is described in Chapter 7, and some space-time models are also covered in Chapter 8. Wang, Ryan, and Faraway (2018) describe in detail other types of smoothing functions not covered here, such as thin plate splines.
INLA
is flexible enough regarding the types of latent effects that can be
included. Other types of smooth functions can be implemented using the
different frameworks to extend the latent effects implemented in INLA
described in Chapter 11.
References
Fahrmeir, Ludwig, and Thomas Kneib. 2011. Bayesian Smoothing Regression for Longitudinal, Spatial and Event History Data. New York: Oxford University Press.
Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning. 2nd ed. New York: Springer.
Holst, Ulla, Ola Hössjer, Claes Björklund, Pär Ragnarson, and Hans Edner. 1996. “Locally Weighted Least Squares Kernel Regression and Statistical Evaluation of LIDAR Measurements.” Environmetrics 7 (4): 401–16. https://doi.org/10.1002/(SICI)1099-095X(199607)7:4<401::AID-ENV221>3.0.CO;2-D.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning. New York: Springer.
Krainski, Elias T., Virgilio Gómez-Rubio, Haakon Bakka, Amanda Lenzi, Daniela Castro-Camilo, Daniel Simpson, Finn Lindgren, and Håvard Rue. 2018. Advanced Spatial Modeling with Stochastic Partial Differential Equations Using R and INLA. Boca Raton, FL: Chapman & Hall/CRC.
R Core Team. 2019a. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Ritz, C., F. Baty, J. C. Streibig, and D. Gerhard. 2015. “Dose-Response Analysis Using R.” PLOS ONE 10 (e0146021, 12). http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0146021.
Ruppert, David, M. P. Wand, and R. J. Carroll. 2003. Semiparametric Regression. New York: Cambridge University Press.
Venables, W. N., and B. D. Ripley. 2002. Modern Applied Statistics with S. Fourth. New York: Springer.
Wand, Matt. 2018. SemiPar: Semiparametic Regression. https://CRAN.R-project.org/package=SemiPar.
Wang, Xiaofeng, Yu Yue Ryan, and Julian J. Faraway. 2018. Bayesian Regression Modeling with Inla. Boca Raton, FL: Chapman & Hall/CRC.