In this example we demonstrate how to set correlation structures for latent variables and random effects in gllvm. The species abundances are often correlated in time or spatially in community ecology. In gllvm, such correlation structures can now be set either for latent variables or comunity level random row effects.
Denote the abundance of the \(j\)th species (\(j\in \{1,\dots, p\}\)) at the \(i\)th site (\(i\in \{1,\dots, n\}\)) as \(y_{ij}\). A set of k environmental variables, or experimental treatments, may also be recorded at each site and stored in the vector \(\boldsymbol{x}_i = (x_{i1}, ..., x_{ik})\). A common form for the GLLVM is then defined for the mean abundance \(\mu_{ij}\):
\[g(\mu_{ij}) = \eta_{ij} = r_{s_a(i)} + \beta_{0j} + \boldsymbol{x}_i'\boldsymbol{\beta}_j + \boldsymbol{u}_{s_u(i)}'\boldsymbol{\theta}_j,\]
Now we can specify a correlation structure for latent variables \(\boldsymbol{u}_{s_u(i)} = ({u}_{1,s_u(i)}, \dots,
{u}_{d,s_u(i)})'\) (\(d=\)
number of latent variables, defined by num.lv
in
gllvm) and/or community level row effects \(r_{s_r(i)}\). The subscripts \(s_u(i)\) defines the structure of the
latent variables. For instance, assume that we have a hierarchical
sampling design with \(n_s<n\) sites
and sites are sampled \(n_t\) times in
consecutive years. Thus we could want to include a site level latent
variables such that \(s_u(i) = k\) if
sampling unit \(i\) is from site \(k\) (\(k\in
\{1,\dots, n_s\}\)). Similarly, we can set a structure for the
row effects defining subscript \(s_r(i)\), eg. for time \(s_r(i) = t\) if sampling unit \(i\) is from sampling year \(t\) (\(t\in
\{1,\dots, n_t\}\)). Then we can define the correlation
structures for latent variables and/or row effects: denote \(\boldsymbol{u}_{q.} = (u_{q1}, ...,
u_{qn_s})\) \[
{u}_{q.} \sim N(\boldsymbol{0}, \Sigma_q), \, q=1,\dots, d.
\] The covariance matrix \(\Sigma_q\) defines the correlation
structure and has unit variances on diagonal. The options are identity
(independent), AR(1) correlation, exponentially decaying correlation
structure and Matern correlation. Currently we can only use the same
structure for all latent variables \(q=1,\dots,d\), but the parameters defining
the strength of the correlation is estimated uniquely for each latent
variable.
Similarly, we can define the correlation structure for random row effect, denote \(\boldsymbol{r} = (r_{1}, ..., r_{n_t})\), and set \[ {r} \sim N(\boldsymbol{0}, \Sigma_r), \] where the covariance matrix \(\Sigma_r\) is a \(n_t \times n_t\) matrix and defines the correlation structure for random row effect. For row effects we have the same options as we have for latent variables. The structure for the row effects can be different from the structure of the latent variables and this is illustrated in the case study below.
As an example, we use time series of sessile algal and invertebrate species from permanent transects (Arkema et al. 2009, Reed and Miller (2023)). Data consists of percent cover of 132 species from 11 sites which each have 2-8 transects, yearly recorded from 2000 to 2021. It also includes measurements of the environment; the number of stripes of giant kelp and percent rock. In total, data has 836 sampling units.
Species percent cover matrix is in Yabund
, for the
analyses we consider only sessile invertebrates and include species
observed at least 10 times. The rarest species are modeled as a sum.
## AB AL AMZO ANSP AR ARUD AS ATM AU BA BAEL BCAL BF BLD BN
## 1 0 0 0 0.0125 0 0 0.1125 0 0.0125 0.0000 0 0 0.0000 0 0.0125
## 2 0 0 0 0.0000 0 0 0.0000 0 0.0000 0.0000 0 0 0.0125 0 0.0000
## 3 0 0 0 0.0000 0 0 0.0750 0 0.0000 0.0000 0 0 0.0000 0 0.0000
## 4 0 0 0 0.0000 0 0 0.0125 0 0.0000 0.0000 0 0 0.0125 0 0.0000
## 5 0 0 0 0.0000 0 0 0.0000 0 0.0000 0.0000 0 0 0.0000 0 0.0000
## 6 0 0 0 0.0000 0 0 0.0000 0 0.0000 0.0000 0 0 0.0000 0 0.0000
## 7 0 0 0 0.0000 0 0 0.0000 0 0.0000 0.0000 0 0 0.0000 0 0.0000
## 8 0 0 0 0.0000 0 0 0.0000 0 0.0000 0.0000 0 0 0.0000 0 0.0000
## 9 0 0 0 0.0000 0 0 0.0500 0 0.0000 0.0125 0 0 0.0000 0 0.0000
## 10 0 0 0 0.0000 0 0 0.0250 0 0.0000 0.0000 0 0 0.0125 0 0.0000
## BO BOW BPSE BR BRA
## 1 0 0 0 0.0250 0.0375
## 2 0 0 0 0.0250 0.0125
## 3 0 0 0 0.0375 0.0000
## 4 0 0 0 0.0000 0.0000
## 5 0 0 0 0.0125 0.0000
## 6 0 0 0 0.0000 0.0125
## 7 0 0 0 0.0000 0.0000
## 8 0 0 0 0.0000 0.0000
## 9 0 0 0 0.1000 0.0000
## 10 0 0 0 0.1000 0.0875
Yinvert <- Yabund[SPinfo$GROUP == "INVERT"]
Yinvert10 = Yinvert[,colSums(Yinvert>0, na.rm = TRUE)>9]
# Sum species that were observed less than 9 times,
Yinvert10$Sum_inv = rowSums(Yinvert[,(colSums(Yinvert>0, na.rm = TRUE)<=9)], na.rm = TRUE)
There is information about the species:
## SP_CODE GROUP COMMON_NAME SCIENTIFIC_NAME TAXON_KINGDOM
## 1 AB INVERT Coarse Sea Fir Hydroid Abietinaria spp. Animalia
## 2 AL INVERT Aggregating cup Coral Astrangia haimei Animalia
## 3 AMZO ALGAE <NA> Amphiroa beauvoisii Plantae
## 4 ANSP INVERT Aggregating anemone Anthopleura spp. Animalia
## 5 AR INVERT Sand Tunicate Eudistoma psammion Animalia
## 6 ARUD INVERT Octocoral Discophyton rudyi Animalia
## TAXON_PHYLUM TAXON_CLASS TAXON_ORDER TAXON_FAMILY TAXON_GENUS
## 1 Cnidaria Hydrozoa Leptothecata Sertulariidae Abietinaria
## 2 Cnidaria Anthozoa Scleractinia Rhizangiidae Astrangia
## 3 Rhodophyta Florideophyceae Corallinales Lithophyllaceae Amphiroa
## 4 Cnidaria Anthozoa Actiniaria Actiniidae Anthopleura
## 5 Chordata Ascidiacea Aplousobranchia Polycitoridae Eudistoma
## 6 Cnidaria Anthozoa Alcyonacea Alcyoniidae Discophyton
SPinfo10 = SPinfo[SPinfo$SP_CODE %in% colnames(Yinvert10),]
SPinfo10 = rbind(SPinfo10,c("Sum_inv","INVERT", rep(NA, ncol(SPinfo10)-2)))
And finally, information about the study design (site name, year, transect index) and environmental coefficients (average number of kelp fronds, percent rock). The distribution of kelp fronds is very skewed to the right so a log of the number of kelp fronds is used as a covariate instead. In addition, covariates are scaled.
## SITE YEAR TRANSECT KELP_FRONDS PERCENT_ROCKY
## 1 ABUR 2001 1 13.8750 35.0
## 2 ABUR 2001 2 0.0000 37.5
## 3 ABUR 2002 1 8.5125 10.0
## 4 ABUR 2002 2 0.3875 2.5
## 5 ABUR 2003 1 0.6000 0.0
## 6 ABUR 2003 2 0.0000 0.0
par(mfrow=c(1,2))
hist(Xenv$KELP_FRONDS, main = "KELP FRONDS")
Xenv$logKELP_FRONDS = log(Xenv$KELP_FRONDS+1)
Xenv$logKELP_FRONDSsc = scale(Xenv$logKELP_FRONDS)
Xenv$PERCENT_ROCKYsc = scale(Xenv$PERCENT_ROCKY)
hist(Xenv$logKELP_FRONDSsc, main = "log of KELP FRONDS")
For modeling percent cover data, has three options, beta for values (0,1), ordered beta for [0,1] and beta hurdle model for [0,1). Here we use beta-hurdle response model.