The puspose of this note is to make a connection between R functions and the mathematical notation of linear and (non)linear mixed models. One goal is to produce a table that maps R functions with notation for these models.
This document is a work in progress.
A linear model can be expressed as
\[ y = X \beta + \epsilon \]
Typically, we assume that the errors are normally distributed, independent and they have constant variance.
\[ \epsilon \sim N(0, \sigma^2) \]
Note: The covariance of the errors \(Cov(\mathbf{\epsilon})\) where \(\mathbf{\epsilon}\) is an \(n \times 1\) vector is \(\sigma^2 \mathbf{I}_n\).
In R we can fit the following model
In order to extract the estimate for \(\beta\), this is \(\hat{\beta}\), we can use the function coef. In order to extract the estiamte for \(\sigma\) (again, strictly \(\hat{\sigma}\)), we can use the function sigma. In order to extract the covariance of \(\hat{\beta}\), this is \(Cov(\hat{\beta})\), we can use the function vcov.
The covariance for \(\hat{\beta}\) is defined as
\[ Cov(\hat{\beta}) = \hat{\sigma}^2 (\mathbf{X'X})^{-1} \]
Obtaining this matrix is important for simulation as it provides information on the variances for the parameters in the model as well as the covariances.
## (Intercept) x
## 0.8059534 2.1388860
## [1] 0.4212005
## (Intercept) x
## (Intercept) 0.009551574 0.002697391
## x 0.002697391 0.010682887
There are other functions in R which are also useful. The function fitted extracts the fitted values or \(\hat{y} = \mathbf{X}\hat{\beta}\), model.matrix extracts \(\mathbf{X}\) and residuals extracts \(\hat{e}\) (or resid).
In addition, in the nlraa package I have the function ‘var_cov’ which extracts the complete variance of the residuals. In this case it is a simple diagonal matrix multiplied by \(\hat{\sigma}^2\).
## Extract the variance covariance of the residuals (which is also of the response)
lm.vc <- var_cov(fit)
## Print just the first 5 observations
round(lm.vc[1:5,1:5],2)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.18 0.00 0.00 0.00 0.00
## [2,] 0.00 0.18 0.00 0.00 0.00
## [3,] 0.00 0.00 0.18 0.00 0.00
## [4,] 0.00 0.00 0.00 0.18 0.00
## [5,] 0.00 0.00 0.00 0.00 0.18
## Visualize the matrix. This is a 20 x 20 matrix.
## It is easier to visualize in the log-scale
## Note: log(0) becomes -Inf, but not shown here
image(log(lm.vc[,ncol(lm.vc):1]))
In the previous image, each orange square represents the estimate of the residual variance (\(\sigma^2\)) and there are 20 squares because there are 20 observations.
It might seem silly for a model such as this one to extract a matrix which is simlpy a diagonal identity matrix multiplied by the scalar \(\hat{\sigma}^2\), but this will become more clear as we move into more complex models.
So far,
r.function | beta | cov.beta | sigma | X | y.hat | e | cov.e |
---|---|---|---|---|---|---|---|
lm | coef | vcov | sigma | model.matrix | fitted | residuals | var_cov |
Note: The models that can be fitted using gls should not be confused with the ones fitted using glm, which are generalized linear models.
The function gls in the nlme package fits linear models, but in which the covariance matrix of the residuals (also called errors) is more flexible.
The model is still
\[ y = X \beta + \epsilon \]
But now the errors have a more flexible covariance structure.
\[ \epsilon \sim N(0, \Sigma) \]
How do we extract \(\Sigma\) from a fitted model?
We will again use the function var_cov (there is a function in nlme called getVarCov, but there are many cases which it does not handle).
For the next example I will use the ChickWeight dataset.
## ChickWeight example
data(ChickWeight)
ggplot(data = ChickWeight, aes(Time, weight)) + geom_point()
Clearly, the variance increases as the weight increases and it would be a good approach to consider this in the modeling process. The code below fits the variance of the residuals (or the response) as a function of the fitted values. We could choose and evaluate different variance functions and determine which one is better, but for the purpose of illustrating the connection between R functions and mathematical notation this is enough.
## One possible model is
fit.gls <- gls(weight ~ Time, data = ChickWeight,
weights = varPower())
fit.gls.vc <- var_cov(fit.gls)
## Note: the function getVarCov fails for this object
## Visualize the first few observations
fit.gls.vc[1:10,1:10]
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 3.530572 0.00000 0.00000 0.0000 0.0000 0.0000 0.000 0.000
## [2,] 0.000000 15.12026 0.00000 0.0000 0.0000 0.0000 0.000 0.000
## [3,] 0.000000 0.00000 47.53714 0.0000 0.0000 0.0000 0.000 0.000
## [4,] 0.000000 0.00000 0.00000 122.3036 0.0000 0.0000 0.000 0.000
## [5,] 0.000000 0.00000 0.00000 0.0000 273.3773 0.0000 0.000 0.000
## [6,] 0.000000 0.00000 0.00000 0.0000 0.0000 550.6203 0.000 0.000
## [7,] 0.000000 0.00000 0.00000 0.0000 0.0000 0.0000 1023.508 0.000
## [8,] 0.000000 0.00000 0.00000 0.0000 0.0000 0.0000 0.000 1785.058
## [9,] 0.000000 0.00000 0.00000 0.0000 0.0000 0.0000 0.000 0.000
## [10,] 0.000000 0.00000 0.00000 0.0000 0.0000 0.0000 0.000 0.000
## [,9] [,10]
## [1,] 0.000 0.000
## [2,] 0.000 0.000
## [3,] 0.000 0.000
## [4,] 0.000 0.000
## [5,] 0.000 0.000
## [6,] 0.000 0.000
## [7,] 0.000 0.000
## [8,] 0.000 0.000
## [9,] 2955.968 0.000
## [10,] 0.000 4688.938
## Store for visualization
## only the first 12 observations
vc2 <- fit.gls.vc[1:12,1:12]
round(vc2,0)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## [1,] 4 0 0 0 0 0 0 0 0 0 0 0
## [2,] 0 15 0 0 0 0 0 0 0 0 0 0
## [3,] 0 0 48 0 0 0 0 0 0 0 0 0
## [4,] 0 0 0 122 0 0 0 0 0 0 0 0
## [5,] 0 0 0 0 273 0 0 0 0 0 0 0
## [6,] 0 0 0 0 0 551 0 0 0 0 0 0
## [7,] 0 0 0 0 0 0 1024 0 0 0 0 0
## [8,] 0 0 0 0 0 0 0 1785 0 0 0 0
## [9,] 0 0 0 0 0 0 0 0 2956 0 0 0
## [10,] 0 0 0 0 0 0 0 0 0 4689 0 0
## [11,] 0 0 0 0 0 0 0 0 0 0 7173 0
## [12,] 0 0 0 0 0 0 0 0 0 0 0 8767
## The variance increases as weight increases
## Visualize the variance-covariance of the errors
## This is only for the first 12 observations
## It is easier to visualize in the log scale
## Note: log(0) becomes -Inf, but not shown here
image(log(vc2[,ncol(vc2):1]),
main = "First 12 observations, Cov(resid)")
## For all observations
image(log(fit.gls.vc[,ncol(fit.gls.vc):1]),
main = "All observations, Cov(resid)")
Since not only the variance is increasing, but also the data comes from different chick and is thus correlated, we could fit the following model.
## Adding the correlation
fit.gls2 <- gls(weight ~ Time, data = ChickWeight,
weights = varPower(),
correlation = corCAR1(form = ~ Time | Chick))
## Extract the variance-covariance of the residuals
## Note: getVarCov fails for this object
fit.gls2.vc <- var_cov(fit.gls2)
## Visualize the first few observations
round(fit.gls2.vc[1:13,1:13],0)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## [1,] 89 129 178 236 302 377 462 555 658 770 890 954 0
## [2,] 129 194 267 353 452 565 692 832 985 1152 1333 1428 0
## [3,] 178 267 379 501 642 803 982 1181 1400 1637 1893 2029 0
## [4,] 236 353 501 684 877 1096 1341 1613 1911 2235 2584 2769 0
## [5,] 302 452 642 877 1160 1449 1774 2133 2527 2956 3418 3663 0
## [6,] 377 565 803 1096 1449 1869 2287 2750 3259 3811 4408 4723 0
## [7,] 462 692 982 1341 1774 2287 2888 3473 4115 4813 5566 5964 0
## [8,] 555 832 1181 1613 2133 2750 3473 4310 5106 5972 6907 7400 0
## [9,] 658 985 1400 1911 2527 3259 4115 5106 6241 7300 8443 9046 0
## [10,] 770 1152 1637 2235 2956 3811 4813 5972 7300 8809 10188 10916 0
## [11,] 890 1333 1893 2584 3418 4408 5566 6907 8443 10188 12158 13026 0
## [12,] 954 1428 2029 2769 3663 4723 5964 7400 9046 10916 13026 14177 0
## [13,] 0 0 0 0 0 0 0 0 0 0 0 0 89
## Visualize the variance-covariance of the errors
## On the log-scale
## Reorder and select
vc2.36 <- fit.gls2.vc[1:36,1:36]
image(log(vc2.36[,ncol(vc2.36):1]),
main = "Covariance matrix of residuals \n for the first three Chicks (log-scale)")
Let’s plot the data, by Chick. this shows that there is a high temporal correlation as we would expect. Chicks which have a higher weight (than others) at given time, tend to still be higher at a later time.