The purpose of this vignette is to provide users with a step-by-step guide for performing and interpreting the results of a time varying mediation analysis with 2 treatment (exposure) groups and a continuous outcome. Please note, that this package has been built considering the structure of panel data, where each subject/participant has repeated responses (collected over time) for the outcome and mediator variables. However, we do not address dynamic treatment regimens in this package. Therefore, we assume the scenario where the treatment (exposure) is time-invariant (i.e., does not change over time). For more details, please refer to Cai et al., 2022.
For illustration, we rely on an example dataset, simulated based on
the Wisconsin Smokers’ Health Study 2 data (Baker et. al., 2016) which
includes 1086 individuals assigned to one of three treatment conditions.
One-third of participants received only a nicotine patch
;
another one-third received varenicline
, and the final third
of participants received a
combination nicotine replacement therapy (NRT)
which
included nicotine patch + nicotine mini-lozenge. The outcome of interest
is cessation fatigue
; that is, how tired a participant felt
of trying to quit smoking (7-point Likert scale). In addition, mediator
variables were measured by asking participants if they felt a
negative mood in the last fifteen minutes
, and whether they
wanted to smoke in the last 15 minutes
, also recorded on a
7-point Likert scale. Both the outcome and mediator variables were
assessed two times per day for the first two weeks post-quit day
(rendering 30 time points of response since assessments start on day 0
post target quit day), and every other day (2x per day) for weeks 3 and
4 (rendering 14 time points of response).
A traditional approach to analyzing this type of data would be to use mediation analysis in which the effects are assumed to not vary as a function of time. First, a single (i.e., time-invariant) direct effect would be calculated by regressing the outcome on the treatment condition and mediator. Next, a time-invariant indirect effect would be computed by multiplying the effect of treatment condition on the mediator by the effect of the mediator on the outcome. However, this method potentially misses important information about the dynamic effect that a mediator may have over time. Specifically, we hypothesize that mood changes across and within days and thus, its mediating effect on one’s success of quitting smoking is likely to vary over time. We therefore propose a time varying mediation analysis which estimates the mediation effect as a function that varies over time.
The tvma
function was developed for estimating the
mediation effect of two treatment (exposure) groups. To illustrate the
analysis, we have considered the scenario of varenicline
vs. not varenicline
i.e., the subjects taking
nicotine patch
or combination NRT
were
combined into a single group.
To use the time varying mediation analysis package in R, you must
first install the package and load it. Before that, make sure you have
R version 4.0.3
or greater. There are two ways to install
the package from the CRAN (Comprehensive R Archive Network) repository,
by using install.packages
or the devtools
function.
install.packages("tvmediation", dependencies = TRUE)
library(tvmediation)
The equivalent code using devtools
is:
::install_cran("tvmediation", dependencies = TRUE)
devtoolslibrary(tvmediation)
If you do not have devtools
installed and loaded, you
can do so using the following code:
install.packages("devtools", dependencies = TRUE)
library(devtools)
Alternatively, if you want to install the package directly from the GitHub repository to access new or revised functions in development, use the following code:
::install_github("dcoffman/tvmediation", dependencies = TRUE)
devtoolslibrary(tvmediation)
tvma
functionOnce installed, you can type ?tvmediation
in the console
to view the package documentation, as well as links to the important
functions and data included in the package. The time-varying mediation
analysis for continuous outcomes and two exposure groups, relies on two
user functions, tvma
and LongToWide
, as well
as a number of internal functions of the tvmediation
package.
The tvma
function has four required and five optional
arguments.
treatment
A binary vector indicating the treatment
groupt.seq
A numeric vector of the time sequence of the
measuresmediator
The matrix of mediator values in wide
formatoutcome
The matrix of outcome values in wide
formatThe optional arguments are:
t.est
The time sequence for estimation. This is by
default equal to t.seq
.plot
TRUE or FALSE for plotting the mediation effect.
The default value is “FALSE”.CI
“none” or “boot” method of deriving confidence
intervals (CIs). The default value is “boot”.replicates
Number of replicates for bootstrapping CIs.
The default value is 1000.verbose
TRUE or FALSE for printing results to screen.
The default value is “FALSE”.The dataset we will use for our illustration is named
smoker
and is included in the package.
To load the simulated dataset smoker.rda
, type:
data(smoker)
As discussed earlier, the tvma
function only supports
two treatment groups. A separate function tvma_3trt
for
three treatment groups is also available.
The smoker
data is organized in long
format with SubjectID
repeating over multiple rows
for each participant. The tvma
function requires that the
data be in wide format to estimate the time varying
coefficients. The tvmediation
package includes a useful
function LongToWide
to help users properly format their
data for analysis.
LongToWide
has three required arguments and a fourth
optional argument.
subject.id
specifies the column of subject
identifiers.time.sequences
specifies the column of measurement
times.outcome
specifies the column for the variable (either
the outcome or the mediator) that will be transposed.verbose
is an optional argument that if “TRUE” prints
the output of LongToWide
to the console. The default value
is “FALSE”.The output of LongToWide
is a matrix of data in wide
format where columns represent the subjects and rows represent the time
sequence. Thus, each cell contains the j-th subject’s response at the
i-th time point.
The tvma
function requires two matrices, one for the
mediator, and one for the outcome. Thus, we use the
LongToWide
function twice as illustrated below:
<- LongToWide(smoker$SubjectID, smoker$timeseq, smoker$NegMoodLst15min)
mediator <- LongToWide(smoker$SubjectID, smoker$timeseq, smoker$cessFatig) outcome
class(mediator)
## [1] "matrix" "array"
1:16, 1:10] mediator[
## 1 2 3 4 5 6 7 8 9 10
## 0 7 1 1 1 1 NA 4 1 1 3
## 0.5 2 1 1 6 1 NA NA 1 1 1
## 1 1 1 2 NA NA 1 3 1 1 1
## 1.5 1 1 1 1 1 1 2 1 NA 1
## 2 1 1 1 1 NA 1 1 1 1 1
## 2.5 3 1 1 1 1 3 1 1 1 1
## 3 1 6 NA 1 1 1 1 2 NA 1
## 3.5 1 1 2 6 3 7 1 6 1 1
## 4 1 1 1 1 1 1 1 1 2 4
## 4.5 1 1 1 5 1 5 2 NA 1 3
## 5 4 1 2 1 NA 5 2 1 1 1
## 5.5 1 1 2 1 1 2 NA 1 1 2
## 6 2 1 1 4 1 7 1 NA 2 3
## 6.5 1 1 1 1 1 1 1 5 3 1
## 7 1 2 1 NA 1 1 NA 1 1 4
## 7.5 2 2 1 1 1 NA 1 1 1 6
class(outcome)
## [1] "matrix" "array"
1:16, 1:10] outcome[
## 1 2 3 4 5 6 7 8 9 10
## 0 1 6 1 2 1 NA 1 6 7 3
## 0.5 1 3 1 1 1 NA NA 1 7 1
## 1 1 1 1 NA NA 1 1 7 7 1
## 1.5 6 7 6 1 1 6 6 4 NA 7
## 2 7 7 1 1 NA 1 7 1 1 1
## 2.5 1 2 1 7 1 6 1 2 1 7
## 3 4 1 NA 1 7 2 1 4 NA 5
## 3.5 5 3 7 7 6 1 1 7 5 1
## 4 6 3 5 4 7 2 7 5 7 5
## 4.5 7 7 1 7 7 1 7 NA 4 1
## 5 7 7 1 1 NA 1 1 7 1 1
## 5.5 6 5 1 1 1 3 NA 5 7 1
## 6 1 1 1 7 5 1 7 NA 2 1
## 6.5 5 7 5 7 7 1 1 1 7 7
## 7 1 1 1 NA 7 1 NA 1 7 7
## 7.5 5 1 2 1 1 NA 1 1 1 7
If your data are already in wide format, there is no need to use the
LongToWide
function and you can simply subset your dataset.
However, mediator
and outcome
must be of class
matrix
; hence make sure you convert the class of the
subsetted mediator
and outcome
objects to
matrix
before proceeding. This can be done using the R
function as.matrix
.
The tvma
function requires two more variables that we
have not yet created:
treatment
A binary numeric vector indicating the
treatment groupt.seq
A numeric vector of the time sequence of the
measuresWe can create these variables as follows. Because
treatment
is time-invariant, we need only one instance of
each subject’s response for varenicline
. We then convert it
to a numeric value and subtract 1 to yield a vector of zeros and ones,
as shown below.
# Step 1: Since each subject has multiple rows of data, extract the unique response of each subject to receiving varenicline. The data is still in dataframe format.
<- unique(smoker[ , c("SubjectID","varenicline")])
trt1 1:10,] trt1[
## SubjectID varenicline
## 1 1 No
## 2 2 Yes
## 3 3 Yes
## 4 4 No
## 5 5 Yes
## 6 6 Yes
## 7 7 Yes
## 8 8 Yes
## 9 9 Yes
## 10 10 Yes
# Step 2: `2` is assigned to those subjects who received varenicline and `1` to the rest. The data is now in vector format.
<- as.numeric(trt1[ , 2])
trt2 1:10] trt2[
## [1] 2 1 1 2 1 1 1 1 1 1
# Step 3: subtract 1 from these numeric responses to procure a vector of zeros and ones
<- trt2 -1
treatment 1:10] treatment[
## [1] 1 0 0 1 0 0 0 0 0 0
This steps can alternatively be collated into a single step and written as follows:
<- as.numeric(unique(smoker[ , c("SubjectID","varenicline")])[, 2])-1
treatment 1:10] treatment[
## [1] 1 0 0 1 0 0 0 0 0 0
As discussed earlier, the goal in this example is estimating the time
varying effect of varenicline
compared to
not varenicline
on cessation fatigue
, as
mediated via negative mood in the last fifteen minutes
. We
could also look at the effect of varenicline
vs. nicotine patch only
or combination NRT
vs. varenicline
with an additional step of excluding the
subjects belonging to the treatment group that is not of interest, and
then following the steps described above. Please refer to the vignette
for the tvmb
function for an illustration of the steps
involved when considering varenicline
vs. nicotine patch only
or combination NRT
vs. varenicline
or combination NRT
vs. nicotine patch only
.
To generate t.seq
we found the unique instance of each
time point and then sorted from smallest to largest. There are 44 unique
time points in the dataset where 0
after the decimal
indicates the morning measurement and 5
after the decimal
indicates the evening measurement recorded for that day.
<- sort(unique(smoker$timeseq))
t.seq t.seq
## [1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
## [16] 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5
## [31] 16.0 16.5 18.0 18.5 20.0 20.5 22.0 22.5 24.0 24.5 26.0 26.5 28.0 28.5
We are now ready to perform the time varying mediation analysis.
tvma
functionAs discussed earlier, the tvma
function has four
required and five optional arguments.
treatment
A binary vector indicating the treatment
groupt.seq
A numeric vector of the time sequence of the
measuresmediator
The matrix of mediator values in wide
formatoutcome
The matrix of outcome values in wide
formatThe optional arguments are:
t.est
The time sequence for estimation. This is by
default equal to t.seq
.plot
TRUE or FALSE for plotting the mediation effect.
The default value is “FALSE”.CI
“none” or “boot” method of deriving CIs. The default
value is “boot”.replicates
Number of replicates for bootstrapping CIs.
The default value is 1000.verbose
TRUE or FALSE for printing results to screen.
The default value is “FALSE”.We will call the function with the optional arguments
plot=TRUE
and replicates = 250
. We decreased
the number of bootstrap replicates so that this vignette compiles faster
but we suggest using at least 500 replicates in an actual analysis. The
remaining optional arguments are left to their respective default
values.
<- tvma(treatment, t.seq, mediator, outcome,
results plot = TRUE, replicates = 250)
The tvma
function returns a list of results that
include:
hat.alpha
estimated time-varying treatment effect on
the mediatorCI.lower.alpha
CI lower limit for
hat.alpha
CI.upper.alpha
CI upper limit for
hat.alpha
hat.gamma
estimated time-varying direct effect of the
treatment on the outcomeCI.lower.gamma
CI lower limit for
hat.gamma
CI.upper.gamma
CI upper limit for
hat.gamma
hat.beta
estimated time-varying effect of the mediator
on the outcomeCI.lower.beta
CI lower limit for
hat.beta
CI.upper.beta
CI upper limit for
hat.beta
hat.tau
estimated time-varying total treatment effect
on the outcomeCI.lower.tau
CI lower limit for
hat.tau
CI.upper.tau
CI upper limit for
hat.tau
est.M
the time-varying mediation effect of treatment on
the outcomeOptional returns based on argument CI = "boot"
include:
boot.se.m
estimated standard error for the time-varying
mediation effect, est.M
CI.lower
CI lower limit for the time-varying mediation
effect, est.M
CI.upper
CI upper limit for the time-varying mediation
effect, est.M
The above estimates are compiled in a single dataframe which can be
accessed using nameOfStoredResultsObj$Estimates
. For
example, the following line of code displays the estimates for the first
6 time-points.
head(results$Estimates)
## t.est hat.alpha CI.lower.alpha CI.upper.alpha hat.gamma CI.lower.gamma
## 1 0.0 -0.5750034 -0.8311188 -0.2301463 0.3803577 0.01533095
## 2 0.5 -0.5819098 -0.7778734 -0.3247245 0.4592414 0.19035349
## 3 1.0 -0.5880155 -0.7368865 -0.4147780 0.5237820 0.34849485
## 4 1.5 -0.5932360 -0.6993930 -0.4665377 0.5710753 0.43064433
## 5 2.0 -0.5976067 -0.6943369 -0.4981212 0.6003423 0.48483612
## 6 2.5 -0.6012658 -0.6787651 -0.5235710 0.6125481 0.51131341
## CI.upper.gamma hat.beta CI.lower.beta CI.upper.beta hat.tau CI.lower.tau
## 1 0.7054048 0.2880372 0.1696814 0.3728144 -0.17768293 -0.59720557
## 2 0.7240859 0.3033509 0.2336375 0.3667325 -0.10302673 -0.37840782
## 3 0.7051765 0.3128918 0.2636811 0.3646710 -0.04859090 -0.23784019
## 4 0.7230081 0.3192837 0.2778582 0.3633242 -0.01368227 -0.14998674
## 5 0.7281596 0.3243138 0.2907910 0.3625444 0.00357346 -0.11639059
## 6 0.7308975 0.3291172 0.2956149 0.3649070 0.00584576 -0.09965062
## CI.upper.tau est.M boot.se.m CI.lower CI.upper
## 1 0.1159318 -0.1656224 0.04352769 -0.2533944 -0.07832414
## 2 0.1359095 -0.1765229 0.03326558 -0.2425085 -0.10510537
## 3 0.1413164 -0.1839852 0.02562934 -0.2375850 -0.13141895
## 4 0.1637977 -0.1894106 0.02016548 -0.2335141 -0.15143941
## 5 0.1511959 -0.1938121 0.01662509 -0.2234565 -0.16143810
## 6 0.1432878 -0.1978869 0.01459388 -0.2299856 -0.16978784
At each time point of interest, t.est
, which in this
case is equal to t.seq
, the time-varying effects of the
treatment on the mediator, the treatment on the outcome (adjusted and
not adjusted for mediator), and the mediator on the outcome are
estimated along with the respective 95% CIs. The CIs are computed via a
non-parametric bootstrap method (Efron and Tibshirani, 1986), drawing
samples of size 1086 from the original sample with replacement,
estimating the sample mean and then applying the percentile method to
compute the 95% CIs. Note that the CIs for the alpha
,
gamma
, beta
and tau
coefficients
(hat.alpha, hat.gamma, hat.beta, hat.tau)
are computed
regardless of the value of CI
argument in the function.
est.M
is the estimated mediation effect of
varenicline
compared to the other two treatment groups,
that varies over t.est
. For CI = "boot"
(which
is the default option unless the user chooses otherwise) the standard
error of the estimated mediation effect and 95% CI is estimated via a
similar bootstrapping technique described earlier for the other
coefficients.
If plot = TRUE
, the results will also include the
following figures:
Alpha_CI
plot for hat.alpha
with 95% CIs
across timeseq
Gamma_CI
plot for hat.gamma
with 95% CIs
across timeseq
Tau_CI
plot for hat.tau
with 95% CIs
across timeseq
Beta_CI
plot for hat.beta
with 95% CIs
across timeseq
MedEff
plot for est.M
across
timeseq
MedEff_CI
plot for est.M
with 95% CIs
across timeseq
We recommend using the plots to interpret the estimates as it may be
difficult to derive meaning from the numerical values alone. To display
the plots, use nameOfStoredResultsObj$
followed by the name
of the plot to access the required plot accordingly. For example:
$Alpha_CI results
In the above plot, the effect of varenicline
on
subjects’ feeling of
negative mood in the last fifteen minutes
is negative
compared to the other two treatment groups and mostly stable with a
slight decreasing trend over time.
$Gamma_CI results
The above plot shows the time-varying direct effect of
varenicline
on the outcome cessation fatigue
that is positive in the varenicline
group compared to the
other two treatment groups. That is, those assigned to
varenicline
have greater cessation fatigue
than the other two treatment groups. The estimated 95% CI does not cover
0 over time, indicating that the effect is statistically
significant.
$Beta_CI results
In the above plot, the effect of subjects’
negative mood in the last fifteen minutes
on the outcome
cessation fatigue
is increases initially, albeit with a
slight decrease around day 6-7 (end of week 1), and then decreases
beginning around day 20-21 (end of week 3). The effect is positive such
that an increase in negative mood is related to an increase in cessation
fatigue.
$Tau_CI results
The above plot shows the time-varying total effect of
varenicline
on the outcome cessation fatigue
.
The estimated 95% CI covers 0 (no effect) throughout the time period and
thus, the effect is not statistically significant.
$MedEff results
$MedEff_CI results
In the above plot, the effect of varenicline
on
cessation fatigue
that is mediated by
negative mood in the last fifteen minutes
is negative
compared to the patch only or combination NRT group. That is,
varenicline
reduces cessation fatigue due to a decrease in
negative mood. This effect decreases (i.e., becomes stronger) during the
first week and then stabilizes after day 6. Beyond week 2 the effect
begins to decrease again, and then increases somewhat after week 3.
Given that the CIs do not include zero, we can conclude that this effect
is significant.
As discussed earlier, tvma
accepts additional input
arguments that allow the user to modify the analyses. For example,
specifying t.est
can be specified to select only certain
time points at which to make the estimations. The following code will
produce estimates at time points 0.2, 0.4, 0.6, and 0.8 using 250
bootstrap replicates.
<- tvma(treatment, t.seq, mediator, outcome,
results_new t.est = c(0.2, 0.4, 0.6, 0.8), replicates = 250)
results_new
## $Estimates
## t.est hat.alpha CI.lower.alpha CI.upper.alpha hat.gamma CI.lower.gamma
## 1 0.2 -0.5778515 -0.7880755 -0.2547578 0.4133649 0.1274307
## 2 0.4 -0.5805877 -0.7629962 -0.2845173 0.4444837 0.1903534
## 3 0.6 -0.5831995 -0.7478090 -0.3237740 0.4734197 0.2505234
## 4 0.8 -0.5856775 -0.7305131 -0.3588519 0.4999199 0.2955084
## CI.upper.gamma hat.beta CI.lower.beta CI.upper.beta hat.tau CI.lower.tau
## 1 0.7008829 0.2950568 0.2117492 0.3801440 -0.14540866 -0.4905056
## 2 0.7114407 0.3008484 0.2338847 0.3790128 -0.11634281 -0.4079986
## 3 0.7147230 0.3056249 0.2487568 0.3746814 -0.09052425 -0.3633269
## 4 0.7093594 0.3095812 0.2583465 0.3720364 -0.06795276 -0.3160366
## CI.upper.tau est.M boot.se.m CI.lower CI.upper
## 1 0.1191578 -0.1704990 0.04003727 -0.2476922 -0.09160858
## 2 0.1278759 -0.1746689 0.03590878 -0.2458093 -0.10125726
## 3 0.1361883 -0.1782403 0.03225378 -0.2421073 -0.11475191
## 4 0.1425049 -0.1813147 0.02903224 -0.2407598 -0.12509123
The tvma
function computes bootstrap confidence
intervals by default. Therefore, if the user decides to not bootstrap
CIs for the mediation effect by specifying CI = "none"
, but
by mistake also specifies replicates = 250
, the function
will not display an error, but will simply execute without computing the
CIs for mediation effect. Note that the CIs for the effects of the
treatment on the mediator and the mediator on the outcome, and for the
direct and total effects are computed even if the user passes the
argument CI = "none"
.
The tvmediation
package provides a set of functions for
estimating mediation effects that vary over time for both binary and
continuous time-varying outcomes. Currently, the package only allows for
a time-invariant treatment. The mediator and outcome are assumed to be
time-varying, such as intensive longitudinal measurements obtained via
Ecological Momentary Assessment or via wearable and mobile devices. The
development of this tool has widespread application for use in human
behavior research, clinical trials, addiction research, and others by
allowing specification of mediation effects that vary as a function of
time.
Cai X, Coffman DL, Piper ME, Li R. Estimation and inference for the mediation effect in a time-varying mediation model. BMC Med Res Methodol. 2022;22(1):1-12.
Baker TB, Piper ME, Stein JH, et al. Effects of Nicotine Patch vs Varenicline vs Combination Nicotine Replacement Therapy on Smoking Cessation at 26 Weeks: A Randomized Clinical Trial. JAMA. 2016;315(4):371.
B. Efron, R. Tibshirani. Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy. Statistical Science. 1986;1(1):54-75.