Why Margins?

For many types of regression techniques, the coefficients in the model may not be sufficient to adequately figure out the marginal relationship between a covariate and the outcome of a regression (or the error in your estimate). In the simplest case, say you run the following formula in glm: wages ~ age + age^2.

Because the output will include coefficients for both age and age squared, it’s not immediately apparent what the total marginal relationship is between a change in age and wages. Moreover, computing the error in that estimate is a non-trivial problem.

This non-obviousness of marginal relationships is also a problem for even very simple regressions with functional forms that mean that coefficients are not in the base units of the regression. For example, the coefficients of logistic regression are odds ratios, so even simple regressions are not immediately interpretable.

This package reproduces the margins command from Stata, which allows for easy and quick estimation of marginal relationships and the associated error. The error is computed using the delta method, which we detail in a separate vignette.

In this vignette, we detail a few possible use cases of the modmarg package.¹

Use Case 1: Treatment Effects and Subgroup Effects

We want to ascertain the marginal effect of treatment on y while controlling for age. In this first example we’ll use a binned age variable.

library(modmarg)
data(margex)

g <- glm(y ~ as.factor(agegroup)*as.factor(treatment) , data = margex)
summary(g)

## 
## Call:
## glm(formula = y ~ as.factor(agegroup) * as.factor(treatment), 
##     data = margex)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -68.490  -13.922    0.178   14.042   71.678  
## 
## Coefficients:
##                                                Estimate Std. Error t value
## (Intercept)                                     68.6901     0.8921  77.001
## as.factor(agegroup)30-39                        -1.4677     1.3448  -1.091
## as.factor(agegroup)40+                          -9.8679     1.2384  -7.968
## as.factor(treatment)1                           14.6799     1.6316   8.997
## as.factor(agegroup)30-39:as.factor(treatment)1  -2.1209     2.2438  -0.945
## as.factor(agegroup)40+:as.factor(treatment)1    -2.1073     1.9565  -1.077
##                                                Pr(>|t|)    
## (Intercept)                                     < 2e-16 ***
## as.factor(agegroup)30-39                          0.275    
## as.factor(agegroup)40+                         2.27e-15 ***
## as.factor(treatment)1                           < 2e-16 ***
## as.factor(agegroup)30-39:as.factor(treatment)1    0.345    
## as.factor(agegroup)40+:as.factor(treatment)1      0.282    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 416.1914)
## 
##     Null deviance: 1391433  on 2999  degrees of freedom
## Residual deviance: 1246077  on 2994  degrees of freedom
## AIC: 26615
## 
## Number of Fisher Scoring iterations: 2

It’s not at all obvious from these coefficients what the total marginal relationship is between treatment and y.

We can get the predicted margin of y at the various levels of treatment.

modmarg::marg(mod = g, var_interest = "treatment", type = 'levels')

## [[1]]
##           Label   Margin Standard.Error Test.Stat P.Value Lower CI (95%)
## 1 treatment = 0 63.28363      0.5484021  115.3964       0       62.20835
## 2 treatment = 1 76.37699      0.5526036  138.2130       0       75.29347
##   Upper CI (95%)
## 1       64.35892
## 2       77.46051

Or the effect of a unit change in treatment.

modmarg::marg(mod = g, var_interest = "treatment", type = "effects")

## [[1]]
##           Label   Margin Standard.Error Test.Stat      P.Value Lower CI (95%)
## 1 treatment = 0  0.00000      0.0000000       NaN          NaN        0.00000
## 2 treatment = 1 13.09336      0.7785343  16.81796 1.013688e-60       11.56684
##   Upper CI (95%)
## 1        0.00000
## 2       14.61988

Or maybe we want to get treatment effect at several different levels of age. Let’s re-run the regression with continuous age cubed (maybe we’re looking at severity of a disease that’s most prevalent among the young and the old).

Note that you have to use raw = T when using poly(). Otherwise marg will try to create multiple orthogonal vectors from a constant, which doesn’t work so well.

g <- glm(y ~ poly(age, 3, raw = T) * as.factor(treatment) , data = margex)
summary(g)

## 
## Call:
## glm(formula = y ~ poly(age, 3, raw = T) * as.factor(treatment), 
##     data = margex)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -70.52  -13.86   -0.37   13.91   73.13  
## 
## Coefficients:
##                                                Estimate Std. Error t value
## (Intercept)                                   1.118e+02  2.371e+01   4.716
## poly(age, 3, raw = T)1                       -3.741e+00  2.012e+00  -1.860
## poly(age, 3, raw = T)2                        1.077e-01  5.415e-02   1.990
## poly(age, 3, raw = T)3                       -1.083e-03  4.649e-04  -2.329
## as.factor(treatment)1                        -4.282e+01  3.677e+01  -1.165
## poly(age, 3, raw = T)1:as.factor(treatment)1  4.973e+00  2.997e+00   1.659
## poly(age, 3, raw = T)2:as.factor(treatment)1 -1.392e-01  7.788e-02  -1.788
## poly(age, 3, raw = T)3:as.factor(treatment)1  1.244e-03  6.486e-04   1.918
##                                              Pr(>|t|)    
## (Intercept)                                  2.52e-06 ***
## poly(age, 3, raw = T)1                         0.0630 .  
## poly(age, 3, raw = T)2                         0.0467 *  
## poly(age, 3, raw = T)3                         0.0199 *  
## as.factor(treatment)1                          0.2442    
## poly(age, 3, raw = T)1:as.factor(treatment)1   0.0972 .  
## poly(age, 3, raw = T)2:as.factor(treatment)1   0.0739 .  
## poly(age, 3, raw = T)3:as.factor(treatment)1   0.0552 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 404.2444)
## 
##     Null deviance: 1391433  on 2999  degrees of freedom
## Residual deviance: 1209499  on 2992  degrees of freedom
## AIC: 26530
## 
## Number of Fisher Scoring iterations: 2

modmarg::marg(mod = g, var_interest = "treatment", type = "effects",
              at = list("age" = c(20, 40, 60)))

## $`age = 20`
##           Label   Margin Standard.Error Test.Stat      P.Value Lower CI (95%)
## 1 treatment = 0  0.00000       0.000000       NaN          NaN       0.000000
## 2 treatment = 1 10.90186       3.278277  3.325484 0.0008932985       4.473953
##   Upper CI (95%)
## 1        0.00000
## 2       17.32976
## 
## $`age = 40`
##           Label   Margin Standard.Error Test.Stat      P.Value Lower CI (95%)
## 1 treatment = 0  0.00000       0.000000       NaN          NaN        0.00000
## 2 treatment = 1 12.96123       1.128904  11.48125 6.862963e-30       10.74772
##   Upper CI (95%)
## 1        0.00000
## 2       15.17474
## 
## $`age = 60`
##           Label   Margin Standard.Error Test.Stat      P.Value Lower CI (95%)
## 1 treatment = 0  0.00000       0.000000       NaN          NaN        0.00000
## 2 treatment = 1 23.06926       3.492633  6.605119 4.683001e-11       16.22105
##   Upper CI (95%)
## 1        0.00000
## 2       29.91746

Use Case 2: Logistic Regression

Let’s say we want to figure out how much treatment increased the likelihood of a binary outcome.

g <- glm(outcome ~ as.factor(treatment), data = margex, family = binomial)
summary(g)

## 
## Call:
## glm(formula = outcome ~ as.factor(treatment), family = binomial, 
##     data = margex)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.7754  -0.7754  -0.4069  -0.4069   2.2507  
## 
## Coefficients:
##                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           -2.44999    0.09554  -25.64   <2e-16 ***
## as.factor(treatment)1  1.40222    0.11221   12.50   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2732.1  on 2999  degrees of freedom
## Residual deviance: 2551.5  on 2998  degrees of freedom
## AIC: 2555.5
## 
## Number of Fisher Scoring iterations: 5

Those coefficients are odds ratios. It’s really unclear what the marginal relationship is.

marg(mod = g, var_interest = "treatment", type = 'levels')

## [[1]]
##           Label     Margin Standard.Error Test.Stat       P.Value
## 1 treatment = 0 0.07943925    0.006986952  11.36966  5.921906e-30
## 2 treatment = 1 0.25965379    0.011313052  22.95170 1.416955e-116
##   Lower CI (95%) Upper CI (95%)
## 1     0.06574508     0.09313343
## 2     0.23748062     0.28182697

marg(mod = g, var_interest = "treatment", type = "effects")

## [[1]]
##           Label    Margin Standard.Error Test.Stat      P.Value Lower CI (95%)
## 1 treatment = 0 0.0000000     0.00000000       NaN          NaN      0.0000000
## 2 treatment = 1 0.1802145     0.01329672  13.55331 7.573387e-42      0.1541535
##   Upper CI (95%)
## 1      0.0000000
## 2      0.2062756

Aha! It’s an 18 percentage point increase in the likelihood of a positive outcome from treatment and the effect is highly statistically significant. Much more interpretable.

Use Case 3: Getting Margins At Specific Values of Variable of Interest

Let’s say we want to know the marginal predicted level at only a couple of age groups while controlling for distance. marg makes that simple.

Note that unlike the at option, which takes a named list of values, at_var_interest takes just an unnamed vector.

g <- glm(y ~ poly(distance, 2, raw = T) * as.factor(agegroup) , data = margex)
summary(g)

## 
## Call:
## glm(formula = y ~ poly(distance, 2, raw = T) * as.factor(agegroup), 
##     data = margex)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -72.507  -14.911    0.076   14.434   73.228  
## 
## Coefficients:
##                                                        Estimate Std. Error
## (Intercept)                                           7.269e+01  9.098e-01
## poly(distance, 2, raw = T)1                           8.555e-02  1.112e-01
## poly(distance, 2, raw = T)2                          -1.175e-04  1.552e-04
## as.factor(agegroup)30-39                             -4.524e-01  1.386e+00
## as.factor(agegroup)40+                               -6.516e+00  1.239e+00
## poly(distance, 2, raw = T)1:as.factor(agegroup)30-39  6.289e-02  1.520e-01
## poly(distance, 2, raw = T)2:as.factor(agegroup)30-39 -1.053e-04  2.124e-04
## poly(distance, 2, raw = T)1:as.factor(agegroup)40+    1.008e-02  1.243e-01
## poly(distance, 2, raw = T)2:as.factor(agegroup)40+   -2.796e-05  1.732e-04
##                                                      t value Pr(>|t|)    
## (Intercept)                                           79.888  < 2e-16 ***
## poly(distance, 2, raw = T)1                            0.770    0.442    
## poly(distance, 2, raw = T)2                           -0.757    0.449    
## as.factor(agegroup)30-39                              -0.326    0.744    
## as.factor(agegroup)40+                                -5.260 1.54e-07 ***
## poly(distance, 2, raw = T)1:as.factor(agegroup)30-39   0.414    0.679    
## poly(distance, 2, raw = T)2:as.factor(agegroup)30-39  -0.496    0.620    
## poly(distance, 2, raw = T)1:as.factor(agegroup)40+     0.081    0.935    
## poly(distance, 2, raw = T)2:as.factor(agegroup)40+    -0.161    0.872    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 452.6186)
## 
##     Null deviance: 1391433  on 2999  degrees of freedom
## Residual deviance: 1353782  on 2991  degrees of freedom
## AIC: 26870
## 
## Number of Fisher Scoring iterations: 2

unique(margex$agegroup)

## [1] "40+"   "20-29" "30-39"

marg(mod = g, var_interest = "agegroup", type = 'levels',
          at_var_interest = c("20-29"))

## [[1]]
##              Label   Margin Standard.Error Test.Stat P.Value Lower CI (95%)
## 1 agegroup = 20-29 73.42933      0.9075403  80.91028       0       71.64987
##   Upper CI (95%)
## 1        75.2088

Use Case 4: Getting Margins with Different Variance-Covariance Matrices

Normally we assume that the amount of variation in our outcome of interest (conditional on covariates) is constant across our sample. Sometimes, this assumption is violated, and we will use a different variance-covariance matrix to represent this heterogeneity in variance (heteroskedasticity). Creating these variance-covariance matrices is beyond the scope of this package. However, they can be used with marg to correct standard errors and p values in predicted levels or effects.

Let’s say we want to get the predicted levels of an outcome for different treatments, but we want to cluster our standard errors by the arm variable. We estimate the model, and then create the “clustered” variance-covariance matrix separately (see this script for one method to do this). This code and example replicate the vce(cluster arm) option in Stata.

We can use the vcov_mat option to pass a custom variance-covariance matrix to modmarg. Because this is an OLS model, the degrees of freedom for the T test must also be corrected. Here we’re using Stata’s default correction of ngroups - 1, where ngroups is the number of unique values in the clustering variable. Notice the standard errors and p values increase substantially.

data(cvcov)
g <- glm(outcome ~ treatment + distance, data = margex, family = 'gaussian')
summary(g)

## 
## Call:
## glm(formula = outcome ~ treatment + distance, family = "gaussian", 
##     data = margex)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -0.27237  -0.26920  -0.09337  -0.09123   0.91224  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.378e-02  9.625e-03   9.743  < 2e-16 ***
## treatment    1.786e-01  1.323e-02  13.508  < 2e-16 ***
## distance    -2.314e-04  3.646e-05  -6.346 2.55e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.1311316)
## 
##     Null deviance: 422.64  on 2999  degrees of freedom
## Residual deviance: 393.00  on 2997  degrees of freedom
## AIC: 2424
## 
## Number of Fisher Scoring iterations: 2

v <- cvcov$ols$clust
print(v)

##               (Intercept)    treatment1      distance
## (Intercept)  2.970431e-03 -4.945889e-04 -5.960823e-06
## treatment1  -4.945889e-04  7.952403e-04  5.335865e-07
## distance    -5.960823e-06  5.335865e-07  1.225712e-08

d <- cvcov$ols$stata_dof
print(d)

## [1] 2

# Without clustering
marg(mod = g, var_interest = "treatment", type = "levels")

## [[1]]
##           Label    Margin Standard.Error Test.Stat       P.Value Lower CI (95%)
## 1 treatment = 0 0.0802249    0.009356981  8.573802  1.579014e-17     0.06187815
## 2 treatment = 1 0.2588702    0.009344512 27.702918 1.362609e-150     0.24054793
##   Upper CI (95%)
## 1     0.09857166
## 2     0.27719254

# With clustering
marg(mod = g, var_interest = "treatment", type = "levels",
          vcov_mat = v, dof = d)

## [[1]]
##           Label    Margin Standard.Error Test.Stat    P.Value Lower CI (95%)
## 1 treatment = 0 0.0802249     0.04810472  1.667714 0.23730671    -0.12675299
## 2 treatment = 1 0.2588702     0.04671881  5.541028 0.03106062     0.05785542
##   Upper CI (95%)
## 1      0.2872028
## 2      0.4598851

Modmarg Usage

2020-11-21

Why Margins?

Use Case 1: Treatment Effects and Subgroup Effects

Use Case 2: Logistic Regression

Use Case 3: Getting Margins At Specific Values of Variable of Interest

Use Case 4: Getting Margins with Different Variance-Covariance Matrices