# P-value functions: Tutorial using the `pvaluefunctions` package

## Accompanying paper

We published an accompanying paper to illustrate the use of p-value functions:

Infanger D, Schmidt-Trucksäss A. (2019): P value functions: An underused method to present research results and to promote quantitative reasoning. Statistics in Medicine, 1-9. doi: 10.1002/sim.8293.

### Recreation of the graphics in the paper

The code and instructions to reproduce all graphics in our paper can be found in the following GitHub repository: https://github.com/DInfanger/pvalue_functions

## Overview

This vignette shows how to use the `pvaluefunctions` package wich contains an R function to create graphics of p-value functions, confidence distributions, confidence densities, or the Surprisal value (S-value) (Greenland 2019).

## Installation

Install (`install.packages("pvaluefunctions")`) and load the package from CRAN.

``library(pvaluefunctions)``

## Dependencies

The function depends on the following R packages:

• ggplot2
• scales
• zipfR

### Important information!

The newest version of ggplot2 (3.1.1) has a bug in `sec_axis` that will lead to the secondary y-axis being labelled wrongly.

It is therefore recommended that you install the developmental version of ggplot2 until the bug has been fixed. You can install the developmental version using the following command (after installing the `devtools` package): `devtools::install_github("tidyverse/ggplot2")`

## Usage

There is only one function needed to create the plots: `conf_dist()`. The function has the following arguments:

• `estimate`: Numerical vector containing the estimate(s).
• `n`: Numerical vector containing the sample size(s). Required for correlations, variances, proportions and differences between proportions. Must be equal the number of estimates.
• `df`: Numerical vector containing the degrees of freedom. Required for statistics based on the t-distribution (e.g. linear regression) and t-tests. Must be equal the number of estimates.
• `stderr`: Numerical vector containing the standard error(s) of the estimate(s). Required for statistics based on the t-distribution (e.g. linear regression) and the normal distribution (e.g. logistic regression). Must be equal the number of estimate(s).
• `tstat`: Numerical vector containing the t-statistic(s). Required for t-tests (means and mean differences). Must be equal the number of estimates.
• `type`: String indicating the type of the estimate. Must be one of the following: `ttest`, `linreg`, `gammareg`, `general_t`, `logreg`, `poisreg`, `coxreg`, `general_z`, `pearson`, `spearman`, `kendall`, `var`, `prop`, `propdiff`.
• `plot_type`: String indicating the type of plot. Must be one of the following: `cdf` (confidence distribution), `pdf` (confidence density), `p_val` (p-value function, the default), `s_val` (Surprisal). For differences between independent proportions, only p-value functions and Surprisal value functions are available.
• `n_values` (optional): Integer indicating the number of points that are used to generate the graphics. The higher this number, the higher the computation time and resolution.
• `est_names` (optional): String vector indicating the names of the estimate(s). Must be equal the number of estimates.
• `conf_level` (optional): Numerical vector indicating the confidence level(s). Bust be between 0 and 1.
• `null_values` (optional): Numerical vector indicating the null value(s) in the plot on the untransformed scale. For example: If you want to plot odds ratios from logistic regressions, the `null_values` have to be given on the log-odds scale. If x limits are specified with `xlim`, all null values outside of the specified x limits are ignored for plotting and a message is printed.
• `trans` (optional): String indicating the transformation function that will be applied to the estimates and confidence curves. For example: `"exp"` for an exponential transformation of the log-odds in logistic regression.
• `alternative`: String indicating if the confidence level(s) are two-sided or one-sided. Must be one of the following: `two_sided`, `one_sided`.
• `log_yaxis`: Logical. Indicating if a portion of the y-axis should be displayed on the logarithmic scale.
• `cut_logyaxis`: Numerical value indicating the threshold below which the y-axis will be displayed logarithmically. Must lie between 0 and 1.
• `xlim` (optional) Optional numerical vector of length 2 (x1, x2) indicating the limits of the x-axis on the untransformed scale. For example: If you want to plot p-value functions for odds ratios from logistic regressions, the limits have to be given on the log-odds scale. Note that x1 > x2 is allowed but then x2 will be the left limit and x1 the right limit (i.e. the limits are sorted before plotting). Null values (specified in `null_values`) that are outside of the specified limits are ignored and a message is printed.
• `together`: Logical. Indicating if graphics for multiple estimates should be displayed together or on separate plots.
• `nrow`: (optional) Integer greater than 0 indicating the number of rows when `together = FALSE` is specified for multiple estimates. Used in `facet_wrap` in ggplot2.
• `ncol`: (optional) Integer greater than 0 indicating the number of columns when `together = FALSE` is specified for multiple estimates. Used in `facet_wrap` in ggplot2.
• `plot_p_limit`: Numerical value indicating the lower limit of the y-axis. Must be greater than 0 for a logarithmic scale (i.e. `log_yaxis = TRUE`). The default is to omit plotting p-values smaller than 1 - 0.999 = 0.001.
• `plot_counternull`: Logical. Indicating if the counternull should be plotted as a point. Only available for -value functions and s-value functions. Counternull values that are outside of the plotted functions are not shown.
• `title` (optional): String containing a title for the plot.
• `xlab` (optional): String indicating the label of the x-axis.
• `ylab` (optional): String containing a title for the primary (left) y-axis.
• `ylab_sec` (optional): String containing a title for the secondary (right) y-axis.
• `inverted` Logical. Indicating the orientation of the P-value function (`p_val`), S-value function (`s_val`) and confidence distribution (`cdf`). By default (i.e. `inverted = FALSE`) small P-values are plotted at the bottom and large ones at the top. By setting `inverted = TRUE`, the y-axis is invertedd. Ignored for confidence densities.
• `x_scale` String indicating the scaling of the x-axis. The default is to scale the x-axis logarithmically if the transformation specified in `trans` is “exp” (exponential) and linearly otherwise. The option `linear` (can be abbreviated) forces a linear scaling and the option `logarithm` (can be abbreviated) forces a logarithmic scaling.

### Required arguments for different estimate types

• t-tests: `estimate`, `df`, `tstat`.
• Linear regression, Gamma regression, general estimates with inference based on the t-distribution: `estimate`, `df`, `stderr`.
• Logistic regression, Poisson regression, Cox regression, general estimates with inference based on the normal distribution: `estimate`, `stderr`.
• Correlation coefficients (Pearson, Spearman, Kendall), proportions, difference between proportions, variances: `estimate`, `n`.

### Returned values

The main function `conf_dist()` returns five objects in a list:

• `res_frame`: A data frame containing the values used to construct the plot.
• `conf_frame`: A data frame containing the confidence intervals for the specified confidence levels for all estimates.
• `counternull_frame`: A data frame containing the counternull values for the specified null values (see Rosenthal & Rubin (1994) for more information about the counternull).
• `point_est`: A data frame containing the point estimates for all estimates. The point estimates correspond to the mean, median or mode of the confidence density (see Xie & Singh (2013) for more information). Estimates are produced using numerical procedures: Increase the number of points `n_values` for higher numerical precision.
• `plot`: A ggplot2 plot object.

## Examples

### Two-sample t-test with unequal variances (Welch-Test)

``````#-----------------------------------------------------------------------------
# T-Test
#-----------------------------------------------------------------------------

with(sleep, mean(extra[group == 1])) - with(sleep, mean(extra[group == 2]))
#> [1] -1.58
t.test(extra ~ group, data = sleep, var.equal = FALSE)
#>
#>  Welch Two Sample t-test
#>
#> data:  extra by group
#> t = -1.8608, df = 17.776, p-value = 0.07939
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -3.3654832  0.2054832
#> sample estimates:
#> mean in group 1 mean in group 2
#>            0.75            2.33

#-----------------------------------------------------------------------------
# Create p-value function
#-----------------------------------------------------------------------------

res <- conf_dist(
estimate = c(-1.58)
, df = c(17.77647)
, tstat = c(-1.860813)
, type = "ttest"
, plot_type = "p_val"
, n_values = 1e4L
# , est_names = c("")
, conf_level = c(0.95, 0.90, 0.80)
, null_values = c(0)
, trans = "identity"
, alternative = "two_sided"
, log_yaxis = FALSE
, cut_logyaxis = 0.05
, xlab = "Mean difference (group 1 - group 2)"
, together = FALSE
, plot_p_limit = 1 - 0.999
, plot_counternull = TRUE
, x_scale = "line"
)``````

### Single coefficient from a linear regression model

#### P-value function

Because it’s difficult to see very small p-values in the graph, you can set the option `log_yaxis = TRUE` so that p-values (i.e. the y-axes) below the value set in `cut_logyaxis` will be plotted on a logarithmic scale. This will make it much easier to see small p-values but has the disadvantage of creating a “kink” in the p-value function which is a pure artifact and puts an undue emphasis on the specified cutoff.

``````#-----------------------------------------------------------------------------
# Model
#-----------------------------------------------------------------------------

mod <- lm(Infant.Mortality~Agriculture + Fertility + Examination, data = swiss)

summary(mod)
#>
#> Call:
#> lm(formula = Infant.Mortality ~ Agriculture + Fertility + Examination,
#>     data = swiss)
#>
#> Residuals:
#>     Min      1Q  Median      3Q     Max
#> -8.5375 -1.4021 -0.0066  1.7381  5.9150
#>
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 11.01896    4.47291   2.463  0.01784 *
#> Agriculture -0.02143    0.02394  -0.895  0.37569
#> Fertility    0.13115    0.04145   3.164  0.00285 **
#> Examination  0.04913    0.08351   0.588  0.55942
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 2.645 on 43 degrees of freedom
#> Multiple R-squared:  0.2291, Adjusted R-squared:  0.1753
#> F-statistic:  4.26 on 3 and 43 DF,  p-value: 0.01014

#-----------------------------------------------------------------------------
# Create p-value function
#-----------------------------------------------------------------------------

res <- conf_dist(
estimate = c(-0.02143)
, df = c(43)
, stderr = (0.02394)
, type = "linreg"
, plot_type = "p_val"
, n_values = 1e4L
# , est_names = c("")
, conf_level = c(0.95, 0.90, 0.80)
, null_values = c(0)
, trans = "identity"
, alternative = "two_sided"
, log_yaxis = TRUE
, cut_logyaxis = 0.05
, xlab = "Coefficient Agriculture"
, xlim = c(-0.12, 0.065)
, together = FALSE
, plot_p_limit = 1 - 0.999
, plot_counternull = FALSE
)``````

#### Confidence distribution

``````res <- conf_dist(
estimate = c(-0.02143)
, df = c(43)
, stderr = (0.02394)
, type = "linreg"
, plot_type = "cdf"
, n_values = 1e4L
# , est_names = c("")
, conf_level = c(0.95, 0.90, 0.80)
, null_values = c(0)
, trans = "identity"
, alternative = "two_sided"
# , log_yaxis = TRUE
# , cut_logyaxis = 0.05
, xlab = "Coefficient Agriculture"
, xlim = c(-0.12, 0.065)
, together = FALSE
# , plot_p_limit = 1 - 0.999
, plot_counternull = FALSE
)``````

The point where the confidence distribution is \(0.5\) is the median unbiased estimator (see Xie & Singh (2013) for a review and proofs).

### Multiple coefficients from a linear regression model

#### P-value functions

``````res <- conf_dist(
estimate = c(0.13115, 0.04913)
, df = c(43, 43)
, stderr = c(0.04145, 0.08351)
, type = "linreg"
, plot_type = "p_val"
, n_values = 1e4L
, est_names = c("Fertility", "Examination")
, conf_level = c(0.95, 0.90, 0.80)
, null_values = c(0)
, trans = "identity"
, alternative = "two_sided"
, log_yaxis = FALSE
, cut_logyaxis = 0.05
, xlab = "Regression coefficients"
, together = TRUE
, plot_p_limit = 1 - 0.999
, plot_counternull = FALSE
, inverted = FALSE
)``````

#### Surprisal values

``````res <- conf_dist(
estimate = c(0.13115, 0.04913)
, df = c(43, 43)
, stderr = c(0.04145, 0.08351)
, type = "linreg"
, plot_type = "s_val"
, n_values = 1e4L
, est_names = c("Fertility", "Examination")
, conf_level = c(0.95, 0.90, 0.80)
, null_values = c(0)
, trans = "identity"
, alternative = "two_sided"
# , log_yaxis = TRUE
# , cut_logyaxis = 0.05
, xlab = "Regression coefficients"
, together = TRUE
, plot_p_limit = 1 - 0.999
, plot_counternull = TRUE
)``````

### Pearson correlation coefficient (one-sided)

``````#-----------------------------------------------------------------------------
# Calculate Pearson's correlation coefficient
#-----------------------------------------------------------------------------

cor.test(swiss\$Fertility, swiss\$Agriculture, alternative = "two.sided", method = "pearson")
#>
#>  Pearson's product-moment correlation
#>
#> data:  swiss\$Fertility and swiss\$Agriculture
#> t = 2.5316, df = 45, p-value = 0.01492
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#>  0.07334947 0.58130587
#> sample estimates:
#>       cor
#> 0.3530792

#-----------------------------------------------------------------------------
# Create p-value function
#-----------------------------------------------------------------------------

res <- conf_dist(
estimate = c(0.3530792)
, n = 47
, type = "pearson"
, plot_type = "p_val"
, n_values = 1e4L
# , est_names = c("")
, conf_level = c(0.95, 0.90, 0.80)
, null_values = c(0)
, trans = "identity"
, alternative = "one_sided"
, log_yaxis = FALSE
, cut_logyaxis = 0.05
, xlab = "Pearson correlation"
, together = TRUE
, plot_p_limit = 1 - 0.999
, plot_counternull = FALSE
)``````