SimplifyStats Vignette

Zachary Colburn

2020-04-08

SimplifyStats provides a set of functions to simplify the process of 1) generating descriptive statistics for the numeric variables of multiple groups and 2) performing hypothesis testing between all combinations of groups.

Generate group-wise descriptive statistics

The function group_summarize can be used to generate descriptive statistics for multiple groups based on unique combinations of the grouping variables.

library(SimplifyStats)

# Generate data.
df <- iris

# Modify df to demonstrate additional functionality.
## Add an NA.
df$Sepal.Length[1] <- NA
## Add another grouping variable.
df$Condition <- rep(c("untreated","treated"), 75)

# Generate descriptive statistics.
group_summarize(
  df, 
  group_cols = c("Species","Condition"), 
  var_cols = c("Sepal.Length","Sepal.Width"), 
  na.rm = TRUE
)
#> # A tibble: 12 x 17
#>    Variable Species Condition     N  Mean StdDev StdErr   Min Quartile1 Median
#>    <chr>    <fct>   <chr>     <int> <dbl>  <dbl>  <dbl> <dbl>     <dbl>  <dbl>
#>  1 Sepal.L~ setosa  untreated    24  5.02  0.399 0.0814   4.4      4.77    5  
#>  2 Sepal.L~ setosa  treated      25  4.99  0.317 0.0633   4.3      4.8     5  
#>  3 Sepal.L~ versic~ untreated    25  5.99  0.556 0.111    5        5.6     5.9
#>  4 Sepal.L~ versic~ treated      25  5.88  0.478 0.0956   4.9      5.6     5.9
#>  5 Sepal.L~ virgin~ untreated    25  6.50  0.603 0.121    4.9      6.2     6.5
#>  6 Sepal.L~ virgin~ treated      25  6.67  0.669 0.134    5.6      6.3     6.5
#>  7 Sepal.W~ setosa  untreated    25  3.48  0.325 0.0651   2.9      3.2     3.5
#>  8 Sepal.W~ setosa  treated      25  3.38  0.426 0.0853   2.3      3.1     3.4
#>  9 Sepal.W~ versic~ untreated    25  2.78  0.336 0.0672   2        2.6     2.9
#> 10 Sepal.W~ versic~ treated      25  2.76  0.297 0.0594   2.3      2.5     2.8
#> 11 Sepal.W~ virgin~ untreated    25  2.94  0.287 0.0574   2.5      2.8     3  
#> 12 Sepal.W~ virgin~ treated      25  3.01  0.356 0.0713   2.2      2.8     3  
#> # ... with 7 more variables: Quartile3 <dbl>, Max <dbl>, PropNA <dbl>,
#> #   Kurtosis <dbl>, Skewness <dbl>, `Jarque-Bera_p.value` <dbl>,
#> #   `Shapiro-Wilk_p.value` <dbl>

Perform pair-wise hypothesis testing

Similarly, the function pairwise_stats can be used to perform pairwise statistical tests for multiple variables based on unique combinations of the grouping variables.

# Generate descriptive statistics.
pairwise_stats(
  df, 
  group_cols = c("Species","Condition"), 
  var_cols = c("Sepal.Length", "Sepal.Width"),
  t.test
)
#> # A tibble: 30 x 15
#>    Variable A.Species A.Condition B.Species B.Condition estimate estimate1
#>    <chr>    <fct>     <chr>       <fct>     <chr>          <dbl>     <dbl>
#>  1 Sepal.L~ setosa    untreated   setosa    treated       0.0328      5.02
#>  2 Sepal.L~ setosa    untreated   versicol~ untreated    -0.971       5.02
#>  3 Sepal.L~ setosa    untreated   versicol~ treated      -0.859       5.02
#>  4 Sepal.L~ setosa    untreated   virginica untreated    -1.48        5.02
#>  5 Sepal.L~ setosa    untreated   virginica treated      -1.65        5.02
#>  6 Sepal.L~ setosa    treated     versicol~ untreated    -1.00        4.99
#>  7 Sepal.L~ setosa    treated     versicol~ treated      -0.892       4.99
#>  8 Sepal.L~ setosa    treated     virginica untreated    -1.52        4.99
#>  9 Sepal.L~ setosa    treated     virginica treated      -1.68        4.99
#> 10 Sepal.L~ versicol~ untreated   versicol~ treated       0.112       5.99
#> # ... with 20 more rows, and 8 more variables: estimate2 <dbl>,
#> #   statistic <dbl>, p.value <dbl>, parameter <dbl>, conf.low <dbl>,
#> #   conf.high <dbl>, method <chr>, alternative <chr>