`{statsExpressions}`: Tidy dataframes and expressions with statistical details

| Status | Usage | Miscellaneous | |----|----|----| | | | | | | | |

Introduction <img src="man/figures/logo.png" alt="statsExpressions package logo" align="right" width="240" />

The {statsExpressions} package has two key aims:

to provide a consistent syntax to do statistical analysis with tidy data (in pipe-friendly manner),
to provide statistical expressions (pre-formatted in-text statistical results) for plotting functions.

Statistical packages exhibit substantial diversity in terms of their syntax and expected input type. This can make it difficult to switch from one statistical approach to another. For example, some functions expect vectors as inputs, while others expect dataframes. Depending on whether it is a repeated measures design or not, different functions might expect data to be in wide or long format. Some functions can internally omit missing values, while other functions error in their presence. Furthermore, if someone wishes to utilize the objects returned by these packages downstream in their workflow, this is not straightforward either because even functions from the same package can return a list, a matrix, an array, a dataframe, etc., depending on the function.

This is where {statsExpressions} comes in: It can be thought of as a unified portal through which most of the functionality in these underlying packages can be accessed, with a simpler interface and no requirement to change data format.

This package forms the statistical processing backend for ggstatsplot package.

For more documentation, see the dedicated website.

Installation

| Type | Command | |:------------|:----------------------------------------------| | Release | install.packages("statsExpressions") | | Development | pak::pak("IndrajeetPatil/statsExpressions") |

On Linux, {statsExpressions} installation may require additional system dependencies, which can be checked using:

pak::pkg_sysreqs("statsExpressions")

Citation

The package can be cited as:

citation("statsExpressions")
To cite package 'statsExpressions' in publications use:

  Patil, I., (2021). statsExpressions: R Package for Tidy Dataframes
  and Expressions with Statistical Details. Journal of Open Source
  Software, 6(61), 3236, https://doi.org/10.21105/joss.03236

A BibTeX entry for LaTeX users is

  @Article{,
    doi = {10.21105/joss.03236},
    year = {2021},
    publisher = {{The Open Journal}},
    volume = {6},
    number = {61},
    pages = {3236},
    author = {Indrajeet Patil},
    title = {{statsExpressions: {R} Package for Tidy Dataframes and Expressions with Statistical Details}},
    journal = {{Journal of Open Source Software}},
  }

General Workflow

Summary of functionality

Summary of available analyses

| Test | Function | |:---------------------------|:-------------------------| | one-sample t-test | one_sample_test() | | two-sample t-test | two_sample_test() | | one-way ANOVA | oneway_anova() | | correlation analysis | corr_test() | | contingency table analysis | contingency_table() | | meta-analysis | meta_analysis() | | pairwise comparisons | pairwise_comparisons() |

Summary of details available for analyses

| Analysis | Hypothesis testing | Effect size estimation | |:--------------------------------|:-------------------|:-----------------------| | (one/two-sample) t-test | ✅ | ✅ | | one-way ANOVA | ✅ | ✅ | | correlation | ✅ | ✅ | | (one/two-way) contingency table | ✅ | ✅ | | random-effects meta-analysis | ✅ | ✅ |

Summary of supported statistical approaches

| Description | Parametric | Non-parametric | Robust | Bayesian | |:---|:---|:---|:---|:---| | Between group/condition comparisons | ✅ | ✅ | ✅ | ✅ | | Within group/condition comparisons | ✅ | ✅ | ✅ | ✅ | | Distribution of a numeric variable | ✅ | ✅ | ✅ | ✅ | | Correlation between two variables | ✅ | ✅ | ✅ | ✅ | | Association between categorical variables | ✅ | ✅ | ❌ | ✅ | | Equal proportions for categorical variable levels | ✅ | ✅ | ❌ | ✅ | | Random-effects meta-analysis | ✅ | ❌ | ✅ | ✅ |

Tidy dataframes from statistical analysis

To illustrate the simplicity of this syntax, let’s say we want to run a one-way ANOVA. If we first run a non-parametric ANOVA and then decide to run a robust ANOVA instead, the syntax remains the same and the statistical approach can be modified by changing a single argument:

mtcars %>% oneway_anova(cyl, wt, type = "nonparametric")
#> # A tibble: 1 × 15
#>   parameter1 parameter2 statistic df.error   p.value
#>   <chr>      <chr>          <dbl>    <int>     <dbl>
#> 1 wt         cyl             22.8        2 0.0000112
#>   method                       effectsize      estimate conf.level conf.low
#>   <chr>                        <chr>              <dbl>      <dbl>    <dbl>
#> 1 Kruskal-Wallis rank sum test Epsilon2 (rank)    0.736       0.95    0.624
#>   conf.high conf.method          conf.iterations n.obs expression
#>       <dbl> <chr>                          <int> <int> <list>    
#> 1         1 percentile bootstrap             100    32 <language>

mtcars %>% oneway_anova(cyl, wt, type = "robust")
#> # A tibble: 1 × 12
#>   statistic    df df.error p.value
#>       <dbl> <dbl>    <dbl>   <dbl>
#> 1      12.7     2     12.2 0.00102
#>   method                                           
#>   <chr>                                            
#> 1 A heteroscedastic one-way ANOVA for trimmed means
#>   effectsize                         estimate conf.level conf.low conf.high
#>   <chr>                                 <dbl>      <dbl>    <dbl>     <dbl>
#> 1 Explanatory measure of effect size     1.05       0.95    0.843      1.50
#>   n.obs expression
#>   <int> <list>    
#> 1    32 <language>

All possible output dataframes from functions are tabulated here: https://indrajeetpatil.github.io/statsExpressions/articles/web_only/dataframe_outputs.html

Needless to say this will also work with the kable function to generate a table:

set.seed(123)

# one-sample robust t-test
# we will leave `expression` column out; it's not needed for using only the dataframe
mtcars %>%
  one_sample_test(wt, test.value = 3, type = "robust") %>%
  dplyr::select(-expression) %>%
  knitr::kable()

| statistic | p.value | n.obs | method | effectsize | estimate | conf.level | conf.low | conf.high | |---:|---:|---:|:---|:---|---:|---:|---:|---:| | 1.179181 | 0.275 | 32 | Bootstrap-t method for one-sample test | Trimmed mean | 3.197 | 0.95 | 2.854246 | 3.539754 |

These functions are also compatible with other popular data manipulation packages.

For example, let’s say we want to run a one-sample t-test for all levels of a certain grouping variable. We can use dplyr to do so:

# for reproducibility
set.seed(123)
library(dplyr)

# grouped operation
# running one-sample test for all levels of grouping variable `cyl`
mtcars %>%
  group_by(cyl) %>%
  group_modify(~ one_sample_test(.x, wt, test.value = 3), .keep = TRUE) %>%
  ungroup()
#> # A tibble: 3 × 16
#>     cyl    mu statistic df.error  p.value method            alternative
#>   <dbl> <dbl>     <dbl>    <dbl>    <dbl> <chr>             <chr>      
#> 1     4     3    -4.16        10 0.00195  One Sample t-test two.sided  
#> 2     6     3     0.870        6 0.418    One Sample t-test two.sided  
#> 3     8     3     4.92        13 0.000278 One Sample t-test two.sided  
#>   effectsize estimate conf.level conf.low conf.high conf.method
#>   <chr>         <dbl>      <dbl>    <dbl>     <dbl> <chr>      
#> 1 Hedges' g    -1.16        0.95   -1.88     -0.402 ncp        
#> 2 Hedges' g     0.286       0.95   -0.388     0.937 ncp        
#> 3 Hedges' g     1.24        0.95    0.544     1.91  ncp        
#>   conf.distribution n.obs expression
#>   <chr>             <int> <list>    
#> 1 t                    11 <language>
#> 2 t                     7 <language>
#> 3 t                    14 <language>

Using expressions in custom plots

Note that expression here means a pre-formatted in-text statistical result. In addition to other details contained in the dataframe, there is also a column titled expression, which contains expression with statistical details and can be displayed in a plot.

For all statistical test expressions, the default template att

StatsExpressions

Install / Use

README

`{statsExpressions}`: Tidy dataframes and expressions with statistical details

Introduction <img src="man/figures/logo.png" alt="statsExpressions package logo" align="right" width="240" />

Installation

Citation

General Workflow

Summary of functionality

Tidy dataframes from statistical analysis

Using expressions in custom plots

StatsExpressions

Install / Use

README

{statsExpressions}: Tidy dataframes and expressions with statistical details

Introduction <img src="man/figures/logo.png" alt="statsExpressions package logo" align="right" width="240" />

Installation

Citation

General Workflow

Summary of functionality

Tidy dataframes from statistical analysis

Using expressions in custom plots

`{statsExpressions}`: Tidy dataframes and expressions with statistical details