Plotastic
Streamlining statistical analysis by using plotting keywords in Python.
Install / Use
/learn @markur4/PlotasticREADME
plotastic: Bridging Plotting and Statistics
📦 Installation
Install from PyPi:
pip install plotastic
Install from GitHub: (experimental, check CHANGELOG.md)
pip install git+https://github.com/markur4/plotastic.git
Requirements
- Python >= 3.11 (not tested with earlier versions)
- pandas == 1.5.3 (pingouin needs this)
- seaborn <= 0.12.2 (later versions reworked hue)
📷 Example Gallery
<details><summary> <b><i> (click to unfold) </i></b> </summary> <blockquote> <hr> <h1 align="center"> <hr> 🐁 Click on Images for Code! 🐁 <hr><a href=https://github.com/markur4/plotastic/blob/main/EXAMPLES/qpcr.ipynb>
<img
src="https://raw.githubusercontent.com/markur4/plotastic/main/EXAMPLES/qpcr1.png"
alt="qpcr1">
</a>
<a href=https://github.com/markur4/plotastic/blob/main/EXAMPLES/fmri.ipynb>
<img
src="https://raw.githubusercontent.com/markur4/plotastic/main/EXAMPLES/fmri2.png"
width="500px"
alt="fmri2">
</a>
<a href="https://github.com/markur4/plotastic/blob/main/EXAMPLES/attention.ipynb">
<img
src="https://raw.githubusercontent.com/markur4/plotastic/main/EXAMPLES/attention1.png"
width="250px"
alt="attention">
</a>
<a href=https://github.com/markur4/plotastic/blob/main/EXAMPLES/tips.ipynb>
<img
src="https://raw.githubusercontent.com/markur4/plotastic/main/EXAMPLES/tips1.png"
width="350px" alt="tips1">
</a>
<a href=https://github.com/markur4/plotastic/blob/main/EXAMPLES/iris.ipynb>
<img
src="https://raw.githubusercontent.com/markur4/plotastic/main/EXAMPLES/iris1.png"
width="400px" alt="iris1">
</a>
<a href="https://github.com/markur4/plotastic/blob/main/EXAMPLES/cars.ipynb">
<img
src="https://raw.githubusercontent.com/markur4/plotastic/main/EXAMPLES/cars1.png"
alt="cars1">
</a>
<a href="https://github.com/markur4/plotastic/blob/main/EXAMPLES/diamonds.ipynb">
<img
src="https://raw.githubusercontent.com/markur4/plotastic/main/EXAMPLES/diamonds1.png"
alt="diamonds1">
<img
src="https://raw.githubusercontent.com/markur4/plotastic/main/EXAMPLES/diamonds2.png"
alt="diamonds2">
</a>
🧑🏫 About plotastic
<details><summary> 🤔<b><i> Summary </i></b> </summary>
<blockquote>
<hr>
plotastic addresses the challenges of transitioning from exploratory
data analysis to hypothesis testing in Python's data science ecosystem.
Bridging the gap between seaborn and pingouin, this library offers a
unified environment for plotting and statistical analysis. It simplifies
the workflow with a user-friendly syntax and seamless integration with
familiar seaborn parameters (y, x, hue, row, col). Inspired by
seaborn's consistency, plotastic utilizes a DataAnalysis object to
intelligently pass parameters to pingouin statistical functions. The
library systematically groups the data according to the needs of
statistical tests and plots, conducts visualisation, analyses and
supports extensive customization options. In essence, plotastic
establishes a protocol for configuring statical analyses through
plotting parameters. This approach streamlines the process, translating
seaborn parameters into statistical terms, providing researchers and
data scientists with a cohesive and user-friendly solution in python.!
Workflow:
- 🧮 Import & Prepare your pandas DataFrame
- We require a long-format pandas dataframe with categorical columns
- If it works with seaborn, it works with plotastic!
- 🔀 Make a DataAnalysis Object
DataAnalysis(DataFrame, dims={x, y, hue, row, col})- Check for empty data groups, differing samplesizes, NaN-count, etc. automatically
- ✅ Explore Data
- Check Data integrity, unequal samplesizes, empty groups, etc.
- Quick preliminary plotting with e.g.
DataAnalysis.catplot()
- 🔨 Adapt Data
- Categorize multiple columns at once
- Transform dependent variable
- Each step warns you, if you introduced NaNs without knowledge!
- etc.
- ✨ Perform Statistical Tests ✨
- Check Normality, Homoscedasticity, Sphericity
- Perform Omnibus tests (ANOVA, RMANOVA, Kruskal-Wallis, Friedman)
- Perform PostHoc tests (Tukey, Dunn, Wilcoxon, etc.) based on
pg.pairwise_tests()
- 📊 Plot figure
- Use pre-defined and optimized multi-layered plots with one line (e.g. strip over box)!
- Annotate statistical results (p-values as *, **, ***, etc.) with full control over which data to include or exclude!
- 💿 Save all results at once!
- One DataAnalysis object holds:
- One DataFrame in
self.data - One Figure in
self.fig,self.axes - Multiple statistical results:
self.results
- One DataFrame in
- Use
DataAnalysis.save_statistics()to save all results to different sheets collected in one .xlsx filesheet per test
- One DataAnalysis object holds:
In Principle:
- Categorical data is separable into
seaborn's categorization parameters: x, y, hue, row, col. We call those "dimensions". - These dimensions are assigned to statistical terms:
- y is the dependent variable (DV)
- x and hue are independent variables (IV) and are treated as within/between factors (categorical variables)
- row and col are grouping variables (categorical variables)
- A subject may be specified for within/paired study designs (categorical variable)
- For each level of row or col (or for each combination of row- and col levels), statistical tests will be performed with regards to the two-factors x and hue
Example with ANOVA:
- Imagine this example data:
- Each day you measure the tip of a group of people.
- For each tip, you note down the day, gender, age-group and whether they smoke or not.
- Hence, this data has 4 categorical dimensions, each with 2 or more
levels:
- day: 4 levels (monday, tuesday, wednesday, Thursday)
- gender: 2 levels (male, female)
- smoker: 2 levels (yes, no)
- age-group: 2 levels (young, old)
- Each category is assigned to a place of a plot, and when calling
statistical tests, we assign them to statistical terms (in comments):
-
# dims is short for dimensions dims = dict( # STATISTICAL TERM: y = "tip", # y-axis, dependent variable x = "day", # x-axis, independent variable (within-subject factor) hue = "gender", # color, independent variable (within-subject factor) col = "smoker", # axes, grouping variable row = "age-group" # axes, grouping variable )
-
- We perform statistical testing groupwise:
- For each level-combinations of smoker and age-group, a
two-way ANOVA will be performed (with day and gender as
between factors for each datagroup):
- 1st ANOVA assesses datapoints where smoker=yes AND age-group=young
- 2nd ANOVA assesses datapoints where smoker=yes AND age-group=old
- 3rd ANOVA assesses datapoints where smoker=no AND age-group=young
- 4th ANOVA assesses datapoints where smoker=no AND age-group=old
- Three-way ANOVAs are not possible (yet), since that would require setting e.g. col as the third factor, or implementing another dimension (e.g. hue2).
- For each level-combinations of smoker and age-group, a
two-way ANOVA will be performed (with day and gender as
between factors for each datagroup):
This software was inspired by ...
- ... Intuitive Biostatistics - Fourth Edition (2017); Harvey Motulsky
- ... Introduction to Statistical Learning with applications in Python - First Edition (2023); Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, Jonathan Taylor
- ... talking to other scientists struggling with statistics
✅ plotastic can help you with...
- ... gaining some practical experience when learning statistics
- ... quickly gain statistical implications about your data without switching to another software
- ... making first steps towards a full statistical analysis
- ... plotting publication grade figures (check statistics results with other software)
- ... publication grade statistical analysis IF you really know what you're doing OR you have back-checked your results by a professional statistician
- ... quickly test data transformations (log)
🚫 plotastic can NOT ...
- ... replace a professional statistician
- ... teac
