SkillAgentSearch skills...

Pingouin

Statistical package in Python based on Pandas

Install / Use

/learn @raphaelvallat/Pingouin

README

.. -- mode: rst --

|

.. image:: https://badge.fury.io/py/pingouin.svg :target: https://badge.fury.io/py/pingouin

.. image:: https://img.shields.io/conda/vn/conda-forge/pingouin.svg :target: https://anaconda.org/conda-forge/pingouin

.. image:: https://img.shields.io/github/license/raphaelvallat/pingouin.svg :target: https://github.com/raphaelvallat/pingouin/blob/master/LICENSE

.. image:: https://github.com/raphaelvallat/pingouin/actions/workflows/pytest.yml/badge.svg :target: https://github.com/raphaelvallat/pingouin/actions

.. image:: https://codecov.io/gh/raphaelvallat/pingouin/branch/master/graph/badge.svg :target: https://codecov.io/gh/raphaelvallat/pingouin

.. image:: https://pepy.tech/badge/pingouin/month :target: https://pepy.tech/badge/pingouin/month

.. image:: http://joss.theoj.org/papers/d2254e6d8e8478da192148e4cfbe4244/status.svg :target: http://joss.theoj.org/papers/d2254e6d8e8478da192148e4cfbe4244


.. image:: https://pingouin-stats.org/_images/logo_pingouin.png :align: center

Pingouin is an open-source statistical package written in Python 3 and based mostly on Pandas and NumPy. Some of its main features are listed below. For a full list of available functions, please refer to the API documentation <https://pingouin-stats.org/api.html>_.

  1. ANOVAs: N-ways, repeated measures, mixed, ancova

  2. Pairwise post-hocs tests (parametric and non-parametric) and pairwise correlations

  3. Robust, partial, distance and repeated measures correlations

  4. Linear/logistic regression and mediation analysis

  5. Bayes Factors

  6. Multivariate tests

  7. Reliability and consistency

  8. Effect sizes and power analysis

  9. Parametric/bootstrapped confidence intervals around an effect size or a correlation coefficient

  10. Circular statistics

  11. Chi-squared tests

  12. Plotting: Bland-Altman plot, Q-Q plot, paired plot, robust correlation...

Pingouin is designed for users who want simple yet exhaustive statistical functions.

For example, the :code:ttest_ind function of SciPy returns only the T-value and the p-value. By contrast, the :code:ttest function of Pingouin returns the T-value, the p-value, the degrees of freedom, the effect size (Cohen's d), the 95% confidence intervals of the difference in means, the statistical power and the Bayes Factor (BF10) of the test.

Documentation

  • Link to documentation <https://pingouin-stats.org/index.html>_

Chat

If you have questions, please ask them in GitHub Discussions <https://github.com/raphaelvallat/pingouin/discussions>_.

Installation

Dependencies

The main dependencies of Pingouin are:

  • NumPy <https://numpy.org/>_ >= 1.22.4
  • SciPy <https://www.scipy.org/>_ >= 1.8.0
  • Pandas <https://pandas.pydata.org/>_ >= 2.1.1
  • Pandas-flavor <https://github.com/Zsailer/pandas_flavor>_
  • Statsmodels <https://www.statsmodels.org/>_ >= 0.14.1
  • Matplotlib <https://matplotlib.org/>_
  • Seaborn <https://seaborn.pydata.org/>_
  • Scikit-learn <https://scikit-learn.org/>_ >= 1.2.2
  • Tabulate <https://github.com/astanin/python-tabulate>_

Some functions additionally require:

  • Mpmath <http://mpmath.org/>_

Pingouin is a Python 3 package and is currently tested for Python 3.10+.

User installation

Pingouin can be easily installed using uv <https://docs.astral.sh/uv/>_

.. code-block:: shell

uv pip install pingouin

pip

.. code-block:: shell

pip install pingouin

or conda

.. code-block:: shell

conda install -c conda-forge pingouin

New releases are frequent so always make sure that you have the latest version:

.. code-block:: shell

uv pip install --upgrade pingouin

Development

To build and install from source, clone this repository and install in editable mode with uv <https://docs.astral.sh/uv/>_

.. code-block:: shell

git clone https://github.com/raphaelvallat/pingouin.git cd pingouin uv pip install --group=dev --editable .

test the package

pytest --verbose

Quick start

Click on the link below and navigate to the notebooks/ folder to run a collection of interactive Jupyter notebooks showing the main functionalities of Pingouin. No need to install Pingouin beforehand, the notebooks run in a Binder environment.

.. image:: https://mybinder.org/badge.svg :target: https://mybinder.org/v2/gh/raphaelvallat/pingouin/develop

10 minutes to Pingouin

  1. T-test #########

.. code-block:: python

import numpy as np import pingouin as pg

np.random.seed(123) mean, cov, n = [4, 5], [(1, .6), (.6, 1)], 30 x, y = np.random.multivariate_normal(mean, cov, n).T

T-test

pg.ttest(x, y)

.. table:: Output :widths: auto

====== ===== ============= ======= ============= ========= ====== ======= T dof alternative p_val CI95 cohen_d BF10 power ====== ===== ============= ======= ============= ========= ====== ======= -3.401 58 two-sided 0.001 [-1.68 -0.43] 0.878 26.155 0.917 ====== ===== ============= ======= ============= ========= ====== =======


  1. Pearson's correlation ########################

.. code-block:: python

pg.corr(x, y)

.. table:: Output :widths: auto

=== ===== =========== ======= ====== ======= n r CI95 p_val BF10 power === ===== =========== ======= ====== ======= 30 0.595 [0.3 0.79] 0.001 69.723 0.950 === ===== =========== ======= ====== =======


  1. Robust correlation #####################

.. code-block:: python

Introduce an outlier

x[5] = 18

Use the robust biweight midcorrelation

pg.corr(x, y, method="bicor")

.. table:: Output :widths: auto

=== ===== =========== ======= ======= n r CI95 p_val power === ===== =========== ======= ======= 30 0.576 [0.27 0.78] 0.001 0.933 === ===== =========== ======= =======


  1. Test the normality of the data #################################

The pingouin.normality function works with lists, arrays, or pandas DataFrame in wide or long-format.

.. code-block:: python

print(pg.normality(x)) # Univariate normality print(pg.multivariate_normality(np.column_stack((x, y)))) # Multivariate normality

.. table:: Output :widths: auto

===== ====== ======== W pval normal ===== ====== ======== 0.615 0.000 False ===== ====== ========

.. parsed-literal::

(False, 0.00018)


  1. One-way ANOVA using a pandas DataFrame #########################################

.. code-block:: python

Read an example dataset

df = pg.read_dataset('mixed_anova')

Run the ANOVA

aov = pg.anova(data=df, dv='Scores', between='Group', detailed=True) print(aov)

.. table:: Output :widths: auto

======== ======= ==== ===== ======= ======= ======= Source SS DF MS F p_unc np2 ======== ======= ==== ===== ======= ======= ======= Group 5.460 1 5.460 5.244 0.023 0.029 Within 185.343 178 1.041 nan nan nan ======== ======= ==== ===== ======= ======= =======


  1. Repeated measures ANOVA ##########################

.. code-block:: python

pg.rm_anova(data=df, dv='Scores', within='Time', subject='Subject', detailed=True)

.. table:: Output :widths: auto

======== ======= ==== ===== ======= ======= ======= ======= Source SS DF MS F p_unc ng2 eps ======== ======= ==== ===== ======= ======= ======= ======= Time 7.628 2 3.814 3.913 0.023 0.04 0.999 Error 115.027 118 0.975 nan nan nan nan ======== ======= ==== ===== ======= ======= ======= =======


  1. Post-hoc tests corrected for multiple-comparisons ####################################################

.. code-block:: python

FDR-corrected post hocs with Hedges'g effect size

posthoc = pg.pairwise_tests(data=df, dv='Scores', within='Time', subject='Subject', parametric=True, padjust='fdr_bh', effsize='hedges')

Pretty printing of table

pg.print_table(posthoc, floatfmt='.3f')

.. table:: Output :widths: auto

========== ======= ======= ======== ============ ====== ====== ============= ======= ======== ========== ====== ======== Contrast A B Paired Parametric T dof alternative p_unc p_corr p_adjust BF10 hedges ========== ======= ======= ======== ============ ====== ====== ============= ======= ======== ========== ====== ======== Time August January True True -1.740 59.000 two-sided 0.087 0.131 fdr_bh 0.582 -0.328 Time August June True True -2.743 59.000 two-sided 0.008 0.024 fdr_bh 4.232 -0.483 Time January June True True -1.024 59.000 two-sided 0.310 0.310 fdr_bh 0.232 -0.170 ========== ======= ======= ======== ============ ====== ====== ============= ======= ======== ========== ====== ========


  1. Two-way mixed ANOVA ######################

.. code-block:: python

Compute the two-way mixed ANOVA

aov = pg.mixed_anova(data=df, dv='Scores', between='Group', within='Time', subject='Subject', correction=False, effsize="np2") pg.print_table(aov)

.. table:: Output :widths: auto

=========== ===== ===== ===== ===== ===== ======= ===== ======= Source SS DF1 DF2 MS F p_unc np2 eps =========== ===== ===== ===== ===== ===== ======= ===== ======= Group 5.460 1 58 5.460 5.052 0.028

Related Skills

View on GitHub
GitHub Stars1.9k
CategoryDevelopment
Updated3d ago
Forks164

Languages

Python

Security Score

100/100

Audited on Mar 24, 2026

No findings