Factoextra
Extract and Visualize the Results of Multivariate Data Analyses
Install / Use
/learn @kassambara/FactoextraREADME
factoextra : Extract and Visualize the Results of Multivariate Data Analyses
factoextra is an R package making easy to extract and visualize the output of exploratory multivariate data analyses, including:
-
Principal Component Analysis (PCA), which is used to summarize the information contained in a continuous (i.e, quantitative) multivariate data by reducing the dimensionality of the data without loosing important information.
-
Correspondence Analysis (CA), which is an extension of the principal component analysis suited to analyse a large contingency table formed by two qualitative variables (or categorical data).
-
Multiple Correspondence Analysis (MCA), which is an adaptation of CA to a data table containing more than two categorical variables.
-
Multiple Factor Analysis (MFA) dedicated to datasets where variables are organized into groups (qualitative and/or quantitative variables).
-
Hierarchical Multiple Factor Analysis (HMFA): An extension of MFA in a situation where the data are organized into a hierarchical structure.
-
Factor Analysis of Mixed Data (FAMD), a particular case of the MFA, dedicated to analyze a data set containing both quantitative and qualitative variables.
There are a number of R packages implementing principal component methods. These packages include: FactoMineR, ade4, stats, ca, MASS and ExPosition.
However, the result is presented differently according to the used packages. To help in the interpretation and in the visualization of multivariate analysis - such as cluster analysis and dimensionality reduction analysis - we developed an easy-to-use R package named factoextra.
- The R package factoextra has flexible and easy-to-use methods to extract quickly, in a human readable standard data format, the analysis results from the different packages mentioned above.\
- It produces a ggplot2-based elegant data visualization with less typing.
- It contains also many functions facilitating clustering analysis and visualization.
We’ll use i) the FactoMineR package (Sebastien Le, et al., 2008) to compute PCA, (M)CA, FAMD, MFA and HCPC; ii) and the factoextra package for extracting and visualizing the results.
The figure below shows methods, which outputs can be visualized using the factoextra package. The official online documentation is available at: https://rpkgs.datanovia.com/factoextra/index.html.
<figure> <img src="tools/factoextra-r-package.png" alt="factoextra R package" /> <figcaption aria-hidden="true">factoextra R package</figcaption> </figure>Why using factoextra?
-
The factoextra R package can handle the results of PCA, CA, MCA, MFA, FAMD and HMFA from several packages, for extracting and visualizing the most important information contained in your data.
-
After PCA, CA, MCA, MFA, FAMD and HMFA, the most important row/column elements can be highlighted using :\
- their cos2 values corresponding to their quality of representation on the factor map
- their contributions to the definition of the principal dimensions.
<span class="success">If you want to do this, the factoextra package provides a convenient solution.</span>
- PCA and (M)CA are used sometimes for prediction problems : one can predict the coordinates of new supplementary variables (quantitative and qualitative) and supplementary individuals using the information provided by the previously performed PCA or (M)CA. This can be done easily using the FactoMineR package.
<span class="success">If you want to make predictions with PCA/MCA and to visualize the position of the supplementary variables/individuals on the factor map using ggplot2: then factoextra can help you. It’s quick, write less and do more…</span>
- Several functions from different packages - FactoMineR, ade4, ExPosition, stats - are available in R for performing PCA, CA or MCA. However, The components of the output vary from package to package.
<span class="success">No matter the package you decided to use, factoextra can give you a human understandable output.</span>
Installing FactoMineR
The FactoMineR package can be installed and loaded as follow:
# Install
install.packages("FactoMineR")
# Load
library("FactoMineR")
Installing and loading factoextra
- factoextra can be installed from CRAN as follow:
install.packages("factoextra")
- Or, install the latest version from Github
if(!require(remotes)) install.packages("remotes")
remotes::install_github("kassambara/factoextra")
The current maintenance baseline targets:
-
R >= 4.1.0 -
ggplot2 >= 3.5.2 -
ggpubr >= 0.6.3(CRAN) -
FactoMineR >= 2.13 -
Load factoextra as follow :
library("factoextra")
Recent compatibility updates
# New helper: map legacy FactoMineR category labels
map_factominer_legacy_names(res.mfa, c("var.level"))
# New support: supplementary qualitative categories in FactoMineR FAMD/MFA
get_famd(res.famd, "quali.sup")
fviz_famd_var(res.famd, "quali.sup")
get_mfa(res.mfa, "quali.sup")
fviz_mfa_var(res.mfa, "quali.sup")
# New helper: remove stale package lock directories
clean_lock_files()
# Hopkins statistic uses corrected formula (Wright 2022)
# Set this option to silence the one-time warning
options(factoextra.warn_hopkins = FALSE)
Main functions in the factoextra package
<span class="warning">See the online documentation (https://rpkgs.datanovia.com/factoextra/index.html) for a complete list.</span>
Visualizing dimension reduction analysis outputs
| Functions | Description | |----|----| | fviz_eig (or fviz_eigenvalue) | Extract and visualize the eigenvalues/variances of dimensions. | | fviz_pca | Graph of individuals/variables from the output of Principal Component Analysis (PCA). | | fviz_ca | Graph of column/row variables from the output of Correspondence Analysis (CA). | | fviz_mca | Graph of individuals/variables from the output of Multiple Correspondence Analysis (MCA). | | fviz_mfa | Graph of individuals/variables from the output of Multiple Factor Analysis (MFA), including supplementary qualitative categories. | | fviz_famd | Graph of individuals/variables from the output of Factor Analysis of Mixed Data (FAMD), including supplementary qualitative categories. | | fviz_hmfa | Graph of individuals/variables from the output of Hierarchical Multiple Factor Analysis (HMFA). | | fviz_ellipses | Draw confidence ellipses around the categories. | | fviz_cos2 | Visualize the quality of representation of the row/column variable from the results of PCA, CA, MCA functions. | | fviz_contrib | Visualize the contributions of row/column elements from the results of PCA, CA, MCA functions. |
Extracting data from dimension reduction analysis outputs
| Functions | Description | |----|----| | get_eigenvalue | Extract and visualize the eigenvalues/variances of dimensions. | | get_pca | Extract all the results (coordinates, squared cosine, contributions) for the active individuals/variables from Principal Component Analysis (PCA) outputs. | | get_ca | Extract all the results (coordinates, squared cosine, contributions) for the active column/row variables from Correspondence Analysis outputs. | | get_mca | Extract results from Multiple Correspondence Analysis outputs. | | get_mfa | Extract results from Multiple Factor Analysis outputs, including supplementary qualitative categories. | | get_famd | Extract results from Factor Analysis of Mixed Data outputs, including supplementary qualitative categories. | | get_hmfa | Extract results from Hierarchical Multiple Factor Analysis outputs. | | facto_summarize | Subset and summarize the output of factor analyses. |
