UPDATE (November 2023) - Version 2.3.0: Verbosity parameter added, long-standing issues fixed

In-depth EDA (target analysis, comparison, feature analysis, correlation) in two lines of code!

Features

Sweetviz is an open-source Python library that generates beautiful, high-density visualizations to kickstart EDA (Exploratory Data Analysis) with just two lines of code. Output is a fully self-contained HTML application.

The system is built around quickly visualizing target values and comparing datasets. Its goal is to help quick analysis of target characteristics, training vs testing data, and other such data characterization tasks.

Usage and parameters are described below, you can also find an article describing its features in depth and see examples in action HERE.

Sweetviz development is still ongoing! Please let me know if you run into any data, compatibility or install issues! Thank you for reporting any BUGS in the issue tracking system here, and I welcome your feedback and questions on usage/features in the brand-new GitHub "Discussions" tab right here!.

Examples & mentions

Example HTML report using the Titanic dataset

Example Notebook w/docs on Colab (Jupyter/other notebooks should also work)

Medium Article describing its features in depth

KD Nugget articles:

Features

Target analysis
- Shows how a target value (e.g. "Survived" in the Titanic dataset) relates to other features
Visualize and compare
- Distinct datasets (e.g. training vs test data)
- Intra-set characteristics (e.g. male versus female)
Mixed-type associations
- Sweetviz integrates associations for numerical (Pearson's correlation), categorical (uncertainty coefficient) and categorical-numerical (correlation ratio) datatypes seamlessly, to provide maximum information for all data types.
Type inference
- Automatically detects numerical, categorical and text features, with optional manual overrides
Summary information
- Type, unique values, missing values, duplicate rows, most frequent values
- Numerical analysis:
  - min/max/range, quartiles, mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness

New & notable

Version 2.2: Big compatibility update for python 3.7+ and numpy versions
Version 2.1: Comet.ml support
Version 2.0: Jupyter, Colab & other notebook support, report scaling & vertical layout

(see below for docs on these features)

Upgrading

Some people have experienced mixed results behavior upgrading through pip. To update to the latest from an existing install, it is recommended to pip uninstall sweetviz first, then simply install.

Installation

Sweetviz currently supports Python 3.6+ and Pandas 0.25.3+. Reports are output using the base "os" module, so custom environments such as Google Colab which require custom file operations are not yet supported, although I am looking into a solution.

Using pip

The best way to install sweetviz (other than from source) is to use pip:

pip install sweetviz

Installation issues & fixes

In some rare cases, users have reported errors such as ModuleNotFoundError: No module named 'sweetviz' and AttributeError: module 'sweetviz' has no attribute 'analyze'. In those cases, we suggest the following:

Make sure none of your scripts are named sweetviz.py, as that interferes with the library itself. Delete or rename that script (and any associated .pyc files), and try again.
Try uninstalling the library using pip uninstall sweetviz, then reinstalling
The issue may stem from using multiple versions of Python, or from OS permissions. The following Stack Overflow articles have resolved many of these issues reported: Article 1, Article 2, Article 3
If all else fails, post a bug issue here on github. Thank you for taking the time, it may help resolve the issue for you and everyone else!

Basic Usage

Creating a report is a quick 2-line process:

Create a DataframeReport object using one of: analyze(), compare() or compare_intra()
Use a show_xxx() function to render the report. You can now use either html or notebook report options, as well as scaling: (more info on these options below)

Report_Show_Options

Step 1: Create the report

There are 3 main functions for creating reports:

analyze(...)
compare(...)
compare_intra(...)

Analyzing a single dataframe (and its optional target feature)

To analyze a single dataframe, simply use the analyze(...) function, then the show_html(...) function:

import sweetviz as sv

my_report = sv.analyze(my_dataframe)
my_report.show_html() # Default arguments will generate to "SWEETVIZ_REPORT.html"

When run, this will output a 1080p widescreen html app in your default browser: Widescreen demo

Optional arguments

The analyze() function can take multiple other arguments:

analyze(source: Union[pd.DataFrame, Tuple[pd.DataFrame, str]],
            target_feat: str = None,
            feat_cfg: FeatureConfig = None,
            pairwise_analysis: str = 'auto',
            verbosity: str = 'default'):

source: Either the data frame (as in the example) or a tuple containing the data frame and a name to show in the report. e.g. my_df or [my_df, "Training"]
target_feat: A string representing the name of the feature to be marked as "target". Only BOOLEAN and NUMERICAL features can be targets for now.
feat_cfg: A FeatureConfig object representing features to be skipped, or to be forced a certain type in the analysis. The arguments can either be a single string or list of strings. Parameters are skip, force_cat, force_num and force_text. The "force_" arguments override the built-in type detection. They can be constructed as follows:

feature_config = sv.FeatureConfig(skip="PassengerId", force_text=["Age"])

verbosity: [NEW] Can be set to full, progress_only (to only display the progress bar but not report generation messages) and off (fully quiet, except for errors or warnings). Default verbosity can also be set in the INI override, under the "General" heading (see "The Config file" section below for details).
pairwise_analysis: Correlations and other associations can take quadratic time (n^2) to complete. The default setting ("auto") will run without warning until a data set contains "association_auto_threshold" features. Past that threshold, you need to explicitly pass the parameter pairwise_analysis="on" (or ="off") since processing that many features would take a long time. This parameter also covers the generation of the association graphs (based on Drazen Zaric's concept):

Pairwise sample

Comparing two dataframes (e.g. Test vs Training sets)

To compare two data sets, simply use the compare() function. Its parameters are the same as analyze(), except with an inserted second parameter to cover the comparison dataframe. It is recommended to use the [dataframe, "name"] format of parameters to better differentiate between the base and compared dataframes. (e.g. [my_df, "Train"] vs my_df)

my_report = sv.compare([my_dataframe, "Training Data"], [test_df, "Test Data"], "Survived", feature_config)

Comparing two subsets of the same dataframe (e.g. Male vs Female)

Another way to get great insights is to use the comparison functionality to split your dataset into 2 sub-populations.

Support for this is built in through the compare_intra() function. This function takes a boolean series as one of the arguments, as well as an explicit "name" tuple for naming the (true, false) resulting datasets. Note that internally, this creates 2 separate dataframes to represent each resulting group. As such, it is more of a shorthand function of doing such processing manually.

my_report = sv.compare_intra(my_dataframe, my_dataframe["Sex"] == "male", ["Male", "Female"], "Survived", feature_config)

Step 2: Show the report

Once you have created your report object (e.g. my_report in the examples above), simply pass it into one of the two `show' functions:

show_html()

show_html(  filepath='SWEETVIZ_REPORT.html', 
            open_browser=True, 
            layout='widescreen', 
            scale=None)

show_html(...) will create

Sweetviz

Install / Use

README

UPDATE (November 2023) - Version 2.3.0: Verbosity parameter added, long-standing issues fixed

Examples & mentions

Features

New & notable

Upgrading

Installation

Using pip

Installation issues & fixes

Basic Usage

Step 1: Create the report

Analyzing a single dataframe (and its optional target feature)

Optional arguments

Comparing two dataframes (e.g. Test vs Training sets)

Comparing two subsets of the same dataframe (e.g. Male vs Female)

Step 2: Show the report

show_html()