FriendsDontLetFriends
Friends don't let friends make certain types of data visualization - What are they and why are they bad.
Install / Use
/learn @cxli233/FriendsDontLetFriendsREADME
Friends Don't Let Friends Make Bad Graphs
Friends don't let friends make certain types of data visualization - What are they and why are they bad.
- Author: Chenxin Li, Ph.D., Assistant Professor at Department of Plant Biology, Michigan State University.
- Contact: lichen27@msu.edu | @chenxinli2.bsky.social
This is an opinionated essay about good and bad practices in data visualization. Examples and explanations are below.
The Scripts/ directory contains .Rmd files that generate the graphics shown below.
It requires R, RStudio, and the rmarkdown package.
- R: R Download
- RStudio: RStudio Download
- rmarkdown can be installed using the install packages interface in RStudio
Table of contents
- Friends Don't Let Friends Make Bar Plots For Mean Separation
- Friends Don't Let Friends Make Violin Plots for Small Sample Sizes
- Friends Don't Let Friends Use Bidirectional Color Scales for Unidirectional Data
- Friends Don't Let Friends Make Bar Plot Meadow
- Friends Don't Let Friends Make Heatmap without Reordering Rows & Columns
- Friends Don't Let Friends Make Heatmap without Checking Outliers
- Friends Don't Let Friends Forget to Check Data Range at Each Factor Level
- Friends Don't Let Friends Make Network Graphs without Trying Different Layouts
- Friends Don't Let Friends Confuse Position and Length Based Visualizations
- Friends Don't Let Friends Make Pie Charts
- Friends Don't Let Friends Make Concentric Donuts
- Friends Don't Let Friends Use Red/green and Rainbow for Color Scales
- Friends Don't Let Friends Forget to Reorder Stacked Bar Plot
- Friends Don't Let Friends Mix Stacked Bars and Mean separation
- Friends Don't Let Friends Use Histogram for Small Sample Sizes
- Friends don't Let Friends Use Boxpot for Bimodal Data
1. Friends Don't Let Friends Make Bar Plots for Means Separation
This has to be the first one. Means separation plots are some of the most common in scientific publications. We have two or more groups, which contains multiple observations; they may have different means, variances, and distributions. The task of the visualization is to show the means and the spread (dispersion) of the data.

In this example, two groups have similar means and standard deviations, but quite different distributions. Are they really "the same"? Just don't use bar plot for means separation, or at least check a couple things before settling down on a bar plot.
It's worth mentioning that I was inspired by many researchers who have tweeted on the limitation of bar graphs. Here is a publication: Weissgerber et al., 2015, PLOS Biology.
2. Friends Don't Let Friends Make Violin Plots for Small Sample Sizes
This is quite common in the literature as well, but unfortunately, violin plots (or any sort of smoothed distribution curves) make no sense for small n.

Distributions and quartiles can vary widely with small n, even if the underlying observations are similar. Distribution and quartiles are only meaningful with large n. I did an experiment before, where I sampled the same normal distribution several times and computed the quartiles for each sample. The quartiles only stablize when n gets larger than 50.
3. Friends Don't Let Friends Use Bidirectional Color Scales for Unidirectional Data
Excuse my language, but this is a truly data visualization sin, and again quite common. I can understand why this error is common, because it appears that many of us have not spent a lot of thoughts on this issue.
Color scales are pretty, but we have to be extra careful. When color scales (or color gradients) are used to represent numerical data, the darkest and lightest colors should have special meanings. You can decide what those special meanings are: e.g., max, min, mean, zero. But they should represent something meaningful. A data visualization sin for heat maps/color gradients is when the lightest or darkest colors are some arbitrary numbers. This is as bad as the longest bar in a bar chart not being the largest value. Can you imagine that?
4. Friends Don't Let Friends Make Bar Plot Meadow
We talked about no bar charts for mean separation, but this is a different issue. It has to do with presenting results of a multi-factorial experiment. Bar plot meadows are very common in scientific publications and unfortunately also ineffective in communicating the results.

Data from: Matand et al., 2020, BMC Plant Biology
Bar plot meadows are common because multi-factorial experiments are common. However, a bar plot meadow is poorly designed for its purpose. To communicate results of a multi-factorial experiment, it requires thoughtful designs regarding grouping/faceting by factors of interest.
In this example, I focus on comparing the effect of Treatment & Explant on Response at the level of each Variety.
However, if the focus is the effect of Treatment & Variety on Response at the level of each Exaplant, then it will require a different layout.
5. Friends Don't Let Friends Make Heatmap without (Considering) Reordering Rows & Columns
Heatmaps are very common in scientific publications, and very very common in omics papers. However, for heatmaps to be effective, we have to consider the ordering of rows & columns.

In this example, I have cells as columns and features as rows. Grids are showing z scores. It is impossible to get anything useful out of the heatmap without reordering rows and columns. We can reorder rows and columns using clustering, but that is not the only way. Of course, if the rows and columns are mapping to physical entities (rows and columns of a 96-well plate), then you can't reorder them. But it is a very good idea to at least consider reordering rows and columns.
Data from: Li et al., 2022, BioRxiv
Bonus: heatmaps can be very pretty
...if you are good are reordering rows/columns and choosing color gradients. Here is an example "abstract aRt" generated from simulated data.
R code for this aRt piece can be found here.
For a tutorial on how to reorder rows and columns of a heatmap, see this markdown file.
6. Friends Don't Let Friends Make Heatmap without Checking Outliers
Outliers in heatmap can really change how we perceive and interpret the visualization. This generalizes to all sort of visualizations that use colors to represent numeric data. Let me show you an example:
