An example of a very bad graph

An example of a very bad graph

Nature Medicine is a peer-reviewed journal that belongs to the very prestigious Nature group. Today, I was reading a paper that included THIS GEM.

These two graphs are so bad. It looks as if the authors had a target to squeeze as many data visualization mistakes as possible in a single piece of graphics.

Let’s take a look at the problems.

  • Double Y axes. Don’t! Double axes are bad in 99% of cases (exceptions do exist, but they are rare).
  • Two subgraphs that are meant to work together have different category orders and different Y-axis scales. These differences make the comparison much harder.
  • Inverted Y scale in a bar chart. Wow! This is very strange. Bizarre! It took me a while to spot this. First, I tried to understand why the line of P<0.05 (the magic value of statistics) is above 0.1. Then, I realized that the right Y-axis is reversed. At first, I thought, “WTF?!” but then I understood why the authors made this decision. You see, according to the widespread statistical ritual, the lower the “P-value” is, the more significant it is considered. The value of 1 is deemed to be non-significant at all, and the value of 0 is considered “as significant as one can have.” So, in theory, the authors could have renamed the axis to “Significance” and reversed the numbers. Still, the result would not be a real “significance,” nor would the name be intuitive to anyone familiar with statistical analysis. On the other hand, they really wanted more “significant” values to be bigger than less significant ones. So, what the heck? Let’s invert the scale! Well, no, this is not a good idea
  • Slanted category labels. This might be a matter of taste, but I dislike rotated and slanted labels. Turning the graph solves the need for label rotation, thus making it more readable and having zero drawbacks.

What can be done?

I don’t like criticism without improvement suggestions. Let’s see what I would have done with this graph. To make this decision, I first need to decide what I want to show. According to my understanding of the paper, the authors wish to show that the two data sets are very different in determining a specific outcome. To show that, we don’t need to depict both the P-value and variance (mainly since these two values are very much correlated). Thus, I will depict only show one metric. I will stick with the P-value.

I will keep the category order the same between the two subgraphs. Doing so will create a “table lens” effect; it will show the individual values while demonstrating the lack of correlations between the two groups. Finally, I will convert the bars into points, primarily to reduce the data-ink ratio. Two additional arguments against bar charts, in this case, are the facts that the P-values of a statistical test cannot possibly be zero and that bar charts don’t allow log-scale, in case we’ll want to use it.

The result should look like this sketch.

By Boris Gorelik

Machine learning, data science and visualization | CTO @ wiser.me http://gorelik.net.

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: