An example of a very bad graph

An example of a very bad graph

Nature Medicine is a peer-reviewed journal that belongs to the very prestigious Nature group. Today, I was reading a paper that included THIS GEM.

These two graphs are so bad. It looks as if the authors had a target to squeeze as many data visualization mistakes as possible in a single piece of graphics.

Let’s take a look at the problems.

  • Double Y axes. Don’t! Double axes are bad in 99% of cases (exceptions do exist, but they are rare).
  • Two subgraphs that are meant to work together have different category orders and different Y-axis scales. These differences make the comparison much harder.
  • Inverted Y scale in a bar chart. Wow! This is very strange. Bizarre! It took me a while to spot this. First, I tried to understand why the line of P<0.05 (the magic value of statistics) is above 0.1. Then, I realized that the right Y-axis is reversed. At first, I thought, “WTF?!” but then I understood why the authors made this decision. You see, according to the widespread statistical ritual, the lower the “P-value” is, the more significant it is considered. The value of 1 is deemed to be non-significant at all, and the value of 0 is considered “as significant as one can have.” So, in theory, the authors could have renamed the axis to “Significance” and reversed the numbers. Still, the result would not be a real “significance,” nor would the name be intuitive to anyone familiar with statistical analysis. On the other hand, they really wanted more “significant” values to be bigger than less significant ones. So, what the heck? Let’s invert the scale! Well, no, this is not a good idea
  • Slanted category labels. This might be a matter of taste, but I dislike rotated and slanted labels. Turning the graph solves the need for label rotation, thus making it more readable and having zero drawbacks.

What can be done?

I don’t like criticism without improvement suggestions. Let’s see what I would have done with this graph. To make this decision, I first need to decide what I want to show. According to my understanding of the paper, the authors wish to show that the two data sets are very different in determining a specific outcome. To show that, we don’t need to depict both the P-value and variance (mainly since these two values are very much correlated). Thus, I will depict only show one metric. I will stick with the P-value.

I will keep the category order the same between the two subgraphs. Doing so will create a “table lens” effect; it will show the individual values while demonstrating the lack of correlations between the two groups. Finally, I will convert the bars into points, primarily to reduce the data-ink ratio. Two additional arguments against bar charts, in this case, are the facts that the P-values of a statistical test cannot possibly be zero and that bar charts don’t allow log-scale, in case we’ll want to use it.

The result should look like this sketch.

One of the reasons I don’t like R

I never liked R. I didn’t like it for the first time I tried to learn it, I didn’t like it when I had to switch to R as my primary work tool at my previous job. And didn’t like it one and a half year later, when I was comfortable enough to add R to my CV, right before leaving my previous job.

Today, I was reminded of one feature (out of so many) that made dislike R. It’s its import (or library, as they call it in R) feature. In Python, you can import a_module and then use its components by calling a_model.a_function. Simple and predictable. In R, you have to read the docs in order to understand what will happen to your namespace after you have library(a.module) (I know, those dots grrrr) in your code. This feature is so annoying that people write modules that help them using other modules. Like in this blog post, which looks like an interesting thing to do, but … wouldn’t it be easier to use Python?

 

Data is the new

I stumbled upon a rant titled  Data is not the new oil — Tech Insights

You’ve heard it many times and so have I: “Data is the new oil” Well it isn’t. At least not yet. I don’t care how I get oil for my car or heating. I simply decide what to cook and where to drive when I want. I’m unconcerned which mechanism is used to refine oil […]

Funny, in my own rant “data is not the new gold“, I claimed that “oil” was a better analogy for data than gold. Obviously, any “X is the new Y” sentences are problematic but it’s still funny how we like them.