Data visualization with statistical reasoning: seeing uncertainty with the bootstrap — Dataviz – Stats – Bayes

On Sunday, I wrote about bootstrapping. On Monday, I wrote about visualization uncertainty. Let’s now talk about bootstrapping and uncertainty visualization.

Robert Grant is a data visualization expert who wrote a book about interactive data visualization (which I should read, BTW).

Robert runs an interesting blog from which I learned another approach to uncertainty visualization, bootstrapping.

Source: Robert Grant.

Read the entire post: Data visualization with statistical reasoning: seeing uncertainty with the bootstrap — Dataviz – Stats – Bayes

Error bars in bar charts. You probably shouldn’t

This is another post in the series Because You Can. This time, I will claim that the fact that you can put error bars on a bar chart doesn’t mean you should.

It started with a paper by prof. Gerd Gigerenzer whose work in promoting numeracy I adore. The paper, “Natural frequencies improve Bayesian reasoning in simple and complex inference tasks” contained a simple graph that meant to convince the reader that natural frequencies lead to more accurate understanding (read the paper, it explains these terms). The error bars in the graph mean to convey uncertainty. However, the data visualization selection that Gigerenzer and his team selected is simply wrong.

First of all, look at the leftmost bar, it demonstrates so many problems with error bars in general, and in error bars in barplots in particular. Can you see how the error bar crosses the X-axis, implying that Task 1 might have resulted in negative percentage of correct inferences?

The irony is that Prof. Gigerenzer is a worldwide expert in communicating uncertainty. I read his book “Calculated risk” from cover to cover. Twice.

Why is this important?

Communicating uncertainty is super important. Take a look at this 2018 study with the self-explaining title “Uncertainty Visualization Influences how Humans Aggregate Discrepant Information.” From the paper: “Our study repeatedly presented two [GPS] sensor measurements with varying degrees of inconsistency to participants who indicated their best guess of the “true” value. We found that uncertainty information improves users’ estimates, especially if sensors differ largely in their associated variability”.

Image result for clinton trump polls
Source HuffPost

Also recall the surprise when Donald Trump won the presidential elections despite the fact that most of the polls predicted that Hillary Clinton had higher chances to win. Nobody cared about uncertainty, everyone saw the graphs!

Why not error bars?

Keep in mind that error bars are considered harmful, and I have a reference to support this claim. But why?

First of all, error bars tend to be symmetric (although they don’t have to) which might lead to the situation that we saw in the first example above: implying illegal values.

Secondly, error bars are “rigid”, implying that there is a certain hard threshold. Sometimes the threshold indeed exists, for example a threshold of H0 rejection. But most of the time, it doesn’t.

stacked round gold-colored coins on white surface

More specifically to bar plots, error lines break the bar analogy and are hard to read. First, let me explain the “bar analogy” part.

The thing with bar charts is that they are meant to represent physical bars. A physical bar doesn’t have soft edges and adding error lines simply breaks the visual analogy.

Another problem is that the upper part of the error line is more visible to the eye than the lower one, the one that is seen inside the physical bar. See?undefined

But that’s not all. The width of the error bars separates the error lines and makes the comparison even harder. Compare the readability of error lines in the two examples below

The proximity of the error lines in the second example (take from this site) makes the comparison easier.

Are there better alternatives?

Yes. First, I recommend reading the “Error bars considered harmful” paper that I already mentioned above. It not only explains why, but also surveys several alternatives

Nathan Yau from flowingdata.com had an extensive post about different ways to visualize uncertainty. He reviewed ranges, shades, rectangles, spaghetti charts and more.

Claus Wilke’s book “Fundamentals of Data Visualization” has a dedicated chapter to uncertainty with and even more detailed review [link].

Visualize uncertainty about the future” is a Science article that deals specifically with forecasts

Robert Kosara from Tableu experimented with visualizing uncertainty in parallel coordinates.

There are many more examples and experiments, but I think that I will stop right now.

The bottom line

Communicating uncertainty is important.

Know your tools.

Try avoiding error bars.

Bars and bars don’t combine well, therefore, try harder avoiding error bars in bar charts.

Value-Suppressing Uncertainty Palettes – UW Interactive Data Lab – Medium

Value-Suppressing Uncertainty Palette

Uncertainty is one of the most neglected aspects of number-based communication and one of the most important concepts in general numeracy. Comprehending uncertainty is hard. Visualizing it is, apparently, even harder.

Last week I read a paper called Value-Suppressing Uncertainty Palettes, by M.Correll, D. Moritz, and J. Heer from the Data visualization and interactive analysis research at the University of Washington. This paper describes an interesting approach to color-encoding uncertainty.

Value-Suppressing Uncertainty Palette

Uncertainty visualization is commonly done by reducing color saturation and opacity.  Cornell et al suggest combining saturation reduction with limiting the number of possible colors in a color palette. Unfortunately, there the authors used Javascript and not python for this paper, which means that in the future, I might try implementing it in python.

Two figures visualizing poll data over the USA map, using different approaches to visualize uncertainty

 

Visualizing uncertainty is one of the most challenging tasks in data visualization. Uncertain

 

via Value-Suppressing Uncertainty Palettes – UW Interactive Data Lab – Medium

What’s the most important thing about communicating uncertainty?

Screenshot: many images for the 2016 US elections

Sigrid Keydana, in her post Plus/minus what? Let’s talk about uncertainty (talk) — recurrent null, said

What’s the most important thing about communicating uncertainty? You’re doing it

Really?

Here, for example, a graph from a blog post

Thousands of randomly looking points. From https://myscholarlygoop.wordpress.com/2017/11/20/the-all-encompassing-figure/

The graph clearly “communicates” the uncertainty but does it really convey it? Would you consider the lines and their corresponding confidence intervals very uncertain had you not seen the points?

What if I tell you that there’s a 30% Chance of Rain Tomorrow? Will you know what it means? Will a person who doesn’t operate on numbers know what it means? The answer, to both these questions, is “no”, as is shown by Gigerenzer and his collaborators in a 2005 paper.

Screenshot: many images for the 2016 US elections

Communicating uncertainty is not a new problem. Until recently, the biggest “clients” of uncertainty communication research were the weather forecasters.  However, the recent “data era” introduced uncertainty to every aspect of our personal and professional lives. From credit risk to insurance premiums, from user classification to content recommendation, the uncertainty is everywhere. Simply “doing” uncertainty communication, as Sigrid Keydana from the Recurrent Null blog suggested isn’t enough. The huge public surprise caused by the 2016 US presidential election is the best evidence for that. Proper uncertainty communication is a complex topic. A good starting point to this complex topic is a paper Visualizing Uncertainty About the Future by David Spiegelhalter.