Useful redundancy — when using colors is not completely useless

The maximum data-ink ratio principle implies that one should not use colors in their graphs if the graph is understandable without the colors. The fact that you can do something, such as adding colors, doesn’t mean you should do it. I know it. I even have a dedicated tag on this blog for that. Sometimes, however, consistent use of colors serves as a useful navigation tool in a long discussion. Keep reading to learn about the justified use of colors.

Pew Research Center is a “is a nonpartisan American fact tank based in Washington, D.C. It provides information on social issues, public opinion, and demographic trends shaping the United States and the world.” Recently, I read a report prepared by the Pew Center on the religious divide in the Israeli society. This is a fascinating report. I recommend reading without any connection to data visualization.

But this post does not deal with the Isreali society but with graphs and colors.

Look at the first chart in that report. You may see a tidy pie chart with several colored segments. 

Pie chart: Religious composition of Israeli society. The chart uses several colored segments

Aha! Can’t they use a single color without losing the details? Of course the can! A monochrome pie chart would contain the same information:

Pie chart: Religious composition of Israeli society. The chart uses monochrome segments

In most of the cases, such a transformation would make a perfect sense. In most of the cases, but not in this report. This report is a multipage research document packed with many facts and analyses. The pie chart above is the first graph in that report that provides a broad overview of the Israeli society. The remaining of this report is dedicated to the relationships between and within the groups represented by the colorful segments in that pie chart. To help the reader navigating through this long report, its authors use a consistent color scheme that anchors every subsequent graph to the relevant sections of the original pie chart.

All these graphs and tables will be readable without the use of colors. Despite the fact that the colors here are redundant, this is a useful redundancy. By using the colors, the authors provided additional information layers that make the navigation within the document easier. I learned about the concept of useful redundancy from “Trees, Maps, and Theorems” by Jean-luc Dumout. If you can only read one book about data communication, it should be this book.

Do you REALLY need the colors?

Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics. Look at this example from the seaborn documentation site

>>> import seaborn as sns
>>> sns.set_style("whitegrid")
>>> tips = sns.load_dataset("tips")
>>> ax = sns.barplot(x="day", y="total_bill", data=tips)

Barplot example with colored bars

This example shows the default barplot and is the first barplot. Can you see how easy it is to add colors to the different columns? But WHY? What do those colors represent? It looks like the only information that is encoded by the color is the bar category. We already have this information in the form of bar location. Having this colorful image adds nothing but a distraction. It is sad that this is the default behavior that seaborn developers decided to adopt.

Look at the same example, without the colors

>>> ax = sns.barplot(x="day", y="total_bill", color='gray', data=tips)

Barplot example with gray bars

Isn’t it much better? The sad thing is that a better version requires memorizing additional arguments and more typing.

This was my because you can rant.