My colleague, Chares Earl, pointed me to this interesting 2010 post that explores different ways to visualize categories of drastically different sizes.
The post author, Tom Hopper, experiments with different ways to deal with “Data Giraffes”. Some of his experiments are really interesting (such as splitting the graph area). In one experiment, Tom Hopper draws bar chart on a log scale. Doing so is considered as a bad practice. Bar charts value (Y) axis must include meaningful zero, which log scale can’t have by its definition.
Other than that, a good read Graphing Highly Skewed Data – Tom Hopper
In the data visualization world, not starting a bar chart at zero is a “BIG NO”. Some people protest. “How come can anyone tell me how to start my bar chart? The Paper/Screen can handle anything! If I want to start a bar chart at 10, nobody can stop me!”
Data visualization is a language. Like any language, data visualization has its set of rules, grammar if you wish. Like in any other language, you are free to break any rule, but if you do so, don’t be surprised if someone underestimates you. I’m not a native English speaker. I certainly break many English grammar rules when I write or speak. However, I never argue if someone knowledgeable corrects me. If you agree that one should try respecting grammar rules of a spoken language, you have to agree to respect the grammar of any other language, including data visualization.
Natan Yau from flowingdata.com has a very informative post
that explores this exact point. Read it.
Another related discussion is called “When to use the start-at-zero rule” and is also worth reading.
Also, do remember is that the zero point has to be a meaningful one. That is why, you cannot use a bar chart to depict the weather because, unless you operate in Kelvin, the zero temperature is meaningless and changes according to the arbitrary choice the temperature scale.
Yet another thing to remember is that
It’s true that every rule has its exception. It’s just that with this particular rule, I haven’t seen a worthwhile reason to bend it yet.
(citing Natan Yau)
Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics. Look at this example from the seaborn documentation site
>>> import seaborn as sns
>>> tips = sns.load_dataset("tips")
>>> ax = sns.barplot(x="day", y="total_bill", data=tips)
This example shows the default barplot and is the first barplot. Can you see how easy it is to add colors to the different columns? But WHY? What do those colors represent? It looks like the only information that is encoded by the color is the bar category. We already have this information in the form of bar location. Having this colorful image adds nothing but a distraction. It is sad that this is the default behavior that seaborn developers decided to adopt.
Look at the same example, without the colors
>>> ax = sns.barplot(x="day", y="total_bill", color='gray', data=tips)
Isn’t it much better? The sad thing is that a better version requires memorizing additional arguments and more typing.
This was my because you can rant.