Tag: dataviz
-
Dual axis with shared x-axis — a much better way
I recently came across a financial update with a plot that looks like the one below: a bar plot of revenue with a line plot of growth rate. The growth rate is on a secondary y-axis, and the x-axis is shared between the two plots. This type of plots are very common in financial reports, and they are often used to show the relationship between two variables that have different scales. However, they are also often criticized for being misleading, as the two variables are not directly comparable. In this post, I will show a better way to visualize this type of data. But first, what’s wrong with the plot below?
-
An example of a very bad graph
An example of a very bad graph
-
On proper selection of colors in graphs
How do you properly select a colormap for a graph? What makes the rainbow color map a wrong choice, and what are the proper alternatives?
-
Another example of the power of data visualization
I stumbled upon a great graph that tells a complex story compellingly.
-
Super useful videos for advanced data visualizers
The great Robert Kosara, also known as the “eager eyes” has started publishing a series of videos he calls Chart Appreciation. In these videos, Robert takes a piece of data visualization from a reputable and known source, and discusses why this particular piece is so good, what decisions were made that made it possible, what alternatives are, and more. If you consider yourself an intermediate or advanced practitioner of data visualization, you should subscribe. Here’s one example.
-
Basic data visualization video course (in Hebrew)
I had the honor to record an introductory data visualization course for high school students as a part of the Israeli national distance learning project. The course is in Hebrew, and since it targets high schoolers, it does not require any prior knowledge.
-
Text Visualization Browser
I’ve stumbled upon an exciting project – text visualization browser. It’s a web page that allows one to search for different text visualization techniques using keywords and publication time.
-
The information is beautiful. The graphs are shit!
I apologize for my harsh language, but recently I was exposed to a bunch of graphs on the “information is beautiful” site, and I was offended (well, ot really, but let’s pretend I was). I mean, I’m a liberal person, and I don’t care what graphs people do in their own time. Many people visit that site because they try to learn good visualization practices, but some charts on that site are wrong. Very wrong.
-
Exploring alternatives to population pyramids
A population pyramid also called an “age-gender-pyramid”, is a graphical illustration that shows the distribution of various age groups in a population (typically that of a country or region of the world), which forms the shape of a pyramid when the population is growing [citation from Wikipedia].
-
ASCII histograms are quick, easy to use and to implement
From time to time, we need to look at a distribution of a group of values. Histograms are, I think, the most popular way to visualize distributions. “Back in the old days,” when most of my work was done in the console, and when creating a plot from Python was required too many boilerplate code lines, I found a neat function that produced histograms using ASCII characters.
-
What is the biggest problem of the Jet and Rainbow color maps, and why is it not as evil as I thought?
There was a consensus among the data visualization purists that the rainbow color map, and it’s close cousin Jet are bad. Really bad. These colormaps used to be popular at the beginning of the computational data visualization era. However, their popularity decreased in the last five years or so. The sentiment isn’t as bad as it used to be a couple of years ago, but still.
-
Before and after. Even excellent graphs can be improved
Being a data visualization consultant, I can’t help looking for dataviz problems in graphs that I see. Even if the graph is good. Even if I know that I would not be able to create a graph that good. Even if the overall graph is excellent, and the problems are minor, or maybe especially when the graph is excellent, and the problems are minor.
-
Data visualization is not only dots, bars, and pies
Look at this wonderful piece of data visualization (taken from here). If you know the terms “tertiary structure” and “glycan”, there is NO way you miss the message that the author of this figure wanted to convey.
-
Lie factor in ad graphs
It’s fun to look at the visit statistics and to discover old stories. I wrote this post in 2016. For a reason I don’t know, this post has been one of the most viewed posts in my blogs during the last week.
-
Nice but useless data visualization
Network visualization can mesmerize and hypnotize. Chord diagrams are especially cool because they are so colorful and smooth. The problem is that sometimes, the result doesn’t provide any actual value, and serves as a cute illustration. Cute illustrations are cute; they help put some “easiness” to the text without the risk of looking too unprofessional.
-
Logarithmic scale misinforms. Period
-
Visualising Odds Ratio — Henry Lau
Besides being a freelancer data scientist and visualization expert, I teach. One of the toughest concepts to teach and to visualize is odds ratio. Today, I stumbled upon a very interesting post that deals exactly with that
-
Bad advice from a reputable source is bad advice.
Would you buy a grammar book with a clear spelling mistake on its cover? I hope not. That’s what happened to IBM when it published it’s new data visualization guide. I didn’t bother reading the manual because of what IBM decided to use as the first image of their guide.
-
The quintessence of data visualization usefulness
I have to admit, I was skeptical at the beginning of the COVID-19 crisis. I started becoming skeptical now when it seems that the crisis didn’t hit my country too hard. But then I saw the graphs in this Financial Times article, and the skepticism disapeared. The graphs are accompanied by hundreds of words, but there is no need for reading the text to understand almost everything.
-
If there is only one document you can read about data visualization, this is the one
I’m sorting my teaching material, and I found this gem. The UK Government Statistical Service published a guideline for effective data visualization and tables. If you know a busy person who doesn’t have time to study data visualization and can only read one document, this document is for them (it has less than 40 pages full of examples). Click o the image above to go to the guideline
-
An interesting solution of the data giraffe problem
A data giraffe is a situation where a very prominent data point shades everything else. I learned this term from a post by Pini Yakuel and immediately liked it a lot.
-
Corona virus vs flu, visualized
-
Three most common mistakes in data visualization
People ask me for good intro video to data visualization. I tend to ask them to look for one of my lectures. To save the search, here’s one of the most relevant talks that I gave
-
ASCII histograms are quick, easy to use and to implement
-
Data visualization as an engineering task - a methodological approach towards creating effective data visualization
In June 2019, I attended the NDR AI conference in Iași, Romania where I also gave a talk. Recently, the organizers uploaded the video recording to YouTube.
-
Sometimes, you don't really need a legend
This is another “because you can” rant, where I claim that the fact that you can do something doesn’t mean that you necessarily need to.
-
What do we see when we look at slices of a pie chart?
What do we see when we look at slices of a pie chart? Angles? Areas? Arc length? The answer to this question isn’t clear and thus “experts” recommend avoiding pie charts at all.
-
Data visualization with statistical reasoning: seeing uncertainty with the bootstrap — Dataviz - Stats - Bayes
On Sunday, I wrote about bootstrapping. On Monday, I wrote about visualization uncertainty. Let’s now talk about bootstrapping and uncertainty visualization.
-
Error bars in bar charts. You probably shouldn't
This is another post in the series Because You Can. This time, I will claim that the fact that you can put error bars on a bar chart doesn’t mean you should.
-
Visualizations with perceptual free-rides
Dr. Richard Brath is a data visualization expert who also blogs from time to time. Each post in Richard’s blog provides a deep, and often unexpected to me, insight into one dataviz aspect or another.
-
Pseudochart. It's like a pseudocode but for charts
Pseudocode is an informal high-level description of the operating principle of a computer program or other algorithm. People write pseudocode to isolate the “bigger picture” of an algorithm. Pseudocode doesn’t care about the particular implementation details that are secondary to the problem, such as memory management, dealing with different encoding, etc. Writing out the pseudocode version of a function is frequently the first step in planning the implementation of complex logic.
-
My slide deck from the NDR conference in Iași
I have published the slide deck from my talk at the NDR conference in Iași, Romania.
-
כוון הציר האפקי במסמכים הנכתבים מימין לשמאל
אני מחפש דוגמאות נוספות
-
X-axis direction in Right-To-Left languages (part two)
I need more examples
-
Useful redundancy — when using colors is not completely useless
The maximum data-ink ratio principle implies that one should not use colors in their graphs if the graph is understandable without the colors. The fact that you can do something, such as adding colors, doesn’t mean you should do it. I know it. I even have a dedicated tag on this blog for that. Sometimes, however, consistent use of colors serves as a useful navigation tool in a long discussion. Keep reading to learn about the justified use of colors.
-
Microtext Line Charts
Why adding text labels to graph lines, when you can build graph lines using text labels? On microtext lines
-
איך אומרים דאטה ויזואליזיישן בעברית?
This post is written in Hebrew about a Hebrew issue. I won’t translate it to English.
-
Data visualization in right-to-left languages
If you speak Arabic or Farsi, I need your help. If you don’t speak, share this post with someone who does.
-
Conference Recap: EuroSciPy 2018 — Data for Breakfast
See my recap of the recent EuroSciPy, published on https://data.blog
-
Graphing Highly Skewed Data – Tom Hopper
My colleague, Chares Earl, pointed me to this interesting 2010 post that explores different ways to visualize categories of drastically different sizes.
-
Sometimes, less is better than more
Today, during the EuroSciPy conference, I gave a presentation titled “Three most common mistakes in data visualization and how to avoid them”. The title of this presentation is identical to the title of the presentation that I gave in Barcelona earlier this year. The original presentation was approximately one and a half hours long. I knew that EuroSciPy presentations were expected to be shorter, so I was prepared to shorten my talk to half an hour. At some point, a couple of days before departing to Trento, I realized that I was only allocated 15 minutes. Fifteen minutes! Instead of ninety.
-
An even better data visualization workshop
-
Meet me at EuroSciPy 2018
I am excited to run a data visualization tutorial, and to give a data visualization talk during the 2018 EuroSciPy meeting in Trento, Italy.
-
Value-Suppressing Uncertainty Palettes – UW Interactive Data Lab – Medium
Uncertainty is one of the most neglected aspects of number-based communication and one of the most important concepts in general numeracy. Comprehending uncertainty is hard. Visualizing it is, apparently, even harder.
-
Evolution of a complex graph. Part 1. What do you want to say?
From time to time, people ask me for help with non-trivial data visualization tasks. A couple of weeks ago, a friend-of-a-friend-of-a-friend showed me a set of graphs with the following note:
-
C for Conclusion
From time to time, I give a lecture about most common mistakes in data visualization. In this lection, I say that not adding a graph’s conclusion as a title is an opportunity wasted
-
Meaningless slopes
That fact that you can doesn’t mean that you should! I will say it once again.That fact that you can doesn’t mean that you should! Look at this slopegraph that was featured by “Information is Beautiful”
-
In defense of three-dimensional graphs
“There is only one thing worse than a pie chart. It’s a 3-D pie chart”. This is what I used to think for quite a long time. Recently, I have revised my attitude towards pie charts, mainly due to the works of Rober Kosara from Tableau. I am no so convinced that pie charts can be a good visualization choice, I even included a session “Pie charts as an alternative to bar charts” in my recent workshop.
-
Today's workshop material
Today, I hosted a data visualization workshop, as a part of the workshop day adjacent to the fourth Israeli Data Science Summit. I really enjoyed this workshop, especially the follow-up questions. These questions are the reason I volunteer talking about data visualization every time I can. It may sound strange, but I learn a lot from the questions people ask me.
-
I will host a data visualization workshop at Israel’s biggest data science event
TL/DR
-
Three most common mistakes in data visualization and how to avoid them. Now, the slides
Yesterday, I talked in front of the Barcelona Data Science and Machine Learning Meetup about the most common mistakes in data visualization. I enjoyed talking with the local community very much. Judging by the feedback I received during and after the talk, they too, enjoyed my presentation. I uploaded my slides to Slideshare.
-
Live in Barcelona. Three most common mistakes in data visualization.
On Thursday, March 20, I will give a talk titled “Three most common mistakes in data visualization and how to avoid them.” I will be a guest of the Barcelona Data Science and Machine Learning Meetup Group. Right now, less than twenty-four hours after the lecture announcement, there are already seventeen people on the waiting list. I feel a lot of responsibility and am very excited.
-
ASCII histograms are quick, easy to use and implement
-
The case of meaningless comparison
Exposé, an Australian-based data analytics company, published a use case in which they analyze the benefits of a custom-made machine learning solution. The only piece of data in their report [PDF] was a graph which shows the observed and the predicted
-
Lie factor in ad graphs
What do you do when you have spare time? I tend to throw graphs from ads to a graph digitizer to compute the “lie factor”. Take the following graph for example. It appeared in an online ad campaign a couple of years ago. In this campaign, one of the four Israeli health care providers bragged about the short waiting times in their phone customer support. According to the Meuheded (the health care provider who run the campaign), their customers had to wait for one minute and one second, compared to 1:03, 1:35, and 2:39 in the cases of the competitors. Look how dramatic the difference is:
-
Does chart junk really damage the readability of your graph?
-
Why bar charts should always start at zero?
In the data visualization world, not starting a bar chart at zero is a “BIG NO”. Some people protest. “How come can anyone tell me how to start my bar chart? The Paper/Screen can handle anything! If I want to start a bar chart at 10, nobody can stop me!”
-
Gender salary gap in the Israeli high-tech — now the code
Several people have asked me about the technology I used to create the graphs in my recent post about the gender salary gap in the Israeli high-tech. Like 99% of the graphs I create, I used matplotlib. I have uploaded the notebook that I used for that post to Github. Here’s the link. The published version uses seaborn style settings. The original one uses a slightly customized style.
-
In defense of double-scale and double Y axes
If you had a chance to talk to me about data visualization, you know that I dislike the use of double Y-axis for anything except for presenting different units of the same measurement (for example inches and meters). Of course, I’m far from being a special case. Double axis ban is a standard stand among all the people in the field of data visualization education. Nevertheless, double-scale axes (mostly Y-axis) are commonly used both in popular and technical publications. One of my data visualization students in the Azrieli College of Engineering of Jerusalem told me that he continually uses double Y scales when he designs dashboards that are displayed on a tiny screen in a piece of sophisticated hardware. He claimed that it was impossible to split the data into separate graphs, due to space constraints, and that the engineers that consume those charts are professional enough to overcome the shortcomings of the double scales. I couldn’t find any counter-argument.
-
Don't take career advises from people who mistreat graphs this badly
Recently, I stumbled upon a report called “Understanding Today’s Chief Data Scientist” published by an HR company called Heidrick & Struggles. This document tries to draw a profile of the modern chief data scientist in today’s Big Data Era. This document contains the ugliest pieces of data visualization I have seen in my life. I can’t think of a more insulting graphical treatment of data. Publishing graph like these ones in a document that tries to discuss careers in data science is like writing a profile of a Pope candidate while accompanying it with pornographic pictures.
-
What's the most important thing about communicating uncertainty?
Sigrid Keydana, in her post Plus/minus what? Let’s talk about uncertainty (talk) — recurrent null, said
-
The Y-axis doesn't have to be on the left
Line charts are great to convey the evolution of a variable over the time. This is a typical chart. It has three key components, the X-axis that represents the time, the Y-axis that represents the tracked value, and the line itself.
-
How to make a graph less readable? Rotate the text labels
This is my “because you can” rant.
-
Good information + bad visualization = BAD
I went through my Machine Learning tag feed. Suddenly, I stumbled upon a pie chart that looked so terrible, I was sure the post would be about bad practices in data visualization. I was wrong. The chart was there to convey some information. The problem is that it is bad in so many ways. It is very hard to appreciate the information in a post that shows charts like that. Especially when the post talks about data science that relies so much on data visualization.
-
Do you REALLY need the colors?
Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics. Look at this example from the seaborn documentation site
-
When scatterplots are better than bar charts, and why?
From time to time, you might hear that graphical method A is better at representing problem X than method B. While in case of problem Z, the method B is much better than A, but C is also a possibility. Did you ever ask yourselves (or the people who tell you that) “Says WHO?”
-
Because you can — a new series of data visualization rants
Here’s an old joke:
-
1461
I teach data visualization to in Azrieli College of Engineering in Jerusalem. Yesterday, during my first lesson, I was talking about the different ways a chart design selection can lead to different conclusions, despite not affecting the actual data. One of the students hypothesized that the preception of a figure can change as a function of other graphs shown together. Which was exactly tested in a research I recently mentioned here. I felt very proud of that student, despite only meeting them one hour before that.
-
Who doesn't like some merciless critique of others' work?
Stephen Few is the author of (among others) “Show Me The Numbers”. Besides writing about what should **be done, in the field of data visualization, Dr. Few also writes a lot about what should **not be done. He does that in a sharp, merciless way which makes it very interesting reading (although, sometimes Dr. Few can be too harsh). This time, it was the turn of the Tableau blog team to be at the center of Stephen Few’s attention, and not for the good reason.
-
Can the order in which graphs are shown change people's conclusions?
When I teach data visualization, I love showing my students how simple changes in the way one visualizes his or her data may drive the potential audience to different conclusions. When done correctly, such changes can help the presenters making their point. They also can be used to mislead the audience. I keep reminding the students that it is up to them to keep their visualizations honest and fair. In his recent post, Robert Kosara, the owner of https://eagereyes.org/, mentioned another possible way that may change the perceived conclusion. This time, not by changing a graph but by changing the order of graphs exposed to a person. Citing Robert Kosara:
-
Before and after — the Hebrew holiday season chart
Sometimes, when I see a graph, I think “I could draw a better version.” From time to time, I even consider writing a blog post with the “before” and “after” versions of the plot. Last time I had this desire was when I read the repost of my own post about the crazy month of Hebrew holidays. I created this graph three years ago. Since then, I have learned A LOT. So I thought it would be a good opportunity to apply my over-criticism to my own work. This is the “before” version:
-
On data beauty and communication style
There’s an interesting mini-drama going on in the data visualization world. The moderators of DataIsBeautiful invited Stephen Few for an ask-me-anything (AMA) session. Stephen Few is a data visualization researcher and an opinionated blogger. I use his book “Show Me the Numbers” when I teach data visualization. Both in his book and even more so, on his blog, Dr. Few is not afraid of criticizing practices that fail to meet his standards of quality. That is why I wasn’t surprised when I read Stephen Few’s public response to the AMA invitation:
-
A successful failure
Almost half a year ago, I decided to create an online data visualization course. After investing hundreds of hours, I managed to release the first lecture and record another one. However, I decided not to publish new lectures and to remove the existing one from the net. Why? The short answer is a huge cost-to-benefit ratio. For a longer answer, you will have to keep reading this post.
-
Correction about the course start date
The first lecture of the data visualization course will be published on July 7 (7/7/17). There was a typo in the original announcement.
-
I have created an online data visualization course
-
Chart legends and the Muttonchops
Adding legends to a graph is easy. With matplotlib, for example, you simply call
plt.legend()
and voilà, you have your legends. The fact that any major or minor visualization platform makes it super easy to add a legend doesn’t mean that it should be added. At least, not in graphs that are supposed to be shared with the public. -
Evolution of a Plot: Better Data Visualization, One Step at a Time
My latest post on data.blog
-
Comparing the incomparable — expanding the time map concept
We are surrounded by discrete events: posts and comments, purchases from the online store, page visits are all examples of discrete events. Some of these events happen periodically; others happen sporadically. Some happen once in a while; others are generated every microsecond. There are many ways to visualise such streams, the most naive one being a one-dimensional plot in which events are placed on a time axis, as demonstrated in the figure below for three simple cases.