What do we see when we look at slices of a pie chart? Angles? Areas? Arc length? The answer to this question isn’t clear and thus “experts” recommend avoiding pie charts at all.
Robert Kosara is a Senior Research Scientist at Tableau Software (you should follow his blog https://eagereyes.org), who is very active in studying pie charts. In 2016, Robert Kosara and his collaborators published a series of studies about pie charts. There is a nice post called “An Illustrated Tour of the Pie Chart Study Results” that summarizes these studies.
Last week, Robert published another paper with a pretty confident title (“Evidence for Area as the Primary Visual Cue in Pie Charts”) and a very inconclusive conclusion
While this study suggests that the charts are read by area, itis not conclusive. In particular, the possibility of pie chart usersre-projecting the chart to read them cannot be ruled out. Furtherexperiments are therefore needed to zero in on the exact mechanismby which this common chart type is read.
Kosara. “Evidence for Area as the Primary Visual Cue in Pie Charts.” OSF, 17 Oct. 2019. Web.
The previous Kosara’s studies had strong practical implications, the most important being that pie charts are not evil provided they are done correctly. However, I’m not sure what I can take from this one. As far as I understand the data, the answer to the questions in the beginning of this post are still unclear. Maybe, the “real answer” to these questions is “a combination of thereof”.
Lior Patcher is a researcher in Caltech. As many other researchers in the academy, Dr. Patcher is measured by, among other things, publications and their impact as measured by citations. In his post, Lior Patcher criticised both the current impact metrics and also their effect on citation patterns in the academic community.
PROBLEM POINTED: citations don’t really measure “actual” citations. Most of the appeared citations are “hit and run citations” i.e: people mention other people’s research without taking anything from that research.
In fact this author has cited [a certain] work in exactly the same way in several other papers which appear to be copies of each other for a total of 7 citations all of which are placed in dubious “papers”. I suppose one may call this sort of thing hit and run citation.
I think that the biggest problem with citation counts is that it costs nothing to cite a paper. When you add a research (or a post, for that matter) to your reference list, you know that most probably nobody will check whether actually read it, that nobody will check whether you got that publication correctly and that nobody will that the chances are super (SUUPER) low nobody will check whether you conclusions are right. All it takes is to click a button.
Have you experienced a vision of the person you might become, the work you could accomplish, the realized being you were meant to be? Are you a writer who doesn’t write, a painter who doesn’t paint, an entrepreneur who never starts a venture? Then you know what “Resistance” is.
As a known procrastinator, I was intrigued and started reading. In the beginning, the book was pretty promising. The first (and, I think, the biggest) part of the book is about “Resistance” — the force behind the procrastination. I immediately noticed that almost every sentence in this chapter could serve a motivational poster. For example
It’s not the writing part that’s hard. What’s hard is sitting down to write.
The danger is greatest when the finish line is in sight.
The most pernicious aspect of procrastination is that it can become a habit.
The more scared we are of a work or calling, the more sure we can be that we have to do it.
Individually, each sentence makes sense, but their concentration was a bit too much for me. The way Pressfield talks about Resistance resembles the way Jewish preachers talk about Yetzer Hara: it sits everywhere, waiting for you to fail. I’ tdon’t like this approach.
The next chapters were even harder for me to digest. Pressfield started talking about Muses, gods, prayers, and other “spiritual” stuff; I almost gave up. But I fought the Resistance and finished the book.
My main takeaways:
Resistance is real
It’s a problem
The more critical the task is, the stronger is the Resistance. OK, I kind of agree with this. Pressfield continues to something do not agree with: thus (according to the author), we can measure the importance of a task by the Resistance it creates.
Justifying not pursuing a task by commitments to the family, job, etc. is a form of Resistance.
The Pro does stuff.
The Artist is a Pro (see above) who does stuff even if nobody cares.
When Massive Online Open Courses (a.k.a MOOCs) emerged some X years ago, I was ecstatic. I was sure that MOOCs were the Big Boom of higher education. Unfortunately, the MOOC impact turned out to be very modest. This modest impact, combined with the high production cost was one of the reasons I quit making my online course after producing two or three lectures. Nevertheless, I don’t think MOOCs are dead yet. Following are some links I recently read that provide interesting insights to MOOC production and consumption.
Thinkful.com, an online platform that provides personalized training to aspiring data professionals, got in the news three weeks ago after being purchased for $80 million. Thinkful isn’t a MOOC per-se but I have a special relationship with it: a couple of years ago I was accepted as a mentor at Thinkful but couldn’t find time to actually mentor anyone.
The bottom line
We still don’t know how this future will look like and how MOOCs will interplay with the legacy education system but I’m sure the MOOCs are the future
This is another post in the series Because You Can. This time, I will claim that the fact that you can put error bars on a bar chart doesn’t mean you should.
It started with a paper by prof. Gerd Gigerenzer whose work in promoting numeracy I adore. The paper, “Natural frequencies improve Bayesian reasoning in simple and complex inference tasks” contained a simple graph that meant to convince the reader that natural frequencies lead to more accurate understanding (read the paper, it explains these terms). The error bars in the graph mean to convey uncertainty. However, the data visualization selection that Gigerenzer and his team selected is simply wrong.
First of all, look at the leftmost bar, it demonstrates so many problems with error bars in general, and in error bars in barplots in particular. Can you see how the error bar crosses the X-axis, implying that Task 1 might have resulted in negative percentage of correct inferences?
The irony is that Prof. Gigerenzer is a worldwide expert in communicating uncertainty. I read his book “Calculated risk” from cover to cover. Twice.
Why is this important?
Communicating uncertainty is super important. Take a look at this 2018 study with the self-explaining title “Uncertainty Visualization Influences how Humans Aggregate Discrepant Information.” From the paper: “Our study repeatedly presented two [GPS] sensor measurements with varying degrees of inconsistency to participants who indicated their best guess of the “true” value. We found that uncertainty information improves users’ estimates, especially if sensors differ largely in their associated variability”.
Also recall the surprise when Donald Trump won the presidential elections despite the fact that most of the polls predicted that Hillary Clinton had higher chances to win. Nobody cared about uncertainty, everyone saw the graphs!
Many years ago, I terribly overfit a model which caused losses of a lot of shekels (a LOT). It’s not that I wasn’t aware of the potential overfitting. I was. Among other things, I used several bootstrapping simulations. It turns out that I applied the bootstrapping in a wrong way. My particular problem was that I “forgot” about confounding parameters and that I “forgot” that peeping into the future is a bad thing.
Anyhow, Yanir Seroussi, my coworker data scientist, gave a very good talk on bootstrapping.
Bootstrapping the right way is a talk I gave earlier this year at the YOW! Data conference in Sydney. You can now watch the video of the talk and have a look through the slides. The content of the talk is similar to a post I published on bootstrapping pitfalls, with some additional simulations.
The main takeaways shared in the talk are:
Don’t compare single-sample confidence intervals by eye
Testing all the things typically requires writing code, which I did for the talk. You can browse through it in this notebook. The most interesting findings from my tests are summarised by the following figure.
The figure shows how the accuracy of confidence interval estimation varies by algorithm, sample size…
We create visualizations to aid viewers in making visual inferences. Different visualizations are suited to different inferences. Some visualizations offer more additional perceptual inferences over comparable visualizations. That is, the specific configuration enables additional inferences to be observed directly, without additional cognitive load. (e.g. see Gem Stapleton et al, Effective Representation of Information: Generalizing Free Rides2016).
Here’s an example from 1940, a bar chart where both bar length and width indicate data:
The length of the bar (horizontally) is the percent increase in income in each industry. Manufacturing has the biggest increase in income (18%), Contract Construction is second at 13%.
The width of the bar (vertically) is the relative size of that industry: Manufacturing is wide – it’s the biggest industry – it accounts for about 23% of all industry. Contract Construction is narrow, perhaps the third smallest industry, perhaps around 3-4%.