On the importance of perspective

Stalin was a relatively short man, his height was 1.65 m. Khrushchev was even shorter, his height was 1.60. It seems that the difference wasn’t enough for the official Soviet propaganda of that time. Take a look at this photo. We can clearly see that Stalin is taller than Khrushchev.


Do you notice something strange? Take a look at the windows in the background. I added horizontal and vertical guides for your convenience.

Screen Shot 2018-11-05 at 8.38.08

Now, look what happens when we fix the horizontal and vertical lines

Screen Shot 2018-11-05 at 8.39.03

Now, Khrushchev is still shorter than Stalin but not by that much.

איך אומרים דאטה ויזואליזיישן בעברית?

This post is written in Hebrew about a Hebrew issue. I won’t translate it to English.

אני מלמד data visualization בשתי מכללות בישראלבמכללת עזריאלי להנדסה בירושלים ובמכון הטכנולוגי בחולון. כשכתבתי את הסילבוס הראשון שלי הייתי צריך למצוא מונח ל־data visualization וכתבתיהדמיית נתונים״ אומנם זה הזכיר לי קצת תהליך של סימולציה, אבל האופציה האחרת ששקלתי היתה ״דימות״ וידעתי שהיא שמורה ל־imaging, דהיינו תהליך של יצירת דמות או צורה של עצם, בעיקר בעולם הרפואה.

הבנתי שהמונח בעייתי בשיעור הראשון שהעברתי. מסתברששניים מארבעת הסטודנטים שהגיעו לשיעור חשבו שקורס ״הדמיית נתונים בתהליך מחקר ופיתוח״ מדבר על סימולציות.

מתישהו שמעתי מחבר של חבר שהמונח הנכון ל־visualization זה הדמאה, אבל זה נשמע לי פלצני מדי, אז השארתי את ה־״הדמיה״ בשם הקורס והוספתי “data visualization” בסוגריים.

היום, שלוש שנים אחרי ההרצאה הראשונה שהעברתי, ויומיים לפני פתיחת הסמסטר הבא, החלטתי לגגל (יש מילה כזאת? יש!) את התשובה. ומה מסתבר? עלון ״למד לשונך״ מס׳ 109 של האקדמיה ללשון עברית שיצא לאור בשנת 2015 קובע שהמונח ל־visualization הוא הַחְזָיָה. לא יודע מה אתכם, אבל אני לא משתגע על החזיה. עוד משהו שאני לא משתגע עליו הוא שבתור הדוגמא להחזיה, האקדמיה החלטיה לשים תרשים עוגה עם כל כך הרבה שגיאות!

Screen Shot 2018-10-23 at 20.35.52

נראה לי שאני אשאר עם הדמיה. ויקימילון מרשה לי.

נ.ב. שמתם לב שפוסט זה השתמשתי במקף עברי? אני מאוד אוהב את המקף העברי.


Innumeracy is “inability to deal comfortably with the fundamental notions of number and chance”.
I which there was a better term for “innumeracy”, a term that would reflect the importance of analyzing risks, uncertainty, and chance. Unfortunately, I can’t find such a term. Nevertheless, the problem is huge. In this long post, Tom Breur reviews many important aspects of “numeracy”.

Data, Analytics and beyond

Tom Breur

21 October 2018

It has long been known that the general public is sometimes remarkably out of tune with math and numbers. In 1988 mathematician John Allan Paulos wrote a classic “Innumeracy” that is chockful of striking examples of misinterpretation of numeric evidence. Paulos refers to innumeracy as “… inability to deal comfortably with the fundamental notions of number and chance …” Personally, I consider it the mathematical equivalent to illiteracy. Another classic from Paulos is “A Mathematician Reads the Newspaper” (1995) which contains a lot of satire, debunking ridiculous claims in the press. It highlights more spectacular examples of innumeracy.

Paulos illustrates innumeracy with lighthearted anecdotes and many common, everyday scenarios. These examples highlight how readers might be fooled by misleading quantitative evidence. His examples span diverse topics like probability and coincidence, misguessing extremely small or very large numbers, pseudoscience and superstition…

View original post 1,450 more words

Working Remotely and the Virtue of Aggressive Transparency

Excellent post by my colleague Simon Ouderkirk on working in a distributed company. It’s a three-year-old post. I wonder how I missed it.

Simon Ouderkirk


One of the things that it has taken me quite a long time to figure out, when it comes to this remote work gig, is this idea I’ve taken to calling aggressive transparency.

I’ve been chewing on this idea quite a lot, and in chatting with my team and other folks whose opinions I respect, I think I’m starting to feel like it’s something I should articulate in greater detail.

View original post 1,077 more words

Data visualization in right-to-left languages

Line chart that uses Arabic text and numerals

If you speak Arabic or Farsi, I need your help. If you don’t speak, share this post with someone who does.

Right-to-left (RTL) languages such as Hebrew, Arabic, and Farsi are used by roughly 1.8 billion people around the world. Many of them consume data in their native languages. Nevertheless, I have never seen any research or study that explores data visualization in RTL languages. Until a couple of days ago, when I saw this interesting observation by Nick Doiron “Charts when you read right-to-left“.

I teach data visualization in Israeli colleges. Whenever a student asks me RTL-related questions, I always answer something like “it’s complicated, let’s not deal with that”. Moreover, in the assignments, I even allow my students to submit graphs in English, even if they write the report in Hebrew.

Nick’s post made me wonder about data visualization do’s and don’ts in RTL environments. Should Hebrew charts differ from Arabic or Farsi? What are the accepted practices?

If you speak Arabic or Farsi, I need your help. If you don’t speak, share this post with someone who does. I want to collect as many examples of data visualization in RTL languages. Links to research articles are more than welcome. You can leave your comments here or send them to boris@gorelik.net.

Thank you.


The image at the top of this post is a modified version of a graph that appears in the post that I cite. Unfortunately, I wasn’t able to find the original publication.

Can error correction cause more error? (The answer is yes)

This is an interesting thought experiment. Suppose that you have some appliance that acts in a normally distributed way. For example, a nerf gun. Let’s say now that you aim and fire the gun. What happens if you miss by some amount of X? Should you correct your aim in the opposite direction? My intuition says “yes.” So does the intuition of many other people with whom I talked about this problem. However, when we start thinking about this problem, we realize that the intuition is wrong. Since we aim the gun, our assumption should be that the deviation is zero. A single observation is not sufficient to reject this assumption. By continually adjusting the data generating process based on a single observation, we reduce the precision (increase the dispersion).
Below is a simulation of adjusted and non-adjusted processes (the code is here). The broader spread of the adjusted data (blue line) is evident.

Two curves. Blues: high dispersion of values when adjustments are performed after every observation. Orange: smaller dispersion when no adjustments are done.

Due to the nature of the normal random variable, a single large accidental deviation can cause an extreme “correction,” which in turn will create a prolonged period of highly inaccurate points. This is precisely what you see in my simulation.
The moral of this simple experiment is that you shouldn’t let a single affect your actions.


“Any questions?” How to fight the awkward silence at the end of a presentation?


If you ever gave or attended a presentation, you are familiar with this situation: the presenter asks whether there are any questions and … nobody asks anything. This is an awkward situation. Why aren’t there any questions? Is it because everything is clear? Not likely. Everything is never clear. Is it because nobody cares? Well, maybe. There are certainly many people that don’t care. It’s a fact of life. Study your audience, work hard to make the presentation relevant and exciting but still, some people won’t care. Deal with it.

However, the bigger reasons for lack of the questions are human laziness and the fear of being stupid. Nobody likes asking a question that someone will perceive as a stupid one. Sometimes, some people don’t mind asking a question but are embarrassed and prefer not being the first one to break the silence.

What can you do? Usually, I prepare one or two questions by myself. In this case, if nobody asks anything, I say something like “Some people, when they see these results ask me whether it is possible to scale this method to larger sets.”. Then, depending on how confident you are, you may provide the answer or ask “What do you think?”.

You can even prepare a slide that answers your question. In the screenshot below, you may see the slide deck of the presentation I gave in Trento. The blue slide at the end of the deck is the final slide, where I thank the audience for the attention and ask whether there are any questions.

My plan was that if nobody asks me anything, I would say “Thank you again. If you want to learn more practical advises about data visualization, watch the recording of my tutorial, where I present this method  <SLIDE TRANSFER, show the mockup of the “book”>. Also, many people ask me about reading suggestions, this is what I suggest you read: <SLIDE TRANSFER, show the reading pointers>

Screen Shot 2018-09-17 at 10.10.21

Luckily for me, there were questions after my talk. Luckily, one of these questions was about practical advice so I had a perfect excuse to show the next, pre-prepared, slide. Watch this moment on YouTube here.