Anything is better when bootstrapped. Read my co-worker’s post on bootstrapping. Also make sure following the links Yanir gives to support his claims
-
Hackers beware: Bootstrap sampling may be harmful
January 15, 2019 -
I have 101 followers!
January 14, 2019Yesterday, the follower list of my blog exceeded one hundred followers! Even though I know that some of these followers are bots, this number makes me happy! Thank you all (humans and bots) for clicking the “follow” button.
-
A Brand Image Analysis of WordPress and Automattic on Twitter
January 13, 2019My coworker analyzed Twitter social network around Automattic, WordPress, and other related projects.
-
Against A/B tests
December 12, 2018Traditional A/B testsing rests on a fundamentally flawed premise. Most of the time, version A will be better for some subgroups, and version B will be better for others. Choosing either A or B is inherentlyinferior to choosing a targeted mix of A and B.
Michael Kaminsky locallyoptimistic.com
The quote above is from a post by Michael Kaminsky “Against A/B tests”. I’m still not fully convinced by Michael’s thesis but it is very interesting and thought-provoking.
-
Links Worth Sharing: What Makes People Successful
November 27, 2018 -
Useful redundancy — when using colors is not completely useless
November 26, 2018The maximum data-ink ratio principle implies that one should not use colors in their graphs if the graph is understandable without the colors. The fact that you can do something, such as adding colors, doesn’t mean you should do it. I know it. I even have a dedicated tag on this blog for that. Sometimes, however, consistent use of colors serves as a useful navigation tool in a long discussion. Keep reading to learn about the justified use of colors.
Pew Research Center is a “is a nonpartisan American fact tank based in Washington, D.C. It provides information on social issues, public opinion, and demographic trends shaping the United States and the world.” Recently, I read a report prepared by the Pew Center on the religious divide in the Israeli society. This is a fascinating report. I recommend reading without any connection to data visualization.
But this post does not deal with the Isreali society but with graphs and colors.
Look at the first chart in that report. You may see a tidy pie chart with several colored segments.

Aha! Can’t they use a single color without losing the details? Of course the can! A monochrome pie chart would contain the same information:

In most of the cases, such a transformation would make a perfect sense. In most of the cases, but not in this report. This report is a multipage research document packed with many facts and analyses. The pie chart above is the first graph in that report that provides a broad overview of the Israeli society. The remaining of this report is dedicated to the relationships between and within the groups represented by the colorful segments in that pie chart. To help the reader navigating through this long report, its authors use a consistent color scheme that anchors every subsequent graph to the relevant sections of the original pie chart.
All these graphs and tables will be readable without the use of colors. Despite the fact that the colors here are redundant, this is a useful redundancy. By using the colors, the authors provided additional information layers that make the navigation within the document easier. I learned about the concept of useful redundancy from “Trees, Maps, and Theorems” by Jean-luc Dumout. If you can only read one book about data communication, it should be this book.
-
On the importance of perspective
November 12, 2018Stalin was a relatively short man, his height was 1.65 m. Khrushchev was even shorter, his height was 1.60. It seems that the difference wasn’t enough for the official Soviet propaganda of that time. Take a look at this photo. We can clearly see that Stalin is taller than Khrushchev.

Do you notice something strange? Take a look at the windows in the background. I added horizontal and vertical guides for your convenience.

Now, look what happens when we fix the horizontal and vertical lines

Now, Khrushchev is still shorter than Stalin but not by that much.
-
Microtext Line Charts
November 12, 2018Why adding text labels to graph lines, when you can build graph lines using text labels? On microtext lines
-
איך אומרים דאטה ויזואליזיישן בעברית?
October 23, 2018This post is written in Hebrew about a Hebrew issue. I won’t translate it to English.
אני מלמד data visualization בשתי מכללות בישראל — במכללת עזריאלי להנדסה בירושלים ובמכון הטכנולוגי בחולון. כשכתבתי את הסילבוס הראשון שלי הייתי צריך למצוא מונח ל־data visualization וכתבתי “הדמיית נתונים״ אומנם זה הזכיר לי קצת תהליך של סימולציה, אבל האופציה האחרת ששקלתי היתה ״דימות״ וידעתי שהיא שמורה ל־imaging, דהיינו תהליך של יצירת דמות או צורה של עצם, בעיקר בעולם הרפואה.
הבנתי שהמונח בעייתי בשיעור הראשון שהעברתי. מסתברששניים מארבעת הסטודנטים שהגיעו לשיעור חשבו שקורס ״הדמיית נתונים בתהליך מחקר ופיתוח״ מדבר על סימולציות.
מתישהו שמעתי מחבר של חבר שהמונח הנכון ל־visualization זה הדמאה, אבל זה נשמע לי פלצני מדי, אז השארתי את ה־״הדמיה״ בשם הקורס והוספתי “data visualization” בסוגריים.
היום, שלוש שנים אחרי ההרצאה הראשונה שהעברתי, ויומיים לפני פתיחת הסמסטר הבא, החלטתי לגגל (יש מילה כזאת? יש!) את התשובה. ומה מסתבר? עלון ״למד לשונך״ מס׳ 109 של האקדמיה ללשון עברית שיצא לאור בשנת 2015 קובע שהמונח ל־visualization הוא הַחְזָיָה. לא יודע מה אתכם, אבל אני לא משתגע על החזיה. עוד משהו שאני לא משתגע עליו הוא שבתור הדוגמא להחזיה, האקדמיה החלטיה לשים תרשים עוגה עם כל כך הרבה שגיאות!

נראה לי שאני אשאר עם הדמיה. ויקימילון מרשה לי.
נ.ב. שמתם לב שפוסט זה השתמשתי במקף עברי? אני מאוד אוהב את המקף העברי.
-
Innumeracy
October 22, 2018Innumeracy is “inability to deal comfortably with the fundamental notions of number and chance”.
I which there was a better term for “innumeracy”, a term that would reflect the importance of analyzing risks, uncertainty, and chance. Unfortunately, I can’t find such a term. Nevertheless, the problem is huge. In this long post, Tom Breur reviews many important aspects of “numeracy”. -
Working Remotely and the Virtue of Aggressive Transparency
October 16, 2018Excellent post by my colleague Simon Ouderkirk on working in a distributed company. It’s a three-year-old post. I wonder how I missed it.
-
Data visualization in right-to-left languages
October 15, 2018If you speak Arabic or Farsi, I need your help. If you don’t speak, share this post with someone who does.
Right-to-left (RTL) languages such as Hebrew, Arabic, and Farsi are used by roughly 1.8 billion people around the world. Many of them consume data in their native languages. Nevertheless, I have never seen any research or study that explores data visualization in RTL languages. Until a couple of days ago, when I saw this interesting observation by Nick Doiron “Charts when you read right-to-left”.

I teach data visualization in Israeli colleges. Whenever a student asks me RTL-related questions, I always answer something like “it’s complicated, let’s not deal with that”. Moreover, in the assignments, I even allow my students to submit graphs in English, even if they write the report in Hebrew.
Nick’s post made me wonder about data visualization do’s and don’ts in RTL environments. Should Hebrew charts differ from Arabic or Farsi? What are the accepted practices?
If you speak Arabic or Farsi, I need your help. If you don’t speak, share this post with someone who does. I want to collect as many examples of data visualization in RTL languages. Links to research articles are more than welcome. You can leave your comments here or send them to boris@gorelik.net.
Thank you.
The image at the top of this post is a modified version of a graph that appears in the post that I cite. Unfortunately, I wasn’t able to find the original publication.
-
A World Without the Number 6 — Math with Bad Drawings
October 11, 2018What will happen if number 6 disappears one day? Ben Orlin, the author of “Math with bad drawings” elaborates on this interesting thought experiment in this 2017 post.

-
Can error correction cause more error? (The answer is yes)
October 9, 2018This is an interesting thought experiment. Suppose that you have some appliance that acts in a normally distributed way. For example, a nerf gun. Let’s say now that you aim and fire the gun. What happens if you miss by some amount of X? Should you correct your aim in the opposite direction? My intuition says “yes.” So does the intuition of many other people with whom I talked about this problem. However, when we start thinking about this problem, we realize that the intuition is wrong. Since we aim the gun, our assumption should be that the deviation is zero. A single observation is not sufficient to reject this assumption. By continually adjusting the data generating process based on a single observation, we reduce the precision (increase the dispersion).
Below is a simulation of adjusted and non-adjusted processes (the code is here). The broader spread of the adjusted data (blue line) is evident.
Due to the nature of the normal random variable, a single large accidental deviation can cause an extreme “correction,” which in turn will create a prolonged period of highly inaccurate points. This is precisely what you see in my simulation.
The moral of this simple experiment is that you shouldn’t let a single affect your actions. -
Me
October 1, 2018 -
Conference Recap: EuroSciPy 2018 — Data for Breakfast
September 20, 2018See my recap of the recent EuroSciPy, published on https://data.blog
In which Boris Gorelik shares his favorite talks and workshops from EuroSciPy 2018.
-
"Any questions?" How to fight the awkward silence at the end of a presentation?
September 20, 2018If you ever gave or attended a presentation, you are familiar with this situation: the presenter asks whether there are any questions and … nobody asks anything. This is an awkward situation. Why aren’t there any questions? Is it because everything is clear? Not likely. Everything is never clear. Is it because nobody cares? Well, maybe. There are certainly many people that don’t care. It’s a fact of life. Study your audience, work hard to make the presentation relevant and exciting but still, some people won’t care. Deal with it.
However, the bigger reasons for lack of the questions are human laziness and the fear of being stupid. Nobody likes asking a question that someone will perceive as a stupid one. Sometimes, some people don’t mind asking a question but are embarrassed and prefer not being the first one to break the silence.
What can you do? Usually, I prepare one or two questions by myself. In this case, if nobody asks anything, I say something like “Some people, when they see these results ask me whether it is possible to scale this method to larger sets.”. Then, depending on how confident you are, you may provide the answer or ask “What do you think?”.
You can even prepare a slide that answers your question. In the screenshot below, you may see the slide deck of the presentation I gave in Trento. The blue slide at the end of the deck is the final slide, where I thank the audience for the attention and ask whether there are any questions.
My plan was that if nobody asks me anything, I would say “Thank you again. If you want to learn more practical advises about data visualization, watch the recording of my tutorial, where I present this method <SLIDE TRANSFER, show the mockup of the “book”>. Also, many people ask me about reading suggestions, this is what I suggest you read: <SLIDE TRANSFER, show the reading pointers>

Luckily for me, there were questions after my talk. Luckily, one of these questions was about practical advice so I had a perfect excuse to show the next, pre-prepared, slide. Watch this moment on YouTube here.
-
Graphing Highly Skewed Data – Tom Hopper
September 16, 2018My colleague, Chares Earl, pointed me to this interesting 2010 post that explores different ways to visualize categories of drastically different sizes.
The post author, Tom Hopper, experiments with different ways to deal with “Data Giraffes”. Some of his experiments are really interesting (such as splitting the graph area). In one experiment, Tom Hopper draws bar chart on a log scale. Doing so is considered as a bad practice. Bar charts value (Y) axis must include meaningful zero, which log scale can’t have by its definition.
Other than that, a good read Graphing Highly Skewed Data – Tom Hopper
-
On privacy, security, and irony
September 9, 2018About a week ago, I met Justin Mayer and had a really interesting chat with him about internet privacy. Today, his 30-minutes talk on that subject appeared in my youtube suggestion list
https://www.youtube.com/watch?v=2rrP_aW-jNA
How ironic. The talk, by the way, is very interesting.
-
Back to Mississippi: Black migration in the 21st century. By Charles Earl
September 4, 2018I wonder how this analysis of remained unnoticed by the social media
The recent election of Doug Jones […] got me thinking: What if the Black populations of Southern cities were to experience a dramatic increase? How many other elections would be impacted?
via Back to Mississippi: Black migration in the 21st century — Charlescearl’s Weblog
-
Please leave a comment to this post
September 3, 2018Please leave a comment to this post. It doesn’t matter what. It doesn’t matter when or where you see it. I want to see how many real people are actually reading this blog.
[caption id=”attachment_media-15” align=”alignnone” width=”1880”]

Photo by Pixabay on Pexels.com[/caption]
-
16-days-work-month — The joys of the Hebrew calendar
September 2, 2018Tishrei is the seventh month of the Hebrew calendar that starts with Rosh-HaShana — the Hebrew New Year. It is a 30 days month that usually occurs in September-October. One interesting feature of Tishrei is the fact that it is full of holidays: Rosh-HaShana (New Year), Yom Kippur (Day of Atonement), first and last days of Sukkot (Feast of Tabernacles) **. All these days are rest days in Israel. Every holiday eve is also a *de facto rest day in many industries (high tech included). So now we have 8 resting days that add to the usual Friday/Saturday pairs, resulting in very sparse work weeks. But that’s not all: the period between the first and the last Sukkot days are mostly considered as half working days. Also, the children are at home since all the schools and kindergartens are on vacation so we will treat those days as half working days in the following analysis.
I have counted the number of business days during this 31-day period (one day before the New Year plus the entire month of Tishrei) between 2008 and 2023 CE, and this is what we get:

Overall, this period consists of between 15 to 17 non-working days in a single month (31 days, mind you). This is how the working/not-working time during this month looks like this:

Now, having some vacation is nice, but this month is absolutely crazy. There is not a single full working week during this month. It is very similar to the constantly interrupt work day, but at a different scale.
So, next time you wonder why your Israeli colleague, customer or partner barely works during September-October, recall this post.
(*) New Year starts in the seventh’s month? I know this is confusing. That’s because we number Nissan – the month of the Exodus from Egypt as the first month.
(**)If you are an observing Jew, you should add to this list Fast of Gedalia, but we will omit it from this discussion -
Sometimes, less is better than more
August 30, 2018Today, during the EuroSciPy conference, I gave a presentation titled “Three most common mistakes in data visualization and how to avoid them”. The title of this presentation is identical to the title of the presentation that I gave in Barcelona earlier this year. The original presentation was approximately one and a half hours long. I knew that EuroSciPy presentations were expected to be shorter, so I was prepared to shorten my talk to half an hour. At some point, a couple of days before departing to Trento, I realized that I was only allocated 15 minutes. Fifteen minutes! Instead of ninety.
Frankly speaking, I was in a panic. I even considered contacting EuroSciPy organizers and asking them to remove my talk from the program. But I was too embarrassed, so I decided to take the risk and started throwing slides away. Overall, I think that I spent eight to ten working hours shortening my presentation. Today, I finally presented it. Based on the result, and on the feedback that I got from the conference audience, I now know that the 15-minutes version is better than the original, longer one. Video recording of my talk is available on Youtube and is embedded below. Below is my slide deck
[slideshare id=112261825&doc=20180830abcthreemostcommonmistakescopy-180830134825]
Illustration image credit: Photo by Jo Szczepanska on Unsplash
-
An even better data visualization workshop
August 30, 2018
Yesterday, I gave a data visualization workshop at EuroSciPy 2018 in Trento. I spent HOURs building and improving it. I even developed a “simple to use, easy to follow, never failing formula” for data visualization process (I’ll write about it later).
I enjoyed this workshop so much. Both preparing it, and (even more so) delivering it. There were so many useful questions and remarks. The most important remark was made by Gael Varoquaux who pointed out that one of my examples was suboptimal for vision impaired people. The embarrassing part is that one of the last lectures that I gave in my college data visualization course was about visual communication for the visually impaired. That is why the first thing I did when I came to my hotel after the workshop was to fix the error. You may find all the (corrected) material I used in this workshop on GitHub. Below, is the video of the workshop, in case you want to follow it.
https://www.youtube.com/watch?v=H-K_fSA54AM
Photo credit: picture of me delivering the workshop is by Margriet Groenendijk
-
Meet me at EuroSciPy 2018
August 26, 2018I am excited to run a data visualization tutorial, and to give a data visualization talk during the 2018 EuroSciPy meeting in Trento, Italy.
My tutorial “Data visualization – from default and suboptimal to efficient and awesome”will take place on Sep 29 at 14:00. This is a two-hours tutorial during which I will cover between two to three examples. I will start with the default Matplotlib graph, and modify it step by step, to make a beautiful aid in technical communication. I will publish the tutorial notebooks immediately after the conference.
My talk “Three most common mistakes in data visualization” will be similar in nature to the one I gave in Barcelona this March, but more condensed and enriched with information I learned since then.
If you plan attending EuroSciPy and want to chat with me about data science, data visualization, or remote working, write a message to boris@gorelik.net.
Full conference program is available here.
- Older posts Newer posts


