• Hackers beware: Bootstrap sampling may be harmful

    Hackers beware: Bootstrap sampling may be harmful

    January 15, 2019

    Anything is better when bootstrapped. Read my co-worker’s post on bootstrapping. Also make sure following the links Yanir gives to support his claims

    January 15, 2019 - 1 minute read -
    blog
  • I have 101 followers!

    I have 101 followers!

    January 14, 2019

    Yesterday, the follower list of my blog exceeded one hundred followers! Even though I know that some of these followers are bots, this number makes me happy! Thank you all (humans and bots) for clicking the “follow” button.

    January 14, 2019 - 1 minute read -
    blogging followers blog
  • A Brand Image Analysis of WordPress and Automattic on Twitter

    A Brand Image Analysis of WordPress and Automattic on Twitter

    January 13, 2019

    My coworker analyzed Twitter social network around Automattic, WordPress, and other related projects.

    January 13, 2019 - 1 minute read -
    blog
  • Against A/B tests

    Against A/B tests

    December 12, 2018

    Traditional A/B testsing rests on a fundamentally flawed premise. Most of the time, version A will be better for some subgroups, and version B will be better for others. Choosing either A or B is inherentlyinferior to choosing a targeted mix of A and B.

    Michael Kaminsky locallyoptimistic.com

    The quote above is from a post by Michael Kaminsky “Against A/B tests”. I’m still not fully convinced by Michael’s thesis but it is very interesting and thought-provoking.

    December 12, 2018 - 1 minute read -
    a-b-testing data science reblog statistics blog
  • Links Worth Sharing: What Makes People Successful

    Links Worth Sharing: What Makes People Successful

    November 27, 2018
    November 27, 2018 - 1 minute read -
    blog
  • Useful redundancy — when using colors is not completely useless

    Useful redundancy — when using colors is not completely useless

    November 26, 2018

    The maximum data-ink ratio principle implies that one should not use colors in their graphs if the graph is understandable without the colors. The fact that you can do something, such as adding colors, doesn’t mean you should do it. I know it. I even have a dedicated tag on this blog for that. Sometimes, however, consistent use of colors serves as a useful navigation tool in a long discussion. Keep reading to learn about the justified use of colors.

    Pew Research Center is a “is a nonpartisan American fact tank based in Washington, D.C. It provides information on social issues, public opinion, and demographic trends shaping the United States and the world.” Recently, I read a report prepared by the Pew Center on the religious divide in the Israeli society. This is a fascinating report. I recommend reading without any connection to data visualization.

    But this post does not deal with the Isreali society but with graphs and colors.

    Look at the first chart in that report. You may see a tidy pie chart with several colored segments.

    Pie chart: Religious composition of Israeli society. The chart uses several colored segments

    Aha! Can’t they use a single color without losing the details? Of course the can! A monochrome pie chart would contain the same information:

    Pie chart: Religious composition of Israeli society. The chart uses monochrome segments

    In most of the cases, such a transformation would make a perfect sense. In most of the cases, but not in this report. This report is a multipage research document packed with many facts and analyses. The pie chart above is the first graph in that report that provides a broad overview of the Israeli society. The remaining of this report is dedicated to the relationships between and within the groups represented by the colorful segments in that pie chart. To help the reader navigating through this long report, its authors use a consistent color scheme that anchors every subsequent graph to the relevant sections of the original pie chart.

    All these graphs and tables will be readable without the use of colors. Despite the fact that the colors here are redundant, this is a useful redundancy. By using the colors, the authors provided additional information layers that make the navigation within the document easier. I learned about the concept of useful redundancy from “Trees, Maps, and Theorems” by Jean-luc Dumout. If you can only read one book about data communication, it should be this book.

    November 26, 2018 - 2 minute read -
    because you can colors data visualisation Data Visualization dataviz Israel redundancy blog
  • On the importance of perspective

    On the importance of perspective

    November 12, 2018

    Stalin was a relatively short man, his height was 1.65 m. Khrushchev was even shorter, his height was 1.60. It seems that the difference wasn’t enough for the official Soviet propaganda of that time. Take a look at this photo. We can clearly see that Stalin is taller than Khrushchev.

    stalin.png

    Do you notice something strange? Take a look at the windows in the background. I added horizontal and vertical guides for your convenience.

    Screen Shot 2018-11-05 at 8.38.08

    Now, look what happens when we fix the horizontal and vertical lines

    Screen Shot 2018-11-05 at 8.39.03

    Now, Khrushchev is still shorter than Stalin but not by that much.

    November 12, 2018 - 1 minute read -
    khrushchev perspective photo photography stalin blog
  • Microtext Line Charts

    Microtext Line Charts

    November 12, 2018

    Why adding text labels to graph lines, when you can build graph lines using text labels? On microtext lines

    November 12, 2018 - 1 minute read -
    data visualisation Data Visualization dataviz microtext blog
  • איך אומרים דאטה ויזואליזיישן בעברית?

    איך אומרים דאטה ויזואליזיישן בעברית?

    October 23, 2018

    This post is written in Hebrew about a Hebrew issue. I won’t translate it to English.

    אני מלמד data visualization בשתי מכללות בישראלבמכללת עזריאלי להנדסה בירושלים ובמכון הטכנולוגי בחולון. כשכתבתי את הסילבוס הראשון שלי הייתי צריך למצוא מונח ל־data visualization וכתבתיהדמיית נתונים״ אומנם זה הזכיר לי קצת תהליך של סימולציה, אבל האופציה האחרת ששקלתי היתה ״דימות״ וידעתי שהיא שמורה ל־imaging, דהיינו תהליך של יצירת דמות או צורה של עצם, בעיקר בעולם הרפואה.

    הבנתי שהמונח בעייתי בשיעור הראשון שהעברתי. מסתברששניים מארבעת הסטודנטים שהגיעו לשיעור חשבו שקורס ״הדמיית נתונים בתהליך מחקר ופיתוח״ מדבר על סימולציות.

    מתישהו שמעתי מחבר של חבר שהמונח הנכון ל־visualization זה הדמאה, אבל זה נשמע לי פלצני מדי, אז השארתי את ה־״הדמיה״ בשם הקורס והוספתי “data visualization” בסוגריים.

    היום, שלוש שנים אחרי ההרצאה הראשונה שהעברתי, ויומיים לפני פתיחת הסמסטר הבא, החלטתי לגגל (יש מילה כזאת? יש!) את התשובה. ומה מסתבר? עלון ״למד לשונך״ מס׳ 109 של האקדמיה ללשון עברית שיצא לאור בשנת 2015 קובע שהמונח ל־visualization הוא הַחְזָיָה. לא יודע מה אתכם, אבל אני לא משתגע על החזיה. עוד משהו שאני לא משתגע עליו הוא שבתור הדוגמא להחזיה, האקדמיה החלטיה לשים תרשים עוגה עם כל כך הרבה שגיאות!

    Screen Shot 2018-10-23 at 20.35.52

    נראה לי שאני אשאר עם הדמיה. ויקימילון מרשה לי.

    נ.ב. שמתם לב שפוסט זה השתמשתי במקף עברי? אני מאוד אוהב את המקף העברי.

    October 23, 2018 - 2 minute read -
    data visualisation Data Visualization dataviz hebrew הדמיה החזיה blog
  • Innumeracy

    Innumeracy

    October 22, 2018

    Innumeracy is “inability to deal comfortably with the fundamental notions of number and chance”.
    I which there was a better term for “innumeracy”, a term that would reflect the importance of analyzing risks, uncertainty, and chance. Unfortunately, I can’t find such a term. Nevertheless, the problem is huge. In this long post, Tom Breur reviews many important aspects of “numeracy”.

    October 22, 2018 - 1 minute read -
    blog
  • Working Remotely and the Virtue of Aggressive Transparency

    Working Remotely and the Virtue of Aggressive Transparency

    October 16, 2018

    Excellent post by my colleague Simon Ouderkirk on working in a distributed company. It’s a three-year-old post. I wonder how I missed it.

    October 16, 2018 - 1 minute read -
    blog
  • Data visualization in right-to-left languages

    Data visualization in right-to-left languages

    October 15, 2018

    If you speak Arabic or Farsi, I need your help. If you don’t speak, share this post with someone who does.

    Right-to-left (RTL) languages such as Hebrew, Arabic, and Farsi are used by roughly 1.8 billion people around the world. Many of them consume data in their native languages. Nevertheless, I have never seen any research or study that explores data visualization in RTL languages. Until a couple of days ago, when I saw this interesting observation by Nick Doiron “Charts when you read right-to-left”.

    I teach data visualization in Israeli colleges. Whenever a student asks me RTL-related questions, I always answer something like “it’s complicated, let’s not deal with that”. Moreover, in the assignments, I even allow my students to submit graphs in English, even if they write the report in Hebrew.

    Nick’s post made me wonder about data visualization do’s and don’ts in RTL environments. Should Hebrew charts differ from Arabic or Farsi? What are the accepted practices?

    If you speak Arabic or Farsi, I need your help. If you don’t speak, share this post with someone who does. I want to collect as many examples of data visualization in RTL languages. Links to research articles are more than welcome. You can leave your comments here or send them to boris@gorelik.net.

    Thank you.

    The image at the top of this post is a modified version of a graph that appears in the post that I cite. Unfortunately, I wasn’t able to find the original publication.

    October 15, 2018 - 2 minute read -
    arabic data visualisation Data Visualization dataviz farsi help RTL blog
  • A World Without the Number 6 — Math with Bad Drawings

    A World Without the Number 6 — Math with Bad Drawings

    October 11, 2018

    What will happen if number 6 disappears one day? Ben Orlin, the author of “Math with bad drawings” elaborates on this interesting thought experiment in this 2017 post.

    October 11, 2018 - 1 minute read -
    math mathematics repost blog
  • Can error correction cause more error? (The answer is yes)

    Can error correction cause more error? (The answer is yes)

    October 9, 2018

    This is an interesting thought experiment. Suppose that you have some appliance that acts in a normally distributed way. For example, a nerf gun. Let’s say now that you aim and fire the gun. What happens if you miss by some amount of X? Should you correct your aim in the opposite direction? My intuition says “yes.” So does the intuition of many other people with whom I talked about this problem. However, when we start thinking about this problem, we realize that the intuition is wrong. Since we aim the gun, our assumption should be that the deviation is zero. A single observation is not sufficient to reject this assumption. By continually adjusting the data generating process based on a single observation, we reduce the precision (increase the dispersion).
    Below is a simulation of adjusted and non-adjusted processes (the code is here). The broader spread of the adjusted data (blue line) is evident.

    Two curves. Blues: high dispersion of values when adjustments are performed after every observation. Orange: smaller dispersion when no adjustments are done.

    Due to the nature of the normal random variable, a single large accidental deviation can cause an extreme “correction,” which in turn will create a prolonged period of highly inaccurate points. This is precisely what you see in my simulation.
    The moral of this simple experiment is that you shouldn’t let a single affect your actions.

    October 9, 2018 - 1 minute read -
    distribution statistics blog
  • Me

    Me

    October 1, 2018
    October 1, 2018 - 1 minute read -
    me blog
  • Conference Recap: EuroSciPy 2018 — Data for Breakfast

    Conference Recap: EuroSciPy 2018 — Data for Breakfast

    September 20, 2018

    See my recap of the recent EuroSciPy, published on https://data.blog

    In which Boris Gorelik shares his favorite talks and workshops from EuroSciPy 2018.

    via Conference Recap: EuroSciPy 2018 — Data for Breakfast

    September 20, 2018 - 1 minute read -
    data visualisation Data Visualization dataviz euroscipy public speaking python blog

  • "Any questions?" How to fight the awkward silence at the end of a presentation?

    September 20, 2018

    If you ever gave or attended a presentation, you are familiar with this situation: the presenter asks whether there are any questions and … nobody asks anything. This is an awkward situation. Why aren’t there any questions? Is it because everything is clear? Not likely. Everything is never clear. Is it because nobody cares? Well, maybe. There are certainly many people that don’t care. It’s a fact of life. Study your audience, work hard to make the presentation relevant and exciting but still, some people won’t care. Deal with it.

    However, the bigger reasons for lack of the questions are human laziness and the fear of being stupid. Nobody likes asking a question that someone will perceive as a stupid one. Sometimes, some people don’t mind asking a question but are embarrassed and prefer not being the first one to break the silence.

    What can you do? Usually, I prepare one or two questions by myself. In this case, if nobody asks anything, I say something like “Some people, when they see these results ask me whether it is possible to scale this method to larger sets.”. Then, depending on how confident you are, you may provide the answer or ask “What do you think?”.

    You can even prepare a slide that answers your question. In the screenshot below, you may see the slide deck of the presentation I gave in Trento. The blue slide at the end of the deck is the final slide, where I thank the audience for the attention and ask whether there are any questions.

    My plan was that if nobody asks me anything, I would say “Thank you again. If you want to learn more practical advises about data visualization, watch the recording of my tutorial, where I present this method <SLIDE TRANSFER, show the mockup of the “book”>. Also, many people ask me about reading suggestions, this is what I suggest you read: <SLIDE TRANSFER, show the reading pointers>

    Screen Shot 2018-09-17 at 10.10.21

    Luckily for me, there were questions after my talk. Luckily, one of these questions was about practical advice so I had a perfect excuse to show the next, pre-prepared, slide. Watch this moment on YouTube here.

    September 20, 2018 - 2 minute read -
    data visualisation Data Visualization presentation presentation-tip presenting public speaking blog
  • Graphing Highly Skewed Data – Tom Hopper

    Graphing Highly Skewed Data – Tom Hopper

    September 16, 2018

    My colleague, Chares Earl, pointed me to this interesting 2010 post that explores different ways to visualize categories of drastically different sizes.

    The post author, Tom Hopper, experiments with different ways to deal with “Data Giraffes”. Some of his experiments are really interesting (such as splitting the graph area). In one experiment, Tom Hopper draws bar chart on a log scale. Doing so is considered as a bad practice. Bar charts value (Y) axis must include meaningful zero, which log scale can’t have by its definition.

    Other than that, a good read Graphing Highly Skewed Data – Tom Hopper

    September 16, 2018 - 1 minute read -
    bar plot data data visualisation Data Visualization dataviz blog
  • On privacy, security, and irony

    On privacy, security, and irony

    September 9, 2018

    About a week ago, I met Justin Mayer and had a really interesting chat with him about internet privacy. Today, his 30-minutes talk on that subject appeared in my youtube suggestion list

    https://www.youtube.com/watch?v=2rrP_aW-jNA

    How ironic. The talk, by the way, is very interesting.

    September 9, 2018 - 1 minute read -
    irony privacy security blog
  • Back to Mississippi: Black migration in the 21st century. By Charles Earl

    Back to Mississippi: Black migration in the 21st century. By Charles Earl

    September 4, 2018

    I wonder how this analysis of remained unnoticed by the social media

    The recent election of Doug Jones […] got me thinking: What if the Black populations of Southern cities were to experience a dramatic increase? How many other elections would be impacted?

    via Back to Mississippi: Black migration in the 21st century — Charlescearl’s Weblog

    September 4, 2018 - 1 minute read -
    data-journalism data science race blog
  • Please leave a comment to this post

    Please leave a comment to this post

    September 3, 2018

    Please leave a comment to this post. It doesn’t matter what. It doesn’t matter when or where you see it. I want to see how many real people are actually reading this blog.

    [caption id=”attachment_media-15” align=”alignnone” width=”1880”]close up of text

    Photo by Pixabay on Pexels.com[/caption]

    September 3, 2018 - 1 minute read -
    перекличка feedback blog
  • 16-days-work-month — The joys of the Hebrew calendar

    16-days-work-month — The joys of the Hebrew calendar

    September 2, 2018

    Tishrei is the seventh month of the Hebrew calendar that starts with Rosh-HaShana — the Hebrew New Year. It is a 30 days month that usually occurs in September-October. One interesting feature of Tishrei is the fact that it is full of holidays: Rosh-HaShana (New Year), Yom Kippur (Day of Atonement), first and last days of Sukkot (Feast of Tabernacles) **. All these days are rest days in Israel. Every holiday eve is also a *de facto rest day in many industries (high tech included). So now we have 8 resting days that add to the usual Friday/Saturday pairs, resulting in very sparse work weeks. But that’s not all: the period between the first and the last Sukkot days are mostly considered as half working days. Also, the children are at home since all the schools and kindergartens are on vacation so we will treat those days as half working days in the following analysis.

    I have counted the number of business days during this 31-day period (one day before the New Year plus the entire month of Tishrei) between 2008 and 2023 CE, and this is what we get:

    Dynamics of the number of working days in Tishrei over the years. The average fluctuation is around 16 days

    Overall, this period consists of between 15 to 17 non-working days in a single month (31 days, mind you). This is how the working/not-working time during this month looks like this:

    tishrei_2018_calendar

    Now, having some vacation is nice, but this month is absolutely crazy. There is not a single full working week during this month. It is very similar to the constantly interrupt work day, but at a different scale.

    So, next time you wonder why your Israeli colleague, customer or partner barely works during September-October, recall this post.

    (*) New Year starts in the seventh’s month? I know this is confusing. That’s because we number Nissan – the month of the Exodus from Egypt as the first month.
    (**)If you are an observing Jew, you should add to this list Fast of Gedalia, but we will omit it from this discussion

    September 2, 2018 - 2 minute read -
    holidays Israel RoshHaShana blog
  • Sometimes, less is better than more

    Sometimes, less is better than more

    August 30, 2018

    Today, during the EuroSciPy conference, I gave a presentation titled “Three most common mistakes in data visualization and how to avoid them”. The title of this presentation is identical to the title of the presentation that I gave in Barcelona earlier this year. The original presentation was approximately one and a half hours long. I knew that EuroSciPy presentations were expected to be shorter, so I was prepared to shorten my talk to half an hour. At some point, a couple of days before departing to Trento, I realized that I was only allocated 15 minutes. Fifteen minutes! Instead of ninety.

    Frankly speaking, I was in a panic. I even considered contacting EuroSciPy organizers and asking them to remove my talk from the program. But I was too embarrassed, so I decided to take the risk and started throwing slides away. Overall, I think that I spent eight to ten working hours shortening my presentation. Today, I finally presented it. Based on the result, and on the feedback that I got from the conference audience, I now know that the 15-minutes version is better than the original, longer one. Video recording of my talk is available on Youtube and is embedded below. Below is my slide deck

    [slideshare id=112261825&doc=20180830abcthreemostcommonmistakescopy-180830134825]

    Illustration image credit: Photo by Jo Szczepanska on Unsplash

    August 30, 2018 - 1 minute read -
    data visualisation Data Visualization dataviz euroscipy presentation blog
  • An even better data visualization workshop

    An even better data visualization workshop

    August 30, 2018

    Boris Gorelik teaching in front of an audience.

    Yesterday, I gave a data visualization workshop at EuroSciPy 2018 in Trento. I spent HOURs building and improving it. I even developed a “simple to use, easy to follow, never failing formula” for data visualization process (I’ll write about it later).

    I enjoyed this workshop so much. Both preparing it, and (even more so) delivering it. There were so many useful questions and remarks. The most important remark was made by Gael Varoquaux who pointed out that one of my examples was suboptimal for vision impaired people. The embarrassing part is that one of the last lectures that I gave in my college data visualization course was about visual communication for the visually impaired. That is why the first thing I did when I came to my hotel after the workshop was to fix the error. You may find all the (corrected) material I used in this workshop on GitHub. Below, is the video of the workshop, in case you want to follow it.

    https://www.youtube.com/watch?v=H-K_fSA54AM

    Photo credit: picture of me delivering the workshop is by Margriet Groenendijk

    August 30, 2018 - 1 minute read -
    data visualisation Data Visualization dataviz euroscipy teaching workshop blog
  • Meet me at EuroSciPy 2018

    Meet me at EuroSciPy 2018

    August 26, 2018

    I am excited to run a data visualization tutorial, and to give a data visualization talk during the 2018 EuroSciPy meeting in Trento, Italy.

    My tutorial “Data visualization – from default and suboptimal to efficient and awesome”will take place on Sep 29 at 14:00. This is a two-hours tutorial during which I will cover between two to three examples. I will start with the default Matplotlib graph, and modify it step by step, to make a beautiful aid in technical communication. I will publish the tutorial notebooks immediately after the conference.

    My talk “Three most common mistakes in data visualization” will be similar in nature to the one I gave in Barcelona this March, but more condensed and enriched with information I learned since then.

    If you plan attending EuroSciPy and want to chat with me about data science, data visualization, or remote working, write a message to boris@gorelik.net.

    Full conference program is available here.

    August 26, 2018 - 1 minute read -
    conference data visualisation Data Visualization dataviz euroscipy scipy talking blog
  • Older posts Newer posts