Three most common mistakes in data visualization

People ask me for good intro video to data visualization. I tend to ask them to look for one of my lectures. To save the search, here’s one of the most relevant talks that I gave This lecture was a part of 2018 EuroScipy conference, where I also ran a workshop.

Not a wasted time

Being a freelancer data scientist, I get to talk to people about proposals that don’t materialize into projects.

Which coffee is this?

Gilad Almosnino is an internationalization expert. I’m reading his post “Eight emojis that will create a more inclusive experience for Middle Eastern markets,” in which he mentions “Turkish or Arabic Coffee,” which reminded me of my last visit to Athens. When, in one restaurant, I asked for a Turkish coffee, the waiter looked at me harshly and Continue reading Which coffee is this?

Further Research is Needed

Do you believe in telepathy? Yesterday, I submitted final proofs of a paper in which I actively participated. During the proofreading, I noticed that our abstract ends with “further research is needed” and scratched my head. I submitted the proofs and then then, I saw this pearl in my blog feed

Book review: Great mental models by Shane Parrish

TL;DR shallow and disappointing The Great Mental Models by Shane Parrish was highly praised by Automattic’s CEO Matt Mullenweg. Since I appreciate Matt’s opinion a lot, I decided to buy the book. I read it and was disappointed. This book is very ambitious but yet shallow and non-engaging. If you consider reading a book on Continue reading Book review: Great mental models by Shane Parrish

Does Zipf’s Law Apply to Alzheimer’s Patients?

Originally posted on Akshay Budhkar:
? Introduction I was fascinated by Zipf’s Law when I came across it on a VSauce video. It is an empirical law that states that the frequency of occurrence of a word in a large text corpus is inversely proportional to its rank in its frequency table. The frequency distribution…

The tombs of the righteous

Some people, in face of important changes visit tombs of the righteous for a blessing. I went to see WEIZAC — Israel’s first computer (and one of the first ones in the world) that was built in 1955.

New year, new notebook

On November 7, 2016, I started an experiment in personal productivity. I decided to use a notebook for thirty days to manage all of my tasks. The thirty days ended more than three years ago, and I still use notebooks to manage myself.

Is security through obscurity back?

Yes, ML transparency opens opportunities for hacking and abuse. However, this is EXACTLY the reason why such openness is needed. Hacking attempts will not disappear with transparency removal; they will be harder to defend.

Book review. A Short History of Nearly Everything by Bill Bryson

TL;DR: a nice popular science book that covers many aspects of the modern science A Short History of Nearly Everything by Bill Bryson is a popular science book. I didn’t learn anything fundamental out of this book, but it was worth reading. I was particularly impressed by the intrigues, lies, and manipulations behind so many Continue reading Book review. A Short History of Nearly Everything by Bill Bryson

Cow shit, virtual patient, big data, and the future of the human species

Yesterday, a new episode was published in the Popcorn podcast, where the host, Lior Frenkel, interviewed me. Everyone who knows me knows how much I love talking about myself and what I do. I definitely used this opportunity to talk about the world of data. Some people who listened to this episode told me that Continue reading Cow shit, virtual patient, big data, and the future of the human species

Illustration: a bunch of measurement tapes

The problem with citation count as an impact metric

Inspired by A citation is not a citation is not a citation by Lior Patcher, this rant is about metrics. Lior Patcher is a researcher in Caltech. As many other researchers in the academy, Dr. Patcher is measured by, among other things, publications and their impact as measured by citations. In his post, Lior Patcher criticised both the Continue reading The problem with citation count as an impact metric

Data visualization with statistical reasoning: seeing uncertainty with the bootstrap — Dataviz – Stats – Bayes

On Sunday, I wrote about bootstrapping. On Monday, I wrote about visualization uncertainty. Let’s now talk about bootstrapping and uncertainty visualization. Robert Grant is a data visualization expert who wrote a book about interactive data visualization (which I should read, BTW). Robert runs an interesting blog from which I learned another approach to uncertainty visualization, Continue reading Data visualization with statistical reasoning: seeing uncertainty with the bootstrap — Dataviz – Stats – Bayes

On MOOCs

When Massive Online Open Courses (a.k.a MOOCs) emerged some X years ago, I was ecstatic. I was sure that MOOCs were the Big Boom of higher education. Unfortunately, the MOOC impact turned out to be very modest. This modest impact, combined with the high production cost was one of the reasons I quit making my Continue reading On MOOCs

You don’t need a fast way to increase your reading speed by 25%. Or, don’t suppress subvocalization

Not long ago, I wrote a post about a fast hack that increased my reading speed by tracking the reading with a finger. I think that the logic behind using a tracking finger is to suppress subvocalization. I noticed that, at least in my case, suppressing subvocalization reduces the fun of reading. I actually enjoy Continue reading You don’t need a fast way to increase your reading speed by 25%. Or, don’t suppress subvocalization

Bootstrapping the right way?

Originally posted on Yanir Seroussi:
Bootstrapping the right way is a talk I gave earlier this year at the YOW! Data conference in Sydney. You can now watch the video of the talk and have a look through the slides. The content of the talk is similar to a post I published on bootstrapping pitfalls,…

How do I look like?

From time to time, people (mostly conference organizers) ask for a picture of mine. Feel free using any of these images

Visualizations with perceptual free-rides

Originally posted on richardbrath:
We create visualizations to aid viewers in making visual inferences. Different visualizations are suited to different inferences. Some visualizations offer more additional perceptual inferences over comparable visualizations. That is, the specific configuration enables additional inferences to be observed directly, without additional cognitive load. (e.g. see Gem Stapleton et al, Effective Representation…

Book review. Indistractable by Nir Eyal

Nir Eyal is known for his book “Hooked” in which he teaches how to create addictive products. In his new book “Indistractable“, Nir teaches how to live in the world full of addictive products. The book itself isn’t bad. It provides interesting information and, more importantly, practical tips and action items. Nir covers topics such Continue reading Book review. Indistractable by Nir Eyal

My blog in Hebrew

As much as I love thinking that I live in a global world, most people whom I know speak Hebrew. From time to time, someone would tell me “nice post, but why not in Hebrew?”. So, from now on, I will try to translate all my new posts to Hebrew. I will try. Not promising Continue reading My blog in Hebrew

Pseudochart. It’s like a pseudocode but for charts

Pseudocode is an informal high-level description of the operating principle of a computer program or other algorithm. People write pseudocode to isolate the “bigger picture” of an algorithm. Pseudocode doesn’t care about the particular implementation details that are secondary to the problem, such as memory management, dealing with different encoding, etc. Writing out the pseudocode Continue reading Pseudochart. It’s like a pseudocode but for charts

Illustration. The word "feedback" written with white chalk on a black board

Please leave a comment to this post

Please leave a comment to this post. It doesn’t matter what, it can be a simple Hi or an interesting link. It doesn’t matter when or where you see it. I want to see how many real people are actually reading this blog.

Word Sequentialization

Originally posted on Memos:
In some ways, “data visualization” is a terrible term. It seems to reduce the construction of good charts to a mechanical procedure. It evokes the tools and methodology required to create rather than the creation itself. It’s like calling Moby-Dick a “word sequentialization” or The Starry Night a “pigment distribution.” It also reflects an ongoing…

Illustration: people work on computers

An interesting way to beat procrastination when working from home

Working from home (or a coffee shop, or a library) is great. However, there is one tiny problem: the temptation not to work is sometimes much bigger than the temptation in a traditional office. In the traditional office you are expected to look busy which is the first step to do an actual work. When Continue reading An interesting way to beat procrastination when working from home

To specialize, or not to specialize, that is the data scientists’ question

In my last post on data science career, I heavily promoted the idea that a data scientist needs to find his or her specialization. I back my opinion with my experience and by citing other people opinions. However, keep in mind that I am not a career advisor, I never surveyed the job market, and Continue reading To specialize, or not to specialize, that is the data scientists’ question

Illustration. The word "feedback" written with white chalk on a black board

Please leave a comment to this post

Please leave a comment to this post. It doesn’t matter what, it can be a simple Hi or an interesting link. It doesn’t matter when or where you see it. I want to see how many real people are actually reading this blog.

בניית אתרים עם תמיכה בארץ

מדי פעם אנשים ששומעים שאני עובד בחברה שמפעילה את וורדפרקס.קום מבקשים ממני עזרה אם בניית האתר שלהם. אני חוקר נתונים, לא בונה אתרים. ברור שהחברה בה אני עובד עושה המון מאמצים כדי לאפשר לאנשים לבנות אתרים בעצמם, אבל לפעםמים אנשים צריכים להאציל את הסמכות הזאת למומחים, רוצים גמישות ושליטה וגם תמיכה. אני מכיר אישית את Continue reading בניית אתרים עם תמיכה בארץ

Hackers beware: Bootstrap sampling may be harmful

Originally posted on Yanir Seroussi:
Bootstrap sampling techniques are very appealing, as they don’t require knowing much about statistics and opaque formulas. Instead, all one needs to do is resample the given data many times, and calculate the desired statistics. Therefore, bootstrapping has been promoted as an easy way of modelling uncertainty to hackers who…

Screenshot that says that I have 100 followers

I have 101 followers!

Yesterday, the follower list of my blog exceeded one hundred followers! Even though I know that some of these followers are bots, this number makes me happy! Thank you all (humans and bots) for clicking the “follow” button.

Against A/B tests

Traditional A/B testsing rests on a fundamentally flawed premise. Most of the time, version A will be better for some subgroups, and version B will be better for others. Choosing either A or B is inherentlyinferior to choosing a targeted mix of A and B. Michael Kaminsky locallyoptimistic.com The quote above is from a post by Michael Kaminsky Continue reading Against A/B tests

Links Worth Sharing: What Makes People Successful

Originally posted on Data for Breakfast:
Boris Gorelik The renown network scientist, Albert-László Barabási, has been applying scientific methods to study the factors that make people successful. Science has published an intriguing paper called Quantifying reputation and success in art written by Prof. Barabási and his collaborators. Prof. Barabási talks about the findings of his…

Useful redundancy — when using colors is not completely useless

The maximum data-ink ratio principle implies that one should not use colors in their graphs if the graph is understandable without the colors. The fact that you can do something, such as adding colors, doesn’t mean you should do it. I know it. I even have a dedicated tag on this blog for that. Sometimes, however, consistent use of colors serves as a useful navigation tool in a long discussion. Keep reading to learn about the justified use of colors.

Microtext Line Charts

Originally posted on richardbrath:
Tangled Lines Line charts are a staple of data visualization. They’ve existed at least since William Playfair and possibly earlier. Like many charts, they can be very powerful and also have their limitations. One limitation is the number of lines that can be displayed. One line works well: you can see trend,…

איך אומרים דאטה ויזואליזיישן בעברית?

This post is written in Hebrew about a Hebrew issue. I won’t translate it to English. אני מלמד data visualization בשתי מכללות בישראל — במכללת עזריאלי להנדסה בירושלים ובמכון הטכנולוגי בחולון. כשכתבתי את הסילבוס הראשון שלי הייתי צריך למצוא מונח ל־data visualization וכתבתי “הדמיית נתונים״ אומנם זה הזכיר לי קצת תהליך של סימולציה, אבל האופציה Continue reading איך אומרים דאטה ויזואליזיישן בעברית?

Innumeracy

Originally posted on Data, Analytics and beyond:
Tom Breur 21 October 2018 It has long been known that the general public is sometimes remarkably out of tune with math and numbers. In 1988 mathematician John Allan Paulos wrote a classic “Innumeracy” that is chockful of striking examples of misinterpretation of numeric evidence. Paulos refers to…

Questions?

“Any questions?” How to fight the awkward silence at the end of a presentation?

If you ever gave or attended a presentation, you are familiar with this situation: the presenter asks whether there are any questions and … nobody asks anything. This is an awkward situation. Why aren’t there any questions? Is it because everything is clear? Not likely. Everything is never clear. Is it because nobody cares? Well, Continue reading “Any questions?” How to fight the awkward silence at the end of a presentation?

screenshot of three graphs: two bar plots and one dot plot with a split graph area

Graphing Highly Skewed Data – Tom Hopper

My colleague, Chares Earl, pointed me to this interesting 2010 post that explores different ways to visualize categories of drastically different sizes. The post author, Tom Hopper, experiments with different ways to deal with “Data Giraffes”. Some of his experiments are really interesting (such as splitting the graph area). In one experiment, Tom Hopper draws Continue reading Graphing Highly Skewed Data – Tom Hopper

On privacy, security, and irony

About a week ago, I met Justin Mayer and had a really interesting chat with him about internet privacy. Today, his 30-minutes talk on that subject appeared in my youtube suggestion list   How ironic. The talk, by the way, is very interesting.    

Back to Mississippi: Black migration in the 21st century. By Charles Earl

I wonder how this analysis of remained unnoticed by the social media The recent election of Doug Jones […] got me thinking: What if the Black populations of Southern cities were to experience a dramatic increase? How many other elections would be impacted? via Back to Mississippi: Black migration in the 21st century — Charlescearl’s Continue reading Back to Mississippi: Black migration in the 21st century. By Charles Earl

16-days-work-month — The joys of the Hebrew calendar

Tishrei is the seventh month of the Hebrew calendar that starts with Rosh-HaShana — the Hebrew New Year*. It is a 30 days month that usually occurs in September-October. One interesting feature of Tishrei is the fact that it is full of holidays: Rosh-HaShana (New Year), Yom Kippur (Day of Atonement), first and last days Continue reading 16-days-work-month — The joys of the Hebrew calendar

Value-Suppressing Uncertainty Palette

Value-Suppressing Uncertainty Palettes – UW Interactive Data Lab – Medium

Uncertainty is one of the most neglected aspects of number-based communication and one of the most important concepts in general numeracy. Comprehending uncertainty is hard. Visualizing it is, apparently, even harder. Last week I read a paper called Value-Suppressing Uncertainty Palettes, by M.Correll, D. Moritz, and J. Heer from the Data visualization and interactive analysis research at the Continue reading Value-Suppressing Uncertainty Palettes – UW Interactive Data Lab – Medium

Screenshot showing two slides. The first one is titled "low within-group variability". The second one is titled "High between-group variability". The graphs in the slides is the same

Evolution of a complex graph. Part 1. What do you want to say?

From time to time, people ask me for help with non-trivial data visualization tasks. A couple of weeks ago, a friend-of-a-friend-of-a-friend showed me a set of graphs with the following note: Each row is a different use case. Each use case was tested on three separate occasions – columns 1,2,3. We hope to show that Continue reading Evolution of a complex graph. Part 1. What do you want to say?