On data beauty and communication style

There’s an interesting mini-drama going on in the data visualization world. The moderators of DataIsBeautiful invited Stephen Few for an ask-me-anything (AMA) session. Stephen Few is a data visualization researcher and an opinionated blogger. I use his book “Show Me the Numbers” when I teach data visualization. Both in his book and even more so, on his blog, Dr. Few is not afraid of criticizing practices that fail to meet his standards of quality. That is why I wasn’t surprised when I read Stephen Few’s public response to the AMA invitation:

I stridently object to the work of lazy, unskilled creators of meaningless, difficult to read, or misleading data displays. … Many data visualizations that are labeled “beautiful” are anything but. Instead, they pander to the base interests of those who seek superficial, effortless pleasure rather than understanding, which always involves effort.

This response triggered some backlash. Randal Olson (a prominent data scientists and a blogger, for example, called his response “petty”:

I have to respectfully disagree with Randy. Don’t get me wrong. Stephens Few’s response style is indeed harsh. However, I have to agree with him. Many (although not all) data visualization cases that I saw on DataIsBeatiful look like data visualization for the sake of data visualization. They are, basically, collections of lines and colors that demonstrate cool features of plotting libraries but do not provide any insight or tell any (data-based) story. From time to time, we see pieces of “data art,” in which the data plays a secondary role, and have nothing to do with “data visualization” where the data is the “king.” I don’t consider myself an artistic person, but I don’t appreciate the “art” part of most of the data art pieces I see.

So, I do understand Stephen Few’s criticism. What I don’t understand is why he decided to pass the opportunity to preach to the best target audience he can hope for. It seems to me that if you don’t like someone’s actions and they ask you for advice, you should be eager to give it to them. Certainly not attacking them. Hillel, an ancient Jewish scholar, said

He who is bashful can’t learn, and he who is harsh can’t teach

Although I don’t have a fraction of teaching experience that Dr. Few has, I’m sure he would’ve achieved better results had he chosen to accept that invitation.

Disclaimer: Stephen Few was very generous to allow me using the illustrations from his book in my teaching.

On procrastination

Cork Board with multiple post-it notes saying "Do it!"

I don’t know anyone, except my wife, who doesn’t consider themselves procrastinator. I procrastinate a lot. Sometimes, when procrastinating,  I read about procrastination. Here’s a list of several recent blog posts about this topic. Read these posts if you have something more important to do*.


 An Ode to the Deadlines competes with An Ode to Procrastination.

I’ll Think of a Title Tomorrow Talks about procrastination from a designer’s point of view. Although it is full of known truths, such as “stop thinking, start doing”, “fear is the mind killer”, and others, it is nevertheless a refreshing reading.

The entire blog called Unblock Results is written by Nancy Linnerooth who seems to position herself as a productivity coach. I liked her last post The Done ListThe Done List that talks about a nice psychological trick of running Done lists instead of Todo lists. This trick plays well with the productivity system that I use in my everyday life. One day, I might describe my system in this blog.

We all know that reading can sometimes be hard. Thus, let me suggest a TED talk titled Inside the mind of a master procrastinator. You’ll be able to enjoy it with a minimal mental effort.

*The pun is intended
Featured image is by Flickr user Vic under CC-by-2.0 (cropped)
The graffiti image is by Flickr user katphotos under CC-by-nc-nd


Fashion, data, science

Zalando is an e-commerce company that sells shoes, clothing and other fashion items. Zalando isn’t a small company. According to Wikipedia, it’s 2015 revenue was almost 3 billion Euro. As you might imagine, you don’t run this kind of business without proper data analysis. Recently, we had Thorsten Dietzsch, a product manager for personalization at the fashion e-commerce at Zalando, joining our team meeting to tell us about how data science works at Zalando. It was an interesting conversation, which is now publically available online.

In the first of our Data Speaker Series posts, Thorsten Dietzsch shares how data products are managed at Zalando, a fashion ecommerce company.

via Data Speaker Series: Thorsten Dietzsch on Building Data Products at Zalando — Data for Breakfast

Featured image: By Flickr user sweetjessie from here. Under the CC BY-NC 2.0 license

Эээх-ухнем. Как не забросить свой блог

Как это не печально, большинство начинающих блоггеров забрасывают свой блог вскоре после его открытия. Что отличает успешных (стойких?) блоггеров от тех, которым не удаётся продержаться? Стоит ли вести коллективные блоги, и если да, как важно распределение труда между авторами?
В этой лекции мы попытаемся пролить свет на эти вопросы, анализируя поведение более пяти миллионов пользователей WordPress.com.

Слайды презентации находятся здесь.

По этой ссылке находится пост на английском, который я написал, когда впервые опубликовал результаты этого исследования.

This Week in Data Reading

Data for Breakfast

This week, Sirin, Boris, and Demet have some recommended reading for you in the fields of descriptive data analysis, machine learning, and ethics in artificial intelligence. Have you recently read anything thought-provoking in the field of data science? Written anything thought-provoking? Be sure to comment and share your recommendations with us.

Sirin Odrowski

sirinSeth Stephens-Davidowitz studies publicly available, anonymous Google Search data. His work reveals prejudices and sheds light on aspects of demography that are hard to tackle with surveys. It’s a long, yet captivating read and a great example data story telling that shows how insightful descriptive data analysis can be. It’s also deeply infuriating because, among other things, his work implies that open racism and biases against girls are widespread.

Boris Gorelik

boris The post “Supervised learning is great — it’s data collection that’s broken” talks about the pain common to many machine learning practitioners, namely, obtaining…

View original post 241 more words

Avoiding being a ‘trophy’ data scientist

In this excellent post, Peadar Coyle lists several anti-patterns in running a data science team. This is an excellent post to read (and a blog to follow).

Models are illuminating and wrong

Recently I’ve been speaking to a number of data scientists about the challenges of adding value to companies. This isn’t an argument that data science doesn’t have positive ROI, but that there needs to be an understanding of the ‘team sport’ and organisational maturity to take advantage of these skills.

The biggest anti-pattern I’ve experienced personally as an individual contributor has been a lack of ‘leadership’ for data science. I’ve seen organisations without the budgetary support, the right champions or clear alignment of data science with their organisational goals. These are some of the anti-patterns I’ve seen, it’s non-exhaustive so I provide it.

The follow is an opinionated list of some of the anti-patterns.

  1. I’ve written before about data strategy. I still think this is one of the things that’s most lacking in organisations. I think a welcome distinction is that data collection which needs to happen before data…

View original post 703 more words

A successful failure

Wall graffiti with text "failure is cool"

Almost half a year ago, I decided to create an online data visualization course. After investing hundreds of hours, I managed to release the first lecture and record another one. However, I decided not to publish new lectures and to remove the existing one from the net. Why? The short answer is a huge cost-to-benefit ratio. For a longer answer, you will have to keep reading this post.

Why creating a course?

It’s not that there are no good courses. There are. However, most of them are tightly coupled with one tool or another. Moreover, many of the courses I have reviewed online are act as an advanced tutorial of a certain plotting tool. The course that I wanted to create was supposed to be tool-neutral, full of theoretical knowledge and bits of practical advice. Another decision that I made was not to write a set of text files (online book, set of Jupyter notebooks, whatever) but to create a course in which the majority of the knowledge is brought to the audience by the means of frontal video lectures. I assumed that this kind of format will be the easiest for the audience to consume.

What went wrong?

So, what went wrong? First of all, you should remember that I work full time at Automattic, which means that every side project is a … side project, that I have to do during my free time. I realized that since the very beginning. However, since I already teach data visualization in different institutions in Israel, I already have a well-formed syllabus with accompanying slide decks full of examples. I assumed that it will take me not more than one hour to prepare every online lecture.

Green screen and a camera in a typical green room setup
Green room. All my friends were very impressed to see it

So, instead of verifying this assumption, I started solving the technical problems, such as buying a nice microphone (which turned out to be a crap), tripods, building a green room in my home office, etc. Once I was satisfied with my technical setup, I decided to record a promo video. Here, I faced a big problem. You see, talking to people and to the camera are completely different things. I feel pretty comfortable talking to people but when I face the camera, I almost freeze. Also, in person-to-person communication, we are somewhat tolerant to small studdering and longish pauses. However, when watching recorded video clips, we expect television quality narration. It turns out that achieving this kind of narration is very hard. Add the fact that English is my third language, and you get a huge time drain. To be able to record a two-minute promo video, I had to write the entire script, rehearse it for a dozen of times, and record it in front of a teleprompter. The filming session alone took around half an hour, as I had to repeat almost every line, time after time.

Screenshot of my YouTube video with 18 views

Preparing slide decks for the lectures wasn’t an easy task either. Despite the fact that I had pretty good slide decks, I realized that they are good for an in-class lecture, where I can point to the screen, go back and forth within a presentation, open external URL’s etc. Once I had my slide decks ready, I faced the narration problem once again. So, I had to write the entire lesson’s script, edit it, rehearse for several days, and shoot. At this time, I became frustrated. I might have been more motivated had my first video received some real traffic. However, with 18 (that’s eighteen) views, most of which lasted not more than a minute or two, I hardly felt a YouTube super star. I know that it’s impossible to get a real traction in such a short period, without massive promotion. However, after I completed shooting the second lecture, I realized that I will not be able to do it much longer. Not without quitting my day job. So, I decided to quit.

What now?

Since I already have pretty good texts for the first two lectures, I might be able to convert them to posts in this blog. I also have material for some before-and-after videos that I planned to have as a part of this course. I will make convert them to posts, too, similar to this post on the data.blog.

Was it worth it?

It certainly was! During the preparations, I learned a lot. I learned new things about data visualization. I took a glimpse into the world of video production. I had a chance to restructure several of my presentations.

Featured image for this post by Nicolas Nova under the CC-by license.