The tombs of the righteous

Some people, in face of important changes visit tombs of the righteous for a blessing. I went to see WEIZAC — Israel’s first computer (and one of the first ones in the world) that was built in 1955.

New year, new notebook

On November 7, 2016, I started an experiment in personal productivity. I decided to use a notebook for thirty days to manage all of my tasks. The thirty days ended more than three years ago, and I still use notebooks to manage myself.

Is security through obscurity back?

Yes, ML transparency opens opportunities for hacking and abuse. However, this is EXACTLY the reason why such openness is needed. Hacking attempts will not disappear with transparency removal; they will be harder to defend.

Book review. A Short History of Nearly Everything by Bill Bryson

TL;DR: a nice popular science book that covers many aspects of the modern science A Short History of Nearly Everything by Bill Bryson is a popular science book. I didn’t learn anything fundamental out of this book, but it was worth reading. I was particularly impressed by the intrigues, lies, and manipulations behind so many Continue reading Book review. A Short History of Nearly Everything by Bill Bryson

Cow shit, virtual patient, big data, and the future of the human species

Yesterday, a new episode was published in the Popcorn podcast, where the host, Lior Frenkel, interviewed me. Everyone who knows me knows how much I love talking about myself and what I do. I definitely used this opportunity to talk about the world of data. Some people who listened to this episode told me that Continue reading Cow shit, virtual patient, big data, and the future of the human species

Illustration: a bunch of measurement tapes

The problem with citation count as an impact metric

Inspired by A citation is not a citation is not a citation by Lior Patcher, this rant is about metrics. Lior Patcher is a researcher in Caltech. As many other researchers in the academy, Dr. Patcher is measured by, among other things, publications and their impact as measured by citations. In his post, Lior Patcher criticised both the Continue reading The problem with citation count as an impact metric

Data visualization with statistical reasoning: seeing uncertainty with the bootstrap — Dataviz – Stats – Bayes

On Sunday, I wrote about bootstrapping. On Monday, I wrote about visualization uncertainty. Let’s now talk about bootstrapping and uncertainty visualization. Robert Grant is a data visualization expert who wrote a book about interactive data visualization (which I should read, BTW). Robert runs an interesting blog from which I learned another approach to uncertainty visualization, Continue reading Data visualization with statistical reasoning: seeing uncertainty with the bootstrap — Dataviz – Stats – Bayes

On MOOCs

When Massive Online Open Courses (a.k.a MOOCs) emerged some X years ago, I was ecstatic. I was sure that MOOCs were the Big Boom of higher education. Unfortunately, the MOOC impact turned out to be very modest. This modest impact, combined with the high production cost was one of the reasons I quit making my Continue reading On MOOCs

You don’t need a fast way to increase your reading speed by 25%. Or, don’t suppress subvocalization

Not long ago, I wrote a post about a fast hack that increased my reading speed by tracking the reading with a finger. I think that the logic behind using a tracking finger is to suppress subvocalization. I noticed that, at least in my case, suppressing subvocalization reduces the fun of reading. I actually enjoy Continue reading You don’t need a fast way to increase your reading speed by 25%. Or, don’t suppress subvocalization

Bootstrapping the right way?

Originally posted on Yanir Seroussi:
Bootstrapping the right way is a talk I gave earlier this year at the YOW! Data conference in Sydney. You can now watch the video of the talk and have a look through the slides. The content of the talk is similar to a post I published on bootstrapping pitfalls,…

How do I look like?

From time to time, people (mostly conference organizers) ask for a picture of mine. Feel free using any of these images

Visualizations with perceptual free-rides

Originally posted on richardbrath:
We create visualizations to aid viewers in making visual inferences. Different visualizations are suited to different inferences. Some visualizations offer more additional perceptual inferences over comparable visualizations. That is, the specific configuration enables additional inferences to be observed directly, without additional cognitive load. (e.g. see Gem Stapleton et al, Effective Representation…

Book review. Indistractable by Nir Eyal

Nir Eyal is known for his book “Hooked” in which he teaches how to create addictive products. In his new book “Indistractable“, Nir teaches how to live in the world full of addictive products. The book itself isn’t bad. It provides interesting information and, more importantly, practical tips and action items. Nir covers topics such Continue reading Book review. Indistractable by Nir Eyal

My blog in Hebrew

As much as I love thinking that I live in a global world, most people whom I know speak Hebrew. From time to time, someone would tell me “nice post, but why not in Hebrew?”. So, from now on, I will try to translate all my new posts to Hebrew. I will try. Not promising Continue reading My blog in Hebrew

Pseudochart. It’s like a pseudocode but for charts

Pseudocode is an informal high-level description of the operating principle of a computer program or other algorithm. People write pseudocode to isolate the “bigger picture” of an algorithm. Pseudocode doesn’t care about the particular implementation details that are secondary to the problem, such as memory management, dealing with different encoding, etc. Writing out the pseudocode Continue reading Pseudochart. It’s like a pseudocode but for charts

Illustration. The word "feedback" written with white chalk on a black board

Please leave a comment to this post

Please leave a comment to this post. It doesn’t matter what, it can be a simple Hi or an interesting link. It doesn’t matter when or where you see it. I want to see how many real people are actually reading this blog.

Word Sequentialization

Originally posted on martin remy:
In some ways, “data visualization” is a terrible term. It seems to reduce the construction of good charts to a mechanical procedure. It evokes the tools and methodology required to create rather than the creation itself. It’s like calling Moby-Dick a “word sequentialization” or The Starry Night a “pigment distribution.” It also reflects an…

Illustration: people work on computers

An interesting way to beat procrastination when working from home

Working from home (or a coffee shop, or a library) is great. However, there is one tiny problem: the temptation not to work is sometimes much bigger than the temptation in a traditional office. In the traditional office you are expected to look busy which is the first step to do an actual work. When Continue reading An interesting way to beat procrastination when working from home

To specialize, or not to specialize, that is the data scientists’ question

In my last post on data science career, I heavily promoted the idea that a data scientist needs to find his or her specialization. I back my opinion with my experience and by citing other people opinions. However, keep in mind that I am not a career advisor, I never surveyed the job market, and Continue reading To specialize, or not to specialize, that is the data scientists’ question

Illustration. The word "feedback" written with white chalk on a black board

Please leave a comment to this post

Please leave a comment to this post. It doesn’t matter what, it can be a simple Hi or an interesting link. It doesn’t matter when or where you see it. I want to see how many real people are actually reading this blog.

בניית אתרים עם תמיכה בארץ

מדי פעם אנשים ששומעים שאני עובד בחברה שמפעילה את וורדפרקס.קום מבקשים ממני עזרה אם בניית האתר שלהם. אני חוקר נתונים, לא בונה אתרים. ברור שהחברה בה אני עובד עושה המון מאמצים כדי לאפשר לאנשים לבנות אתרים בעצמם, אבל לפעםמים אנשים צריכים להאציל את הסמכות הזאת למומחים, רוצים גמישות ושליטה וגם תמיכה. אני מכיר אישית את Continue reading בניית אתרים עם תמיכה בארץ

Hackers beware: Bootstrap sampling may be harmful

Originally posted on Yanir Seroussi:
Bootstrap sampling techniques are very appealing, as they don’t require knowing much about statistics and opaque formulas. Instead, all one needs to do is resample the given data many times, and calculate the desired statistics. Therefore, bootstrapping has been promoted as an easy way of modelling uncertainty to hackers who…

Screenshot that says that I have 100 followers

I have 101 followers!

Yesterday, the follower list of my blog exceeded one hundred followers! Even though I know that some of these followers are bots, this number makes me happy! Thank you all (humans and bots) for clicking the “follow” button.

Against A/B tests

Traditional A/B testsing rests on a fundamentally flawed premise. Most of the time, version A will be better for some subgroups, and version B will be better for others. Choosing either A or B is inherentlyinferior to choosing a targeted mix of A and B. Michael Kaminsky locallyoptimistic.com The quote above is from a post by Michael Kaminsky Continue reading Against A/B tests

Links Worth Sharing: What Makes People Successful

Originally posted on Data for Breakfast:
Boris Gorelik The renown network scientist, Albert-László Barabási, has been applying scientific methods to study the factors that make people successful. Science has published an intriguing paper called Quantifying reputation and success in art written by Prof. Barabási and his collaborators. Prof. Barabási talks about the findings of his…

Useful redundancy — when using colors is not completely useless

The maximum data-ink ratio principle implies that one should not use colors in their graphs if the graph is understandable without the colors. The fact that you can do something, such as adding colors, doesn’t mean you should do it. I know it. I even have a dedicated tag on this blog for that. Sometimes, however, consistent use of colors serves as a useful navigation tool in a long discussion. Keep reading to learn about the justified use of colors.

Microtext Line Charts

Originally posted on richardbrath:
Tangled Lines Line charts are a staple of data visualization. They’ve existed at least since William Playfair and possibly earlier. Like many charts, they can be very powerful and also have their limitations. One limitation is the number of lines that can be displayed. One line works well: you can see trend,…

איך אומרים דאטה ויזואליזיישן בעברית?

This post is written in Hebrew about a Hebrew issue. I won’t translate it to English. אני מלמד data visualization בשתי מכללות בישראל — במכללת עזריאלי להנדסה בירושלים ובמכון הטכנולוגי בחולון. כשכתבתי את הסילבוס הראשון שלי הייתי צריך למצוא מונח ל־data visualization וכתבתי “הדמיית נתונים״ אומנם זה הזכיר לי קצת תהליך של סימולציה, אבל האופציה Continue reading איך אומרים דאטה ויזואליזיישן בעברית?

Innumeracy

Originally posted on Data, Analytics and beyond:
Tom Breur 21 October 2018 It has long been known that the general public is sometimes remarkably out of tune with math and numbers. In 1988 mathematician John Allan Paulos wrote a classic “Innumeracy” that is chockful of striking examples of misinterpretation of numeric evidence. Paulos refers to…

Questions?

“Any questions?” How to fight the awkward silence at the end of a presentation?

If you ever gave or attended a presentation, you are familiar with this situation: the presenter asks whether there are any questions and … nobody asks anything. This is an awkward situation. Why aren’t there any questions? Is it because everything is clear? Not likely. Everything is never clear. Is it because nobody cares? Well, Continue reading “Any questions?” How to fight the awkward silence at the end of a presentation?

screenshot of three graphs: two bar plots and one dot plot with a split graph area

Graphing Highly Skewed Data – Tom Hopper

My colleague, Chares Earl, pointed me to this interesting 2010 post that explores different ways to visualize categories of drastically different sizes. The post author, Tom Hopper, experiments with different ways to deal with “Data Giraffes”. Some of his experiments are really interesting (such as splitting the graph area). In one experiment, Tom Hopper draws Continue reading Graphing Highly Skewed Data – Tom Hopper

On privacy, security, and irony

About a week ago, I met Justin Mayer and had a really interesting chat with him about internet privacy. Today, his 30-minutes talk on that subject appeared in my youtube suggestion list   How ironic. The talk, by the way, is very interesting.    

Back to Mississippi: Black migration in the 21st century. By Charles Earl

I wonder how this analysis of remained unnoticed by the social media The recent election of Doug Jones […] got me thinking: What if the Black populations of Southern cities were to experience a dramatic increase? How many other elections would be impacted? via Back to Mississippi: Black migration in the 21st century — Charlescearl’s Continue reading Back to Mississippi: Black migration in the 21st century. By Charles Earl

16-days-work-month — The joys of the Hebrew calendar

Tishrei is the seventh month of the Hebrew calendar that starts with Rosh-HaShana — the Hebrew New Year*. It is a 30 days month that usually occurs in September-October. One interesting feature of Tishrei is the fact that it is full of holidays: Rosh-HaShana (New Year), Yom Kippur (Day of Atonement), first and last days Continue reading 16-days-work-month — The joys of the Hebrew calendar

Value-Suppressing Uncertainty Palette

Value-Suppressing Uncertainty Palettes – UW Interactive Data Lab – Medium

Uncertainty is one of the most neglected aspects of number-based communication and one of the most important concepts in general numeracy. Comprehending uncertainty is hard. Visualizing it is, apparently, even harder. Last week I read a paper called Value-Suppressing Uncertainty Palettes, by M.Correll, D. Moritz, and J. Heer from the Data visualization and interactive analysis research at the Continue reading Value-Suppressing Uncertainty Palettes – UW Interactive Data Lab – Medium

Screenshot showing two slides. The first one is titled "low within-group variability". The second one is titled "High between-group variability". The graphs in the slides is the same

Evolution of a complex graph. Part 1. What do you want to say?

From time to time, people ask me for help with non-trivial data visualization tasks. A couple of weeks ago, a friend-of-a-friend-of-a-friend showed me a set of graphs with the following note: Each row is a different use case. Each use case was tested on three separate occasions – columns 1,2,3. We hope to show that Continue reading Evolution of a complex graph. Part 1. What do you want to say?

Karl Popper

C for Conclusion

From time to time, I give a lecture about most common mistakes in data visualization. In this lection, I say that not adding a graph’s conclusion as a title is an opportunity wasted In one of these lectures, a fresh university graduate commented that in her University, she was told to never write a conclusion Continue reading C for Conclusion

Meaningless slopes

That fact that you can doesn’t mean that you should! I will say it once again.That fact that you can doesn’t mean that you should! Look at this slopegraph that was featured by “Information is Beautiful” Found it! pic.twitter.com/RxDoB683oI — Information is Beautiful (@infobeautiful) May 10, 2018 What does it say? What do the slopes Continue reading Meaningless slopes

If you know matplolib and are in Israel on May 27th, I need your help

So, the data visualization workshop is fully booked. The organizers told me to expect 40-50 attendees and I need some assistance. I am looking for a person who will be able to answer technical questions such as “I got a syntax error”, “why can’t I see this graph?”, “my graph has different colors”. It’s a Continue reading If you know matplolib and are in Israel on May 27th, I need your help

Prerequisites for the upcoming data visualization workshop

I have been told that the data visualization workshop (“Data Visualization from default to outstanding. Test cases of tough data visualization“) is completely sold out. If you plan to attend this workshop, please check out the repository that I created for it [link]. In that repository, you will find a list of pre-requisites that you Continue reading Prerequisites for the upcoming data visualization workshop

I will host a data visualization workshop at Israel’s biggest data science event

TL/DR   What: Data Visualization from default to outstanding. Test cases of tough data visualization Why:  You would never settle for default settings of a machine learning algorithm. Instead, you would tweak them to obtain optimal results. Similarly, you should never stop with the default results you receive from a data visualization framework. Sadly, most Continue reading I will host a data visualization workshop at Israel’s biggest data science event

Illustration: a mechanical stopwatch in a person's palm

Whoever owns the metric owns the results — don’t trust benchmarks

Other factors being equal, what language would you choose for heavy numeric computations: Python or PHP? This is not a language war but a serious question. For me, the choice seems to be obvious: I would choose Python, and I’m not the only one. In this survey, for example, 45% of data scientist use Python, Continue reading Whoever owns the metric owns the results — don’t trust benchmarks

Pile of poo emoji

When “a pile of shit” is a compliment — On context importance in remote communication

What would you do, if someone left a “Pile of Poo” emoji as a reaction to your photo in your team Slack channel? This is exactly what happened to me a couple of days ago, when Sirin, my team lead, posted a picture of me talking to the Barcelona Machine Learning Meetup Group about data Continue reading When “a pile of shit” is a compliment — On context importance in remote communication

collage of data visualization paper headlines

Three most common mistakes in data visualization 
and how to avoid them. Now, the slides

Yesterday, I talked in front of the Barcelona Data Science and Machine Learning Meetup about the most common mistakes in data visualization. I enjoyed talking with the local community very much. Judging by the feedback I received during and after the talk, they too, enjoyed my presentation. I uploaded my slides to Slideshare. Three most Continue reading Three most common mistakes in data visualization 
and how to avoid them. Now, the slides

Engineering Data Science at Automattic

Originally posted on Data for Breakfast:
Most data scientists have to write code to analyze data or build products. While coding, data scientists act as software engineers. Adopting best practices from software engineering is key to ensuring the correctness, reproducibility, and maintainability of data science projects. This post describes some of our efforts in the…

Me in front of a whiteboard, pointing at a graph

Live in Barcelona. Three most common mistakes in data visualization.

On Thursday, March 20, I will give a talk titled “Three most common mistakes in data visualization and how to avoid them.” I will be a guest of the Barcelona Data Science and Machine Learning Meetup Group. Right now, less than twenty-four hours after the lecture announcement, there are already seventeen people on the waiting Continue reading Live in Barcelona. Three most common mistakes in data visualization.

Illustration: large lamp sigh that says "The same for everyone" with a sunset as a background

On algorithmic fairness & transparency

My teammate, Charles Earl has recently attended the Conference on Fairness, Accountability, and Transparency (FAT*). The conference site is full of very interesting material, including proceedings and video recording of lectures and tutorials. Reading through the conference proceedings, I found a very interesting paper titled “The Cost of Fairness in Binary Classification.” This paper talks Continue reading On algorithmic fairness & transparency