Useful redundancy — when using colors is not completely useless

The maximum data-ink ratio principle implies that one should not use colors in their graphs if the graph is understandable without the colors. The fact that you can do something, such as adding colors, doesn’t mean you should do it. I know it. I even have a dedicated tag on this blog for that. Sometimes, however, consistent use of colors serves as a useful navigation tool in a long discussion. Keep reading to learn about the justified use of colors.

Pew Research Center is a “is a nonpartisan American fact tank based in Washington, D.C. It provides information on social issues, public opinion, and demographic trends shaping the United States and the world.” Recently, I read a report prepared by the Pew Center on the religious divide in the Israeli society. This is a fascinating report. I recommend reading without any connection to data visualization.

But this post does not deal with the Isreali society but with graphs and colors.

Look at the first chart in that report. You may see a tidy pie chart with several colored segments. 

Pie chart: Religious composition of Israeli society. The chart uses several colored segments

Aha! Can’t they use a single color without losing the details? Of course the can! A monochrome pie chart would contain the same information:

Pie chart: Religious composition of Israeli society. The chart uses monochrome segments

In most of the cases, such a transformation would make a perfect sense. In most of the cases, but not in this report. This report is a multipage research document packed with many facts and analyses. The pie chart above is the first graph in that report that provides a broad overview of the Israeli society. The remaining of this report is dedicated to the relationships between and within the groups represented by the colorful segments in that pie chart. To help the reader navigating through this long report, its authors use a consistent color scheme that anchors every subsequent graph to the relevant sections of the original pie chart.

All these graphs and tables will be readable without the use of colors. Despite the fact that the colors here are redundant, this is a useful redundancy. By using the colors, the authors provided additional information layers that make the navigation within the document easier. I learned about the concept of useful redundancy from “Trees, Maps, and Theorems” by Jean-luc Dumout. If you can only read one book about data communication, it should be this book.

16-days-work-month — The joys of the Hebrew calendar

Tishrei is the seventh month of the Hebrew calendar that starts with Rosh-HaShana — the Hebrew New Year*. It is a 30 days month that usually occurs in September-October. One interesting feature of Tishrei is the fact that it is full of holidays: Rosh-HaShana (New Year), Yom Kippur (Day of Atonement), first and last days of Sukkot (Feast of Tabernacles) **. All these days are rest days in Israel. Every holiday eve is also a de facto rest day in many industries (high tech included). So now we have 8 resting days that add to the usual Friday/Saturday pairs, resulting in very sparse work weeks. But that’s not all: the period between the first and the last Sukkot days are mostly considered as half working days. Also, the children are at home since all the schools and kindergartens are on vacation so we will treat those days as half working days in the following analysis.

I have counted the number of business days during this 31-day period (one day before the New Year plus the entire month of Tishrei) between 2008 and 2023 CE, and this is what we get:

Dynamics of the number of working days in Tishrei over the years. The average fluctuation is around 16 days

Overall, this period consists of between 15 to 17 non-working days in a single month (31 days, mind you). This is how the working/not-working time during this month looks like this:

tishrei_2018_calendar

Now, having some vacation is nice, but this month is absolutely crazy. There is not a single full working week during this month. It is very similar to the constantly interrupt work day, but at a different scale.

So, next time you wonder why your Israeli colleague, customer or partner barely works during September-October, recall this post.

(*) New Year starts in the seventh’s month? I know this is confusing. That’s because we number Nissan — the month of the Exodus from Egypt as the first month.
(**)If you are an observing Jew, you should add to this list Fast of Gedalia, but we will omit it from this discussion

Gender salary gap in the Israeli high-tech

A large and popular Israeli Facebook group, “The High-Tech Troubles,” has recently surveyed its participants. The responders provided personal, demographic, and professional information. The group owners have published the aggregated results of that survey. In this post, I analyze a particular aspect of these findings, namely, how the responders’ gender and experience affect their salary. It is worth noting that this survey is by no means a representative one. It’s most noticeable but not the only problem is the participation bias. Another problem is the fact that the result tables do not contain any information about the number of responders in any group. Without this information, it is impossible to compute confidence intervals of any findings. Despite these problems, the results are interesting and worth noting.

The data that I used in my analysis is available in this spreadsheet. The survey organizers promise that they excluded groups and categories with insufficiently few answers, and we have to trust them in that. The results are divided into twenty professional categories such as ‘Account Management,’ ‘Data Science’, ‘Support’ and ‘CXO’ (which stands for an executive role). The salary groups are organized in exponential bins according to the years of experience: 0-1, 1-2, 2-4, 4-7; and more than seven years of experience. Some of the cell values are missing, I assume that these are the categories with too few responders. I took a look at the gap between the salary reported by women and the compensation reported by men.

Let’s take a look at the most complete set of data — the groups of people with 1-2 years of experience. As we may see from the figure below, in thirteen out of twenty groups (65%), women get less money than men.
Gender compensation gap, 1-2 years of experience. Women earn less in 13 of 20 categories

Among the workers with 1 – 2 years of experience, the most discriminating fields are executives and security researchers. It is interesting to note the difference between two closely related fields: Data Science and BI/Data Analysts. The former is considered a more lucrative position. On average, the male data scientists get 11% more than their female colleagues, while male data analysts get 13% less than their female counterparts. I wonder how this difference relates to my (very limited) observation that most of the people who call themselves a BI expert are females, while most of the data scientists whom I know are males.

As we have seen, there is no much gender equality for the young professionals. What happens when people gain experience? How does the gender compensation gap change after eight years of professional life? The situation is even worse. In fourteen, out of sixteen available fields, women get less money than men. The only field in which it pays to be a woman is the executive roles, where the women get 19% more than the men.

Gender compensation gap, more than 7 years of experience. Women earn less in 14 of 16 categories

To complete the picture, let’s look at the gap dynamics over the years in all the occupation fields in that report.

Gender gap dynamics. 20 professional fields over different experience bins

What do we learn from these findings?

These findings are real. We cannot use the non-representativity of these data, and the lack of confidence intervals to dismiss these findings. I don’t expect the participants to lies, neither do I not expect any bias from the participation patterns. It is true that I can’t obtain confidence intervals for these results. However, the fact that the vast majority of the groups lie on one side of the equality line suggests the overall validity of the gender gap notion. How can we fix this gap? I frankly don’t know. As a father of three daughters (9, 12, and 14 years old), I talk to them about this gap. I make sure they are aware of this problem so that, when it’s their turn to negotiate compensation, they are aware of the systematic bias. I hope that this knowledge will give them the right tools to fight for justice.

16-days work month — The joys of the Hebrew calendar

Graph: Tishrei is mostly a non-working month.

Tishrei is the seventh month of the Hebrew calendar that starts with Rosh-HaShana — the Hebrew New Year*. It is a 30 days month that usually occurs in September-October. One interesting feature of Tishrei is the fact that it is full of holidays: Rosh-HaShana (New Year), Yom Kippur (Day of Atonement), first and last days of Sukkot (Feast of Tabernacles) **. All these days are rest days in Israel. Every holiday eve is also a de facto rest day in many industries (high tech included). So now we have 8 resting days that add to the usual Friday/Saturday pairs, resulting in very sparse work weeks. But that’s not all: the period between the first and the last Sukkot days are mostly considered as half working days. Also, the children are at home since all the schools and kindergartens are on vacation, so we will treat those days as half working days in the following analysis.

I have counted the number of business days during this 31-day period (one day before the New Year plus the entire month of Tishrei) between 1993 and 2020 CE, and this is what we get:

 

tishrei_working_days

 

Overall, this period consists of between 15 to 17 non-working days in a single month (31 days, mind you). This is how the working/not-working time during this month looks like this:

tishrei_workign_weeks.png

 

Now, having some vacation is nice, but this month is absolutely crazy. There is not a single full working week during this month. It is very similar to constantly interrupted work day, but at a different scale.

So, next time you wonder why your Israeli colleague, customer or partner barely works during September-October, recall this post.

 

(*) New Year starts in the seventh’s month? I know this is confusing. That’s because we number Nissan — the month of the Exodus from Egypt as the first month.
(**)If you are an observing Jew, you should add to this list Fast of Gedalia, but we will omit it from this discussion

 

Evolution of a Plot: Better Data Visualization, One Step at a Time

My latest post on data.blog

Data for Breakfast

The goal of data visualization is to transform numbers into insights. However, default data visualization output often disappoints. Sometimes, the graph shows irrelevant data or misses important aspects; sometimes, the graph lacks context; sometimes, it’s difficult to read. Often, data practitioners “feel” that something isn’t right with the graph, but cannot pinpoint the problem.

In this post, I’ll share the process of visualizing a complex issue using a simple plot. Despite the fact that the final plot looks elementary and straightforward, it took me several hours and trial-and-error attempts to achieve the result. By sharing this process, I hope to accomplish two goals: to offer my perspectives and approaches to data visualization and to learn from other options you suggest. You’ll find the code and the data used in this post here.

Plotting power distribution in the Knesset

This post is devoted to a graph I created to explore…

View original post 1,738 more words

16-days work month — The joys of the Hebrew calendar

Graph: Tishrei is mostly a non-working month.

Tishrei is the seventh month of the Hebrew calendar that starts with Rosh-HaShana — the Hebrew New Year*. It is a 30 days month that usually occurs in September-October. One interesting feature of Tishrei is the fact that it is full of holidays: Rosh-HaShana (New Year), Yom Kippur (Day of Atonement), first and last days of Sukkot (Feast of Tabernacles) **. All these days are rest days in Israel. Every holiday eve is also a de facto rest day in many industries (high tech included). So now we have 8 resting days that add to the usual Friday/Saturday pairs, resulting in very sparse work weeks. But that’s not all: the period between the first and the last Sukkot days are mostly considered as half working days. Also, the children are at home since all the schools and kindergartens are on vacation, so we will treat those days as half working days in the following analysis.

I have counted the number of business days during this 31-day period (one day before the New Year plus the entire month of Tishrei) between 1993 and 2020 CE, and this is what we get:

 

tishrei_working_days

 

Overall, this period consists of between 15 to 17 non-working days in a single month (31 days, mind you). This is how the working/not-working time during this month looks like this:

tishrei_workign_weeks.png

 

Now, having some vacation is nice, but this month is absolutely crazy. There is not a single full working week during this month. It is very similar to constantly interrupted work day, but at a different scale.

So, next time you wonder why your Israeli colleague, customer or partner barely works during September-October, recall this post.

 

(*) New Year starts in the seventh’s month? I know this is confusing. That’s because we number Nissan — the month of the Exodus from Egypt as the first month.
(**)If you are an observing Jew, you should add to this list Fast of Gedalia, but we will omit it from this discussion