Yesterday, the follower list of my blog exceeded one hundred followers! Even though I know that some of these followers are bots, this number makes me happy! Thank you all (humans and bots) for clicking the “follow” button.
My coworker analyzed Twitter social network around Automattic, WordPress, and other related projects.
As a data scientist, I spend a lot of time analyzing how our users interact with WordPress.com. However, WordPress.com isn’t the only place to gain insight into how people use and talk about our services. Many WordPress.org and WordPress.com discussions take place on social media. Analyzing these discussions can help us understand what our users are saying about WordPress[*] and Automattic, the topics closely associated with our services, and who is leading these discussions.
In every social network, there are people who steer the topic and sentiment of the conversation. These influencers usually have large followings and are positioned centrally within the network. Brands often reach out to influencers to organize focus groups or invite them to events, since they’re usually knowledgeable about the brand and can offer insight into how consumers use the product and potential improvements.
At Automattic, we don’t do traditional influencer marketing. However, since the discussions…
View original post 1,060 more words
Traditional A/B testsing rests on a fundamentally flawed premise. Most of the time, version A will be better for some subgroups, and version B will be better for others. Choosing either A or B is inherentlyinferior to choosing a targeted mix of A and B.Michael Kaminsky locallyoptimistic.com
The quote above is from a post by Michael Kaminsky “Against A/B tests“. I’m still not fully convinced by Michael’s thesis but it is very interesting and thought-provoking.
The renown network scientist, Albert-László Barabási, has been applying scientific methods to study the factors that make people successful. Science has published an intriguing paper called Quantifying reputation and success in art written by Prof. Barabási and his collaborators. Prof. Barabási talks about the findings of his research in an interview with The HumanCurrent podcast.
(The featured image is a portion from Figure 1 in Fraiberger et al., Science 10.1126/science.aau7224 (2018)).
The maximum data-ink ratio principle implies that one should not use colors in their graphs if the graph is understandable without the colors. The fact that you can do something, such as adding colors, doesn’t mean you should do it. I know it. I even have a dedicated tag on this blog for that. Sometimes, however, consistent use of colors serves as a useful navigation tool in a long discussion. Keep reading to learn about the justified use of colors.
Pew Research Center is a “is a nonpartisan American fact tank based in Washington, D.C. It provides information on social issues, public opinion, and demographic trends shaping the United States and the world.” Recently, I read a report prepared by the Pew Center on the religious divide in the Israeli society. This is a fascinating report. I recommend reading without any connection to data visualization.
But this post does not deal with the Isreali society but with graphs and colors.
Look at the first chart in that report. You may see a tidy pie chart with several colored segments.
Aha! Can’t they use a single color without losing the details? Of course the can! A monochrome pie chart would contain the same information:
In most of the cases, such a transformation would make a perfect sense. In most of the cases, but not in this report. This report is a multipage research document packed with many facts and analyses. The pie chart above is the first graph in that report that provides a broad overview of the Israeli society. The remaining of this report is dedicated to the relationships between and within the groups represented by the colorful segments in that pie chart. To help the reader navigating through this long report, its authors use a consistent color scheme that anchors every subsequent graph to the relevant sections of the original pie chart.
All these graphs and tables will be readable without the use of colors. Despite the fact that the colors here are redundant, this is a useful redundancy. By using the colors, the authors provided additional information layers that make the navigation within the document easier. I learned about the concept of useful redundancy from “Trees, Maps, and Theorems” by Jean-luc Dumout. If you can only read one book about data communication, it should be this book.
Why adding text labels to graph lines, when you can build graph lines using text labels? On microtext lines
Line charts are a staple of data visualization. They’ve existed at least since William Playfair and possibly earlier. Like many charts, they can be very powerful and also have their limitations. One limitation is the number of lines that can be displayed. One line works well: you can see trend, volatility, highs, lows, reversals. Two lines provides opportunity for comparison. 5 lines might be getting crowded. 10 lines and you’re starting to run out of colors. But what if the task is to compare across a peer group of 30 or 40 items? Lines get jumbled, there aren’t enough discrete colors, legends can’t clearly distinguish between them. Consider this example looking at unemployment across 37 countries from the OECD: which country had the lowest unemployment in 2010?
Tooltips are an obvious way to solve this, but tooltips have problems – they are much slower than just shifing visual attention…
View original post 1,323 more words
Stalin was a relatively short man, his height was 1.65 m. Khrushchev was even shorter, his height was 1.60. It seems that the difference wasn’t enough for the official Soviet propaganda of that time. Take a look at this photo. We can clearly see that Stalin is taller than Khrushchev.
Do you notice something strange? Take a look at the windows in the background. I added horizontal and vertical guides for your convenience.
Now, look what happens when we fix the horizontal and vertical lines
Now, Khrushchev is still shorter than Stalin but not by that much.
This post is written in Hebrew about a Hebrew issue. I won’t translate it to English.
אני מלמד data visualization בשתי מכללות בישראל — במכללת עזריאלי להנדסה בירושלים ובמכון הטכנולוגי בחולון. כשכתבתי את הסילבוס הראשון שלי הייתי צריך למצוא מונח ל־data visualization וכתבתי “הדמיית נתונים״ אומנם זה הזכיר לי קצת תהליך של סימולציה, אבל האופציה האחרת ששקלתי היתה ״דימות״ וידעתי שהיא שמורה ל־imaging, דהיינו תהליך של יצירת דמות או צורה של עצם, בעיקר בעולם הרפואה.
הבנתי שהמונח בעייתי בשיעור הראשון שהעברתי. מסתברששניים מארבעת הסטודנטים שהגיעו לשיעור חשבו שקורס ״הדמיית נתונים בתהליך מחקר ופיתוח״ מדבר על סימולציות.
מתישהו שמעתי מחבר של חבר שהמונח הנכון ל־visualization זה הדמאה, אבל זה נשמע לי פלצני מדי, אז השארתי את ה־״הדמיה״ בשם הקורס והוספתי “data visualization” בסוגריים.
היום, שלוש שנים אחרי ההרצאה הראשונה שהעברתי, ויומיים לפני פתיחת הסמסטר הבא, החלטתי לגגל (יש מילה כזאת? יש!) את התשובה. ומה מסתבר? עלון ״למד לשונך״ מס׳ 109 של האקדמיה ללשון עברית שיצא לאור בשנת 2015 קובע שהמונח ל־visualization הוא הַחְזָיָה. לא יודע מה אתכם, אבל אני לא משתגע על החזיה. עוד משהו שאני לא משתגע עליו הוא שבתור הדוגמא להחזיה, האקדמיה החלטיה לשים תרשים עוגה עם כל כך הרבה שגיאות!
נראה לי שאני אשאר עם הדמיה. ויקימילון מרשה לי.
נ.ב. שמתם לב שפוסט זה השתמשתי במקף עברי? אני מאוד אוהב את המקף העברי.
Innumeracy is “inability to deal comfortably with the fundamental notions of number and chance”.
I which there was a better term for “innumeracy”, a term that would reflect the importance of analyzing risks, uncertainty, and chance. Unfortunately, I can’t find such a term. Nevertheless, the problem is huge. In this long post, Tom Breur reviews many important aspects of “numeracy”.
21 October 2018
It has long been known that the general public is sometimes remarkably out of tune with math and numbers. In 1988 mathematician John Allan Paulos wrote a classic “Innumeracy” that is chockful of striking examples of misinterpretation of numeric evidence. Paulos refers to innumeracy as “… inability to deal comfortably with the fundamental notions of number and chance …” Personally, I consider it the mathematical equivalent to illiteracy. Another classic from Paulos is “A Mathematician Reads the Newspaper” (1995) which contains a lot of satire, debunking ridiculous claims in the press. It highlights more spectacular examples of innumeracy.
Paulos illustrates innumeracy with lighthearted anecdotes and many common, everyday scenarios. These examples highlight how readers might be fooled by misleading quantitative evidence. His examples span diverse topics like probability and coincidence, misguessing extremely small or very large numbers, pseudoscience and superstition…
View original post 1,450 more words
Excellent post by my colleague Simon Ouderkirk on working in a distributed company. It’s a three-year-old post. I wonder how I missed it.
One of the things that it has taken me quite a long time to figure out, when it comes to this remote work gig, is this idea I’ve taken to calling aggressive transparency.
I’ve been chewing on this idea quite a lot, and in chatting with my team and other folks whose opinions I respect, I think I’m starting to feel like it’s something I should articulate in greater detail.
View original post 1,077 more words