• Career advice. A research pharmacist wants to become a data scientist.

    Career advice. A research pharmacist wants to become a data scientist.

    January 9, 2020

    Recently, I received an email from a pharmacist who considers becoming a data scientist. Since this is not a first (or last) similar email that I receive, I think others will find this message exchange interesting.

    Here’s the original email with minor edits, followed by my response.

    The question

    Hi Boris,

    My name is XXXXX, and I came across your information and your advice on data science as I was researching career opportunities.

    I currently work at a hospital as a research pharmacist, mainly involved in managing drugs for clinical trials.
    Initially, I wanted to become a clinical pharmacist and pursued 1-year post-graduate residency training. However, it was not something I could envision myself enjoying for the rest of my career.

    I then turned towards obtaining a Ph.D. in translational research, bridging the benchwork research to the bedside so that I could be at the forefront of clinical trial development and benefit patients from the rigorous stages of pre-clinical research outcomes. I much appreciate learning all the meticulous work dedicated before the development of Phase I clinical trials. However, Ph.D. in pharmaceutical sciences was overkill for what I wanted to achieve in my career (in my opinion), and I ended up completing with master’s in pharmaceutical sciences.

    Since I wanted to be involved in both research and pharmacy areas in my career, I ended up where I am now, a research pharmacist.

    My main job description is not any different from typical hospital pharmacists. I do have a chance of handling investigational medications, learning about new medications and clinical protocols, overseeing side effects that may be a crucial barrier in marketing the trial medications, and sometimes participating in development of drug preparation and handling for investigator-initiated trials. This does keep my job interesting and brings variety in what I do. However, I do still feel I am merely following the guidelines to prepare medications and not critically thinking to make interventions or manipulate data to see the outcomes. At this point, I am preparing to find career opportunities in the pharmaceutical industry where I will be more actively involved in clinical trial development, exchanging information about targeting the diseases and analyzing data. I believe gaining knowledge and experiences in critical characteristics for the data science field would broaden my career opportunities and interest. Still, unfortunately, I only have pharmacy background and have little to no experience in computer science, bioinformatics, or machine learning.

    The answer

    First of all, thank you for asking me. I’m genuinely flattered. I assume that you found me through my blog posts, and if not, I suggest that you read at least the following posts

    All my thoughts on the career path of a data scientist appear in this page https://gorelik.net/category/career-advice/

    Now, specifically to your questions.

    My path towards data science was through gradual evolution. Every new phase in my career used my previous experience and knowledge. From B.Sc studies in pharmacy to doctorate studies in computational drug design, from computational drug design to biomathematical modeling, from that to bioinformatics, and from that to cybersecurity. Of course, my path is not unique. I know at least three people who followed a similar career from pharmacy to data science. Maybe other people made different choices and are even more successful than I am. My first advice to everyone who wants to transition into data science is not to (see the first link in the list above). I was lucky to enter the field before it was a field, but today, we live in the age of specialization. Today we have data analysts, data engineers, machine learning engineers, NLP scientists, image processing specialists, etc. If computational modeling is something that a person likes and sees themselves doing for living, I suggest pursuing a related advanced degree with a project that involves massive modeling efforts. Examples of such degrees for a pharmacist are computational chemistry, pharmacoepidemiology, pharmacovigilance, bioinformatics. This way, one can utilize the knowledge that they already have to expand the expertise, build a reputation, and gain new knowledge. If staying in academia is not an option, consider taking a relevant real-life project. For example, if you work in a hospital, you could try identifying patterns in antibiotics usage, a correlation between demographics and hospital re-admission, … you get the idea.

    Whatever you do, you will not be able to work as a data scientist if you can’t write computer programs. Modifying tutorial scripts is not enough; knowing how to feed data into models is not enough.

    Also, my most significant knowledge gap is in maths. If you do go back to academia, I strongly suggest taking advantage of the opportunity and taking several math classes: at least calculus and linear algebra and, of course, statistics.

    Do you have a question for me?

    If you have questions, feel free writing them here, in the comments section or writing to boris@gorelik.net

    January 9, 2020 - 4 minute read -
    advice data science data science careers pharmacist blog Career advice
  • Athens, Greece

    Athens, Greece

    January 8, 2020

    January 8, 2020 - 1 minute read -
    athens graffiti greece photo blog
  • New year, new notebook

    New year, new notebook

    January 1, 2020

    On November 7, 2016, I started an experiment in personal productivity. I decided to use a notebook for thirty days to manage all of my tasks. The thirty days ended more than three years ago, and I still use notebooks to manage myself. Today, I started the thirteenth notebook.

    Read about my time management system here.

    January 1, 2020 - 1 minute read -
    procrastination productivity blog Productivity & Procrastination
  • Don't we all like a good contradiction?

    Don't we all like a good contradiction?

    December 31, 2019

    I am a huge fan of Gerd Gigerenzer who preaches numeracy and uncertainty education. One of Prof. Gigerenzer’s pivotal theses is “Fast and Frugal Heuristics” which is also popularized in his book “Gut Feelings” (listen to this podcast if you don’t want to read the book). I like this approach.

    Today, I listened to the latest episode of the Brainfluence podcast that hosted the psychologist Dr. Gleb Tsipursky who wrote an extensive book called “Never Trust your Gut” with a seemingly contradicting thesis. I added this book to my TOREAD list.

    December 31, 2019 - 1 minute read -
    book contradiction gigerenzer intuition uncertainty blog
  • Staying employable and relevant as a data scientist

    Staying employable and relevant as a data scientist

    December 23, 2019

    One common wisdom is that creative jobs are immune to becoming irrelevant. This is what Brian Solis, the author of “Lifescale” says on this matter

    On the positive side, historically, with every technological advancement, new jobs are created. Incredible opportunity opens up for individuals to learn new skills and create in new ways. It is your mindset, the new in-demand skills you learn, and your creativity that will assure you a bright future in the age of automation. This is not just my opinion. A thoughtful article in Harvard Business Review by Joseph Pistrui was titled, “The Future of Human Work Is Imagination, Creativity, and Strategy.” He cites research by McKinsey […]. In their research, they discovered that the more technical the work, the more replaceable it is by technology. However, work that requires imagination, creative thinking, analysis, and strategic thinking is not only more difficult to automate; it is those capabilities that are needed to guide and govern the machines.

    Many people think that data science falls into the category of “creative thinking and analysis”. However, as time passes by this becomes less true. Here’s why.

    As time passes by, tools become stronger, smarter, and faster. This means that a problem that could have been solved using cutting edge algorithms running by cutting edge scientists on cutting edge computers, will be solvable using a commodity product. “All you have to do” is to apply domain knowledge, select a “good enough” tool, get the results and act upon them. You’ll notice that I included two phases in quotation marks. First, “all you have to do”. I know that it’s not that simple as “just add water” but it gets simpler.

    “Good enough” is also a tricky part. Selecting the right algorithm for a problem has dramatic effect on tough cases but is less important with easy ones. Think of a sorting algorithm. I remember my algorithm class professor used to talk how important it was to select the right sorting algorithm to the right problem. That was almost twenty years ago. Today, I simply write list.sort() and I’m done. Maybe, one day I will have to sort billions of data points in less than a second on a tiny CPU without RAM, which will force me into developing a specialized solution. But in 99.999% of cases, list.sort() is enough.

    Back to data science. I think that in the near future, we will see more and more analogs of list.sort(). What does that mean to us, data scientists? I am not sure. What I’m sure is that in order to stay relevant we have to learn and evolve.

    Featured image by Héctor López on Unsplash

    December 23, 2019 - 2 minute read -
    creativity data science data science careers development employability blog Career advice
  • Is security through obscurity back?

    Is security through obscurity back?

    December 15, 2019

    HBR published an opinion post by Andrew Burt, called “The AI Transparency Paradox.” This post talks about the problems that were created by tools that open up the “black box” of a machine learning model.

    “Black box” refers to the situation where one can’t explain why a machine learning model predicted whatever it predicted. Predictability is not only important when one wants to improve the model or to pinpoint mistakes, but it is also an essential feature in many fields. For example, when I was developing a cancer detection model, every physician requested to know why we thought a particular patient had cancer. That is why I’m happy, so many people develop tools that allow peeking into the black box.

    I was very surprised to read the “transparency paradox” post. Not because I couldn’t imagine that people will use the insights to hack the models. I was surprised because the post reads like a case for security through obscurity – an ancient practice that was mostly eradicated from the mainstream.

    Yes, ML transparency opens opportunities for hacking and abuse. However, this is EXACTLY the reason why such openness is needed. Hacking attempts will not disappear with transparency removal; they will be harder to defend.

    December 15, 2019 - 1 minute read -
    blackbox hbr machine learning opinion transparency blog
  • I will speak at the NDR conference in Bucharest

    I will speak at the NDR conference in Bucharest

    December 11, 2019

    NDR is a family of machine learning conferences in Romania. Last year, I attended the Iași edition of that conference, gave a data visualization talk, and enjoyed every moment. All the lectures (including mine, obviously) were interesting and relevant. That is why, when Vlad Iliescu, one of the NDR organizers, asked me whether I wanted to talk in Bucharest at NDR 2020, I didn’t think twice.

    Since the organizers didn’t publish the talk topics yet, I will not ruin the surprise for you, but I promise to be interesting and relevant. I definitely think that NDR is worth the trip to Bucharest to many data practitioners, even the ones who don’t live in Romania. Visit the conference site to register.

    December 11, 2019 - 1 minute read -
    bucharest conference romania speaking blog
  • Book review. A Short History of Nearly Everything by Bill Bryson

    Book review. A Short History of Nearly Everything by Bill Bryson

    December 2, 2019

    TL;DR: a nice popular science book that covers many aspects of the modern science

    A Short History of Nearly Everything by Bill Bryson is a popular science book. I didn’t learn anything fundamental out of this book, but it was worth reading. I was particularly impressed by the intrigues, lies, and manipulations behind so many scientific discoveries and discoverers.

    The main “selling point” of this book is that it answers the question, “how do the scientists know what they know”? How, for example, do we know the age of Earth or the skin color of the dinosaurs? The author indeed provides some insight. However, because the book tries to talk about “nearly everything,” the answer isn’t focused enough. Simon Singh’s book “Big Bang” concentrates on the cosmology and provides a better insight into the question of “how do we know what we know.”

    Interesting takeaways and highlights

    • Of the problem that our Universe is unlikely to be created by chance: “Although the creation of Universe is very unlikely, nobody knows about failed attempts.”
    • The Universe is unlimited but finite (think of a circle)
    • Developments in chemistry were the driving force of the industrial revolution. Nevertheless, chemistry wasn’t recognized as a scientific field in its own for several decades

    The bottom line: Read if you have time 3.5/5.

    December 2, 2019 - 1 minute read -
    book review popular-science blog
  • Cow shit, virtual patient, big data, and the future of the human species

    Cow shit, virtual patient, big data, and the future of the human species

    November 28, 2019

    Yesterday, a new episode was published in the Popcorn podcast, where the host, Lior Frenkel, interviewed me. Everyone who knows me knows how much I love talking about myself and what I do. I definitely used this opportunity to talk about the world of data. Some people who listened to this episode told me that they enjoyed it a lot. If you know Hebrew, I recommend that you listen to this episode

    https://soundcloud.com/hamutsi/142-boris-gorelik

    November 28, 2019 - 1 minute read -
    data science interivew me podcast speaking blog
  • Data visualization as an engineering task - a methodological approach towards creating effective data visualization

    Data visualization as an engineering task - a methodological approach towards creating effective data visualization

    November 20, 2019

    In June 2019, I attended the NDR AI conference in Iași, Romania where I also gave a talk. Recently, the organizers uploaded the video recording to YouTube.

    November 20, 2019 - 1 minute read -
    bucharest conference data visualisation Data Visualization dataviz iasi public speaking romania speaking video blog
  • A tangible productivity tool (and a book review)

    A tangible productivity tool (and a book review)

    November 11, 2019
    One month ago, I stumbled upon a book called “[Personal Kanban: Mapping Work Navigating Life](https://amzn.to/33DM4l4)” by Jim Benson (all the book links use my affiliate code). Never before, I saw a more significant discrepancy between the value that the book gave me and its actual content.

    Even before finishing the first chapter of this book, I realized that I wanted to incorporate “personal kanban” into my productivity system. The problem was that the entire book could be summarized by a blog post or by a Youtube video (such as this one). The rest of the book contains endless repetitions and praises. I recommend not reading this book, even though it strongly affected the way I work

    So, what is Personal Kanban anyhow? Kanban is a productivity approach that puts all the tasks in front of a person on a whiteboard. Usually, Kanban boards are physical boards with post-it notes, but software Kanban boards are also widely known (Trello is one of them). Following are the claims that Jim Benson makes in his book that resonated with me

    • Many productivity approaches view personal and professional life separately. The reality is that these two aspects of our lives are not separate at all. Therefore, a productivity method needs to combine them.
    • Having all the critical tasks in front of your eyes helps to get the global picture. It also helps to group the tasks according to their contexts.
    • The act of moving notes from one place to another gives valuable tangible feedback. This feedback has many psychological benefits.
    • One should limit the number of work-in-progress tasks.
    • There are three different types of “productivity.” You are Productive when you work hard. You are Efficient when your work is actually getting done. Finally, you are Effective when you do the right job at the right time, and can repeat this process if needed.

    I’m a long user of a productivity method that I adopted from Mark Forster. You may read about my process here. Having read Personal Kanban, I decided to combine it with my approach. According to the plan, I have more significant tasks on my Kanban board, which I use to make daily, weekly, and long-term plans. For the day-to-day (and hour-to-hour) taks, I still use my notebooks.

    Initially, I used my whiteboard for this purpose, but something wasn’t right about it.

    Having my Kanban on my home office whiteboard had two significant drawbacks. First, the whiteboard isn’t with me all the time. And what is the point of putting your tasks on board if you can’t see it? Secondly, listing everything on a whiteboard has some privacy issues. After some thoughts, I decided to migrate the Kanban to my notebook.

    In this notebook, I have two spreads. The first spread is used for the backlog, and “this week” taks. The second spread has the “today,” “doing,” “wait,” and “done” columns. The fact that the notebook is smaller than the whiteboard turned out to be a useful feature. This physical limitation limits the number of tasks I put on my “today” and “doing” lists.

    I organize the tasks at the beginning of my working day. The rest of the system remains unchanged. After more than a month, I’m happy with this new tangible productivity method.

    November 11, 2019 - 3 minute read -
    book review kanban procrastination productivity blog Productivity & Procrastination
  • Knowledge Graphs & NLP @ EMNLP

    Knowledge Graphs & NLP @ EMNLP

    November 10, 2019

    I stumbled upon a very detailed and useful summary of a recent conference on empirical methods in natural language processing. I have to say, Michael Galkin, the author of this review, did an excellent job. His blog, https://medium.com/@mgalkin, is worth following.

    https://medium.com/@mgalkin/knowledge-graphs-nlp-emnlp-2019-part-i-e4e69fd7957c

    November 10, 2019 - 1 minute read -
    repost blog
  • Data science tools with a graphical user interface

    Data science tools with a graphical user interface

    November 5, 2019

    A Quora user asked about data science tools with a graphical user interface. Here’s my answer. I should mention though that I don’t usually use GUI for data science. Not that I think GUIs are bad, I simply couldn’t find a tool that works well for me.

    Of the many tools that exist, I like the most Orange (https://orange.biolab.si/). Orange allows the user creating data pipelines for exploration, visualization, and production but also allows editing the “raw” python code. The combination of these features makes is a powerful and flexible tool.

    The major drawback of Orange (in my opinion) is that is uses its own data format and its own set of models that are not 100% compatible with the Numpy/Pandas/Sklearn ecosystem.

    I have made a modest contribution to Orange by adding a six-lines function that computes Matthews correlation coefficient.

    Other tools are KNIME and Weka (none of them is natively Python).

    There is also RapidMinder but I never used it.

    November 5, 2019 - 1 minute read -
    data science gui knime orange tools weka blog
  • Working in a distributed company. Communication styles

    Working in a distributed company. Communication styles

    October 30, 2019

    I work at Automattic, one of the largest distributed companies in the world. Working in a distributed company means that everybody in this company works remotely. There are currently about one thousand people working in this company from about seventy countries. As you might expect, the international nature of the company poses a communication challenge. Recently, I had a fun experience that demonstrates how different people are.

    Remote work means that we use text as our primary communication tool. Moreover, since the company spans over all the time zones in the world, we mostly use asynchronous communication, which takes the form of posts in internal blogs. A couple of weeks ago, I completed a lengthy analysis and summarized it in a post that was meant to be read by the majority of the company. Being a responsible professional, I asked several people to review the draft of my report.

    To my embarrassment, I discovered that I made a typo in the report title, and not just a typo: I misspelled the company name :-(. A couple of minutes after asking for a review, two of my coworkers pinged me on Slack and told me about the typo. One message was, “There is a typo in the title.” Short, simple, and concise.

    The second message was much longer.

    Do you want to guess what the difference between the two coworkers is? . . . . . Here’s the answer . . . . The author of the first (short) message grew up and lives in Germany. The author of the second message is American. Germany, United States, and Israel (where I am from) have very different cultural codes. Being an Israeli, I tend to communicate in a more direct and less “sweetened” way. For me, the American communication style sounds a little bit “artificial,” even though I don’t doubt the sincerity of this particular American coworker. I think that the opposite situation is even more problematic. It happened several times: I made a remark that, in my opinion, was neutral and well-intended, and later I heard comments about how I sounded too aggressive. Interestingly, all the commenters were Americans.

    To sum up. People from different cultural backgrounds have different communication styles. In theory, we all know that these differences exist. In practice, we usually are unaware of them.

    Featured photo by Stock Photography on Unsplash

    October 30, 2019 - 2 minute read -
    communication-style distributed work remote working working-remotely blog
  • Sometimes, you don't really need a legend

    Sometimes, you don't really need a legend

    October 28, 2019

    This is another “because you can” rant, where I claim that the fact that you can do something doesn’t mean that you necessarily need to.

    This time, I will claim that sometimes, you don’t really need a legend in your graph. Let’s take a look at an example. We will plot the GDP per capita for three countries: Israel, France, and Italy. Plotting three lines isn’t a tricky task. Here’s how we do this in Python

    plt.plot(gdp.Year, gdp.Israel, '-', label='Israel')
    plt.plot(gdp.Year, gdp.France, '-', label='France')
    plt.plot(gdp.Year, gdp.Italy, '-', label='Italy')
    plt.legend()
    

    The last line in the code above does a small magic and adds a nice legend

    This image has an empty alt attribute; its file name is image.png

    In Excel, we don’t even need to do anything, the legend is added for us automatically.

    This image has an empty alt attribute; its file name is image-1.png

    So, what is the problem?

    What happens when a person wants to know which line represents which country? That person needs to compare the line color to the colors in the legend. Since our working memory has a limited capacity, we do one of the following. We either jump from the graph to the legends dozens of times, or we try to find a heuristic (a shortcut). Human brains don’t like working hard and always search for shortcuts (I recommend reading Daniel Kahneman’s “Think Fast and Slow” to learn more about how our brain works).

    What would be the shortcut here? Well, note how the line for Israel lies mostly below the line for Italy which lies mostly below the line for France. The lines in the legend also lie one below the other. However, the line order in these two pieces of information isn’t conserved. This results in a cognitive mess; the viewer needs to work hard to decipher the graph and misses the point that you want to convey.

    And if we have more lines in the graph, the situation is even worse.

    This image has an empty alt attribute; its file name is image-2.png

    Can we improve the graph?

    Yes we can. The simplest way to improve the graph is to keep the right order. In Python, we do that by reordering the plotting commands.

    plt.plot(gdp.Year, gdp.Australia, '-', label='Australia')
    plt.plot(gdp.Year, gdp.Belgium, '-', label='Belgium')
    plt.plot(gdp.Year, gdp.France, '-', label='France')
    plt.plot(gdp.Year, gdp.Italy, '-', label='Italy')
    plt.plot(gdp.Year, gdp.Israel, '-', label='Israel')
    plt.legend()
    

    This image has an empty alt attribute; its file name is image-3.png

    We still have to work hard but at least we can trust our brain’s shortcut.

    If we have more time

    If we have some more time, we may get rid of the (classical) legend altogether.

    countries = [c for c in gdp.columns if c != 'Year']
    fig, ax = plt.subplots()
    for i, c in enumerate(countries):
        ax.plot(gdp.Year, gdp[c], '-', color=f'C{i}')
        x = gdp.Year.max()
        y = gdp[c].iloc[-1]
        ax.text(x, y, c, color=f'C{i}', va='center')
    seaborn.despine(ax=ax)
    

    (if you don’t understand the Python in this code, I feel your pain but I won’t explain it here)

    This image has an empty alt attribute; its file name is image-4.png

    Isn’t it better? Now, the viewer doesn’t need to zap from the lines to the legend; we show them all the information at the same place. And since we already invested three minutes in making the graph prettier, why not add one more minute and make it even more awesome.

    This image has an empty alt attribute; its file name is image-5.png

    This graph is much easier to digest, compared to the first one and it also provides more useful information.

    .

    This image has an empty alt attribute; its file name is image-6.png

    I agree that this is a mess. The life is tough. But if you have time, you can fix this mess too. I don’t, so I won’t bother, but Randy Olson had time. Look what he did in a similar situation.

    percent-bachelors-degrees-women-usa

    I also recommend reading my older post where I compared graph legends to muttonchops.

    In conclusion

    Sometimes, no legend is better than legend.

    This post, in Hebrew: [link]

    October 28, 2019 - 3 minute read -
    because you can data visualisation data-visualizatin dataviz legend blog Data Visualization
  • What do we see when we look at slices of a pie chart?

    What do we see when we look at slices of a pie chart?

    October 21, 2019

    What do we see when we look at slices of a pie chart? Angles? Areas? Arc length? The answer to this question isn’t clear and thus “experts” recommend avoiding pie charts at all.

    Robert Kosara is a Senior Research Scientist at Tableau Software (you should follow his blog https://eagereyes.org), who is very active in studying pie charts. In 2016, Robert Kosara and his collaborators published a series of studies about pie charts. There is a nice post called “An Illustrated Tour of the Pie Chart Study Results” that summarizes these studies.

    Last week, Robert published another paper with a pretty confident title (“Evidence for Area as the Primary Visual Cue in Pie Charts”) and a very inconclusive conclusion

    While this study suggests that the charts are read by area, itis not conclusive. In particular, the possibility of pie chart usersre-projecting the chart to read them cannot be ruled out. Furtherexperiments are therefore needed to zero in on the exact mechanismby which this common chart type is read.

    Kosara. “Evidence for Area as the Primary Visual Cue in Pie Charts.” OSF, 17 Oct. 2019. Web.

    The previous Kosara’s studies had strong practical implications, the most important being that pie charts are not evil provided they are done correctly. However, I’m not sure what I can take from this one. As far as I understand the data, the answer to the questions in the beginning of this post are still unclear. Maybe, the “real answer” to these questions is “a combination of thereof”.

    October 21, 2019 - 2 minute read -
    data visualisation Data Visualization dataviz kosara pie-chart research blog
  • The problem with citation count as an impact metric

    The problem with citation count as an impact metric

    October 18, 2019

    Inspired by A citation is not a citation is not a citation by Lior Patcher, this rant is about metrics.

    Lior Patcher is a researcher in Caltech. As many other researchers in the academy, Dr. Patcher is measured by, among other things, publications and their impact as measured by citations. In his post, Lior Patcher criticised both the current impact metrics and also their effect on citation patterns in the academic community.

    PROBLEM POINTED: citations don’t really measure “actual” citations. Most of the appeared citations are “hit and run citations” i.e: people mention other people’s research without taking anything from that research.

    In fact this author has cited [a certain] work in exactly the same way in several other papers which appear to be copies of each other for a total of 7 citations all of which are placed in dubious “papers”. I suppose one may call this sort of thing hit and run citation.

    via A citation is not a citation is not a citation — Bits of DNA

    I think that the biggest problem with citation counts is that it costs nothing to cite a paper. When you add a research (or a post, for that matter) to your reference list, you know that most probably nobody will check whether actually read it, that nobody will check whether you got that publication correctly and that nobody will that the chances are super (SUUPER) low nobody will check whether you conclusions are right. All it takes is to click a button.

    October 18, 2019 - 2 minute read -
    barabasi impact blog
  • Book review. The War of Art by S. Pressfield

    Book review. The War of Art by S. Pressfield

    October 10, 2019

    TL;DR: This is a long motivational book that is “too spiritual” for the cynic materialist that I am.

    The War of Art by [Pressfield, Steven]

    The War of Art is a strange book. I read it because “everybody” recommended it. This is what Derek Sivers’ book recommendation page says about this book

    Have you experienced a vision of the person you might become, the work you could accomplish, the realized being you were meant to be? Are you a writer who doesn’t write, a painter who doesn’t paint, an entrepreneur who never starts a venture? Then you know what “Resistance” is.
    

    As a known procrastinator, I was intrigued and started reading. In the beginning, the book was pretty promising. The first (and, I think, the biggest) part of the book is about “Resistance” – the force behind the procrastination. I immediately noticed that almost every sentence in this chapter could serve a motivational poster. For example

    • It’s not the writing part that’s hard. What’s hard is sitting down to write.
    • The danger is greatest when the finish line is in sight.
    • The most pernicious aspect of procrastination is that it can become a habit.
    • The more scared we are of a work or calling, the more sure we can be that we have to do it.

    Individually, each sentence makes sense, but their concentration was a bit too much for me. The way Pressfield talks about Resistance resembles the way Jewish preachers talk about Yetzer Hara: it sits everywhere, waiting for you to fail. I’ tdon’t like this approach.

    The next chapters were even harder for me to digest. Pressfield started talking about Muses, gods, prayers, and other “spiritual” stuff; I almost gave up. But I fought the Resistance and finished the book.

    My main takeaways:

    • Resistance is real
    • It’s a problem
    • The more critical the task is, the stronger is the Resistance. OK, I kind of agree with this. Pressfield continues to something do not agree with: thus (according to the author), we can measure the importance of a task by the Resistance it creates.
    • Justifying not pursuing a task by commitments to the family, job, etc. is a form of Resistance.
    • The Pro does stuff.
    • The Artist is a Pro (see above) who does stuff even if nobody cares.
    October 10, 2019 - 2 minute read -
    book review pressfield procrastination resistance the-war-of-art blog
  • Data visualization with statistical reasoning: seeing uncertainty with the bootstrap — Dataviz - Stats - Bayes

    Data visualization with statistical reasoning: seeing uncertainty with the bootstrap — Dataviz - Stats - Bayes

    October 8, 2019

    On Sunday, I wrote about bootstrapping. On Monday, I wrote about visualization uncertainty. Let’s now talk about bootstrapping and uncertainty visualization.

    Robert Grant is a data visualization expert who wrote a book about interactive data visualization (which I should read, BTW).

    Robert runs an interesting blog from which I learned another approach to uncertainty visualization, bootstrapping.

    Source: Robert Grant.

    Read the entire post: Data visualization with statistical reasoning: seeing uncertainty with the bootstrap — Dataviz - Stats - Bayes

    October 8, 2019 - 1 minute read -
    bootstrapping data visualisation Data Visualization dataviz repost uncertainty blog
  • On MOOCs

    On MOOCs

    October 7, 2019

    When Massive Online Open Courses (a.k.a MOOCs) emerged some X years ago, I was ecstatic. I was sure that MOOCs were the Big Boom of higher education. Unfortunately, the MOOC impact turned out to be very modest. This modest impact, combined with the high production cost was one of the reasons I quit making my online course after producing two or three lectures. Nevertheless, I don’t think MOOCs are dead yet. Following are some links I recently read that provide interesting insights to MOOC production and consumption.

    • A systematic study of academic engagement in MOOCs that is scheduled for publication in the November issue of Erudit.org. This 20+ page-long survey summarizes everything we know about MOOCs today (I have to admit, I only skimmed through this paper, I didn’t read all of it)
    • A Science Magazine article from January, 2019. The article, “The MOOC pivot,” sheds light to the very low retention numbers in MOOCs.

    • On MOOCs and video lectures. Prof. Loren Barbara from George Washington University explains why her MOOCs are not built for video. If you consider creating an online class, you should read this.
    • The economic consequences of MOOCs. A concise summary of a 2018 study that suggest that MOOC’s economic impact is high despite the high churn rates.
    • Thinkful.com, an online platform that provides personalized training to aspiring data professionals, got in the news three weeks ago after being purchased for $80 million. Thinkful isn’t a MOOC per-se but I have a special relationship with it: a couple of years ago I was accepted as a mentor at Thinkful but couldn’t find time to actually mentor anyone.

    The bottom line

    We still don’t know how this future will look like and how MOOCs will interplay with the legacy education system but I’m sure the MOOCs are the future

    October 7, 2019 - 2 minute read -
    education future mooc thinkful blog Career advice
  • Error bars in bar charts. You probably shouldn't

    Error bars in bar charts. You probably shouldn't

    October 7, 2019

    This is another post in the series Because You Can. This time, I will claim that the fact that you can put error bars on a bar chart doesn’t mean you should.

    It started with a paper by prof. Gerd Gigerenzer whose work in promoting numeracy I adore. The paper, “Natural frequencies improve Bayesian reasoning in simple and complex inference tasks” contained a simple graph that meant to convince the reader that natural frequencies lead to more accurate understanding (read the paper, it explains these terms). The error bars in the graph mean to convey uncertainty. However, the data visualization selection that Gigerenzer and his team selected is simply wrong.

    First of all, look at the leftmost bar, it demonstrates so many problems with error bars in general, and in error bars in barplots in particular. Can you see how the error bar crosses the X-axis, implying that Task 1 might have resulted in negative percentage of correct inferences?

    The irony is that Prof. Gigerenzer is a worldwide expert in communicating uncertainty. I read his book “Calculated risk” from cover to cover. Twice.

    Why is this important?

    Communicating uncertainty is super important. Take a look at this 2018 study with the self-explaining title “Uncertainty Visualization Influences how Humans Aggregate Discrepant Information.” From the paper: “Our study repeatedly presented two [GPS] sensor measurements with varying degrees of inconsistency to participants who indicated their best guess of the “true” value. We found that uncertainty information improves users’ estimates, especially if sensors differ largely in their associated variability”.

    Image result for clinton trump pollsSource HuffPost

    Also recall the surprise when Donald Trump won the presidential elections despite the fact that most of the polls predicted that Hillary Clinton had higher chances to win. Nobody cared about uncertainty, everyone saw the graphs!

    Why not error bars?

    Keep in mind that error bars are considered harmful, and I have a reference to support this claim. But why?

    First of all, error bars tend to be symmetric (although they don’t have to) which might lead to the situation that we saw in the first example above: implying illegal values.

    Secondly, error bars are “rigid”, implying that there is a certain hard threshold. Sometimes the threshold indeed exists, for example a threshold of H0 rejection. But most of the time, it doesn’t.

    stacked round gold-colored coins on white surface

    More specifically to bar plots, error lines break the bar analogy and are hard to read. First, let me explain the “bar analogy” part.

    The thing with bar charts is that they are meant to represent physical bars. A physical bar doesn’t have soft edges and adding error lines simply breaks the visual analogy.

    Another problem is that the upper part of the error line is more visible to the eye than the lower one, the one that is seen inside the physical bar. See?undefined

    But that’s not all. The width of the error bars separates the error lines and makes the comparison even harder. Compare the readability of error lines in the two examples below

    The proximity of the error lines in the second example (take from this site) makes the comparison easier.

    Are there better alternatives?

    Yes. First, I recommend reading the “Error bars considered harmful” paper that I already mentioned above. It not only explains why, but also surveys several alternatives

    Nathan Yau from flowingdata.com had anextensive post about different ways to visualize uncertainty. He reviewed ranges, shades, rectangles, spaghetti charts and more.

    Claus Wilke’s book “Fundamentals of Data Visualization” has a dedicated chapter to uncertainty with and even more detailed review [link].

    Visualize uncertainty about the future” is a Science article that deals specifically with forecasts

    Robert Kosara from Tableu experimented with visualizing uncertainty in parallel coordinates.

    There are many more examples and experiments, but I think that I will stop right now.

    The bottom line

    Communicating uncertainty is important.

    Know your tools.

    Try avoiding error bars.

    Bars and bars don’t combine well, therefore, try harder avoiding error bars in bar charts.

    October 7, 2019 - 3 minute read -
    because you can data visualisation Data Visualization dataviz gigerenzer uncertainty blog
  • You don't need a fast way to increase your reading speed by 25%. Or, don't suppress subvocalization

    You don't need a fast way to increase your reading speed by 25%. Or, don't suppress subvocalization

    October 6, 2019

    Not long ago, I wrote a post about a fast hack that increased my reading speed by tracking the reading with a finger. I think that the logic behind using a tracking finger is to suppress subvocalization. I noticed that, at least in my case, suppressing subvocalization reduces the fun of reading. I actually enjoy hearing the inner voice that reads the book “with me”.

    October 6, 2019 - 1 minute read -
    reading reading-speed blog
  • Bootstrapping the right way?

    Bootstrapping the right way?

    October 6, 2019

    Many years ago, I terribly overfit a model which caused losses of a lot of shekels (a LOT). It’s not that I wasn’t aware of the potential overfitting. I was. Among other things, I used several bootstrapping simulations. It turns out that I applied the bootstrapping in a wrong way. My particular problem was that I “forgot” about confounding parameters and that I “forgot” that peeping into the future is a bad thing.

    Anyhow, Yanir Seroussi, my coworker data scientist, gave a very good talk on bootstrapping.

    October 6, 2019 - 1 minute read -
    bootstrapping data science overfitting reblog blog
  • How do I look like?

    How do I look like?

    October 3, 2019

    From time to time, people (mostly conference organizers) ask for a picture of mine. Feel free using any of these images

    • Me in front of a whiteboard, pointing at a graph
    • Me in front of a screen that shows a bar chart
    • Me speaking on a stage
    October 3, 2019 - 1 minute read -
    me photo blog
  • Visualizations with perceptual free-rides

    Visualizations with perceptual free-rides

    October 2, 2019

    Dr. Richard Brath is a data visualization expert who also blogs from time to time. Each post in Richard’s blog provides a deep, and often unexpected to me, insight into one dataviz aspect or another.

    October 2, 2019 - 1 minute read -
    bar plot data visualisation Data Visualization dataviz reblog richard-brath blog
  • Older posts Newer posts