• Can the order in which graphs are shown change people's conclusions?

    Can the order in which graphs are shown change people's conclusions?

    October 17, 2017

    When I teach data visualization, I love showing my students how simple changes in the way one visualizes his or her data may drive the potential audience to different conclusions. When done correctly, such changes can help the presenters making their point. They also can be used to mislead the audience. I keep reminding the students that it is up to them to keep their visualizations honest and fair. In his recent post, Robert Kosara, the owner of https://eagereyes.org/, mentioned another possible way that may change the perceived conclusion. This time, not by changing a graph but by changing the order of graphs exposed to a person. Citing Robert Kosara:

    Priming is when what you see first influences how you perceive what comes next. In a series of studies, [André Calero Valdez, Martina Ziefle, and Michael Sedlmair] showed that these effects also exist in the particular case of scatterplots that show separable or non-separable clusters. Seeing one kind of plot first changes the likelihood of you judging a subsequent plot as the same or another type.

    via IEEE VIS 2017: Perception, Evaluation, Vision Science — eagereyes

    As any tool, priming can be used for good or bad causes. Priming abuse can be a deliberate exposure to non-relevant information in order to manipulate the audience. A good way to use priming is to educate the listeners of its effect, and repeatedly exposing them to alternate contexts. Alternatively, reminding the audience of the “before” graph, before showing them the similar “after” situation will also create a plausible effect of context setting.

    P.S. The paper mentioned by Kosara is noticeable not only by its results (they are not as astonishing as I expected from the featured image) but also by how the authors report their research, including the failures.

    Featured image is Figure 1 from Calero Valdez et al. Priming and Anchoring Effects in Visualization

    October 17, 2017 - 2 minute read -
    Data Visualization dataviz manipulation presenting priming psychology teaching blog
  • Advice for aspiring data scientists and other FAQs — Yanir Seroussi

    Advice for aspiring data scientists and other FAQs — Yanir Seroussi

    October 15, 2017

    It seems that career in data science is the hottest topic many data scientists are asked about. To help an aspiring data scientist, I’m reposting here a FAQ by my teammate Yanir Seroussi

    Aspiring data scientists and other visitors to this site often repeat the same questions. This post is the definitive collection of my answers to such questions (which may evolve over time). How do I become a data scientist? It depends on your situation. Before we get into it, have you thought about why you want […]

    via Advice for aspiring data scientists and other FAQs — Yanir Seroussi

    October 15, 2017 - 1 minute read -
    advice career data science blog Career advice
  • How to be a better teacher?

    How to be a better teacher?

    October 12, 2017

    If you know me in person or follow my blog, you know that I have a keen interest in teaching. Indeed, besides being a full-time data scientist at Automattic, I teach data visualization anywhere I can. Since I started teaching, I became much better in communication, which is one of the required skills of a good data scientist.
    In my constant strive for improving what I do, I joined the Data Carpentry instructor training. Recently, I got my certification as a data carpentry instructor.

    Certificate of achievement. Data Carpentry instructor

    Software Carpentry (and it’s sibling project Data Carpentry) aims to teach researchers the computing skills they need to get more done in less time and with less pain. “Carpentry” instructors are volunteers who receive a pretty extensive training and who are committed to evidence-based teaching techniques. The instructor training had a powerful impact on how I approach teaching. If teaching is something that you do or plan to do, invest three hours of your life watching this video in which Greg Wilson, “Carpentries” founder, talks about evidence-based teaching and his “Carpentries” project.

    https://www.youtube.com/watch?v=kmVKGxPlTvc

    I also recommend reading these papers, which provide a brief overview of some evidence-based results in teaching:

    * "[The Science of Learning](https://swcarpentry.github.io/instructor-training/files/papers/science-of-learning-2015.pdf)"
    * "[Success in Introductory Programming: What Works?](https://swcarpentry.github.io/instructor-training/files/papers/porter-what-works-2013.pdf)"
    * "[What Can I Do Today to Create a More Inclusive Community in CS?](https://swcarpentry.github.io/instructor-training/files/papers/lee-create-inclusive-community-2015.pdf)"
    
    October 12, 2017 - 1 minute read -
    advice career teaching video work blog Career advice
  • What you need to know to start a career as a data scientist

    What you need to know to start a career as a data scientist

    October 11, 2017

    It’s hard to overestimate how I adore StackOverflow. One of the recent blog posts on StackOverflow.blog is “What you need to know to start a career as a data scientist” by Julia Silge. Here are my reservations about that post:

    1. It’s not that simple (part 1)

    You might have seen my post “Don’t study data science as a career move; you’ll waste your time!”. Becoming a good data scientist is much more than making a decision and “studying it”.

    2. Universal truths mean nothing

    The first section in the original post is called “You’ll learn new things”. This is a universal truth. If you don’t “learn new things” every day, your professional career is stalling. Taken from the word of classification models, telling a universal truth has a very high sensitivity but very low specificity. In other words, it’s a useless waste of ink.

    3. Not for developers only

    The first section starts as follows: “When transitioning from a role as a developer to a position focused on data, …”. Most of the data scientists I know were never developers. I, for example, started as a pharmacist, computational chemist, and bioinformatician. I know several physicists, a historian and a math teacher who are now successful data scientists.

    4. SQL skills are overrated

    Another quote from the post: “Strong SQL skills are table stakes for data scientists and data engineers”. The thing is that in many cases, we use SQL mostly to retrieve data. Most of the “data scienc-y” work requires analytical tools and the flexibility that are not available in most of the SQL environments. Good familiarity with industry-standard tools and libraries are more important than knowing SQL. Statistics is way more important than knowing SQL. Julia Silge did indeed mention the tools (numpy/R) but didn’t emphasize them enough.

    5. Communication importance is hard to overestimate

    Again, quoting the post:

    The ability to communicate effectively with people from diverse backgrounds is important.

    Yes, Yes, and one thousand times Yes. Effective communication is a non-trivial task that is often overlooked by many professionals. Some people are born natural communicators. Some, like me, are not. If there’s one book that you can afford buying to improve your communication skills, I recommend buying “Trees, maps and theorems” by Jean-luc Doumont. This is a small, very expensive book that changed the way I communicate in my professional life.

    6. It’s not that simple (part 2)

    After giving some very general tips, Julia proceeds to suggest her readers checking out the data science jobs at StackOverflow Jobs site. The impression that’s made is that becoming a data scientist is a relatively simple task. It is not. At the bare minimum, I would mention several educational options that are designed for people trying to become data scientists. One such an option is Thinkful (I’m a mentor at Thinkful). Udacity and Coursera both have data science programs too. The point is that to become a data scientist, you have to study a lot. You might notice a potential contradiction between point 1 above and this paragraph. A short explanation is that becoming a data scientist takes a lot of time and effort. The post “Teach Yourself Programming in Ten Years” which was written in 2001 about programming is relevant in 2017 about data science.

    Featured image is based on a photo by Jase Ess on Unsplash

    October 11, 2017 - 3 minute read -
    advice career data science life opinion blog Career advice
  • Graffiti from Chișinău, Moldova

    Graffiti from Chișinău, Moldova

    October 10, 2017

    I’ve stumbled upon a nice post by Jackie Hadel where she shared some graffiti pictures from - Chișinău, the town I was born at. I left Chișinău in 1990 and first visited it in this March. I also took several graffiti pictures which I will share here. Chișinău is also known by its Russian name Kishinev.

    Graffiti in Chisinau. Kishinevers, put your all efforts to rebuild your native city

    This is a partially restored post-WWII writing that says “Kishinevers, give your all efforts to rebuild [your] native town”. Kishinev was ruined almost completely during the World War II. Right now, after the USSR collapse more than 25 years ago, the city still looks as if it needs to be restored.

    Graffiti in Chisinau. Pythagorean theorem.

    Being a data scientist, I liked this graffiti for the maths. It’s the Pythagorean theorem, in case you missed it.

    Swastika on a tombstone in Chisinau

    Swastika on a tombstone in the old Jewish cemetery. One of the saddest places I visited in this city.

    Graffiti in Chisinau. Building-size graffity.

    A mega-graffiti?

    Graffiti in Chisinau. Writing that says "I love Moldova" (in Romanian)

    “I love Moldova”. I love it too.

    See the original post that prompted me to share these pictures: CHISINAU, MOLDOVA GRAFFITI: LEFT IN RUIN, YOU MAKE ME HAPPY — TOKIDOKI (NOMAD)

    15july17 Chisinau, Moldova 🇲🇩

    October 10, 2017 - 2 minute read -
    chisinau graffiti kishinev moldova travel blog
  • Identifying and overcoming bias in machine learning

    Identifying and overcoming bias in machine learning

    October 8, 2017

    Data scientists build models using data. Real-life data captures real-life injustice and stereotypes. Are data scientists observers whose job is to describe the world, no matter how unjust it is? Charles Earl, an excellent data scientist, and my teammate says that the answer to this question is a firm “NO.” Read the latest data.blog post to learn Charles’ arguments and best practices.

    https://videopress.com/embed/jckHrKeF?hd=0&autoPlay=0&permalink=0&loop=0

    Charles Earl on identifying and overcoming bias in machine learning.

    via Data Speaker Series: Charles Earl on Discriminatory Artificial Intelligence — Data for Breakfast

    October 8, 2017 - 1 minute read -
    data science diversity inclusion blog
  • Before and after — the Hebrew holiday season chart

    Before and after — the Hebrew holiday season chart

    October 8, 2017

    Sometimes, when I see a graph, I think “I could draw a better version.” From time to time, I even consider writing a blog post with the “before” and “after” versions of the plot. Last time I had this desire was when I read the repost of my own post about the crazy month of Hebrew holidays. I created this graph three years ago. Since then, I have learned A LOT. So I thought it would be a good opportunity to apply my over-criticism to my own work. This is the “before” version:

    Graph: Tishrei is mostly a non-working month.

    There are quite a few points worth fixing in that plot. Let’s review those problems:

    * The point of the original post is to emphasize the amount of NON-working days in Tishrei. However, the largest points represent the working days. As the result, the emphasis goes to the working days, thus reversing the semantics.
    * It is not absolutely clear what point I intended to make using this graph. A short and meaningful title is an effective way to lead the audience towards the desired conclusion.
    * There are three distinct colors in my graph, representing working, half-working and non-working days. The category order is clear. The color order, on the other hand, is absolutely arbitrary. Moreover, green and red are never a good color combination due to the significantly high prevalence of impaired color vision.
    * Y label is rotated. Rotated Y labels are the default option in all the plotting tools that I know. Why is that is beyond my understanding, given the numerous studies that show that reading rotated text takes more time and is more error-prone (for example, see [ref](http://journals.sagepub.com/doi/abs/10.1177/154193120204601722), [ref](http://jov.arvojournals.org/article.aspx?articleid=2121153), and [ref](http://psycnet.apa.org/record/1986-10970-001).)
    * One interesting piece of information that one might expect to read from a graph is how many working days are there in year X. One can obtain this information either by counting the dots or by looking at a separate graph. It would be a good idea to make this information readily available to the observer.
    * The frame around the plot is useless.
    

    OK, now that we have identified the problems, let’s fix them

    * **Emphasize the right things.** I will use bigger points for the non-working days and small ones for the working days. I will also use squares instead of circles. Placing several squares one next to the other creates solid areas with less white space in-between. This lack of whitespace will help further emphasizing non-working chunks. I will make to leave *some* whitespace between the points, to enable counting.
    * **What's your point?** I will add an explanatory title. Having given some thought, I came up with "How productive can you be?". It is short, thought-provoking, and makes the point.
    * **Reduce the number of colors. **My intention was to use red for non-working days, and blue for the working ones. What color should I use for the half-working ([Chol haMoed](https://en.wikipedia.org/wiki/Chol_HaMoed)) days? I don't want to introduce another color to the improved graph. Since in my case, those days are mostly non-working, I will use a shade of red for Chol haMoed.
    * **Improve label readability. **One way to solve the rotated Y label problem is to remove the Y label at all! After all, most people will correctly assume that "2006", "2010", "2020" and other values represent the years. However, the original post mentions two different methods to count the years, using the Hebrew and Christian traditions. To make it absolutely clear that the graph talks about the Christian (common) calendar, I decided to keep the legend and format it properly.
    * **Add more info. **I added the total number of working days as a separate column of properly aligned gray text labels. The gray color ensures that the labels don't compete with the graph.  I also highlighted the current year using a subtle background rectangle.
    * **Data-ink ratio. **I removed the box around the graph and got rid of lines for the X and Y axes. I also removed the vertical grid lines. I wasn't sure about the horizontal ones but I decided to keep them in place.
    

    This is the result:

    tishrei_working_days_after.png

    I like it very much. I’m sure though, that if I revisit it in a year or two, I will find more ways to make it even better.

    You may find the code that generates this figure here.

    October 8, 2017 - 4 minute read -
    before-after Data Visualization dataviz blog
  • Pseudo-rehearsal: A simple solution to catastrophic forgetting for NLP

    Pseudo-rehearsal: A simple solution to catastrophic forgetting for NLP

    October 2, 2017

    Frequently, training a machine learning model in a single session is impossible. Most commonly, this happens when one needs to update a model with newly obtained observations. The generic term for such an update is “online learning.” In the scikit-learn world, this concept is also known as partial fit. The problem is that some models or their implementations don’t allow for partial fit. Even if the partial fitting is technically possible, the weight assigned to the new observations is may not be under your control. What happens when you re-train a model from scratch, or when the new observations are assigned too high weights? Recently, I stumbled upon an interesting concept of Pseudo-rehearsal that addresses this problem. Citing Matthew Honnibal:

    Sometimes you want to fine-tune a pre-trained model to add a new label or correct some specific errors. This can introduce the “catastrophic forgetting” problem. Pseudo-rehearsal is a good solution: use the original model to label examples, and mix them through your fine-tuning updates.

    This post is written by Matthew Honnibal from the team behind the excellent Spacy NLP library. This post is valuable in many aspects. First, it demonstrates a simple-to-implement technique. More importantly, it provides the True Name for a problem I encounter from time to time: Catastrophic forgetting.

    Featured image is by Flickr user Herr Olsen under CC-by-nc-2.0


    October 2, 2017 - 1 minute read -
    machine learning blog
  • 16-days work month — The joys of the Hebrew calendar

    16-days work month — The joys of the Hebrew calendar

    September 27, 2017

    Tishrei is the seventh month of the Hebrew calendar that starts with Rosh-HaShana — the Hebrew New Year. It is a 30 days month that usually occurs in September-October. One interesting feature of Tishrei is the fact that it is full of holidays: Rosh-HaShana (New Year), Yom Kippur (Day of Atonement), first and last days of Sukkot (Feast of Tabernacles) **. All these days are rest days in Israel. Every holiday eve is also a *de facto rest day in many industries (high tech included). So now we have 8 resting days that add to the usual Friday/Saturday pairs, resulting in very sparse work weeks. But that’s not all: the period between the first and the last Sukkot days are mostly considered as half working days. Also, the children are at home since all the schools and kindergartens are on vacation, so we will treat those days as half working days in the following analysis.

    I have counted the number of business days during this 31-day period (one day before the New Year plus the entire month of Tishrei) between 1993 and 2020 CE, and this is what we get:

    tishrei_working_days

    Overall, this period consists of between 15 to 17 non-working days in a single month (31 days, mind you). This is how the working/not-working time during this month looks like this:

    tishrei_workign_weeks.png

    Now, having some vacation is nice, but this month is absolutely crazy. There is not a single full working week during this month. It is very similar to constantly interrupted work day, but at a different scale.

    So, next time you wonder why your Israeli colleague, customer or partner barely works during September-October, recall this post.

    (*) New Year starts in the seventh’s month? I know this is confusing. That’s because we number Nissan – the month of the Exodus from Egypt as the first month.
    (**)If you are an observing Jew, you should add to this list Fast of Gedalia, but we will omit it from this discussion

    September 27, 2017 - 2 minute read -
    Israel blog
  • On data beauty and communication style

    On data beauty and communication style

    August 18, 2017

    There’s an interesting mini-drama going on in the data visualization world. The moderators of DataIsBeautiful invited Stephen Few for an ask-me-anything (AMA) session. Stephen Few is a data visualization researcher and an opinionated blogger. I use his book “Show Me the Numbers” when I teach data visualization. Both in his book and even more so, on his blog, Dr. Few is not afraid of criticizing practices that fail to meet his standards of quality. That is why I wasn’t surprised when I read Stephen Few’s public response to the AMA invitation:

    I stridently object to the work of lazy, unskilled creators of meaningless, difficult to read, or misleading data displays. … Many data visualizations that are labeled “beautiful” are anything but. Instead, they pander to the base interests of those who seek superficial, effortless pleasure rather than understanding, which always involves effort.

    This response triggered some backlash. Randal Olson (a prominent data scientists and a blogger, for example, called his response “petty”:

    https://twitter.com/randal_olson/status/898244310600228865

    I have to respectfully disagree with Randy. Don’t get me wrong. Stephens Few’s response style is indeed harsh. However, I have to agree with him. Many (although not all) data visualization cases that I saw on DataIsBeatiful look like data visualization for the sake of data visualization. They are, basically, collections of lines and colors that demonstrate cool features of plotting libraries but do not provide any insight or tell any (data-based) story. From time to time, we see pieces of “data art,” in which the data plays a secondary role, and have nothing to do with “data visualization” where the data is the “king.” I don’t consider myself an artistic person, but I don’t appreciate the “art” part of most of the data art pieces I see.

    So, I do understand Stephen Few’s criticism. What I don’t understand is why he decided to pass the opportunity to preach to the best target audience he can hope for. It seems to me that if you don’t like someone’s actions and they ask you for advice, you should be eager to give it to them. Certainly not attacking them. Hillel, an ancient Jewish scholar, said

    He who is bashful can’t learn, and he who is harsh can’t teach

    Although I don’t have a fraction of teaching experience that Dr. Few has, I’m sure he would’ve achieved better results had he chosen to accept that invitation.

    Disclaimer: Stephen Few was very generous to allow me using the illustrations from his book in my teaching.

    August 18, 2017 - 2 minute read -
    argument Data Visualization dataviz teaching blog
  • Accepting payments on a WordPress.com site? Easy!

    Accepting payments on a WordPress.com site? Easy!

    August 18, 2017

    This is an exciting feature available to any WordPress.com Premium and Business users, and on Jetpack sites running version 5.2 or higher. The button looks like this:

    [simple-payment id=”757”]

    This page has all the information you need to know about the PayPal button.

    August 18, 2017 - 1 minute read -
    blogging feature wordpress-com blog
  • On procrastination

    On procrastination

    August 17, 2017

    I don’t know anyone, except my wife, who doesn’t consider themselves procrastinator. I procrastinate a lot. Sometimes, when procrastinating, I read about procrastination. Here’s a list of several recent blog posts about this topic. Read these posts if you have something more important to do*.

    procrastination_quote

    An Ode to the Deadlines competes with An Ode to Procrastination.

    I’ll Think of a Title Tomorrow Talks about procrastination from a designer’s point of view. Although it is full of known truths, such as “stop thinking, start doing”, “fear is the mind killer”, and others, it is nevertheless a refreshing reading.

    The entire blog called Unblock Results is written by Nancy Linnerooth who seems to position herself as a productivity coach. I liked her last post The Done ListThe Done List that talks about a nice psychological trick of running Done lists instead of Todo lists. This trick plays well with the productivity system that I use in my everyday life. One day, I might describe my system in this blog.

    We all know that reading can sometimes be hard. Thus, let me suggest a TED talk titled Inside the mind of a master procrastinator. You’ll be able to enjoy it with a minimal mental effort.


    *The pun is intended
    Featured image is by Flickr user Vic under CC-by-2.0 (cropped)
    The graffiti image is by Flickr user katphotos under CC-by-nc-nd

    August 17, 2017 - 2 minute read -
    procrastination productivity blog Productivity & Procrastination
  • Fashion, data, science

    Fashion, data, science

    August 16, 2017

    Zalando is an e-commerce company that sells shoes, clothing and other fashion items. Zalando isn’t a small company. According to Wikipedia, it’s 2015 revenue was almost 3 billion Euro. As you might imagine, you don’t run this kind of business without proper data analysis. Recently, we had Thorsten Dietzsch, a product manager for personalization at the fashion e-commerce at Zalando, joining our team meeting to tell us about how data science works at Zalando. It was an interesting conversation, which is now publically available online.

    [wpvideo 9BSbPlBe]

    In the first of our Data Speaker Series posts, Thorsten Dietzsch shares how data products are managed at Zalando, a fashion ecommerce company.

    via Data Speaker Series: Thorsten Dietzsch on Building Data Products at Zalando — Data for Breakfast

    Featured image: By Flickr user sweetjessie from here. Under the CC BY-NC 2.0 license

    August 16, 2017 - 1 minute read -
    data science fashion industry blog
  • Anomaly detection in time series — now the video

    Anomaly detection in time series — now the video

    August 14, 2017

    Two months ago, on the PyCon-IL conference, I gave a lecture called “Time Series Analysis: When “Good Enough” is Good Enough”. You may find the written version of this talk here. Today, the conference organizers published all the conference talks on YouTube. Here’s mine:

    https://youtu.be/UwkNmXhWmfI?t=15s

    August 14, 2017 - 1 minute read -
    a2f2 anomaly-detection conference presenting talking video blog
  • Эээх-ухнем. Как не забросить свой блог

    Эээх-ухнем. Как не забросить свой блог

    August 12, 2017

    Как это не печально, большинство начинающих блоггеров забрасывают свой блог вскоре после его открытия. Что отличает успешных (стойких?) блоггеров от тех, которым не удаётся продержаться? Стоит ли вести коллективные блоги, и если да, как важно распределение труда между авторами?
    В этой лекции мы попытаемся пролить свет на эти вопросы, анализируя поведение более пяти миллионов пользователей WordPress.com.

    Слайды презентации находятся здесь.

    По этой ссылке находится пост на английском, который я написал, когда впервые опубликовал результаты этого исследования.

    August 12, 2017 - 1 minute read -
    blogging research blog
  • This Week in Data Reading

    This Week in Data Reading

    July 26, 2017
    July 26, 2017 - 1 minute read -
    blog
  • Avoiding being a 'trophy' data scientist

    Avoiding being a 'trophy' data scientist

    July 24, 2017

    In this excellent post, Peadar Coyle lists several anti-patterns in running a data science team. This is an excellent post to read (and a blog to follow).

    July 24, 2017 - 1 minute read -
    blog
  • A successful failure

    A successful failure

    July 23, 2017

    Almost half a year ago, I decided to create an online data visualization course. After investing hundreds of hours, I managed to release the first lecture and record another one. However, I decided not to publish new lectures and to remove the existing one from the net. Why? The short answer is a huge cost-to-benefit ratio. For a longer answer, you will have to keep reading this post.

    Why creating a course?

    It’s not that there are no good courses. There are. However, most of them are tightly coupled with one tool or another. Moreover, many of the courses I have reviewed online are act as an advanced tutorial of a certain plotting tool. The course that I wanted to create was supposed to be tool-neutral, full of theoretical knowledge and bits of practical advice. Another decision that I made was not to write a set of text files (online book, set of Jupyter notebooks, whatever) but to create a course in which the majority of the knowledge is brought to the audience by the means of frontal video lectures. I assumed that this kind of format will be the easiest for the audience to consume.

    What went wrong?

    So, what went wrong? First of all, you should remember that I work full time at Automattic, which means that every side project is a … side project, that I have to do during my free time. I realized that since the very beginning. However, since I already teach data visualization in different institutions in Israel, I already have a well-formed syllabus with accompanying slide decks full of examples. I assumed that it will take me not more than one hour to prepare every online lecture.

    [caption id=”attachment_630” align=”alignright” width=”225”]Green screen and a camera in a typical green room setup

    Green room. All my friends were very impressed to see it[/caption]

    So, instead of verifying this assumption, I started solving the technical problems, such as buying a nice microphone (which turned out to be a crap), tripods, building a green room in my home office, etc. Once I was satisfied with my technical setup, I decided to record a promo video. Here, I faced a big problem. You see, talking to people and to the camera are completely different things. I feel pretty comfortable talking to people but when I face the camera, I almost freeze. Also, in person-to-person communication, we are somewhat tolerant to small studdering and longish pauses. However, when watching recorded video clips, we expect television quality narration. It turns out that achieving this kind of narration is very hard. Add the fact that English is my third language, and you get a huge time drain. To be able to record a two-minute promo video, I had to write the entire script, rehearse it for a dozen of times, and record it in front of a teleprompter. The filming session alone took around half an hour, as I had to repeat almost every line, time after time.

    [caption id=”attachment_648” align=”alignright” width=”300”]Screenshot of my YouTube video with 18 views

    18 views.[/caption]

    Preparing slide decks for the lectures wasn’t an easy task either. Despite the fact that I had pretty good slide decks, I realized that they are good for an in-class lecture, where I can point to the screen, go back and forth within a presentation, open external URL’s etc. Once I had my slide decks ready, I faced the narration problem once again. So, I had to write the entire lesson’s script, edit it, rehearse for several days, and shoot. At this time, I became frustrated. I might have been more motivated had my first video received some real traffic. However, with 18 (that’s eighteen) views, most of which lasted not more than a minute or two, I hardly felt a YouTube super star. I know that it’s impossible to get a real traction in such a short period, without massive promotion. However, after I completed shooting the second lecture, I realized that I will not be able to do it much longer. Not without quitting my day job. So, I decided to quit.

    What now?

    Since I already have pretty good texts for the first two lectures, I might be able to convert them to posts in this blog. I also have material for some before-and-after videos that I planned to have as a part of this course. I will make convert them to posts, too, similar to this post on the data.blog.

    Was it worth it?

    It certainly was! During the preparations, I learned a lot. I learned new things about data visualization. I took a glimpse into the world of video production. I had a chance to restructure several of my presentations.


    Featured image for this post by Nicolas Nova under the CC-by license.

    July 23, 2017 - 4 minute read -
    advice Data Visualization dataviz failure online-education blog
  • The first lesson of the data visualization course is available

    The first lesson of the data visualization course is available

    July 7, 2017

    The first lesson of the course Data Visualization Beyond the Tutorial is online! Go to the lesson page to watch the lesson video. There’s also an assignment!

    Do you know a friend, a colleague, a classmate who needs to communicate numbers as part of their works? Let them know about this course. They will thank you :-)

    https://youtu.be/N54OeCNTaLU

    July 7, 2017 - 1 minute read -
    course Data Visualization data-visualization-beyond-the-tutorial tutorial blog
  • Correction about the course start date

    Correction about the course start date

    June 27, 2017

    The first lecture of the data visualization course will be published on July 7 (7/7/17). There was a typo in the original announcement.

    June 27, 2017 - 1 minute read -
    course Data Visualization data-visualization-beyond-the-tutorial dataviz teaching blog
  • I have created an online data visualization course

    I have created an online data visualization course

    June 26, 2017

    Free online course. Data Visualization Beyond the Tutorial. https://gorelik.net/course

    If you create charts using your tool’s default settings and your intuition, chances are you’re doing it wrong.

    Let me present you an online course that dives into the theory of data visualization and its practical aspects. Every lecture is accompanied by before & after case studies and learner assignments. The course is tool-neutral. It doesn’t matter if you use Python, R, Excel, or pen, and paper.

    The first lecture will be published on July 7th. Future lectures will follow every two weeks. Meanwhile, you may visit the course page and watch the intro video. Follow this blog so that you don’t miss new lectures!

    Please spread the word! Reblog this post, share it on Twitter (@gorelik_boris), Facebook, LinkedIn or any other network. Tell about this course to your colleagues and friends. The more learners will take this course, the happier I will be.

    June 26, 2017 - 1 minute read -
    course Data Visualization data-visualization-beyond-the-tutorial dataviz teaching blog
  • Data is NOT the new gold

    Data is NOT the new gold

    June 18, 2017

    A couple of days ago, I read the excellent post by Bob Rudis about data ethics and the importance of keeping users’ data safe. In this post, Bob recited the mantra I have heard for the past several years that “data is the new gold.” Comparing something to gold implies that it is scarce, unchangeable and has zero utility value. Data is neither, it’s ubiquitous, ever-changing and has some utility value of its own.

    I think that oil (petroleum) is a better analogy for data. Much like the oil, data has some utility value by itself but is most valuable when properly distilled, processed and transformed.

    Regardless of the analogy, I highly recommend reading Bob Rudis’ post.

    I caught a mention of this project by Pete Warden on Four Short Links today. If his name sounds familiar, he’s the creator of the DSTK, an O’Reilly author, and now works at Google. A decidedly clever and decent chap. The project goal is noble: crowdsource and make a repository of open speech data for…

    via Keeping Users Safe While Collecting Data — rud.is

    June 18, 2017 - 1 minute read -
    ethics read-recommendation blog

  • "Deliver first, improve later"

    June 13, 2017

    This is the approach behind “minimal viable product”. It is also valid for data science solutions.

    June 13, 2017 - 1 minute read -
    blog
  • Time Series Analysis: When “Good Enough” is Good Enough

    Time Series Analysis: When “Good Enough” is Good Enough

    June 12, 2017

    My today’s talk at PyCon Israel in a post format.

    June 12, 2017 - 1 minute read -
    anomaly-detection conference machine learning talking blog
  • The strange loop in deep learning — a recommended reading

    The strange loop in deep learning — a recommended reading

    June 8, 2017

    https://medium.com/intuitionmachine/the-strange-loop-in-deep-learning-38aa7caf6d7d

    June 8, 2017 - 1 minute read -
    blog
  • Older posts Newer posts