Further Research is Needed

Do you believe in telepathy? Yesterday, I submitted final proofs of a paper in which I actively participated. During the proofreading, I noticed that our abstract ends with “further research is needed” and scratched my head. I submitted the proofs and then then, I saw this pearl in my blog feed

Further Research is Needed — xkcd.com

Yes, your friends are more successful than you are. On “The Majority Illusion in Social Networks”

Recently, I re-read “The Majority Illusion in Social Networks” (by Lerman, Yan and Wu).

The starting point of this paper is the friendship paradox — a situation when a node in a network has fewer friends that its friends have. The authors expand this paradox to what they call “the majority illusion” — a situation in which a node may observe that the majority of its friends have a particular property, despite the fact that such a property is rare in the entire network.

An illustration of the “majority illusion” paradox. The two networks are identical, except for which three nodes are colored. These are the “active” nodes and the rest are “inactive.” In the network on the left, all “inactive” nodes observe that at least half of their neighbors are “active,” while in the network on the right, no “inactive” node makes this observation.F

Besides pointing out the existence of majority illusion phenomenon, the authors used synthetic networks to characterize the situations in which this phenomenon is most prevalent.

 

Quoting the authors:

the paradox is stronger in networks in which the better-connected nodes are active, and also in networks with a heterogeneous degree distribution. […] The paradox is strongest in networks where low degree nodes have the tendency to connect to high degree nodes. […] Activating the high degree nodes in such networks biases the local observations of many nodes, which in turn impacts collective phenomena

The conditions listed in the quote above describe a lot of known social networks. The last sentence in that quote is of a special interest. It explains the contagious nature of many actions, from sharing a meme to buying a new car.

 

What is the best thing that can happen to your career?

Today, I’ve read a tweet by Sinan Aral (@sinanaral) from the MIT:

 

I’ve just realized that Ikigai is what happened to my career as a data scientist. There was no point in my professional life where I felt boredom or lack of motivation. Some people think that I’m good at what I’m doing. If they are right (which I hope they are), It is due to my love for what I have been doing since 2001. I am so thankful for being able to do things that I love, I care about, and am good at. Not only that, I’m being paid for that! The chart shared by Sinan Aral in his tweet should be guiding anyone in their career choices.

 

Featured image is taken from this article. Original image credit: Toronto Star Graphic 

What you need to know to start a career as a data scientist

It’s hard to overestimate how I adore StackOverflow. One of the recent blog posts on StackOverflow.blog is “What you need to know to start a career as a data scientist” by Julia Silge. Here are my reservations about that post:

1. It’s not that simple (part 1)

You might have seen my post “Don’t study data science as a career move; you’ll waste your time!“. Becoming a good data scientist is much more than making a decision and “studying it”.

2. Universal truths mean nothing

The first section in the original post is called “You’ll learn new things”. This is a universal truth. If you don’t “learn new things” every day, your professional career is stalling. Taken from the word of classification models, telling a universal truth has a very high sensitivity but very low specificity. In other words, it’s a useless waste of ink.

3. Not for developers only

The first section starts as follows: “When transitioning from a role as a developer to a position focused on data, …”. Most of the data scientists I know were never developers. I, for example, started as a pharmacist, computational chemist, and bioinformatician. I know several physicists, a historian and a math teacher who are now successful data scientists.

4. SQL skills are overrated

Another quote from the post: “Strong SQL skills are table stakes for data scientists and data engineers”. The thing is that in many cases, we use SQL mostly to retrieve data. Most of the “data scienc-y” work requires analytical tools and the flexibility that are not available in most of the SQL environments. Good familiarity with industry-standard tools and libraries are more important than knowing SQL. Statistics is way more important than knowing SQL. Julia Silge did indeed mention the tools (numpy/R) but didn’t emphasize them enough.

5. Communication importance is hard to overestimate

Again, quoting the post:

The ability to communicate effectively with people from diverse backgrounds is important.

Yes, Yes, and one thousand times Yes. Effective communication is a non-trivial task that is often overlooked by many professionals. Some people are born natural communicators. Some, like me, are not. If there’s one book that you can afford buying to improve your communication skills, I recommend buying “Trees, maps and theorems” by Jean-luc Doumont. This is a small, very expensive book that changed the way I communicate in my professional life.

6. It’s not that simple (part 2)

After giving some very general tips, Julia proceeds to suggest her readers checking out the data science jobs at StackOverflow Jobs site. The impression that’s made is that becoming a data scientist is a relatively simple task. It is not. At the bare minimum, I would mention several educational options that are designed for people trying to become data scientists. One such an option is Thinkful (I’m a mentor at Thinkful). Udacity and Coursera both have data science programs too. The point is that to become a data scientist, you have to study a lot. You might notice a potential contradiction between point 1 above and this paragraph. A short explanation is that becoming a data scientist takes a lot of time and effort. The post “Teach Yourself Programming in Ten Years” which was written in 2001 about programming is relevant in 2017 about data science.

Featured image is based on a photo by Jase Ess on Unsplash