Prompt engineers, the sexiest job of the third decade of the 21st century (?), or Don’t study prompt engineering as a career move, you’ll waste your time

Do you recall when data scientists were the talk of the town? Dubbed the sexiest job of the 21st century, they boasted a unique blend of knowledge and skills. I still remember the excitement I felt when I realized that the work I did had a name, and the warm feeling I got when I saw those cool Venn diagrams showing just how awesome data scientists were. Well, it’s time for data scientists to step aside and make way for the new heroes in town: the Prompt Engineers!

The demand for prompt engineers is soaring, and it seems like everyone is trying to become one. But what exactly is a prompt engineer, and what are my thoughts on this new profession?

Let’s take a step back in time: we started with assembly languages, and then a language called Formula Translator (better known as Fortran), which significantly lowered the barrier of entry into the field. I’m sure back then, people rolled their eyes and said that with the emergence of high-level programming languages, anyone could now take any formula and get an output, without understanding how semiconductors worked.

Fast forward to today. What do prompt engineers do? They essentially translate their domain knowledge, language understanding, and AI algorithm expertise into computer output (sounds like “ForTran,” right?). Prompt engineering is, in essence, a super-high-level programming language. Over time, I believe we’ll see dedicated tools and established standards emerge. But for now, it’s a wild, untamed frontier.

In 2017, I wrote a blog post titled “Don’t study data science as a career move; you’ll waste your time!“. Until today, this is the most read post in my blog. Now, it’s time for a new warning: “Don’t study prompt engineering as a career move; you’ll waste your time!”

Meanwhile, here’s a nice Venn diagram for you 🙂

Not a feature but a bug. Why having only superstars in your team can be a disaster.

Read this to learn about well-rounded teams that can effectively collaborate and communicate. As an experienced team leader and builder, contact me to learn more about my services and how I can help you achieve better outcomes.

As a freelancer and a manager, I have worked with many companies and teams. Recently,  I talked to a CEO who built a data science team that consisted of several “wonder kids” who obtained University degrees before graduating high school. The CEO was very proud of them. However, he complained that they don’t deliver as expected. This made me realize that having only superstars is not a feature but a bug.

The fact is that most of us are average, even geniuses are average in most aspects. Richard Feynman, the Nobel laureate physicist, was also a painter, musician, and an excellent teacher, but he is unique. I, for example, tend to think of myself as an excellent generalizer, leader, and communicator. However, I need help with attention to detail and deep domain-specific knowledge. To work well, I need to have pedantic specialists in my team. Why? Because, on average, I’m average.

Most “geniuses” are extremely talented in one field but still need help in others. Many tend to be individual workers, meaning their team communication is often suboptimal. Additionally, the fact that the entire team is very young also means they need more expertise in project management, inter-team communication, business orientation, or even enough real-life experience. The result: a disaster. That company got a team of solo players who don’t communicate within the team, don’t communicate with other teams, and don’t deliver on time.

What do I suggest? They say that “A’s hire A’s”. However, this doesn’t mean that each “A person” must ace the same field. A good team needs an A generalizer, an A specialist, an A communicator, and an A business expert. If you only hire “A++ specialists,” you risk ending up with a group of individuals who are “C-” communicators.

As another CEO I consulted once told me, “genius developers can do 10x job. They also tend to enter rabbit holes, and if unattended, they can do 10x damage.” If you build a team, you cannot afford to have unbalanced expertise sets. 

The bottom line is to ensure your team is diverse in its capabilities. Hiring only superstars may seem like a good idea, but it can result in a lack of collaboration, communication, and the necessary skills to succeed as a team. A diverse team with various skills and expertise is essential for achieving better outcomes.

In conclusion, avoid falling into the trap of thinking that only superstars can make a great team. Instead, focus on creating a diverse team with various skills, and you’ll be surprised at how much your team can achieve.

New position, new challenge

I will skip the usual “I’m thrilled and excited…”. I’ll just say it.
As of today, I am the CTO of wizer.me, a platform for teachers and educators to create and share interactive worksheets.

On a scale of 1 to 10, how thrilled am I? 10
On a scale of 1 to 10, how terrified am I? 10
On a scale of 1 to 10, how confident am I that wizer.me will become the “next big thing” and the most significant chapter in my career? You won’t believe me, but also 10.

Experiment report

In January 2020, I started a new experiment. I quit what was a dream job and became a freelancer. Today, the experiment is over. This post serves as omphaloskepsis – a short reflection on what went well and what could have worked better.

What worked well?

To sum up, I declare this experiment successful. I had a chance to work with several very interesting companies. I got exposed to business models of which I wasn’t aware. Most importantly, I met new intelligent and ambitious people. I also had a chance to feel by myself how it feels to be self-employed, to see the behind-the-scenes of several freelancers and entrepreneurs. I learned to appreciate the audacity and the courage of people who don’t rely on monthly paychecks and take much more responsibility for their lives than the vast majority of the “salarymen.”

Let’s talk about money. Was it worth it in terms of $$$$$ (or ₪₪₪₪₪₪)? Objectively speaking, my financial situation remained approximately unchanged. Towards the end of the experiment, I found myself overbooked, which means that, in theory, I could have increased my income substantially. But this is only in theory. In practice, I decided to end the freelance experiment and “settle down”.

What could have been better?

So, was it peachy? Not at all. For me, being a freelancer is much more stressful than being a hired employee. The stress does not come exclusively from the need to make sure one has enough projects in the pipeline (I had enough of them, most of the time). The more significant source of stress came from the lack of focus, the need for EXTREME context switching, and the lack of a team. 

I did receive one suggestion to mitigate this source of stress; however, when I heard it, I already had several job offers and was already 90% committed to accepting the position at MyBiotics.

To sum up

I’m am very happy I did this experiment. I learned a lot; I enjoyed a lot (and suffered a lot too), I met new people, and I changed the way I think about many things. Was it a good idea? Yes, it was. Should you try becoming a freelancer? How the hell can I know that? It’s your life; you enjoy the success and take the risk of failure. 

A new phase in my professional life

rbt

I’m excited to announce that I’m joining MyBiotics Pharma Ltd as the company’s Head of Data and Bioinformatics. I have been working with this fantastic company and its remarkable people as a freelancer for fourteen fruitful months. But today, I join the MyBiotics family as a full-time member. Together, we will strive to better understanding the interactions between humans and their microbiome to improve health and well-being.

rbt

Career advise. Upgrading data science career

Photo by Kelly Lacy on Pexels.com

From time to time, people send me emails asking for career advice. Here’s one recent exchange.

Hi Boris,

I am currently trying to decide on a career move and would like to ask for your advice.

I have a MSc from a leading university in ML, without thesis.

I have 5 years of experience in data science at <XXX Multinational Company> , producing ML based pipelines for the products. I have experience with Big Data (Spark, …), ML, deploying models to production…

However, I feel that I missed doing real ML complicated stuff. Most of the work I did was to build pipelines, training simple models, do some basic feature engineering… and it worked good enough.

Well, this IS the real ML job for 91.4%* of data scientists. You were lucky to work in a company with access to data and has teams dedicated to keeping data flowing, neat, and organized. You worked in a company with good work ethics, surrounded by smart people, and, I guess, the computational power was never a big issue. Most of the data scientists that I know don’t have all these perks. Some have to work alone; others need to solve “dull” engineering problems, find ways to process data on suboptimal computers or fight with a completely unstandardized data collection process. In fact, I know a young data scientist who quit their first post-Uni job after less than six months because she couldn’t handle most of these problems.

However I don’t have any real research experience. I never published any paper, and feel like I always did easy stuff. Therefore, I lack confidence in the ML domain. I feel like what I’ve been doing is not complicated and I could be easily replaced.

This is a super valid concern. I am surprised how few people in our field think about it. On the one hand, most ML practitioners don’t publish papers because they are busy doing the job they are paid for. I am a big proponent of teaching as a means of professional growth. So, you can decide to teach a course in a local meetup, local college, in your workplace, or at a conference. Teaching is an excellent way to improve your communication skills, which are the best means for job security (see this post).

Since you work at XXXXX , I suggest talking to your manager and/or HR representative. I’m SURE that they will have some ideas for a research project that you can take full-time or part-time to help you grow and help your business unit. This brings me to your next question.

I feel like having a research experience/doing a PhD may be an essential part to stay relevant in the long term in the domain. Also, having an expertise in one of NLP/Computer Vision may be very valuable.

I agree. Being a Ph.D. and an Israeli (we have one of the largest Ph.D. percentages globally) makes me biased.

I got 2 offers:

– One with <YYY Multinational company> , to do research in NLP and Computer Vision. […] which is focused on doing research and publishing papers […]

– One with a very fast growing insurance startup, for a data scientist position, as a part of the founding team team. […] However, I feel it would be the continuation of my current position as a data scientist, and I would maybe miss on this research component in my career.

You can explore a third option: A Ph.D. while working at your current place of work. I know for a fact that this company allows some of their employees to pursue a Ph.D. while working. The research may or may not be connected to their day job.

I am very hesitant because

– I am not sure focusing on ML models in a research team would be a good use of my time as ML may be commoditised, and general DS may be more future-proof. Also I am concerned about my impact there.

– I am not sure that I would have such a great impact in the DS team of the startup, due to regulations in the pricing model [of that company], and the fact that business problems may be solved by outsourced tools.

These are hard questions to answer. First of all, one may see legal constraints as a “feature, not a bug,” as they force more creative thinking and novel approaches. Many business problems may indeed be solved by outsourcing, but this usually doesn’t happen in problems central to the company’s success since these problems are unique enough to not fit an off-the-shelf product. You also need to consider your personal preferences because it is hard to be good at something you hate doing.

From time to time, I give career advice. When the question or the answer is general enough, I publish them in a post like this. You may read all of these posts here.

Career advice. Becoming a freelancer immediately after finishing a masters degree

Photo by Miguel u00c1. Padriu00f1u00e1n on Pexels.com

Will Cray [link] is a fresh M.Sc. in Computer Science and considers becoming a freelancer in the Machine Learning / Artificial Intelligence / Data Science field. Will asked for advice on the LocallyOptimistic.com community Slack channel. Here’s will question (all the names in this post are used with people’s permissions).

Read more career advices [here].

Let’s begin.

Will Cray 

I’m hoping to start a career as a freelancer in the AI space after finishing my Master’s in CS with a focus in AI. I don’t, however, have any industry experience in AI or data science. Do you all think it’s feasible to start a freelancing career without any industry experience? If so, do you have any tips on how to do it successfully?
[I worked for] two years at a major tech company, but I was a systems engineer. It was experience that isn’t necessarily relevant to what I want to work on as a freelancer.

Let’s divide the response to Will’s questions into two parts that correspond to Slack’s two discussion threads.

Thread #1 – Michael Kaminsky

This is a copy/paste from Slack.

Michael Kaminsky 

LocallyOptimistic.com — a valuable source for data folks

My hunch is that it’s going to be pretty tough to get started, though not impossible. You’re probably looking at a pretty lean year or two to build up a reputation out of the gate

Michael Kaminsky 

AI work in general is sort of difficult to contract out — so you might have more luck if you team up with a larger consulting outfit that can handle the other non-AI parts of the work

Michael Kaminsky 

very rarely is someone like “we have all of the data pipeline and pieces working, now we just need to hire someone to do the AI part” — in general, the model-fitting part of an AI project is the easiest and fastest

Will Cray 

Thank you so much for the info–it’s really helping me getting a better understanding of the landscape. Would your opinion, especially regarding that last message, change if the AI work I was doing was more custom model/agent design and training, rather than doing something quick like .fit() in sklearn?

Michael Kaminsky

ummm maybe? but like who needs custom model/agent design and training that doesn’t already have in-house data scientists working on it?

Michael Kaminsky

I don’t want to dissuade you, but my point is that you should think about who your customers are, and how you can market your services in such a way that it will provide them value. If you don’t have a clear map of the three concepts in italics, it could get rough — you can definitely figure it out by doing it, but that’s what you’ll be up against

Will Cray

You mentioned “larger consulting outfits” earlier–do you have any examples of organizations that you think could be a good fit?

Michael Kaminsky

so Brooklyn Data Company and 4 mile consulting are the two that jump to my mind — they specialize in BI and data but might want flex capacity into DS — they might be able to give you deal flow, etc. I know there are a number of others, maybe even folks in this channel

Thread #2 – Boris Gorelik

This is a copy/paste from Slack with some later edits and additions. 

Boris Gorelik 

Another thing to consider is what your risks are. If there are people who depend on you financially, starting with a freelance career might be too risky, especially if you don’t have 1-2 (better 2) customers who already committed to paying you for your services.

If you can afford several months without a steady income, or no income at all, being a freelancer might expose you to a larger variety of companies and business models in the market. I know some people who used to work as freelancers and gradually “adopted” one customer and moved to full employment. In these cases, freelance projects were, in fact, mutual trial periods where both sides decided whether there is a good fit.

Will Cray 

I greatly appreciate this insight. I have little risks. I’m single, my living expenses are low, and I have some financial runway. Part of the reason I like the idea of freelancing is for the reason you stated–I’ll get to see many different business models. As an aspiring entrepreneur, I think diversity of experiences and exposure would be useful to me. I also think being flexible in how many hours I work will allow me to allocate more time to developing my own ideas/projects; although, I understand that’s a luxury that comes with being an established freelancer. I don’t have any clients currently. Do you have any recommendations for channels to try and garner clients?

Boris Gorelik

> As an aspiring entrepreneur, I think ….

Even though a freelancer and an entrepreneur’s legal status may be the same, they are different occupations and careers. An entrepreneur creates and realizes business models; a freelancer sells their time and expertise to fulfill someone else’s ideas. That’s true that most of the time (not always), combining freelance with entrepreneurship is easier than combining entrepreneurship with being a full-time employee in a traditional company.

 > Do you have any recommendations for channels to try and garner clients?

Nothing except the regular facebook/linkedin/ but mostly friends and former coworkers and, in your case, teachers/lecturers. I got my first job interview via my Ph.D. advisor. Later, when I helped in hiring processes, I asked him and other professors to refer me to proper candidates. So yeah, make sure your professors know your status.

How to become a Python professional in 42 hours?

Here’s an appealing ad that I saw

This image has an empty alt attribute; its file name is image-2.png

How to become a Python professional in 42 hours? I’ll tell you how. There is no way. I don’t know any field of knowledge in which one can become professional after 42 hours. Certainly not Python. Not even after 42 days. Maybe after 42 weeks if that’s mostly what you do and you already a programmer.

Calling bullshit on “persistence leads to success”

Did you know that J.K. Rowling, the author of Harry Potter, submitted her books 13 times before it was accepted? Did you know that Thomas Edison tried again and again, even though his teachers thought he was “too stupid to learn anything?” Did you know that Lior Raz (Fauda’s creator and lead actor) was an anonymous actor for more than ten years before he broke the barrier of anonymity? What do these all people have in common? They persisted, and they succeeded. BUT, and there is a big but.

girl wearing pink framed sunglasses

People keep telling us: follow your dream, and if you persist, it will come true. You will learn from your mistakes, improve, and adapt, and finally, will reach your goal. I call bullshit

Think of the Martingale betting strategy. In theory, it works. Why doesn’t it work in practice? Because nobody has infinite time and infinite pockets. The same is right with chasing your dream. We need to pay for the shelter above our heads, the food on our tables, the clothes that we wear. Often other people depend on us. Time passes by. I had to be a party pooper, but some people who chase their dreams will eat all their savings and will either have to give up or declare bankruptcy (and then give up).

Survivorship bias

But what about all those successful failers? What we see a typical example of survivorship bias, the logical error of concentrating on the people or things that made it past some selection process and overlooking those that did not, typically because of their lack of visibility. We know the names Rowling, Edison, Raz, and others not because of their multiple failures but DESPITE them. For every Rowling, Edison, and Raz, there are thousands of failed writers, engineers, and actors who ended up broke and caused sorrow to their families.

So, should I quit?

I don’t know. Maybe. Maybe not. It’s your life, your decision.

Once again on becoming a data scientist

My stand on learning data science is known: I think that learning “data science” as a career move is a mistake. You may read this long rant of mine to learn why I think so. This doesn’t mean that I think that studying data science, in general, is a waste of time.

Let me explain this confusion. Take this blogger for example https://thegirlyscientist.com/. As of this writing, “thegirlyscientst” has only two posts: “Is my finance degree useless?” and “How in the world do I learn data science?“. This person (whom I don’t know) seems to be a perfect example of someone may learn data science tools to solve problems in their professional domain. This is exactly how my professional career evolved, and I consider myself very lucky about that. I’m a strong believer that successful data scientists outside the academia should evolve either from domain knowledge to data skills or from statistical/CS knowledge to domain-specific skills. Learning “data science” as a collection of short courses, without deep knowledge in some domain, is in my opinion, a waste of time. I’m constantly doubting myself with this respect but I haven’t seen enough evidence to change my mind. If you think I miss some point, please correct me.

 

 

Don’t take career advises from people who mistreat graphs this badly

Recently, I stumbled upon a report called “Understanding Today’s Chief Data Scientist” published by an HR company called Heidrick & Struggles. This document tries to draw a profile of the modern chief data scientist in today’s Big Data Era. This document contains the ugliest pieces of data visualization I have seen in my life. I can’t think of a more insulting graphical treatment of data. Publishing graph like these ones in a document that tries to discuss careers in data science is like writing a profile of a Pope candidate while accompanying it with pornographic pictures.

Before explaining my harsh attitude, let’s first ask an important question.

What is the purpose of graphs in a report?

There are only two valid reasons to include graphs in a report. The first reason is to provide a meaningful glimpse into the document. Before a person decided whether he or she wants to read a long document, they want to know what is it about, what were the methods used, and what the results are. The best way to engage the potential reader to provide them with a set of relevant graphs (a good abstract or introduction paragraph help too). The second reason to include graphs in a document is to provide details that cannot be effectively communicating by text-only means.

That’s it! Only two reasons. Sometimes, we might add an illustration or two, to decorate a long piece of text. Adding illustrations might be a valid decision provided that they do not compete with the data and it is obvious to any reader that an illustration is an illustration.

Let the horror begin!

The first graph in the H&S report stroke me with its absurdness.

Example of a bad chart. I have no idea what it means

At first glance, it looks like an overly-artistic doughnut chart. Then, you want to understand what you are looking at. “OK”, you say to yourself, “there were 100 employees who belonged to five categories. But what are those categories? Can someone tell me? Please? Maybe the report references this figure with more explanations? Nope.  Nothing. This is just a doughnut chart without a caption or a title. Without a meaning.

I continued reading.

Two more bad charts. The graphs are meaningless!

OK, so the H&S geniuses decided to hide the origin or their bar charts. Had they been students in a dataviz course I teach, I would have given them a zero. Ooookeeyy, it’s not a college assignment, as long as we can reconstruct the meaning from the numbers and the labels, we are good, right? I tried to do just that and failed. I tried to use the numbers in the text to help me filling the missing information and failed. All in all, these two graphs are a meaningless graphical junk, exactly like the first one.

The fourth graph gave me some hope.

Not an ideal pie chart but at least we can understand it

Sure, this graph will not get the “best dataviz” award, but at least I understand what I’m looking at. My hope was too early. The next graph was as nonsense as the first three ones.

Screenshot with an example of another nonsense graph

Finally, the report authors decided that it wasn’t enough to draw smartly looking color segments enclosed in a circle. They decided to add some cool looking lines. The authors remained faithful to their decision to not let any meaning into their graphical aidsScreenshot with an example of a nonsense chart.

Can’t we treat these graphs as illustrations?

Before co-founding the life-changing StackOverflow, Joel Spolsky was, among other things, an avid blogger. His blog, JoelOnSoftware, was the first blog I started following. Joel writes mostly about the programming business and. In order not to intimidate the readers with endless text blocks, Joel tends to break the text with illustrations. In many posts, Joel uses pictures of a cute Husky as an illustration. Since JoelOnSoftware isn’t a cynology blog, nobody gets confused by the sudden appearance of a Husky. Which is exactly what an illustration is – a graphical relief that doesn’t disturb. But what would happen if Joel decided to include a meaningless class diagram? Sure a class diagram may impress the readers. The readers will also want to understand it and its connection to the text. Once they fail, they will feel angry, and rightfully so

Two screenshots of Joel's blog. One with a Husky, another one with a meaningless diagram

The bottom line

The bottom line is that people have to respect the rules of the domain they are writing about. If they don’t, their opinion cannot be trusted. That is why you should not take any pieces of advice related to data (or science) from H&S. Don’t get me wrong. It’s OK not to know the “grammar” of all the possible business domains. I, for example, know nothing about photography or dancing; my English is far from being perfect. That is why, I don’t write about photography, dancing or creative writing. I write about data science and visualization. It doesn’t mean I know everything about these fields. However, I did study a lot before I decided I could write something without ridiculing myself. So should everyone.

 

Gartner: More than 40% of data science tasks will be automated by 2020. So what?

Recently, I gave a data science career advice, in which I suggested the perspective data scientists not to study data science as a career move. Two of my main arguments were (and still are):

  • The current shortage of data scientists will go away, as more and more general purpose tools are developed.
  • When this happens, you’d better be an expert in the underlying domain, or in the research methods. The many programs that exist today are too shallow to provide any of these.

Recently, the research company Gartner published a press release in which they claim that “More than 40 percent of data science tasks will be automated by 2020, resulting in increased productivity and broader usage of data and analytics by citizen data scientists, according to Gartner, Inc.” Gartner’s main argument is similar to mine: the emergence of ready-to-use tools, algorithm-as-a-service platforms and the such will reduce the amount of the tedious work many data scientists perform for the majority of their workday: data processing, cleaning, and transformation. There are also more and more prediction-as-a-service platforms that provide black boxes that can perform predictive tasks with ever increasing complexity. Once good plug-and-play tools are available, more and more domain owners, who are not necessary data scientists, will be able to use them to obtain reasonably good results. Without the need to employ a dedicated data scientist.

Data scientists won’t disappear as an occupation. They will be more specialized.

I’m not saying that data scientists will disappear in the way coachmen disappeared from the labor market. My claim is that data scientists will cease to be perceived as a panacea by the typical CEO/CTO/CFO. Many tasks that are now performed by the data scientists will shift to business developers, programmers, accountants and other domain owners who will learn another skill — operating with numbers using ready to use tools. An accountant can use Excel to balance a budget, identify business strengths, and visualize trends. There is no reason he or she cannot use a reasonably simple black box to forecast sales, identify anomalies, or predict churn.

So, what is the future of data science occupation? Will the emergence of out-of-box data science tools make data scientists obsolete? The answer depends on the data scientists, and how sustainable his or her toolbox is. In the past, bookkeeping used to rely on manual computations. Has the emergence of calculators, and later, spreadsheet programs, result in the extinction of bookkeepers as a profession? No, but most of them are now busy with tasks that require more expertise than just adding the numbers.

The similar thing will happen, IMHO, with data scientists. Some of us will develop a specialization in a business domain — gain a better understanding of some aspect of a company activity. Others will specialize in algorithm optimization and development and will join the companies for which algorithm development is the core business. Others will have to look for another career. What will be the destiny of a particular person depends mostly on their ability to adapt. Basic science, solid math foundation, and good research methodology are the key factors the determine one’s career sustainability. The many “learn data science in 3 weeks” courses might be the right step towards a career in data science. A right, small step in a very long journey.

Featured image: Alex Knight on Unsplash

What is the best thing that can happen to your career?

Today, I’ve read a tweet by Sinan Aral (@sinanaral) from the MIT:

 

I’ve just realized that Ikigai is what happened to my career as a data scientist. There was no point in my professional life where I felt boredom or lack of motivation. Some people think that I’m good at what I’m doing. If they are right (which I hope they are), It is due to my love for what I have been doing since 2001. I am so thankful for being able to do things that I love, I care about, and am good at. Not only that, I’m being paid for that! The chart shared by Sinan Aral in his tweet should be guiding anyone in their career choices.

 

Featured image is taken from this article. Original image credit: Toronto Star Graphic 

Advice for aspiring data scientists and other FAQs — Yanir Seroussi

It seems that career in data science is the hottest topic many data scientists are asked about. To help an aspiring data scientist, I’m reposting here a FAQ by my teammate Yanir Seroussi

Aspiring data scientists and other visitors to this site often repeat the same questions. This post is the definitive collection of my answers to such questions (which may evolve over time). How do I become a data scientist? It depends on your situation. Before we get into it, have you thought about why you want […]

via Advice for aspiring data scientists and other FAQs — Yanir Seroussi

How to be a better teacher?

If you know me in person or follow my blog, you know that I have a keen interest in teaching. Indeed, besides being a full-time data scientist at Automattic, I teach data visualization anywhere I can. Since I started teaching, I became much better in communication, which is one of the required skills of a good data scientist.
In my constant strive for improving what I do, I joined the Data Carpentry instructor training. Recently, I got my certification as a data carpentry instructor.

Certificate of achievement. Data Carpentry instructor

Software Carpentry (and it’s sibling project Data Carpentry) aims to teach researchers the computing skills they need to get more done in less time and with less pain. “Carpentry” instructors are volunteers who receive a pretty extensive training and who are committed to evidence-based teaching techniques. The instructor training had a powerful impact on how I approach teaching. If teaching is something that you do or plan to do, invest three hours of your life watching this video in which Greg Wilson, “Carpentries” founder, talks about evidence-based teaching and his “Carpentries” project.

I also recommend reading these papers, which provide a brief overview of some evidence-based results in teaching:

What you need to know to start a career as a data scientist

It’s hard to overestimate how I adore StackOverflow. One of the recent blog posts on StackOverflow.blog is “What you need to know to start a career as a data scientist” by Julia Silge. Here are my reservations about that post:

1. It’s not that simple (part 1)

You might have seen my post “Don’t study data science as a career move; you’ll waste your time!“. Becoming a good data scientist is much more than making a decision and “studying it”.

2. Universal truths mean nothing

The first section in the original post is called “You’ll learn new things”. This is a universal truth. If you don’t “learn new things” every day, your professional career is stalling. Taken from the word of classification models, telling a universal truth has a very high sensitivity but very low specificity. In other words, it’s a useless waste of ink.

3. Not for developers only

The first section starts as follows: “When transitioning from a role as a developer to a position focused on data, …”. Most of the data scientists I know were never developers. I, for example, started as a pharmacist, computational chemist, and bioinformatician. I know several physicists, a historian and a math teacher who are now successful data scientists.

4. SQL skills are overrated

Another quote from the post: “Strong SQL skills are table stakes for data scientists and data engineers”. The thing is that in many cases, we use SQL mostly to retrieve data. Most of the “data scienc-y” work requires analytical tools and the flexibility that are not available in most of the SQL environments. Good familiarity with industry-standard tools and libraries are more important than knowing SQL. Statistics is way more important than knowing SQL. Julia Silge did indeed mention the tools (numpy/R) but didn’t emphasize them enough.

5. Communication importance is hard to overestimate

Again, quoting the post:

The ability to communicate effectively with people from diverse backgrounds is important.

Yes, Yes, and one thousand times Yes. Effective communication is a non-trivial task that is often overlooked by many professionals. Some people are born natural communicators. Some, like me, are not. If there’s one book that you can afford buying to improve your communication skills, I recommend buying “Trees, maps and theorems” by Jean-luc Doumont. This is a small, very expensive book that changed the way I communicate in my professional life.

6. It’s not that simple (part 2)

After giving some very general tips, Julia proceeds to suggest her readers checking out the data science jobs at StackOverflow Jobs site. The impression that’s made is that becoming a data scientist is a relatively simple task. It is not. At the bare minimum, I would mention several educational options that are designed for people trying to become data scientists. One such an option is Thinkful (I’m a mentor at Thinkful). Udacity and Coursera both have data science programs too. The point is that to become a data scientist, you have to study a lot. You might notice a potential contradiction between point 1 above and this paragraph. A short explanation is that becoming a data scientist takes a lot of time and effort. The post “Teach Yourself Programming in Ten Years” which was written in 2001 about programming is relevant in 2017 about data science.

Featured image is based on a photo by Jase Ess on Unsplash

Don’t study data science as a career move; you’ll waste your time!

March 2019: Two years after the completion of this post I wrote a follow-up. Read it here.

January 2020: Three years after the completion of this post, I realized that I wrote a whole bunch of career advices. Make sure you check this link that collects everything that I have to say about becoming a data scientist

No, this account wasn’t hacked. I really think that studying data science to advance your career is wasting your time. Briefly, my thesis is as follows:

  • Data science is a term coined to bridge between problems and experts.
  • The current shortage of data scientists will go away, as more and more general purpose tools are developed.
  • When this happens, you’d better be an expert in the underlying domain, or in the research methods. The many programs that exist today are too shallow to provide any of these.

To explain myself, let me start from a Quora answer that I wrote a year ago. The original question was:

I am a pharmacist. I am interested in becoming a data scientist. My > interests are pharmacoeconomics and other areas of health economics. What do I need to study to become a data scientist?

To answer this question, I described how I gradually transformed from a pharmacist to a data scientists by continuous adaptation to the new challenges of my professional career. In the end, I invited anyone to ask personal questions via e-mail (it’s boris@gorelik.net). Two days ago, I received a follow-up question:

I would like to know how to learn data science. Would you suggest a master’s degree in analytics? Or is there another way to add “data scientist” label on my resume?

Here’s my answer that will explain why, in my opinion, studying data science won’t give you job security.

Data scientists are real. Data science isn’t.

I think that while “data scientists” are real, “data science” isn’t. We, the data scientists, analyze data using the scientific methods we know and using the tools we mastered. The term “data scientist” was coined about five years ago for the job market. It was meant to help to bring the expertise and the positions together. How else would you explain a person who knows scientific analysis, machine learning, writes computer code and isn’t too an abstract thinker to understand the business need of a company? Before “data scientist,” there was a less catchy “dataist” http://www.dataists.com/. However, “data scientist” sounded better. It is only after the “data scientist” became a reality, people started searching for “data science.” In the future, data science may become a scientific field, similar to statistics. Currently, though, it is not mature enough. Right now, data science is an attempt to merge different disciplines to answer practical questions. Sometimes, this attempt is successful, which makes my life and the lives of many my colleagues so exciting.

Hilary Mason, from whom I learned the term dataist
Hilary Mason, from whom I learned the term “dataist”

One standard feature of most if not all, the data science tasks is the requirement to understand the underlying domain. A data scientist in a cyber security team needs to have an understanding of data security, a bioinformatician needs to understand the biological processes, and a data scientist in a financial institution needs to know how money works.

That is why, career-wise, I think that the best strategy is to study an applied field that requires data-intense solutions. By doing so, you will learn how to use the various data analysis techniques. More importantly, you will also learn how to conduct a complicated research, and how the analysis and the underlying domain interact. Then, one of the two alternatives will happen. You will either specialize in your domain and will become an expert; or, you will switch between several domains and will learn to build bridges between the domains and the tools. Both paths are valuable. I took the second path, and it looks like most of the today’s data scientists took that route too. However, sometimes, I am jealous with the specialization I could have gained had I not left computational chemistry about ten years ago.

Who can use the “data scientist” title?

Who can use the “data scientist” title? I started presenting myself as a “data scientist and algorithm developer” not because I passed some licensing exams, or had a diploma. I did so because I was developing algorithms to answer data-intense questions. Saying “I’m a data scientist” is like saying “I’m an expert,” or “I’m an analyst,” or “I’m a manager.” If you feel comfortable enough calling yourself so, and if you can defend this title before your peers, do so. Out of the six data scientists in my current team, we have a pharmacist (me), a physicist, an electrical engineer, a CS major, and two mathematicians. We all have advanced degrees (M.A. or Ph.D.), but none of us had any formal “data science” training. I think that the many existing data science courses and programs are only good for people with deep domain knowledge who need to learn the data tools. Managers can benefit from these courses too. However, by taking such a program alone, you will lack the experience in scientific methodology, which is central to any data research project. Such a program will not provide you the computer science knowledge and expertise to make you a good data engineer. You might end up a mediocre Python or R programmer who can fiddle with the parameters of various machine learning libraries, one of the many. Sometimes it’s good enough. Frequently, it’s not.

You might end up a mediocre Python or R programmer who can fiddle with the parameters of various machine learning libraries, one of the many. Sometimes it’s good enough. Frequently, it’s not.

Lessons from the past

When I started my Ph.D. (in 2001), bioinformatics was HUGE. Many companies had bioinformatics departments that consisted of dozens, sometimes, hundreds of people. Every university in Israel (where I live), had a bioinformatics program. I knew at least five bioinformatics startups in my geographic area. Where is it now? What do these bioinformaticians do? I don’t know any bioinformatician who kept their job description. Most of those who I know transformed into data science, some became managers. Others work as governmental clerks.

The same might happen to data science. Two years ago, Barb Darrow from the Fortune magazine wrote quoting industry experts:

Existing tools like Tableau have already sweated much of the complexity out of the once-very-hard task of data visualization, said Raghuram. And there are more higher-level tools on the way … that will improve workflow and automate how data interpretations are presented. “That’s the sort of automation that eliminates the need for data scientists to a large degree,” … And as the technology solves more of these problems, there will also be a lot more human job candidates from the 100 graduate programs worldwide dedicated to churning out data scientists
Supply, meet demand. And bye-bye perks.

My point is, you have to be versatile and expert. The best way to become one isn’t to take a crash course but to solve hard problems, preferably, under supervision. Usually, you do so by obtaining an advanced degree. By completing an advanced degree, you learn, you learn to learn, and you prove to yourself and your potential employees that you’re capable of bridging the knowledge gaps that will always be there. That is why is why I advocate obtaining a degree in an existing field, keeping the data science as a tool, not a goal.

I might be wrong.

Giving advice is easy. Living the life is not. The path I’m advocating for worked for me. I might be completely wrong here.

I may be completely wrong about data science not being a mature scientific field. For example, deep learning may be the defining concept of data science as a scientific field on its own.

Credits: The crowd image is by Flicker user Amy West. Hilary Mason's photo is from her site https://hilarymason.com/about/