Single-handedly Development: A Recipe for Troubles

[copied from my Substack newsletter]

The subject of this post primarily revolves around creators of digital solutions, such as programmers, designers, analysts, and data scientists. Regardless of whether you identify as one or manage one, I assure you there’s a valuable takeaway for all within this read.

We often encounter “lone wolves,” individuals who are the sole professional in their field within their organization. This situation typically arises when the company lacks the resources to employ more than one programmer, designer, or analyst. Such circumstances can pose significant risks and necessitate proper risk management. 

Now, you may say, “What about C-level managers? They too are often the sole professionals in their field, and working alone is the norm.” I’ll try to address the scenario of C-levels later, but let’s concentrate on everyone else.

What’s the big deal?

So, what’s the big deal?

To put it succinctly, we’re discussing knowledge workers: individuals who translate their brain prowess into value. To maximize this value creation, the process needs to be as efficient and honed as possible. The Talmud tells us about rabbi Hama bar Hanina who said, “Just as a knife is sharpened only by the steel of its mate, so too, a scholar [ knowledge worker, in our context] is sharpened only by his fellow.” When knowledge workers lose that sharpness, the quality of their work suffers. The output becomes suboptimal due to a lack of adversarial oversight, the curse of knowledge, and the bus factor. Let’s delve into each of these aspects.

Lack of adversarial oversight

When I operate within a team alongside peers, I understand that my work is continually subject to review. This can, and should, be a formal review process, such as code review in programming. But it can also take the form of informal exchanges of ideas during daily communication. A healthy organization fosters a culture of review and constructive debates. In such an environment, everyone is expected to receive and offer feedback, everyone anticipates being challenged and to challenge others. This potential for critique and the ongoing drive to critique others maintain a state of alertness and motivation for continuous improvement. 

The Talmud, which I mentioned earlier, is full of records of scholars disputing and challenging one another. That’s how these scholars ensured their constant intellectual growth. The renowned philosopher Karl Popper presented the concept of risky predictions, suggesting that any hypothesis—or piece of code, for that matter—should be bold enough to potentially be proven incorrect. If these bold assertions undergo testing and remain unrefuted, they are deemed accurate and, I would add, their author is deemed credible.

Now, consider our Lone Wolf. There’s no one to criticize them, no one to review their work, and no one requiring their review for their own tasks. The Lone Wolf’s colleagues hold deep respect for them because they’re the only ones in the team who know how to program, design a logo, or perform p-hacking. They admire the Lone Wolf, they appreciate the Lone Wolf, but they fail to sharpen the Lone Wolf’s skills.

The curse of knowledge

Rarely do you know what I don’t know. What may seem trivial to you could be completely enigmatic to me, and you might not even realize it. This phenomenon is known as the “curse of knowledge.”

One issue associated with the curse of knowledge relates to the first point of this post, the absence of oversight. When certain pieces of information seem obvious, you start treating your hypotheses and assumptions as facts and behave accordingly. Another risk is that the curse of knowledge can lead to inadequate planning and documentation.

When working solo on a small or moderately-sized project, you know exactly what’s happening within it. To onlookers, you resemble the archetypal chef in the kitchen, effortlessly grabbing the sharpest knife from the drawer without a second glance and instinctively knowing where every spice jar is located. 

But this seamless workflow can falter when one of two scenarios occurs: either your project becomes too large for you to manage all its details mentally, or a new member joins your team. In both cases, you start losing time trying to remember which function performs the preprocessing, which directory houses the client’s mockups, and which file contains the up-to-date data versus which one is merely a backup. Consequently, your code becomes less clean, you design against incorrect mockups, and your analysis is flawed. Worse still, you produce these substandard results in quadruple the time it would have taken had you properly planned and documented.

The risk of such chaos is significantly reduced when two or more colleagues collaborate in a team. Like dance partners, they must be careful not to step on each other’s toes, which encourages them to dedicate time to planning and documentation. As a team, they have more strength to resist the constant pressure to sacrifice quality, planning, and documentation for speed.

The bus factor

The bus factor refers to the risk associated with information and capabilities not being shared among team members, a concept that draws from the hypothetical scenario of ‘what if they were hit by a bus.’ Of course, we don’t need to be so morbid. Team members can leave their roles for various reasons: they might become parents, win the lottery, choose a monastic life, or any other myriad of joyful reasons. When a team member departs, there should exist some level of redundancy to compensate for the expertise lost. The issue with a lone professional leaving the organization extends beyond having no one to perform their tasks; there’s also no one who **knows** how to perform their tasks.

Onboarding a replacement for a team member is always a challenge. However, since the Lone Wolf wasn’t subjected to constant reviews (lack of oversight) and didn’t allocate enough time for planning and documentation (remember the curse of knowledge?), what is a challenge for a multi-member team escalates into a nightmare that can halt operations for weeks or even months. I personally witnessed first-hand situations like this. I personally saw thousands of lines of code rewritten from scratch because nobody knew how it worked.  

This resulting chaos not only disrupts workflow but also creates undue stress on the remaining team members and the new recruit, who must scramble to fill a knowledge gap without a clear roadmap. It’s a stark reminder of the importance of collaborative processes and shared understanding within a team.

Is there a solution?

The old saying goes, “It’s better to be young, healthy, and wealthy than old, sick, and poor.” Similarly, it’s obviously preferable to hire at least two professionals for each role. However, this isn’t always feasible. Even if budget isn’t a constraint, you might not require two designers, analysts, or programmers on your team. Having a bored knowledge worker is an issue in itself, warranting a separate discussion. 

So, what alternatives are there? One approach to mitigating this issue is to introduce a part-time colleague, a co-pilot, or a proverbial sidekick—either as a hired employee or a freelancer. This additional team member’s role would be to serve as a sharpening tool for your Lone Wolf, their sparring partner, someone with whom they can exchange ideas, ensure nothing is taken for granted, and verify that the correct processes are adhered to. For this arrangement to be effective, the co-pilot should not be a one-time visitor but rather a regular contributor. They need to understand your company’s business and culture and become a part of its institutional memory. In this way, you not only ensure a proper workflow but also safeguard against embarrassing bugs and unforeseen departures. 

One of the services I provide aligns exactly with this solution, and you’re encouraged to reach out if you’re seeking a collaborator for your Lone Wolf.

Calling Bullshit on ‘Management is not Promotion’

“Climbing Invisible Ladders and Falling into Deep Holes: A Discourse in Five Parts” is a witty, engaging, and profoundly insightful exploration of corporate dynamics and career progression.”

Climbing Invisible Ladders and Falling Deep Holes: A Discourse in Five Parts

DRAMATIS PERSONAE

BORIS: A seasoned data scientist, middle-aged but ridiculously good-looking. An ex-Soviet Israeli, he adds an extra layer of cynicism to his character, complemented by a mysterious Russian-Israeli accent.

LAURA: The epitome of kindness, Laura is an American HR manager, and potentially the nicest person you’ll ever meet. She wears a constant, sincere smile.

DAPHNE: As a junior software developer, Daphne is smart and ambitious, constantly seeking opportunities to grow and evolve in her career.

Part 0. Prologue

FADE IN:

INT. HOTEL BAR – NIGHT

BORIS, LAURA, and DAPHNE sit around a table, each wearing a company name tag. A thought bubble appears above BORIS, reading, “It’s bullshit.” Boris shakes his head, dispelling the bubble.

LAURA

(Thoughtfully)

I hear you, Daphne. It’s great that you’re considering a promotion just three months into your first job. However, you should understand that management is more of a lateral move rather than a vertical one.

BORIS

(Shakes head, dispelling the “bullshit” thought bubble again, speaks sternly)

Laura, I strongly disagree.

A thought bubble appears above LAURA, reading, “Not him again.”

LAURA

(Smiling)

Interesting, Boris. Why do you think so?

BORIS

(Sighs)

Let’s discuss the term “promotion.” What do we seek when we aim for a promotion? More money, more autonomy, and a higher social status, wouldn’t you agree?

LAURA

(Nods)

Absolutely! And that’s exactly why transitioning to leadership roles doesn’t necessarily mean more money. We compensate employees based on their impact, not their position in the organizational chart. We also value and celebrate developers as much, if not more than managers, so their social status is already at its peak. All that managers do is facilitate developers in performing their jobs.

Part 1. Social status

BORIS

Here’s where I beg to differ. Even the terminology we employ suggests a higher social status. I have a “manager,” a “team leader,” or even a “boss.” Regardless of how much you’d like me to believe that a manager’s role is to assist me, they’re still referred to as a manager, not an assistant. Moreover, my manager has a direct influence on my evaluation, an influence I don’t hold over them.

LAURA

(Amused)

Boris, you couldn’t be more mistaken! Have you forgotten the annual engagement survey that you complete each year? We specifically ask for your thoughts on your team lead.

BORIS

Yes, but you ask both me and all my teammates, so my individual voice is diluted. Moreover, my team leader’s superior —

(Cynically, with air quotes)

“l e a d e r,” not assistant —

(Continues)

provides their direct feedback.

(Sips from a glass of cheap gin, longing for it to be Arak)

And that’s just one aspect. Our vacation policy is indeed generous, but it explicitly states that I need my team lead’s approval before taking time off. My team lead doesn’t require my consent for their time off; they consult their own superior. So yes, a manager does hold a higher social status than an individual contributor.

Part 2. Autonomy

LAURA

You know what? I’ll give you that. But when it comes to professional autonomy, an experienced individual contributor has the full power to decide how they solve the problem they work on.

(Daphne smiles)

BORIS

(takes another sip from the gin glass)

Oh, this is not true either. Take me as an example. I’m not a manager. It is true that I have the autonomy to decide how to solve a problem, but I often don’t get to decide what problem to solve. I can have some influence on this matter, but when my opinion collides with the opinion of my manager or their managers, my opinion is put aside.

DAPHNE

(interrupts)

Right, the other day…

LAURA

(irritated)

You can always take initiative and start working on something that really interests you.

(adds pathos to her voice)

In our company, you can write your own history. Identify a problem, start working on a solution in your spare time, and one day you may convince the management that the solution is worth adopting and expanding.

DAPHNE 

(sarcastically) 

Free time? You must be kidding.

BORIS 

For once, I agree with Laura. We have some free time, and moderate switching between projects might be a good form of rest. Not only that, but a technical hands-on person might also have more tools to solve a technical problem. But… a manager usually has better knowledge of company needs and, more importantly, company politics. That is why a manager’s pet project has a higher chance of being accepted by the company than the one initiated by an IC.

LAURA 

(doubtful) 

Hmm… I don’t know… Well, at least in terms of money, management isn’t promotion.

Part 3. Money

BORIS 

(Chuckles)

Ah, money. Who doesn’t love money? However, I’m afraid I must disagree with you on this one.

LAURA 

(Joyfully)

Well, as the head of HR, not you, I understand how compensation is calculated. You are all compensated based on your impact on our business, nothing more. I know several managers who earn less than the individuals they manage.

(Makes a dramatic pause)

It’s all about the impact!

BORIS 

(Points a finger)

Correct. I presume that when you talk about these managers, you’re primarily referring to team leads. Am I right?

LAURA 

(Pauses, then nods)

Actually, yes.

BORIS 

(Chuckles)

You see, a team lead may earn less than a developer, researcher, or designer they manage, especially if they oversee senior and experienced professionals. But who can bring a greater impact to the business: a senior programmer or a senior manager?

DAPHNE 

(Looks puzzled)

What do you mean?

BORIS 

(Turns to Daphne)

Let’s take me as an example. I’m an outstanding data scientist, one of the best in the field.

(LAURA and DAPHNE nod in agreement)

Nevertheless, my brain operates optimally only 8-9 hours a day. On the other hand, David, the head of the Modelling division, is also a top-tier professional, and he too works 8-9 hours a day. But since he is at the helm of a division, his work is amplified by the ten people who work under him.

LAURA 

There are thirteen now; we’ve hired two additional scientists.

BORIS 

(Turns to Daphne)

See what I mean? David’s impact is over a dozen times larger than mine. Therefore, it would only make sense for his salary to exceed mine.

Everyone falls silent. Boris signals for a refill. Daphne appears dejected.

Part 4. Don’t lose your sleep over this

BORIS 

(Looks at Daphne) 

Don’t lose sleep over this.

Boris takes a salt shaker, opens it, pours the salt onto the table, and draws two partially overlapping bell curves.

BORIS 

Here’s an analogy. Consider men and women. On average, men are stronger than women, right?

Laura and Daphne nod their heads. Laura looks concerned, anticipating that Boris might say something stupidly inappropriate.

BORIS 

(Points to the salt) 

In this graph, the X-axis represents strength, and the two curves represent men and women. Now, what does this mean? Does it mean that all men are stronger than all women? Certainly not! There are many strong women and many weak men! You can see this by looking at the overlapping part of these curves here.

He points to the intersecting area of the two curves drawn in the salt.

BORIS 

(Continues) 

Now, let’s return to our original discussion. Let’s say that the X-axis now stands for “promotion” — a vague amalgamation of social status, power, and money. I hope I’ve convinced you that management is a form of promotion, but consider these curves. As in the men versus women case, there will be many individual contributors positioned higher on the promotion axis than some managers.

Laura looks relieved. Daphne is deep in thought.

DAPHNE

That makes sense. I could focus on improving my development skills… Conversely, I could invest the same energy into enhancing my management skills and transition to the better curve.

The atmosphere in the room becomes dense with contemplation.

Part 5. You have to like your job

BORIS 

(pensively) 

You’re right but also somewhat wrong. Becoming good at your work is hard. It becomes even harder if you don’t enjoy it. If you like managing people, go for it. Enjoy the process, grow your skills, and plan the mansion you want to buy when you’re a big-shot CEO.

He takes a sip of his drink.

BORIS 

(continuing) 

But if you enjoy writing good code more than dealing with people, you might become miserable during your quest for a management career. Being miserable won’t leave enough energy to improve your skills, and you might end up as a mediocre, bitter, mid-level manager who’s jealous of her younger self.

Laura smiles in agreement.

BORIS 

(sincerely) 

I fully agree with that. In the past, in some companies, such a move would be perceived as a demotion, but now and not here. Nowadays, many companies, small and big, recognize that management is a separate profession. The atmosphere in our company is kind enough to accept that people need to search for their path in life. Take me for example I “stepped down” from a management position twice. I don’t regret taking those positions. Neither do I regret stepping down from them.

Daphne looks thoughtful, contemplating the information she’s received.

DAPHNE 

(tentatively) 

I think I need to explore more about myself. I need to see what suits me best. But I guess I won’t know until I try.

Laura and Boris share a look of approval.

Boris raises his glass for a toast.

BORIS 

(smiling) 

To exploration and finding what truly makes us happy!

Everyone raises their glasses, and the scene ends on a positive note of camaraderie and mutual respect.

FADE OUT.

FADE IN.

INT. HOTEL BAR TABLE – NIGHT

The conversation has come to a natural end. Boris stays at the table. Daphne and Laura are leaving the bar, their faces illuminated by the soft lights of the lobby.

LAURA

(sincerely)

Remember, Daphne. You have a whole community here that believes in you. Reach out anytime.

(checks to make sure nobody’s listening)

And remember, don’t take Boris too seriously. He’s… well… he’s Boris.

FADE OUT.

THE END

Prompt engineers, the sexiest job of the third decade of the 21st century (?), or Don’t study prompt engineering as a career move, you’ll waste your time

Do you recall when data scientists were the talk of the town? Dubbed the sexiest job of the 21st century, they boasted a unique blend of knowledge and skills. I still remember the excitement I felt when I realized that the work I did had a name, and the warm feeling I got when I saw those cool Venn diagrams showing just how awesome data scientists were. Well, it’s time for data scientists to step aside and make way for the new heroes in town: the Prompt Engineers!

The demand for prompt engineers is soaring, and it seems like everyone is trying to become one. But what exactly is a prompt engineer, and what are my thoughts on this new profession?

Let’s take a step back in time: we started with assembly languages, and then a language called Formula Translator (better known as Fortran), which significantly lowered the barrier of entry into the field. I’m sure back then, people rolled their eyes and said that with the emergence of high-level programming languages, anyone could now take any formula and get an output, without understanding how semiconductors worked.

Fast forward to today. What do prompt engineers do? They essentially translate their domain knowledge, language understanding, and AI algorithm expertise into computer output (sounds like “ForTran,” right?). Prompt engineering is, in essence, a super-high-level programming language. Over time, I believe we’ll see dedicated tools and established standards emerge. But for now, it’s a wild, untamed frontier.

In 2017, I wrote a blog post titled “Don’t study data science as a career move; you’ll waste your time!“. Until today, this is the most read post in my blog. Now, it’s time for a new warning: “Don’t study prompt engineering as a career move; you’ll waste your time!”

Meanwhile, here’s a nice Venn diagram for you 🙂

Not a feature but a bug. Why having only superstars in your team can be a disaster.

Read this to learn about well-rounded teams that can effectively collaborate and communicate. As an experienced team leader and builder, contact me to learn more about my services and how I can help you achieve better outcomes.

As a freelancer and a manager, I have worked with many companies and teams. Recently,  I talked to a CEO who built a data science team that consisted of several “wonder kids” who obtained University degrees before graduating high school. The CEO was very proud of them. However, he complained that they don’t deliver as expected. This made me realize that having only superstars is not a feature but a bug.

The fact is that most of us are average, even geniuses are average in most aspects. Richard Feynman, the Nobel laureate physicist, was also a painter, musician, and an excellent teacher, but he is unique. I, for example, tend to think of myself as an excellent generalizer, leader, and communicator. However, I need help with attention to detail and deep domain-specific knowledge. To work well, I need to have pedantic specialists in my team. Why? Because, on average, I’m average.

Most “geniuses” are extremely talented in one field but still need help in others. Many tend to be individual workers, meaning their team communication is often suboptimal. Additionally, the fact that the entire team is very young also means they need more expertise in project management, inter-team communication, business orientation, or even enough real-life experience. The result: a disaster. That company got a team of solo players who don’t communicate within the team, don’t communicate with other teams, and don’t deliver on time.

What do I suggest? They say that “A’s hire A’s”. However, this doesn’t mean that each “A person” must ace the same field. A good team needs an A generalizer, an A specialist, an A communicator, and an A business expert. If you only hire “A++ specialists,” you risk ending up with a group of individuals who are “C-” communicators.

As another CEO I consulted once told me, “genius developers can do 10x job. They also tend to enter rabbit holes, and if unattended, they can do 10x damage.” If you build a team, you cannot afford to have unbalanced expertise sets. 

The bottom line is to ensure your team is diverse in its capabilities. Hiring only superstars may seem like a good idea, but it can result in a lack of collaboration, communication, and the necessary skills to succeed as a team. A diverse team with various skills and expertise is essential for achieving better outcomes.

In conclusion, avoid falling into the trap of thinking that only superstars can make a great team. Instead, focus on creating a diverse team with various skills, and you’ll be surprised at how much your team can achieve.

Modern tools make your skills obsolete. So what?

Read this if you are a data scientist (or another professional) worried about your career.

So many people, including me, write about how fields such as copywriting, drawing, or data science change from being accessible to a niche of highly professional individuals to a mere commodity. I claim it’s a good thing, not only for humankind but for the individual professional. Since I know nothing about drawing, I’ll talk about data science.

I started working as a data scientist a long time ago, even before the term data science was coined. Back then, my data science job included:

  • writing code that implements this optimization algorithm or the other
  • writing code that implements this statistical analysis or the other
  • writing code that implements this machine learning technique of the other
  • writing code that implements this quality metric or the other
  • writing code that handles named columns
  • writing code that deals with parallelization, caching, fetching data from the internet

Back then, exactly when the term data scientist was coined, I used to say “data is data”. I claimed that it didn’t matter whether you write a model that detects cancer or detects online fraud, a model that simulates two molecules in a solution or a model that simulates players in the electric appliances market. Data was data, and my job, as a data scientist was to crunch it.

Time passed by. Suddenly, I discovered one cool library, the other, and a third one … Suddenly, my job was to connect these libraries, which allowed me to be more expressive in what I could achieve. It also allowed me to concentrate better on “business logic.” Business logic is the term I use to describe all the knowledge required for the organization that pays your salary to keep doing so. If you work for a gaming company, “business logic” is the gaming psychology, competitor landscape, growth methods, and network effect. If you work for a biotech company, “business logic” is the deep understanding of disease mechanisms, biochemistry, genetics, or whatever is needed to perform the breakthrough. The fact that I don’t need to deal with “low-level coding” made me obsolete and drove me to a state where I became more specialized.

These days, we are facing a new era in knowledge commoditization. This commoditization makes our skills obsolete but also makes us more efficient in tasks that we were slow at and lets us develop new skills. 

In 2017, Gartner predicted that more than 40% of data science tasks would be obsolete by 2020. Today, in 2023, I can safely say that they were right. I can also say that today, despite the recent layouts, there are much more busy data scientists than there were in 2017 or 2020.

The bottom line. Stop worrying.

Let me cite myself from 2017:

Data scientists won’t disappear as an occupation. They will be more specialized.

I’m not saying that data scientists will disappear in the way coachmen disappeared from the labor market. My claim is that data scientists will cease to be perceived as a panacea by the typical CEO/CTO/CFO. Many tasks that are now performed by the data scientists will shift to business developers, programmers, accountants and other domain owners who will learn another skill — operating with numbers using ready to use tools. An accountant can use Excel to balance a budget, identify business strengths, and visualize trends. There is no reason he or she cannot use a reasonably simple black box to forecast sales, identify anomalies, or predict churn.

This is another piece of career advice. I have more of them in my blog

Chances are that you don’t need a data scientist, and three things to consider before hiring one.

Read this if you are considering hiring data scientists

I already wrote about how data science becomes a commodity.

If you read this, I guess data science is not the core part of your business. If this is the case, consider the following before you hire data scientists.

Data engineers

Your data scientists can be as good as the data you provide them. You must collect the correct data, validate it, store it well, and be able to access it easily. I have hours of “war stories” about how each component of the last message went wrong, and the company burned tons of money because of that. Data piping is a serious challenge. So, before you hire a data scientist, ask yourself whether your data engineering needs are covered.

Data analysts

Data Analysts mainly focus on the organization and interpretation of data. Unlike data scientists, Analysts don’t build predictive models or create unique algorithms. However, they identify trends and insights and present their findings clearly and understandably. Not being required to build novel models and algorithms allow them to better connect with stakeholders’ business needs and practical questions. A good data analyst will take the business problem, translate it into a data-based question, will know its potential value, and in many cases, will be able to answer it.

Boxed Solutions

Data Science as a Service is a term for boxed solutions that are constantly becoming more versatile, flexible, and affordable. I was a freelancer for a company that built its data-based product on an open-source implementation of a single optimization algorithm. They managed to run a successful company without a single data scientist for more than five years, and they started thinking of better solutions when they squeezed everything they could from their MRE. At this point, they had their data storage pipelines (data engineering), a better picture of their business (data analysts), and paying customers to finance the development of new algorithms.

How to work with data scientists?
I’ll write separate posts on this topic, but the gist is: to make sure they know your business needs. Ensure you communicate your needs and problems to them and make sure they share their efforts with you. I have seen many failed data science projects in my life. Most failed due to a lack of alignment, communication, or both.

This was another career advice post. Read more of them here.

Data Science Reality Check: My Predictions Come True (or, A Piece of Advice to Young Data Scientists)

Read this if you’re a data scientist or consider becoming one.

Almost six years ago, when Data Scientist was named the “sexiest job of the 21st century”, I wrote a blog post telling young professionals not to learn data science as a career move. My claim was that the data science field fill gets commoditized, and if you don’t possess deep (I mean DEEP) knowledge of either algorithms or the business you are working at, you will end up a mediocre coder.

Look what happened. Data science has indeed become commoditized in many fields. Many data-intence businesses work just fine without data scientists. Even I, a very experienced data scientist, got laid off because I couldn’t bring the company value that would justify my salary. People like Matthew Yglesias from https://www.slowboring.com suggest that data scientists learn how to roll a burrito or mine lithium.

Why did this happen? Well, I was right. Data science has become a commodity. Each self-respecting platform offers AI tools (I hate the term AI, by the way) such as keyword extraction, insights, predictions, anomaly detection, recommendations, and many more. Tableau, PowerBI, and even Google Sheets or Excel offer tools that were once only available through custom data and code fiddling. The Data-Science-As-A-Service niche is full of products such as https://www.pecan.ai and https://www.anodot.com. And we haven’t even started talking about the new word of the day: the GPT.

Being an experienced data scientist, people often ask for my advice and help. In the past, when this happened, I used to discuss possible custom-tailored solutions. Now, I find myself suggesting the person looking at product X or Y will solve their problems in a fraction of the time and cost. 

So, what do we have? What does all that mean?

Data science has become a commodity. In the past, to get a nice salary and a sexy title, it was enough to know what training, testing, and cross-validation were. Today, you absolutely have to know the theory and be a fast and good coder. But most of all, you must hone your communication skills and learn the business of the company where you work. Only this way will you be able to ensure your efforts are always aligned with the stakeholders and that you can consistently deliver value.

This is a career advice post. Check out the career tag and the Career Advice category of this blog.

Experiment report

In January 2020, I started a new experiment. I quit what was a dream job and became a freelancer. Today, the experiment is over. This post serves as omphaloskepsis – a short reflection on what went well and what could have worked better.

What worked well?

To sum up, I declare this experiment successful. I had a chance to work with several very interesting companies. I got exposed to business models of which I wasn’t aware. Most importantly, I met new intelligent and ambitious people. I also had a chance to feel by myself how it feels to be self-employed, to see the behind-the-scenes of several freelancers and entrepreneurs. I learned to appreciate the audacity and the courage of people who don’t rely on monthly paychecks and take much more responsibility for their lives than the vast majority of the “salarymen.”

Let’s talk about money. Was it worth it in terms of $$$$$ (or ₪₪₪₪₪₪)? Objectively speaking, my financial situation remained approximately unchanged. Towards the end of the experiment, I found myself overbooked, which means that, in theory, I could have increased my income substantially. But this is only in theory. In practice, I decided to end the freelance experiment and “settle down”.

What could have been better?

So, was it peachy? Not at all. For me, being a freelancer is much more stressful than being a hired employee. The stress does not come exclusively from the need to make sure one has enough projects in the pipeline (I had enough of them, most of the time). The more significant source of stress came from the lack of focus, the need for EXTREME context switching, and the lack of a team. 

I did receive one suggestion to mitigate this source of stress; however, when I heard it, I already had several job offers and was already 90% committed to accepting the position at MyBiotics.

To sum up

I’m am very happy I did this experiment. I learned a lot; I enjoyed a lot (and suffered a lot too), I met new people, and I changed the way I think about many things. Was it a good idea? Yes, it was. Should you try becoming a freelancer? How the hell can I know that? It’s your life; you enjoy the success and take the risk of failure. 

Career advise. Upgrading data science career

Photo by Kelly Lacy on Pexels.com

From time to time, people send me emails asking for career advice. Here’s one recent exchange.

Hi Boris,

I am currently trying to decide on a career move and would like to ask for your advice.

I have a MSc from a leading university in ML, without thesis.

I have 5 years of experience in data science at <XXX Multinational Company> , producing ML based pipelines for the products. I have experience with Big Data (Spark, …), ML, deploying models to production…

However, I feel that I missed doing real ML complicated stuff. Most of the work I did was to build pipelines, training simple models, do some basic feature engineering… and it worked good enough.

Well, this IS the real ML job for 91.4%* of data scientists. You were lucky to work in a company with access to data and has teams dedicated to keeping data flowing, neat, and organized. You worked in a company with good work ethics, surrounded by smart people, and, I guess, the computational power was never a big issue. Most of the data scientists that I know don’t have all these perks. Some have to work alone; others need to solve “dull” engineering problems, find ways to process data on suboptimal computers or fight with a completely unstandardized data collection process. In fact, I know a young data scientist who quit their first post-Uni job after less than six months because she couldn’t handle most of these problems.

However I don’t have any real research experience. I never published any paper, and feel like I always did easy stuff. Therefore, I lack confidence in the ML domain. I feel like what I’ve been doing is not complicated and I could be easily replaced.

This is a super valid concern. I am surprised how few people in our field think about it. On the one hand, most ML practitioners don’t publish papers because they are busy doing the job they are paid for. I am a big proponent of teaching as a means of professional growth. So, you can decide to teach a course in a local meetup, local college, in your workplace, or at a conference. Teaching is an excellent way to improve your communication skills, which are the best means for job security (see this post).

Since you work at XXXXX , I suggest talking to your manager and/or HR representative. I’m SURE that they will have some ideas for a research project that you can take full-time or part-time to help you grow and help your business unit. This brings me to your next question.

I feel like having a research experience/doing a PhD may be an essential part to stay relevant in the long term in the domain. Also, having an expertise in one of NLP/Computer Vision may be very valuable.

I agree. Being a Ph.D. and an Israeli (we have one of the largest Ph.D. percentages globally) makes me biased.

I got 2 offers:

– One with <YYY Multinational company> , to do research in NLP and Computer Vision. […] which is focused on doing research and publishing papers […]

– One with a very fast growing insurance startup, for a data scientist position, as a part of the founding team team. […] However, I feel it would be the continuation of my current position as a data scientist, and I would maybe miss on this research component in my career.

You can explore a third option: A Ph.D. while working at your current place of work. I know for a fact that this company allows some of their employees to pursue a Ph.D. while working. The research may or may not be connected to their day job.

I am very hesitant because

– I am not sure focusing on ML models in a research team would be a good use of my time as ML may be commoditised, and general DS may be more future-proof. Also I am concerned about my impact there.

– I am not sure that I would have such a great impact in the DS team of the startup, due to regulations in the pricing model [of that company], and the fact that business problems may be solved by outsourced tools.

These are hard questions to answer. First of all, one may see legal constraints as a “feature, not a bug,” as they force more creative thinking and novel approaches. Many business problems may indeed be solved by outsourcing, but this usually doesn’t happen in problems central to the company’s success since these problems are unique enough to not fit an off-the-shelf product. You also need to consider your personal preferences because it is hard to be good at something you hate doing.

From time to time, I give career advice. When the question or the answer is general enough, I publish them in a post like this. You may read all of these posts here.

Five things I wish people knew about real-life machine learning

Deena Gergis is a data science lead at Bayer. I recently discovered Deena’s article on LinkedIn titled “Five Things I Wish I Knew About Real-Life AI.” I think that this article is a great piece of a career advice for all the current and aspiring data scientists, as well as for all the professionals who work with them. Let’ me take Deena’s headings and add my 2 cents.

One. It is all about the delivered value, not the method.

I fully agree with this one. Nobody cares whether you used a linear regression or recurrent neural network. Nobody really cares about p-values or r-squared. What people need are results, insights, or working products. Simple, right?

Two. Packaging does matter

Again, well said. The way you present your solution to your colleagues, customers, or stakeholders can determine whether your project will get more funds and resources or not. 

Three. Doing the right things != doing things right.

Exactly. Citing Deena: “you might be perfectly predicting a KPI that no one cares about.” Enough said. 

Four. Set realistic expectations.

Not everybody realizes that “machine learning” and “artificial intelligence” are not a synonym of “magic” but rather a form of statistics (I hope “real” statisticians won’t get mad at me here). The principle “garbage in – garbage out” holds in machine learning. Moreover, sometimes, ML systems amplify the garbage, resulting in “garbage in, tons of garbage out”. 

Five. Keep humans in the loop.

Let me cite Deena again: “My customers are my partners, not just end-users.” Note that by “customers,” we don’t only mean walk-in clients, but also any internal customer, project manager, even a colleague who works on the same project. They are all partners with unique insights, domain knowledge, and experience. Use them to make your work better. 

Read the original article here. Deena Gergis has several more articles on LinkedIn here. And if you know Arabic, you might want to watch Deena’s videos on YouTube here. Unfortunately, my Arabic is not good enough to understand her Egyptian accent, but I suspect that her videos are as good as her writings.

Career advice. Becoming a freelancer immediately after finishing a masters degree

Photo by Miguel u00c1. Padriu00f1u00e1n on Pexels.com

Will Cray [link] is a fresh M.Sc. in Computer Science and considers becoming a freelancer in the Machine Learning / Artificial Intelligence / Data Science field. Will asked for advice on the LocallyOptimistic.com community Slack channel. Here’s will question (all the names in this post are used with people’s permissions).

Read more career advices [here].

Let’s begin.

Will Cray 

I’m hoping to start a career as a freelancer in the AI space after finishing my Master’s in CS with a focus in AI. I don’t, however, have any industry experience in AI or data science. Do you all think it’s feasible to start a freelancing career without any industry experience? If so, do you have any tips on how to do it successfully?
[I worked for] two years at a major tech company, but I was a systems engineer. It was experience that isn’t necessarily relevant to what I want to work on as a freelancer.

Let’s divide the response to Will’s questions into two parts that correspond to Slack’s two discussion threads.

Thread #1 – Michael Kaminsky

This is a copy/paste from Slack.

Michael Kaminsky 

LocallyOptimistic.com — a valuable source for data folks

My hunch is that it’s going to be pretty tough to get started, though not impossible. You’re probably looking at a pretty lean year or two to build up a reputation out of the gate

Michael Kaminsky 

AI work in general is sort of difficult to contract out — so you might have more luck if you team up with a larger consulting outfit that can handle the other non-AI parts of the work

Michael Kaminsky 

very rarely is someone like “we have all of the data pipeline and pieces working, now we just need to hire someone to do the AI part” — in general, the model-fitting part of an AI project is the easiest and fastest

Will Cray 

Thank you so much for the info–it’s really helping me getting a better understanding of the landscape. Would your opinion, especially regarding that last message, change if the AI work I was doing was more custom model/agent design and training, rather than doing something quick like .fit() in sklearn?

Michael Kaminsky

ummm maybe? but like who needs custom model/agent design and training that doesn’t already have in-house data scientists working on it?

Michael Kaminsky

I don’t want to dissuade you, but my point is that you should think about who your customers are, and how you can market your services in such a way that it will provide them value. If you don’t have a clear map of the three concepts in italics, it could get rough — you can definitely figure it out by doing it, but that’s what you’ll be up against

Will Cray

You mentioned “larger consulting outfits” earlier–do you have any examples of organizations that you think could be a good fit?

Michael Kaminsky

so Brooklyn Data Company and 4 mile consulting are the two that jump to my mind — they specialize in BI and data but might want flex capacity into DS — they might be able to give you deal flow, etc. I know there are a number of others, maybe even folks in this channel

Thread #2 – Boris Gorelik

This is a copy/paste from Slack with some later edits and additions. 

Boris Gorelik 

Another thing to consider is what your risks are. If there are people who depend on you financially, starting with a freelance career might be too risky, especially if you don’t have 1-2 (better 2) customers who already committed to paying you for your services.

If you can afford several months without a steady income, or no income at all, being a freelancer might expose you to a larger variety of companies and business models in the market. I know some people who used to work as freelancers and gradually “adopted” one customer and moved to full employment. In these cases, freelance projects were, in fact, mutual trial periods where both sides decided whether there is a good fit.

Will Cray 

I greatly appreciate this insight. I have little risks. I’m single, my living expenses are low, and I have some financial runway. Part of the reason I like the idea of freelancing is for the reason you stated–I’ll get to see many different business models. As an aspiring entrepreneur, I think diversity of experiences and exposure would be useful to me. I also think being flexible in how many hours I work will allow me to allocate more time to developing my own ideas/projects; although, I understand that’s a luxury that comes with being an established freelancer. I don’t have any clients currently. Do you have any recommendations for channels to try and garner clients?

Boris Gorelik

> As an aspiring entrepreneur, I think ….

Even though a freelancer and an entrepreneur’s legal status may be the same, they are different occupations and careers. An entrepreneur creates and realizes business models; a freelancer sells their time and expertise to fulfill someone else’s ideas. That’s true that most of the time (not always), combining freelance with entrepreneurship is easier than combining entrepreneurship with being a full-time employee in a traditional company.

 > Do you have any recommendations for channels to try and garner clients?

Nothing except the regular facebook/linkedin/ but mostly friends and former coworkers and, in your case, teachers/lecturers. I got my first job interview via my Ph.D. advisor. Later, when I helped in hiring processes, I asked him and other professors to refer me to proper candidates. So yeah, make sure your professors know your status.

The hazard of being a wizard. On balance between specialization and the risk to become obsolete.

A wizard is a person who continually improves his or her professional skill in a particular and defined field. I learned about this definition of wizardness from the book “Managing project, people and yourself” by Nikolay Toverosky (the book is in Russian).  

Recently, Nikolay published an interesting post about the hazards of becoming a wizard. The gist of the idea is that while you are polishing your single skill to perfection, the world changes. You may find your super-skill irrelevant anymore (see my Soviet Shoemaker story).

Nikolay doesn’t give any suggestions. Neither do I. 

Below is the link to the original post. The post is in Russian, and you can use Google Translate to read it.

Страница о магах У меня в книге есть глава про полководцев и магов. В её конце я подвожу итог: Несмотря на свою кру­тость, маг уяз­вим. Он поле­зен, только если его навык под­хо­дит к задаче. 658 more words

Почему опасно быть магом — Об управлении проектами и дизайне

Bioinformatics career advice and a story about a Soviet shoemaker

When I was in elementary school (back in the USSR of the mid 80’s), I had a friend whose father was a shoemaker. Due to the crazy stupid way the Soviet economy worked, a Soviet shoemaker was much richer than a physician or an engineer. But this is not the story. The story is that one day this friend’s father had a chat with me about selecting a profession. This man’s point was that for as long as people have feet and need shoes on their feet, a shoemaker would be required and well-earning occupation. Guess what? People still have feet, and still, ware shoes, but I don’t see too many successful shoemakers anymore. 

Common wisdom says, “It is very hard to predict, especially the future.” And I will add “even more especially, about the job market.”. Nevertheless, people need to decide what to do with their lives, how to live, and what career paths to pursue. Some of them ask me, and I’m glad to answer. If you have any career-related questions, don’t be shy! Write to boris@gorelik.net, and I’ll see what wisdom I will be able to share with you.

Anyhow, this is a letter that I got from another pharmacist looking for a data science career.

Hope you are doing well. I saw your posts on Quora and thought of asking a doubt.
First let me tell my background. I am from India, I completed my Doctor of Pharmacy program (Pharm D). I am familiar with computer programming. I have intermediate knowledge in python and R programming.  So I thought taking up Bioinformatics and computational biology Masters program so that I can connect Pharma industry and my knowledge in computer science. 
What do you think? 
I have applied to University XYZ and got offer letter. I have to take a decision within 2 weeks.
Please let me know your thoughts on this.

To which I replied

Obviously, since the path you are describing similar to the one I took, I will think that it is a good idea. Moreover, as you might have read in my blog (for example, here), my opinion is that advanced degrees give much more stable foundations, compared to the “fast and easy” courses. Having said that, your life is yours, not mine, and the job market today is not the job market in 2001 when I graduated my B.Pharm.  

Thank you so much for replying to my silly question. I am honoured to get a response from you. 

First of all, I don’t believe in “there are no silly questions” bullshit, but asking a silly question is better than not asking at all. Secondly, these questions are not silly at all.

I have a question, in your post dated 2017, you have mentioned that Bioinformatics was booming in 2001 and now it has lost its significance. Are you still have the same thoughts? 

I think that this person refers to the most visited post of mine “Don’t study data science as a career move; you’ll waste your time!”.  There is also a 2019 follow-up.

If that is the case then me taking a master’s in bioinformatics and computational genomics would be a bad idea, right ?

Here’s what I responded. Keep in mind that I wrote this before the COVID-19 outbreak.

Look, the markets in different countries are different. 

Back in the old days, there was a worldwide wave of closing bioinfo companies. All the Israeli ones were either closing or counting weeks before closing. One anecdote: I was interviewing at a company. Two weeks later, I called the person who interviewed me to ask whether I got the job or not, and the secretary told me that that person was fired due to layoffs. 

Right now, Israel sees a renaissance of bioinformatics companies, but I don’t know what will happen in the future. These companies live mostly out of investors’ money and are subject to strict regulations. However, if you get a good education, your head will be full of useful mental models, relevant basic knowledge, and good practices. 

End of quote. One of The COVID-19 madness side effects is the massive influx of money into biotech companies. Is this a short-term anecdote, or will it become a sustainable trend? I have no idea.

Do you have any career-related questions to me? You don’t have to be a pharmacist to ask :-). Write to boris@gorelik.net. I promise to respond, even if by sending a link to my blog posts. 

The difference between statistically meaningful and practically meaningful. An interview with me

Recently, I gave an interview to the Techie Leadership site. Andrei Crudu, the interviewer, made a helpful outline of the conversation. I marked the most important parts in bold.

  • Academic views on leadership;
  • Managing people isn’t for everyone;
  • Lessons from a practical approach;
  • Data Science is predominantly about data cleaning;
  • The difference between statistically meaningful and practically meaningful;
  • How sometimes companies tweak results to match expectations;
  • Bad managers make you appreciate the good managers;
  • Giving credit, being decent and not cheating;
  • All good teamwork starts with effective communication;
  • You don’t know that the stuff that you know is unknown to others;

Overall, I enjoyed chatting with Andrei, and I hope you’ll enjoy listening to the interview. If you have any comments, feel free sharing them here or on the Techie Leadership size

Calling bullshit on “persistence leads to success”

Did you know that J.K. Rowling, the author of Harry Potter, submitted her books 13 times before it was accepted? Did you know that Thomas Edison tried again and again, even though his teachers thought he was “too stupid to learn anything?” Did you know that Lior Raz (Fauda’s creator and lead actor) was an anonymous actor for more than ten years before he broke the barrier of anonymity? What do these all people have in common? They persisted, and they succeeded. BUT, and there is a big but.

girl wearing pink framed sunglasses

People keep telling us: follow your dream, and if you persist, it will come true. You will learn from your mistakes, improve, and adapt, and finally, will reach your goal. I call bullshit

Think of the Martingale betting strategy. In theory, it works. Why doesn’t it work in practice? Because nobody has infinite time and infinite pockets. The same is right with chasing your dream. We need to pay for the shelter above our heads, the food on our tables, the clothes that we wear. Often other people depend on us. Time passes by. I had to be a party pooper, but some people who chase their dreams will eat all their savings and will either have to give up or declare bankruptcy (and then give up).

Survivorship bias

But what about all those successful failers? What we see a typical example of survivorship bias, the logical error of concentrating on the people or things that made it past some selection process and overlooking those that did not, typically because of their lack of visibility. We know the names Rowling, Edison, Raz, and others not because of their multiple failures but DESPITE them. For every Rowling, Edison, and Raz, there are thousands of failed writers, engineers, and actors who ended up broke and caused sorrow to their families.

So, should I quit?

I don’t know. Maybe. Maybe not. It’s your life, your decision.

On oranizing a data org in a company, job titles, and more

Photo by Khimish Sharma on Pexels.com

My colleague, Simon Ouderkik, recorded a REALLY interesting interview with Stephen Levin of Zapier and Emilie Schario of Gitlab on organizing data org in a company, job titles, career ladders, and other important stuff.

As y’all may recall, last year I was lucky enough to spens some time working with the fine folks at Locally Optimistic to produce and run some AMA content for them – they ended up being more similar to traditional interviews, but folks seemed to enjoy them! You can find those all here! These were […]

I’m Giving Video Content a Try! — Simon Ouderkirk

Another piece of career advice

Here’s another email that I got with the question about switching to the data science career

Hello, my name is X. I saw your blog, and to be honest, I said, “Wow, is this me :)” I’m a pharmacist 5th-grade student currently working on a project in computational drug design. I started programming, and I loved it. After that, I heard the term “Data Science” and started to do some research […]

Basically, I loved being on a computer and solving problems its a good career option for me (at least for now, you can’t predict future) my mom has a pharmacy I worked there (internship), and it is not for me (i am counting the time when I’m in a pharmacy.) so I have a few questions for you

I don’t have any degree in statistics or CS or something equivalent I am determined to learn these topics, but some people want to see the degree, and probably no one accept a pharmacist to a master degree in statistics (I also wish to do my Ms in computational drug design because, in the end, I don’t want to be a data scientist in social sciences or economics, at least for now, I want to use that knowledge in my field which is drugs and pharmaceuticals)

Ph.D. on Bioinformatics would help ? or Biostatistics ( is it easier for us to be accepted in biostatistics rather than statistics? To be honest, I don’t know the difference much, I took a biostatistics class, but it was just one semester and probably not enough for Ph.D. :))

Do I really need a degree in CS or statistics to be a pharmaceutical data scientist? I want to do my Ph.D. but also want to be realistic, it sounds amazing doing online masters in statistics while you are doing Computational drug design or Bİoinformatics Ph.D., but it is very hard and frustrating and also decrease your productivity in both fields.

I asked a lot of questions, sorry, but I have many :). You can reply when you have time. Thank you, and I loved your blog. I read and watched tons of things, but yours was the best suited for me because being a pharmacist, computational drug design, considering bioinformatics, it is all fits. By the way, I also considering cybersecurity (not working in a company but learning). I see that as a “martial arts of the future,” maybe I am wrong, but a person should know it to protect him/her self. Thank you again 🙂

Indeed, X’s background sounds very much like mine.
I’m not sure I have too much to add to what I already wrote here, in this blog. The only thing that I have to say is that in my biased opinion, a Ph.D. is something worth pursuing. The more time passes by, the more Ph.Ds there are, and the lack of a degree might be a problem in the future job market. On the other hand, there are many smart and rich people who claim that university degrees are a waste of time. Go figure 🙂

I hope that this helps.

Career advice. A clinical pharmacist, epidemiologist, and a Ph.D. student wants to become a data scientist.

Photo by Pixabay on Pexels.com

From time to time, I get emails from people who seek advice in their career paths. If I have time, I write them an extended reply and if they agree, I publish the questions and my replies here, in my blog. Here’s one such email exchange. All similar pieces of advice, as well as other rants about a career in data science, can be found here.

“Hi Boris 🙂
My name is XXXXX. I came across your blog while searching for people with a mix of pharmacy and data science skillsets. Your blog has been so informative to me so far but I was compelled to write to you to ask for your advice.
I am a clinical pharmacist by background but decided to leave the clinical pharmacy to pursue public health. Whilst doing my MPH, I fell in love with epidemiology and statistics and am now doing a Ph.D. in biostatistics. Your blog has made me feel very happy that I made this career move <…>  I feel better about my decision to leave the pharmacy and pursue a quant Ph.D. I have gone from pharmacy, to internships at <YYYY> as I wanted to pursue a career in <ZZZZZ> and now I am thinking of data science in the tech industry…my background is a bit confusing!”

In the past, I also felt that the pharmacy degree was confusing many potential employers, and since I wanted to leave the bio/pharma world and move to “pure data” positions, I omitted the B.Pharm title & studies from my CV. Ten years ago, the salaries in the bio sector, here in Israel, were much lower than the salaries in the “high tech” field. I think that today this situation is more or less normalized and that the people got used to the fact that a typical “data scientist” can have a very wide range of degrees.

“I was just wondering if I could get your opinion on the three questions I have. 
1. I work part-time as a clinical pharmacist to not forget my clinical skills. What do you think about the future of the pharmacy career overall?”

My last shift as a pharmacist

This is a huge question and I don’t have answers to it. Moreover, the answer depends heavily on legal regulations in the given country. I say that if you enjoy treating people, and can afford this time, why not? I, personally, was a very lousy pharmacist 🙂 so I was very happy to leave the pharmacy.

“I am wondering if I should keep up my pharmacist title or pursue data science full-time.”

Again, it depends. For many years, I didn’t have my pharmacy title in my CV because it felt unrelated to what I was doing. It was also a nice icebreaker to tell people with whom I worked “by the way, I’m a pharmacist” and it was fun to see their reactions. If I were you, I would ask two-three HR people or people who recruit employees what they think. Different countries may behave differently. 

“2. At what point can someone call themselves a data scientist?”

In my opinion, as long as you are comfortable enough to call yourself a data scientist, you are good to go. Note that unlike many people who got their data science “title” after taking some online courses, you already have a very strong theoretical base. Not only are your Master’s and the future Ph.D. degree relevant to data science, but they also give you strong and unique advantages. 

“I am looking at DS jobs at large tech companies. I am not sure how qualified and experienced I have to be for these jobs. I code in R using regression, clustering and time series methods and I am quite fluent in this language. I have just started to learn ML algorithms. I have a basic foundation in Python and SQL. I use Tableau for visualization and love communicating my research at any opportunity I get. I was wondering…how good do I have to be able to apply to DS jobs? What are the methods that data scientists use mostly? Would I be able to learn on the job?”

It sounds like a good combination of techniques. I am not recruiting but if I would, I would definitely like this list of skills. Personally, I don’t like R too much and prefer Python. But once you program one language, moving to another one is a doable task. As to what methods do data scientists use mostly, this hugely depends on your job. Most of my time, I clean data and write wrapper functions around known algorithms. The task that I have been facing during my professional life required regression, classification, network analysis. I never did real deep learning stuff, but I know people who only do deep learning for image and sound analysis. Also, in many cases, the data science part takes only 10% of your time because the “customer” doesn’t care about an algorithm, they want a solution. See this post for a nice example.

“3. If you had the opportunity to start your career again, say you were in your early twenties, what would you study and why? What advice would you have for your younger self? I would be so keen to hear what you think.”

It’s a philosophical task which I never like doing. What is done is done. The fact that I am a pretty successful data scientist may mean that I took the right decisions or that I was super lucky. 

Software commodities are eating interesting data science work — Yanir Seroussi

If you read my shortish post about staying employable as a data scientist, you might like a longer post by a colleague, Yanir Seroussi. In his post, Yanir lists four possible paths for a data scientist: (1) become an engineer; (2) reinvent the wheel; (3) search for niches; and (4) expand the cutting edge.

To this list, I would also add two other options.

(5) Manage. Managing is not developing, it’s a different profession. However, some developers and data scientists that I know choose this path. I am not a manager myself, so I hope I don’t insult the managers who read these lines, but I think that it is much easier for a good manager to stay good, than for a good developer or data scientist.

(6) Teach. I teach as a part-time job. One reason for teaching is that I sometimes enjoy it. Another reason is that I feel that at some point, I might not be good enough to stay on the cutting edge but still sharp enough to teach the new generations the basics.

Anyhow, read Yanir’s post linked below.

The passage of time makes wizards of us all. Today, any dullard can make bells ring across the ocean by tapping out phone numbers, cause inanimate toys to march by barking an order, or activate remote devices by touching a wireless screen. Thomas Edison couldn’t have managed any of this at his peak—and shortly before […]

Software commodities are eating interesting data science work — Yanir Seroussi

Career advice. A research pharmacist wants to become a data scientist.

Recently, I received an email from a pharmacist who considers becoming a data scientist. Since this is not a first (or last) similar email that I receive, I think others will find this message exchange interesting.

Here’s the original email with minor edits, followed by my response.

The question

Hi Boris, 


My name is XXXXX, and I came across your information and your advice on data science as I was researching career opportunities.

I currently work at a hospital as a research pharmacist, mainly involved in managing drugs for clinical trials.
Initially, I wanted to become a clinical pharmacist and pursued 1-year post-graduate residency training. However, it was not something I could envision myself enjoying for the rest of my career.

I then turned towards obtaining a Ph.D. in translational research, bridging the benchwork research to the bedside so that I could be at the forefront of clinical trial development and benefit patients from the rigorous stages of pre-clinical research outcomes. I much appreciate learning all the meticulous work dedicated before the development of Phase I clinical trials. However, Ph.D. in pharmaceutical sciences was overkill for what I wanted to achieve in my career (in my opinion), and I ended up completing with master’s in pharmaceutical sciences.

Since I wanted to be involved in both research and pharmacy areas in my career, I ended up where I am now, a research pharmacist.

My main job description is not any different from typical hospital pharmacists. I do have a chance of handling investigational medications, learning about new medications and clinical protocols, overseeing side effects that may be a crucial barrier in marketing the trial medications, and sometimes participating in development of drug preparation and handling for investigator-initiated trials. This does keep my job interesting and brings variety in what I do. However, I do still feel I am merely following the guidelines to prepare medications and not critically thinking to make interventions or manipulate data to see the outcomes. At this point, I am preparing to find career opportunities in the pharmaceutical industry where I will be more actively involved in clinical trial development, exchanging information about targeting the diseases and analyzing data. I believe gaining knowledge and experiences in critical characteristics for the data science field would broaden my career opportunities and interest. Still, unfortunately, I only have pharmacy background and have little to no experience in computer science, bioinformatics, or machine learning.

The answer

First of all, thank you for asking me. I’m genuinely flattered. I assume that you found me through my blog posts, and if not, I suggest that you read at least the following posts

All my thoughts on the career path of a data scientist appear in this page https://gorelik.net/category/career-advice/

Now, specifically to your questions.

My path towards data science was through gradual evolution. Every new phase in my career used my previous experience and knowledge. From B.Sc studies in pharmacy to doctorate studies in computational drug design, from computational drug design to biomathematical modeling, from that to bioinformatics, and from that to cybersecurity. Of course, my path is not unique. I know at least three people who followed a similar career from pharmacy to data science. Maybe other people made different choices and are even more successful than I am. My first advice to everyone who wants to transition into data science is not to (see the first link in the list above). I was lucky to enter the field before it was a field, but today, we live in the age of specialization. Today we have data analysts, data engineers, machine learning engineers, NLP scientists, image processing specialists, etc. If computational modeling is something that a person likes and sees themselves doing for living, I suggest pursuing a related advanced degree with a project that involves massive modeling efforts. Examples of such degrees for a pharmacist are computational chemistry, pharmacoepidemiology, pharmacovigilance, bioinformatics. This way, one can utilize the knowledge that they already have to expand the expertise, build a reputation, and gain new knowledge. If staying in academia is not an option, consider taking a relevant real-life project. For example, if you work in a hospital, you could try identifying patterns in antibiotics usage, a correlation between demographics and hospital re-admission, … you get the idea.

Whatever you do, you will not be able to work as a data scientist if you can’t write computer programs. Modifying tutorial scripts is not enough; knowing how to feed data into models is not enough.

Also, my most significant knowledge gap is in maths. If you do go back to academia, I strongly suggest taking advantage of the opportunity and taking several math classes: at least calculus and linear algebra and, of course, statistics. 

Do you have a question for me?

If you have questions, feel free writing them here, in the comments section or writing to boris@gorelik.net

Staying employable and relevant as a data scientist

One common wisdom is that creative jobs are immune to becoming irrelevant. This is what Brian Solis, the author of “Lifescale” says on this matter

On the positive side, historically, with every technological advancement, new jobs are created. Incredible opportunity opens up for individuals to learn new skills and create in new ways. It is your mindset, the new in-demand skills you learn, and your creativity that will assure you a bright future in the age of automation. This is not just my opinion. A thoughtful article in Harvard Business Review by Joseph Pistrui was titled, “The Future of Human Work Is Imagination, Creativity, and Strategy.” He cites research by McKinsey […]. In their research, they discovered that the more technical the work, the more replaceable it is by technology. However, work that requires imagination, creative thinking, analysis, and strategic thinking is not only more difficult to automate; it is those capabilities that are needed to guide and govern the machines.

Many people think that data science falls into the category of “creative thinking and analysis”. However, as time passes by this becomes less true. Here’s why.

As time passes by, tools become stronger, smarter, and faster. This means that a problem that could have been solved using cutting edge algorithms running by cutting edge scientists on cutting edge computers, will be solvable using a commodity product. “All you have to do” is to apply domain knowledge, select a “good enough” tool, get the results and act upon them. You’ll notice that I included two phases in quotation marks. First, “all you have to do”. I know that it’s not that simple as “just add water” but it gets simpler.

“Good enough” is also a tricky part. Selecting the right algorithm for a problem has dramatic effect on tough cases but is less important with easy ones. Think of a sorting algorithm. I remember my algorithm class professor used to talk how important it was to select the right sorting algorithm to the right problem. That was almost twenty years ago. Today, I simply write list.sort() and I’m done. Maybe, one day I will have to sort billions of data points in less than a second on a tiny CPU without RAM, which will force me into developing a specialized solution. But in 99.999% of cases, list.sort() is enough.

Back to data science. I think that in the near future, we will see more and more analogs of list.sort(). What does that mean to us, data scientists? I am not sure. What I’m sure is that in order to stay relevant we have to learn and evolve.

Featured image by Héctor López on Unsplash

On MOOCs

When Massive Online Open Courses (a.k.a MOOCs) emerged some X years ago, I was ecstatic. I was sure that MOOCs were the Big Boom of higher education. Unfortunately, the MOOC impact turned out to be very modest. This modest impact, combined with the high production cost was one of the reasons I quit making my online course after producing two or three lectures. Nevertheless, I don’t think MOOCs are dead yet. Following are some links I recently read that provide interesting insights to MOOC production and consumption.

  • A systematic study of academic engagement in MOOCs that is scheduled for publication in the November issue of Erudit.org. This 20+ page-long survey summarizes everything we know about MOOCs today (I have to admit, I only skimmed through this paper, I didn’t read all of it)
  • A Science Magazine article from January, 2019. The article, “The MOOC pivot,” sheds light to the very low retention numbers in MOOCs.
  • On MOOCs and video lectures. Prof. Loren Barbara from George Washington University explains why her MOOCs are not built for video. If you consider creating an online class, you should read this.
  • The economic consequences of MOOCs. A concise summary of a 2018 study that suggest that MOOC’s economic impact is high despite the high churn rates.
  • Thinkful.com, an online platform that provides personalized training to aspiring data professionals, got in the news three weeks ago after being purchased for $80 million. Thinkful isn’t a MOOC per-se but I have a special relationship with it: a couple of years ago I was accepted as a mentor at Thinkful but couldn’t find time to actually mentor anyone.

The bottom line

We still don’t know how this future will look like and how MOOCs will interplay with the legacy education system but I’m sure the MOOCs are the future

Book review: The Formula by A. L Barabasi

The bottom line: read it but use your best judgement 4/5

I recently completed reading “The Formula. The Universal Laws of Success” by Albert-László Barabási. Barabási is a network science professor who co-authored the “preferential attachment” paper (a.k.a. the Barabási-Albert model). People who follow him closely are ether vivid fabs or haters accusing him of nonsense science.

For several years, A-L Barabási is talking and writing about the “science of success” (yeah, I can hear some of my colleagues laughing right now). Recently, he summarized the research in this area in an easy-to-read book with the promising title “The Formula. The Universal Laws of Success.” The main takeaways that I took from this book are:

  • Success is about us, not about you. In other words, it doesn’t matter how hard you work and how good your work is, if “we” (i.e., the public) don’t know about it, or don’t see it, or attribute it to someone else.
  • Be known for your expertise. Talk passionately about your job. The people who talk about an idea will get the credit for it. Consider the following example from the book. Let’s say, prof. Barabasi and the Pope write a joint scientific paper. If the article is about network science, it will be perceived as if the Pope helped Barabasi with writing an essay. If, on the other hand, if it is a theosophical book, we will immediately assume that the Pope was the leading force behind it.
  • It doesn’t matter how old you are; the success can come to you at any age. It is a well-known fact that most successful people broke into success at a young age. What Barabási claims is that the reason for that is not a form of ageism but the fact that the older people try less. According to this claim, as long as you are creative and work hard, your most significant success is ahead of you.
  • Persistence pays. This is another claim that Barabasi makes in his book. It is related to the previous one but is based on a different set of observations (did you know that Harry Potter was rejected twelve times before it was published?). I must say that I’m very skeptical about this one. Right now, I don’t have the time to explain my reasons, and I promise to write a dedicated post.

Keep in mind that the author uses academic success (the Nobel prize, citation index, etc.) as the metric for most of his conclusions. This limitation doesn’t bother him, after all, Barabási is a full-time University professor, but most of us should add another grain of salt to the conclusions. 

Overall, if you find yourself thinking about your professional future, or if you are looking for a good career advice, I recommend reading this book. 

Curated list of established remote tech companies

Someone asked me about distributed companies or companies that offer remote positions. Of course, my first response was Automattic but that person didn’t think that Automattic was a good fit for them. So I googled and was surprised to discover that my colleague, Yanir Seroussi, maintains a list of companies that offer remote jobs.

I work at Automattic, one of the biggest distributed-only companies in the world (if not the biggest one). Recently, Automattic founder and CEO, Matt Mullenweg started a new podcast called (surprise) Distributed.

The third wave data scientist – a useful point of view

In 2019, it’s hard to find a data-related blogger who doesn’t write about the essence and the future of data science as a profession. Most of these posts (like this one for example) are mostly useless both for existing data scientists who think about their professional plans and for people who consider data science as their career.

Today I saw yet another post which I find very useful. In this post, Dominik Haitz identifies a “third wave data scientist.” In Dominik’s opinion, a successful data scientist has to combine four features: (1) Business mindset (2) Software engineering craftsmanship (3) Statistics and algorithmic toolbox, and (4) Soft skills. In Dominik’s classification, the business mindset is not “another skill” but the central pillar.

The professional challenges that I have been facing during the past eighteen months or so, made me realize the importance of points 1, 2, and 3 from Dominik’s list (number 4 was already very important on my personal list). However, it took reading his post to put the puzzle parts in place.

Dominik’s additional contribution to the discussion is ditching the famous data science Venn Diagram in favor of another, “business-oriented” visual which I used as the “featured image” to this post.

Painting: sailors in a wavy sea
A fragment from an 1850 painting by the Russian Armenian marine painter Ivan Aivazovsky named “The Ninth Wave.” I wonder what the “ninth wave data scientist” will be.

To specialize, or not to specialize, that is the data scientists’ question

In my last post on data science career, I heavily promoted the idea that a data scientist needs to find his or her specialization. I back my opinion with my experience and by citing other people opinions. However, keep in mind that I am not a career advisor, I never surveyed the job market, and I might not know what I’m talking about. Moreover, despite the fact that I advocate for specialization, I think that I am more of a generalist.

Since I published the last post, I was pointed to some other posts and articles that either support or contradict my point of view. The most interesting ones are: “Why you shouldn’t be a data science generalist” and “Why Data Science Teams Need Generalists, Not Specialists“, both are very recent and very articulated but promote different points of view. Go figure

The featured image is based on a photo by Tom Parsons on Unsplash

The data science umbrella or should you study data science as a career move (the 2019 edition)?

TL/DR: Studying data science is OK as long as you know that it’s only a starting point.

Almost two years ago, I wrote a post titled “Don’t study data science as a career move.” Even today, this post is the most visited post on my blog. I was reminded about this post a couple of days ago during a team meeting in which we discussed what does a “data scientist” mean today. I re-read my original post, and I think that I was generally right, but there is a but…

The term “data science” was born as an umbrella term that meant to describe people who know programming, statistics, and business logic. We all saw those numerous Venn diagrams that tried to describe the perfect data scientist. Since its beginning, the field of “data science” has finally matured. There are more and more people that question the mere definition of data science.

Here’s what an entrepreneur Chuck Russel has to say:

Now don’t get me wrong — some of these folks are legit Data Scientists but the majority is not. I guess I’m a purist –calling yourself a scientist indicates that you practice science following a scientific method. You create hypotheses, test the hypothesis with experimental results and after proving or disproving the conjecture move on or iterate.

Screenshot of a Google image search showing many Venn diagrams
There can’t be enough Venn diagrams

Now, “create and test hypotheses” is a very vague requirement. After all, any A/B test is a process of “creating and testing hypotheses” using data. Is anyone who performs A/B tests a data scientist? I think not.
Moreover, a couple of years ago, if you wanted to run an A/B test, perform a regression analysis, build a classifier, you would have to write numerous lines of code, debug and tune it. This tedious and intriguing process certainly felt very “sciency,” and if it worked, you would have been very proud of our job. Today, on the other hand, we are lucky to have general-purpose tools that require less and less coding. I don’t remember the last time I had to implement an analysis or an algorithm from the first principles. With the vast amount of verified tools and libraries, writing an algorithm from scratch feels like a huge waste of time.
On the other hand, I spend more and more time trying to understand the “business logic” that I try to improve: why has this test fail? Who will use this algorithm and what will make them like the results? Does effort justify the potential improvement?

I (a data scientist) have all this extra time to think of a business logic thanks to the huge arsenal of generalized tools to choose from. These tools were created mostly by those data scientists whose primary job is to implement, verify, and tune algorithms. My job and the job of these data scientists is different and requires different sets of skills.

There is another ever-growing group of professionals who work hard to make sure someone can apply all those algorithms to any amount of data they feel suitable. These people know that any model is at most as good as the data it is based on. Therefore, they build systems that deliver the right information on time, distribute the data among computation nodes, and make sure no crazy “scientist” sends a production server to a non-responsive state due to a bad choice of parameters. We already have a term for professionals whose job is to build fail-proof systems. We call them engineers, or “data engineers” in this case.

The bottom line

Up till now, I mentioned three major activities that used to be covered by the data science umbrella: building new algorithms, applying algorithms to business logic, and engineering reliable data systems. I’m sure there are other areas under that umbrella that I forgot. In 2019, we reached the point where one has to decide what field of data science does one want to practice. If you consider stying data science think of it as studying medicine. The vast majority of physicians don’t end up general practitioners but rather invest at least five more years of their lives professionalize. Treat your data science studies as an entry ticket into the life-long learning process, and you’ll be OK. Otherwise, (I’m citing myself here): You might end up a mediocre Python or R programmer who can fiddle with the parameters of various machine learning libraries, one of the many. Sometimes it’s good enough. Frequently, it’s not.

PS. Here’s a one-week-old article on Forbes.com with very similar theses: link.

Five misconceptions about data science

One item on my todo list is to write a post about “three common misconceptions about data science. Today, I found this interesting post that lists misconceptions much better than I would have been able to do. Plus, they list five of them. That 67% more than I intended to do 😉

I especially liked the section called “What is a Data Scientist” that presents six Venn diagrams of a dream data scientist.

The analogy between the data scientist and a purple unicorn is still apt – finding an individual that satisfies any one of the top four diagrams above is rare.

 

Enjoy reading  Five Misconceptions About Data Science – Knowing What You Don’t Know — Track 2 Analytics

Once again on becoming a data scientist

My stand on learning data science is known: I think that learning “data science” as a career move is a mistake. You may read this long rant of mine to learn why I think so. This doesn’t mean that I think that studying data science, in general, is a waste of time.

Let me explain this confusion. Take this blogger for example https://thegirlyscientist.com/. As of this writing, “thegirlyscientst” has only two posts: “Is my finance degree useless?” and “How in the world do I learn data science?“. This person (whom I don’t know) seems to be a perfect example of someone may learn data science tools to solve problems in their professional domain. This is exactly how my professional career evolved, and I consider myself very lucky about that. I’m a strong believer that successful data scientists outside the academia should evolve either from domain knowledge to data skills or from statistical/CS knowledge to domain-specific skills. Learning “data science” as a collection of short courses, without deep knowledge in some domain, is in my opinion, a waste of time. I’m constantly doubting myself with this respect but I haven’t seen enough evidence to change my mind. If you think I miss some point, please correct me.

 

 

Don’t take career advises from people who mistreat graphs this badly

Recently, I stumbled upon a report called “Understanding Today’s Chief Data Scientist” published by an HR company called Heidrick & Struggles. This document tries to draw a profile of the modern chief data scientist in today’s Big Data Era. This document contains the ugliest pieces of data visualization I have seen in my life. I can’t think of a more insulting graphical treatment of data. Publishing graph like these ones in a document that tries to discuss careers in data science is like writing a profile of a Pope candidate while accompanying it with pornographic pictures.

Before explaining my harsh attitude, let’s first ask an important question.

What is the purpose of graphs in a report?

There are only two valid reasons to include graphs in a report. The first reason is to provide a meaningful glimpse into the document. Before a person decided whether he or she wants to read a long document, they want to know what is it about, what were the methods used, and what the results are. The best way to engage the potential reader to provide them with a set of relevant graphs (a good abstract or introduction paragraph help too). The second reason to include graphs in a document is to provide details that cannot be effectively communicating by text-only means.

That’s it! Only two reasons. Sometimes, we might add an illustration or two, to decorate a long piece of text. Adding illustrations might be a valid decision provided that they do not compete with the data and it is obvious to any reader that an illustration is an illustration.

Let the horror begin!

The first graph in the H&S report stroke me with its absurdness.

Example of a bad chart. I have no idea what it means

At first glance, it looks like an overly-artistic doughnut chart. Then, you want to understand what you are looking at. “OK”, you say to yourself, “there were 100 employees who belonged to five categories. But what are those categories? Can someone tell me? Please? Maybe the report references this figure with more explanations? Nope.  Nothing. This is just a doughnut chart without a caption or a title. Without a meaning.

I continued reading.

Two more bad charts. The graphs are meaningless!

OK, so the H&S geniuses decided to hide the origin or their bar charts. Had they been students in a dataviz course I teach, I would have given them a zero. Ooookeeyy, it’s not a college assignment, as long as we can reconstruct the meaning from the numbers and the labels, we are good, right? I tried to do just that and failed. I tried to use the numbers in the text to help me filling the missing information and failed. All in all, these two graphs are a meaningless graphical junk, exactly like the first one.

The fourth graph gave me some hope.

Not an ideal pie chart but at least we can understand it

Sure, this graph will not get the “best dataviz” award, but at least I understand what I’m looking at. My hope was too early. The next graph was as nonsense as the first three ones.

Screenshot with an example of another nonsense graph

Finally, the report authors decided that it wasn’t enough to draw smartly looking color segments enclosed in a circle. They decided to add some cool looking lines. The authors remained faithful to their decision to not let any meaning into their graphical aidsScreenshot with an example of a nonsense chart.

Can’t we treat these graphs as illustrations?

Before co-founding the life-changing StackOverflow, Joel Spolsky was, among other things, an avid blogger. His blog, JoelOnSoftware, was the first blog I started following. Joel writes mostly about the programming business and. In order not to intimidate the readers with endless text blocks, Joel tends to break the text with illustrations. In many posts, Joel uses pictures of a cute Husky as an illustration. Since JoelOnSoftware isn’t a cynology blog, nobody gets confused by the sudden appearance of a Husky. Which is exactly what an illustration is – a graphical relief that doesn’t disturb. But what would happen if Joel decided to include a meaningless class diagram? Sure a class diagram may impress the readers. The readers will also want to understand it and its connection to the text. Once they fail, they will feel angry, and rightfully so

Two screenshots of Joel's blog. One with a Husky, another one with a meaningless diagram

The bottom line

The bottom line is that people have to respect the rules of the domain they are writing about. If they don’t, their opinion cannot be trusted. That is why you should not take any pieces of advice related to data (or science) from H&S. Don’t get me wrong. It’s OK not to know the “grammar” of all the possible business domains. I, for example, know nothing about photography or dancing; my English is far from being perfect. That is why, I don’t write about photography, dancing or creative writing. I write about data science and visualization. It doesn’t mean I know everything about these fields. However, I did study a lot before I decided I could write something without ridiculing myself. So should everyone.

 

The Keys to Effective Data Science Projects — Operationalize

Recently, I’ve stumbled upon an interesting series of posts about effective management of data science projects.  One of the posts in the series says:

 “Operationalization” – a term only a marketer could love. It really just means “people using your solution”.

The main claim of that post is that, at some point, bringing actual users to your data science project may be more important than improving the model. This is exactly what I meant in my “when good enough is good enough” post (also available on YouTube)

Gartner: More than 40% of data science tasks will be automated by 2020. So what?

Recently, I gave a data science career advice, in which I suggested the perspective data scientists not to study data science as a career move. Two of my main arguments were (and still are):

  • The current shortage of data scientists will go away, as more and more general purpose tools are developed.
  • When this happens, you’d better be an expert in the underlying domain, or in the research methods. The many programs that exist today are too shallow to provide any of these.

Recently, the research company Gartner published a press release in which they claim that “More than 40 percent of data science tasks will be automated by 2020, resulting in increased productivity and broader usage of data and analytics by citizen data scientists, according to Gartner, Inc.” Gartner’s main argument is similar to mine: the emergence of ready-to-use tools, algorithm-as-a-service platforms and the such will reduce the amount of the tedious work many data scientists perform for the majority of their workday: data processing, cleaning, and transformation. There are also more and more prediction-as-a-service platforms that provide black boxes that can perform predictive tasks with ever increasing complexity. Once good plug-and-play tools are available, more and more domain owners, who are not necessary data scientists, will be able to use them to obtain reasonably good results. Without the need to employ a dedicated data scientist.

Data scientists won’t disappear as an occupation. They will be more specialized.

I’m not saying that data scientists will disappear in the way coachmen disappeared from the labor market. My claim is that data scientists will cease to be perceived as a panacea by the typical CEO/CTO/CFO. Many tasks that are now performed by the data scientists will shift to business developers, programmers, accountants and other domain owners who will learn another skill — operating with numbers using ready to use tools. An accountant can use Excel to balance a budget, identify business strengths, and visualize trends. There is no reason he or she cannot use a reasonably simple black box to forecast sales, identify anomalies, or predict churn.

So, what is the future of data science occupation? Will the emergence of out-of-box data science tools make data scientists obsolete? The answer depends on the data scientists, and how sustainable his or her toolbox is. In the past, bookkeeping used to rely on manual computations. Has the emergence of calculators, and later, spreadsheet programs, result in the extinction of bookkeepers as a profession? No, but most of them are now busy with tasks that require more expertise than just adding the numbers.

The similar thing will happen, IMHO, with data scientists. Some of us will develop a specialization in a business domain — gain a better understanding of some aspect of a company activity. Others will specialize in algorithm optimization and development and will join the companies for which algorithm development is the core business. Others will have to look for another career. What will be the destiny of a particular person depends mostly on their ability to adapt. Basic science, solid math foundation, and good research methodology are the key factors the determine one’s career sustainability. The many “learn data science in 3 weeks” courses might be the right step towards a career in data science. A right, small step in a very long journey.

Featured image: Alex Knight on Unsplash

What is the best thing that can happen to your career?

Today, I’ve read a tweet by Sinan Aral (@sinanaral) from the MIT:

 

I’ve just realized that Ikigai is what happened to my career as a data scientist. There was no point in my professional life where I felt boredom or lack of motivation. Some people think that I’m good at what I’m doing. If they are right (which I hope they are), It is due to my love for what I have been doing since 2001. I am so thankful for being able to do things that I love, I care about, and am good at. Not only that, I’m being paid for that! The chart shared by Sinan Aral in his tweet should be guiding anyone in their career choices.

 

Featured image is taken from this article. Original image credit: Toronto Star Graphic 

Advice for aspiring data scientists and other FAQs — Yanir Seroussi

It seems that career in data science is the hottest topic many data scientists are asked about. To help an aspiring data scientist, I’m reposting here a FAQ by my teammate Yanir Seroussi

Aspiring data scientists and other visitors to this site often repeat the same questions. This post is the definitive collection of my answers to such questions (which may evolve over time). How do I become a data scientist? It depends on your situation. Before we get into it, have you thought about why you want […]

via Advice for aspiring data scientists and other FAQs — Yanir Seroussi

How to be a better teacher?

If you know me in person or follow my blog, you know that I have a keen interest in teaching. Indeed, besides being a full-time data scientist at Automattic, I teach data visualization anywhere I can. Since I started teaching, I became much better in communication, which is one of the required skills of a good data scientist.
In my constant strive for improving what I do, I joined the Data Carpentry instructor training. Recently, I got my certification as a data carpentry instructor.

Certificate of achievement. Data Carpentry instructor

Software Carpentry (and it’s sibling project Data Carpentry) aims to teach researchers the computing skills they need to get more done in less time and with less pain. “Carpentry” instructors are volunteers who receive a pretty extensive training and who are committed to evidence-based teaching techniques. The instructor training had a powerful impact on how I approach teaching. If teaching is something that you do or plan to do, invest three hours of your life watching this video in which Greg Wilson, “Carpentries” founder, talks about evidence-based teaching and his “Carpentries” project.

I also recommend reading these papers, which provide a brief overview of some evidence-based results in teaching:

What you need to know to start a career as a data scientist

It’s hard to overestimate how I adore StackOverflow. One of the recent blog posts on StackOverflow.blog is “What you need to know to start a career as a data scientist” by Julia Silge. Here are my reservations about that post:

1. It’s not that simple (part 1)

You might have seen my post “Don’t study data science as a career move; you’ll waste your time!“. Becoming a good data scientist is much more than making a decision and “studying it”.

2. Universal truths mean nothing

The first section in the original post is called “You’ll learn new things”. This is a universal truth. If you don’t “learn new things” every day, your professional career is stalling. Taken from the word of classification models, telling a universal truth has a very high sensitivity but very low specificity. In other words, it’s a useless waste of ink.

3. Not for developers only

The first section starts as follows: “When transitioning from a role as a developer to a position focused on data, …”. Most of the data scientists I know were never developers. I, for example, started as a pharmacist, computational chemist, and bioinformatician. I know several physicists, a historian and a math teacher who are now successful data scientists.

4. SQL skills are overrated

Another quote from the post: “Strong SQL skills are table stakes for data scientists and data engineers”. The thing is that in many cases, we use SQL mostly to retrieve data. Most of the “data scienc-y” work requires analytical tools and the flexibility that are not available in most of the SQL environments. Good familiarity with industry-standard tools and libraries are more important than knowing SQL. Statistics is way more important than knowing SQL. Julia Silge did indeed mention the tools (numpy/R) but didn’t emphasize them enough.

5. Communication importance is hard to overestimate

Again, quoting the post:

The ability to communicate effectively with people from diverse backgrounds is important.

Yes, Yes, and one thousand times Yes. Effective communication is a non-trivial task that is often overlooked by many professionals. Some people are born natural communicators. Some, like me, are not. If there’s one book that you can afford buying to improve your communication skills, I recommend buying “Trees, maps and theorems” by Jean-luc Doumont. This is a small, very expensive book that changed the way I communicate in my professional life.

6. It’s not that simple (part 2)

After giving some very general tips, Julia proceeds to suggest her readers checking out the data science jobs at StackOverflow Jobs site. The impression that’s made is that becoming a data scientist is a relatively simple task. It is not. At the bare minimum, I would mention several educational options that are designed for people trying to become data scientists. One such an option is Thinkful (I’m a mentor at Thinkful). Udacity and Coursera both have data science programs too. The point is that to become a data scientist, you have to study a lot. You might notice a potential contradiction between point 1 above and this paragraph. A short explanation is that becoming a data scientist takes a lot of time and effort. The post “Teach Yourself Programming in Ten Years” which was written in 2001 about programming is relevant in 2017 about data science.

Featured image is based on a photo by Jase Ess on Unsplash

Don’t study data science as a career move; you’ll waste your time!

March 2019: Two years after the completion of this post I wrote a follow-up. Read it here.

January 2020: Three years after the completion of this post, I realized that I wrote a whole bunch of career advices. Make sure you check this link that collects everything that I have to say about becoming a data scientist

No, this account wasn’t hacked. I really think that studying data science to advance your career is wasting your time. Briefly, my thesis is as follows:

  • Data science is a term coined to bridge between problems and experts.
  • The current shortage of data scientists will go away, as more and more general purpose tools are developed.
  • When this happens, you’d better be an expert in the underlying domain, or in the research methods. The many programs that exist today are too shallow to provide any of these.

To explain myself, let me start from a Quora answer that I wrote a year ago. The original question was:

I am a pharmacist. I am interested in becoming a data scientist. My > interests are pharmacoeconomics and other areas of health economics. What do I need to study to become a data scientist?

To answer this question, I described how I gradually transformed from a pharmacist to a data scientists by continuous adaptation to the new challenges of my professional career. In the end, I invited anyone to ask personal questions via e-mail (it’s boris@gorelik.net). Two days ago, I received a follow-up question:

I would like to know how to learn data science. Would you suggest a master’s degree in analytics? Or is there another way to add “data scientist” label on my resume?

Here’s my answer that will explain why, in my opinion, studying data science won’t give you job security.

Data scientists are real. Data science isn’t.

I think that while “data scientists” are real, “data science” isn’t. We, the data scientists, analyze data using the scientific methods we know and using the tools we mastered. The term “data scientist” was coined about five years ago for the job market. It was meant to help to bring the expertise and the positions together. How else would you explain a person who knows scientific analysis, machine learning, writes computer code and isn’t too an abstract thinker to understand the business need of a company? Before “data scientist,” there was a less catchy “dataist” http://www.dataists.com/. However, “data scientist” sounded better. It is only after the “data scientist” became a reality, people started searching for “data science.” In the future, data science may become a scientific field, similar to statistics. Currently, though, it is not mature enough. Right now, data science is an attempt to merge different disciplines to answer practical questions. Sometimes, this attempt is successful, which makes my life and the lives of many my colleagues so exciting.

Hilary Mason, from whom I learned the term dataist
Hilary Mason, from whom I learned the term “dataist”

One standard feature of most if not all, the data science tasks is the requirement to understand the underlying domain. A data scientist in a cyber security team needs to have an understanding of data security, a bioinformatician needs to understand the biological processes, and a data scientist in a financial institution needs to know how money works.

That is why, career-wise, I think that the best strategy is to study an applied field that requires data-intense solutions. By doing so, you will learn how to use the various data analysis techniques. More importantly, you will also learn how to conduct a complicated research, and how the analysis and the underlying domain interact. Then, one of the two alternatives will happen. You will either specialize in your domain and will become an expert; or, you will switch between several domains and will learn to build bridges between the domains and the tools. Both paths are valuable. I took the second path, and it looks like most of the today’s data scientists took that route too. However, sometimes, I am jealous with the specialization I could have gained had I not left computational chemistry about ten years ago.

Who can use the “data scientist” title?

Who can use the “data scientist” title? I started presenting myself as a “data scientist and algorithm developer” not because I passed some licensing exams, or had a diploma. I did so because I was developing algorithms to answer data-intense questions. Saying “I’m a data scientist” is like saying “I’m an expert,” or “I’m an analyst,” or “I’m a manager.” If you feel comfortable enough calling yourself so, and if you can defend this title before your peers, do so. Out of the six data scientists in my current team, we have a pharmacist (me), a physicist, an electrical engineer, a CS major, and two mathematicians. We all have advanced degrees (M.A. or Ph.D.), but none of us had any formal “data science” training. I think that the many existing data science courses and programs are only good for people with deep domain knowledge who need to learn the data tools. Managers can benefit from these courses too. However, by taking such a program alone, you will lack the experience in scientific methodology, which is central to any data research project. Such a program will not provide you the computer science knowledge and expertise to make you a good data engineer. You might end up a mediocre Python or R programmer who can fiddle with the parameters of various machine learning libraries, one of the many. Sometimes it’s good enough. Frequently, it’s not.

You might end up a mediocre Python or R programmer who can fiddle with the parameters of various machine learning libraries, one of the many. Sometimes it’s good enough. Frequently, it’s not.

Lessons from the past

When I started my Ph.D. (in 2001), bioinformatics was HUGE. Many companies had bioinformatics departments that consisted of dozens, sometimes, hundreds of people. Every university in Israel (where I live), had a bioinformatics program. I knew at least five bioinformatics startups in my geographic area. Where is it now? What do these bioinformaticians do? I don’t know any bioinformatician who kept their job description. Most of those who I know transformed into data science, some became managers. Others work as governmental clerks.

The same might happen to data science. Two years ago, Barb Darrow from the Fortune magazine wrote quoting industry experts:

Existing tools like Tableau have already sweated much of the complexity out of the once-very-hard task of data visualization, said Raghuram. And there are more higher-level tools on the way … that will improve workflow and automate how data interpretations are presented. “That’s the sort of automation that eliminates the need for data scientists to a large degree,” … And as the technology solves more of these problems, there will also be a lot more human job candidates from the 100 graduate programs worldwide dedicated to churning out data scientists
Supply, meet demand. And bye-bye perks.

My point is, you have to be versatile and expert. The best way to become one isn’t to take a crash course but to solve hard problems, preferably, under supervision. Usually, you do so by obtaining an advanced degree. By completing an advanced degree, you learn, you learn to learn, and you prove to yourself and your potential employees that you’re capable of bridging the knowledge gaps that will always be there. That is why is why I advocate obtaining a degree in an existing field, keeping the data science as a tool, not a goal.

I might be wrong.

Giving advice is easy. Living the life is not. The path I’m advocating for worked for me. I might be completely wrong here.

I may be completely wrong about data science not being a mature scientific field. For example, deep learning may be the defining concept of data science as a scientific field on its own.

Credits: The crowd image is by Flicker user Amy West. Hilary Mason's photo is from her site https://hilarymason.com/about/