The Power of Knowledge Sharing and Public Speaking

For the past three and a half years, I’ve hosted the Hebrew podcast This Week in the Middle East Podcast . Despite not being a Middle Eastern studies expert and knowing little Arabic, my passion and curiosity have led me on a remarkable journey of sharing knowledge and public speaking.

Each week, I’ve engaged with experts on various topics, providing insights into the Middle East and highlighting the importance of diverse voices. This experience has reinforced the value of knowledge sharing and open dialogue.

Recently, a major channel invited me to discuss Ramadan, despite my lacking of of formal credentials.

This illustrates that expertise extends beyond degrees to include passion, learning, and effective communication. 

I am already perceived as a #datavisualization expert, and now, people start asking for my opinion in a completely different field. How did this happen? 

It starts with you talking about something, then fearing to sound foolish, you learn about the subject to avoid embarrassment, and indeed, you become an expert

As we progress in our careers and lives, the significance of voicing our thoughts, exchanging ideas, and embracing various viewpoints becomes clear. Through such engagements, we evolve, learn, and foster meaningful discussions. Let’s keep breaking barriers through conversation.

Published
Categorized as blog

Don’t be afraid to explain. Really, don’t

In data visualization, much like in any form of communication, it’s vital to keep the main point front and center. That’s precisely why I’m a proponent of a clean, minimalistic approach to crafting data visuals, coupled with the inclusion of descriptive titles for each graph. These titles aren’t just fluff; they serve as a psychological lever, aiding in persuading your audience of your argument. Moreover, the act of titling forces a second look at the graph to ensure it accurately represents your intended message.

During a recent practical data visualization workshop I led, we tackled creating a graph that illustrated the income inequality in Israel in comparison to OECD countries. In the “before” version of the graph, displayed below on the left, there’s a noticeable redundancy between the title and the Y-axis label. Both essentially echoed each other, added no real value, and worst of all, were obscure to anyone not versed in the jargon of the “Gini Index”.

Our strategy for improvement was straightforward but effective: we swapped the title for the overarching conclusion. This modification was the kickoff for a cascade of enhancements. Yet, we hit a snag with the Gini Index itself—our focal point. Our solution? We underscored the fact that this index is a measure of inequality, clarified its scale (“Higher – more unequal”), and kept the term for those already in the know.

Wrapping up, the derision towards explaining the seemingly obvious, sparked by the “mansplaining” trend, has bled into all areas of communication. However, in the realm of data visualization, clarity and comprehensibility must reign supreme. By making our visual presentations both accessible and elucidatory, we widen the doorway for a more extensive audience to connect with and grasp complex information.

When a Model Fales, Make a Modelade

Or, How to Extract Value from Failed Projects

Typically, a professional post should begin with an introductory paragraph that provides some background and engages the reader. Let’s pretend that such a paragraph has been written and proceed directly to the story at hand. This story doesn’t end happily, nor does it end sadly. It simply begins and ends, and that’s all. Nonetheless, it’s worth your time.

The Story

I have a client who implements a variety of smart algorithms to assist individuals with money and innovative ideas in making a positive impact on society and the environment. They requested my help with modeling, and given my extensive experience as a data scientist and my track record of building numerous predictive models, I was keen to assist.

After working our modeling magic, we ended up with a predictive model that was “statistically significant” but practically unusable. What do I mean by that? Typically, we evaluate the value of a predictive model by comparing the predicted values with the known (“observed”) ones. Standard comparison procedures involve a correlation metric (or R-squared, which is NOT a correlation metric) and, for those who want to sound intellectual, a p-value. Both these metrics were excellent in our model. We also generated plots for a more comprehensive analysis. Below is a representation of our data using a completely fictitious dataset (rest assured, I would never share a client’s data).

Statistically Significant but Practically Useless

The graph indeed looks promising: the correlation coefficient was over 0.95, and the p-value (I can’t believe I’m resorting to this!) was 0.00000001, which is considered “excellent.” 

However, there’s the rub: echoing an old Russian proverb, “you can’t spread the p-value on your bread”, or to quote a less old Hebrew saying, “you can’t pay with a correlation coefficient at the grocery store.” Statistical tests demonstrate the existence of a connection between your model and reality. Yet, this connection isn’t sufficient for making informed decisions due to the excessively high spread in our case.

For our model to be of practical use to my client’s clients, the typical deviation from real life should be within an order of magnitude of 0.5 (whatever the units may be). If the deviation is higher, my client’s clients would be wasting time and money. Despite our diligent efforts and the extensive work with the client’s team, the typical deviation was significantly larger, rendering the model practically useless.

… or is it?

An old Yiddish adage offers wisdom: [yeah, I don’t have anything relevant, but I’m sure there is one]. Consider our situation: we’ve spent considerable time and effort building a model. Does the model’s prediction bear any relation to reality? Yes. Are the deviations too high? Again, yes. What does this mean? It indicates that many instances we’re trying to model don’t behave as anticipated based on our data. Herein lies an opportunity.

In this project, we’re attempting to forecast a key business metric. If an entity’s metrics are notably worse than expected, this identifies a significant opportunity for improvement—a low-hanging fruit. Consequently, my client or my client’s clients could approach the entity and offer assistance.

However, there’s another side to this. What hasn’t been mentioned is that the “observed” data comes from self-reports. This implies that some reports may be manipulated to portray a more optimistic picture than reality. Therefore, the same model can be used to identify potential “mistakes” 😉 in self-reports, which is a valuable exercise in its own right.

A happy ending?

The typical “war story” of a freelance consultant generally concludes with the client accepting the consultant’s insights, raking in substantial profits, and treating the consultant to a swanky race car. Let’s pretend that this is what happened, despite the reality: my client listened to my take and decided to invest their resources in procuring more high-quality data. 

Of course, if you ever need help with your modeling, feel free to reach out to me. And if a model doesn’t turn out as productive as you’d hoped, we can always attempt to make a rewarding ‘modelade’ from it. I’m always reachable at boris@gorelik.net.

Single-handedly Development: A Recipe for Troubles

[copied from my Substack newsletter]

The subject of this post primarily revolves around creators of digital solutions, such as programmers, designers, analysts, and data scientists. Regardless of whether you identify as one or manage one, I assure you there’s a valuable takeaway for all within this read.

We often encounter “lone wolves,” individuals who are the sole professional in their field within their organization. This situation typically arises when the company lacks the resources to employ more than one programmer, designer, or analyst. Such circumstances can pose significant risks and necessitate proper risk management. 

Now, you may say, “What about C-level managers? They too are often the sole professionals in their field, and working alone is the norm.” I’ll try to address the scenario of C-levels later, but let’s concentrate on everyone else.

What’s the big deal?

So, what’s the big deal?

To put it succinctly, we’re discussing knowledge workers: individuals who translate their brain prowess into value. To maximize this value creation, the process needs to be as efficient and honed as possible. The Talmud tells us about rabbi Hama bar Hanina who said, “Just as a knife is sharpened only by the steel of its mate, so too, a scholar [ knowledge worker, in our context] is sharpened only by his fellow.” When knowledge workers lose that sharpness, the quality of their work suffers. The output becomes suboptimal due to a lack of adversarial oversight, the curse of knowledge, and the bus factor. Let’s delve into each of these aspects.

Lack of adversarial oversight

When I operate within a team alongside peers, I understand that my work is continually subject to review. This can, and should, be a formal review process, such as code review in programming. But it can also take the form of informal exchanges of ideas during daily communication. A healthy organization fosters a culture of review and constructive debates. In such an environment, everyone is expected to receive and offer feedback, everyone anticipates being challenged and to challenge others. This potential for critique and the ongoing drive to critique others maintain a state of alertness and motivation for continuous improvement. 

The Talmud, which I mentioned earlier, is full of records of scholars disputing and challenging one another. That’s how these scholars ensured their constant intellectual growth. The renowned philosopher Karl Popper presented the concept of risky predictions, suggesting that any hypothesis—or piece of code, for that matter—should be bold enough to potentially be proven incorrect. If these bold assertions undergo testing and remain unrefuted, they are deemed accurate and, I would add, their author is deemed credible.

Now, consider our Lone Wolf. There’s no one to criticize them, no one to review their work, and no one requiring their review for their own tasks. The Lone Wolf’s colleagues hold deep respect for them because they’re the only ones in the team who know how to program, design a logo, or perform p-hacking. They admire the Lone Wolf, they appreciate the Lone Wolf, but they fail to sharpen the Lone Wolf’s skills.

The curse of knowledge

Rarely do you know what I don’t know. What may seem trivial to you could be completely enigmatic to me, and you might not even realize it. This phenomenon is known as the “curse of knowledge.”

One issue associated with the curse of knowledge relates to the first point of this post, the absence of oversight. When certain pieces of information seem obvious, you start treating your hypotheses and assumptions as facts and behave accordingly. Another risk is that the curse of knowledge can lead to inadequate planning and documentation.

When working solo on a small or moderately-sized project, you know exactly what’s happening within it. To onlookers, you resemble the archetypal chef in the kitchen, effortlessly grabbing the sharpest knife from the drawer without a second glance and instinctively knowing where every spice jar is located. 

But this seamless workflow can falter when one of two scenarios occurs: either your project becomes too large for you to manage all its details mentally, or a new member joins your team. In both cases, you start losing time trying to remember which function performs the preprocessing, which directory houses the client’s mockups, and which file contains the up-to-date data versus which one is merely a backup. Consequently, your code becomes less clean, you design against incorrect mockups, and your analysis is flawed. Worse still, you produce these substandard results in quadruple the time it would have taken had you properly planned and documented.

The risk of such chaos is significantly reduced when two or more colleagues collaborate in a team. Like dance partners, they must be careful not to step on each other’s toes, which encourages them to dedicate time to planning and documentation. As a team, they have more strength to resist the constant pressure to sacrifice quality, planning, and documentation for speed.

The bus factor

The bus factor refers to the risk associated with information and capabilities not being shared among team members, a concept that draws from the hypothetical scenario of ‘what if they were hit by a bus.’ Of course, we don’t need to be so morbid. Team members can leave their roles for various reasons: they might become parents, win the lottery, choose a monastic life, or any other myriad of joyful reasons. When a team member departs, there should exist some level of redundancy to compensate for the expertise lost. The issue with a lone professional leaving the organization extends beyond having no one to perform their tasks; there’s also no one who **knows** how to perform their tasks.

Onboarding a replacement for a team member is always a challenge. However, since the Lone Wolf wasn’t subjected to constant reviews (lack of oversight) and didn’t allocate enough time for planning and documentation (remember the curse of knowledge?), what is a challenge for a multi-member team escalates into a nightmare that can halt operations for weeks or even months. I personally witnessed first-hand situations like this. I personally saw thousands of lines of code rewritten from scratch because nobody knew how it worked.  

This resulting chaos not only disrupts workflow but also creates undue stress on the remaining team members and the new recruit, who must scramble to fill a knowledge gap without a clear roadmap. It’s a stark reminder of the importance of collaborative processes and shared understanding within a team.

Is there a solution?

The old saying goes, “It’s better to be young, healthy, and wealthy than old, sick, and poor.” Similarly, it’s obviously preferable to hire at least two professionals for each role. However, this isn’t always feasible. Even if budget isn’t a constraint, you might not require two designers, analysts, or programmers on your team. Having a bored knowledge worker is an issue in itself, warranting a separate discussion. 

So, what alternatives are there? One approach to mitigating this issue is to introduce a part-time colleague, a co-pilot, or a proverbial sidekick—either as a hired employee or a freelancer. This additional team member’s role would be to serve as a sharpening tool for your Lone Wolf, their sparring partner, someone with whom they can exchange ideas, ensure nothing is taken for granted, and verify that the correct processes are adhered to. For this arrangement to be effective, the co-pilot should not be a one-time visitor but rather a regular contributor. They need to understand your company’s business and culture and become a part of its institutional memory. In this way, you not only ensure a proper workflow but also safeguard against embarrassing bugs and unforeseen departures. 

One of the services I provide aligns exactly with this solution, and you’re encouraged to reach out if you’re seeking a collaborator for your Lone Wolf.

Feedback Fertilizer, Shit Sandwiches, and Other Musings on Growing Careers Like PlantsF

Copied from Substack newsletter

Let’s go pseudo-intellectual, shall we?

Feedback: The Essential Ingredient

One of the advantages of being a freelance consultant, as opposed to a traditional employee, is the opportunity for more frequent feedback. Each piece of feedback is precious, steering your career path. Positive feedback? Even better – it can truly make your day.

graphical user interface, text, application

Freelancers typically receive more feedback. But this doesn’t mean that traditional employees should settle for less. If you’re in a management position, remember to provide feedback regularly. And don’t shy away from seeking feedback from your own managers.

If I were going for a poetic analogy, giving feedback could be compared to watering a plant: it needs to be done regularly and in the right amounts, or the plant either withers or becomes waterlogged.

To keep things straightforward, here’s the key takeaway: Feedback is essential.

Now, speaking of feedback, let’s discuss the feedback strategy that should disappear from the face of this world – the ‘shit sandwich.’

Against the ‘Shit Sandwich’

Fact check No. 1: The sandwich, as a concept, was savored by Hillel, a Talmudic scholar from the 1st Century BC, centuries before it was “invented” by a notorious British gambler of the same name.

Fact check No. 2: Sabich ([saˈbiχ]), a pita bread sandwich filled with fried eggplants, hard-boiled eggs, chopped salad, parsley, amba, and tahini sauce, is the best street food dish globally. That’s it. Full stop. Period. The debate ends here.

Sabich. By the Wikipedia user Gilabrand under the Creative Commons Attribution-Share Alike 3.0 Unported license

Fact check No. 3: The ‘shit sandwich’ is a feedback strategy conceived by corporate America that basically involves sandwiching negative feedback between two layers of positive feedback. In theory, this approach aims to deliver bad news in a way that doesn’t hurt someone’s feelings. In practice, however, I’ve yet to meet someone who reacted positively to this method. Instead, I’ve heard numerous stories of people being called for a pre-dismissal hearing (a “performance review,” as they call it) without realizing it because the “bad news” was sugar-coated with fake positivity. 

I suspect the real motive behind the ‘shit sandwich’ is to make the delivery of feedback more comfortable for the giver rather than the receiver. Giving harsh feedback is challenging, but the person on the receiving end deserves the dignity of hearing your honest opinion. Make the effort – they’re worth it.

Published
Categorized as blog

Sometimes, good enough is good enough

Copied from my Substack newsletter

I want to share an experience I had with a CEO-entrepreneur that might offer some valuable insights for other managers and business owners who struggle delivering projects. I wish this were a success story post, but I see this case as a personal failure.

Before we continue, it’s a good time to remind you to share this newsletter with your colleagues.

This CEO approached me with concerns about the security of her custom-tailored questionnaire website, which couldn’t be created on standard platforms like Typeform or Crowdsignal. A small development company had built her site using WordPress, but an “expert” had warned her about potential security risks. Not knowing what to do, she spent a month searching for help before turning to me for advice on securing her site. 

Here’s what I told her

You don’t necessarily need a fortress for a website

Securing a site is like securing an office or a house. There’s no limit to how secure they can be, but there is a limit to how much time and effort you should spend on it. Most likely, your house doesn’t have armed personnel patrolling its perimeter, but strategic infrastructure buildings, such as an electric company, do.

Understanding your website’s potential vulnerabilities, the consequences of these vulnerabilities, and the resources you are willing to invest to mitigate them is vital. The best approach is to compile a list of potential security events and their potential impacts on your business, your customers, and the general public. Then, estimate the likelihood of each event. With this list in hand, you can engage a skilled consultant to formulate a plan.

Premature optimization as a form of procrastination

There is a saying in the software engineering industry, “premature optimization is the root of 

There’s a saying in the software industry, “Premature optimization is the root of all evil.” Of course, you should strive to deliver the best product you can. In an ideal world, you would build a perfect product, produce bug-free code, and write flawless documents. Realistically, however, you need to balance costs and benefits, allocate limited resources, and manage uncertainty. Thus, “good enough” is not always a sign of laziness but can be the most practical approach. 

The urge to optimize often stems from a lack of understanding or could be a form of perfectionism that masks procrastination. So, how can you ensure you’re on the right track? Seek advice from a trusted friend, colleague, or consultant. Ask open, non-leading questions, and be genuinely willing to consider perspectives other than your own. 

(Yes, you can reach out to me at: boris@gorelik.net)

I failed

Back to the CEO-entrepreneur.  During our conversation, I learned about her business and potential customers and concluded that her venture didn’t require the same security measures as a bank or a utility company. I researched the development company that built her site and found them reputable. So, I suggested launching the service, starting to work, acquiring new customers, making money, and allocating a portion of the income for future security investments.

Unfortunately, she disagreed with me. It has been five months since our conversation, and she’s still searching for the right security expert to rewrite her entire site from scratch. Two days ago, I asked her about her site. “It’s almost done,” she told me.

Are her customers waiting for her? I’m not sure.

To sum up

Recognizing the balance between perfect and practical can prevent unnecessary delays in your business ventures. Don’t hesitate to contact me if you’re looking for advice on how to navigate these waters.

Calling Bullshit on ‘Management is not Promotion’

“Climbing Invisible Ladders and Falling into Deep Holes: A Discourse in Five Parts” is a witty, engaging, and profoundly insightful exploration of corporate dynamics and career progression.”

Climbing Invisible Ladders and Falling Deep Holes: A Discourse in Five Parts

DRAMATIS PERSONAE

BORIS: A seasoned data scientist, middle-aged but ridiculously good-looking. An ex-Soviet Israeli, he adds an extra layer of cynicism to his character, complemented by a mysterious Russian-Israeli accent.

LAURA: The epitome of kindness, Laura is an American HR manager, and potentially the nicest person you’ll ever meet. She wears a constant, sincere smile.

DAPHNE: As a junior software developer, Daphne is smart and ambitious, constantly seeking opportunities to grow and evolve in her career.

Part 0. Prologue

FADE IN:

INT. HOTEL BAR – NIGHT

BORIS, LAURA, and DAPHNE sit around a table, each wearing a company name tag. A thought bubble appears above BORIS, reading, “It’s bullshit.” Boris shakes his head, dispelling the bubble.

LAURA

(Thoughtfully)

I hear you, Daphne. It’s great that you’re considering a promotion just three months into your first job. However, you should understand that management is more of a lateral move rather than a vertical one.

BORIS

(Shakes head, dispelling the “bullshit” thought bubble again, speaks sternly)

Laura, I strongly disagree.

A thought bubble appears above LAURA, reading, “Not him again.”

LAURA

(Smiling)

Interesting, Boris. Why do you think so?

BORIS

(Sighs)

Let’s discuss the term “promotion.” What do we seek when we aim for a promotion? More money, more autonomy, and a higher social status, wouldn’t you agree?

LAURA

(Nods)

Absolutely! And that’s exactly why transitioning to leadership roles doesn’t necessarily mean more money. We compensate employees based on their impact, not their position in the organizational chart. We also value and celebrate developers as much, if not more than managers, so their social status is already at its peak. All that managers do is facilitate developers in performing their jobs.

Part 1. Social status

BORIS

Here’s where I beg to differ. Even the terminology we employ suggests a higher social status. I have a “manager,” a “team leader,” or even a “boss.” Regardless of how much you’d like me to believe that a manager’s role is to assist me, they’re still referred to as a manager, not an assistant. Moreover, my manager has a direct influence on my evaluation, an influence I don’t hold over them.

LAURA

(Amused)

Boris, you couldn’t be more mistaken! Have you forgotten the annual engagement survey that you complete each year? We specifically ask for your thoughts on your team lead.

BORIS

Yes, but you ask both me and all my teammates, so my individual voice is diluted. Moreover, my team leader’s superior —

(Cynically, with air quotes)

“l e a d e r,” not assistant —

(Continues)

provides their direct feedback.

(Sips from a glass of cheap gin, longing for it to be Arak)

And that’s just one aspect. Our vacation policy is indeed generous, but it explicitly states that I need my team lead’s approval before taking time off. My team lead doesn’t require my consent for their time off; they consult their own superior. So yes, a manager does hold a higher social status than an individual contributor.

Part 2. Autonomy

LAURA

You know what? I’ll give you that. But when it comes to professional autonomy, an experienced individual contributor has the full power to decide how they solve the problem they work on.

(Daphne smiles)

BORIS

(takes another sip from the gin glass)

Oh, this is not true either. Take me as an example. I’m not a manager. It is true that I have the autonomy to decide how to solve a problem, but I often don’t get to decide what problem to solve. I can have some influence on this matter, but when my opinion collides with the opinion of my manager or their managers, my opinion is put aside.

DAPHNE

(interrupts)

Right, the other day…

LAURA

(irritated)

You can always take initiative and start working on something that really interests you.

(adds pathos to her voice)

In our company, you can write your own history. Identify a problem, start working on a solution in your spare time, and one day you may convince the management that the solution is worth adopting and expanding.

DAPHNE 

(sarcastically) 

Free time? You must be kidding.

BORIS 

For once, I agree with Laura. We have some free time, and moderate switching between projects might be a good form of rest. Not only that, but a technical hands-on person might also have more tools to solve a technical problem. But… a manager usually has better knowledge of company needs and, more importantly, company politics. That is why a manager’s pet project has a higher chance of being accepted by the company than the one initiated by an IC.

LAURA 

(doubtful) 

Hmm… I don’t know… Well, at least in terms of money, management isn’t promotion.

Part 3. Money

BORIS 

(Chuckles)

Ah, money. Who doesn’t love money? However, I’m afraid I must disagree with you on this one.

LAURA 

(Joyfully)

Well, as the head of HR, not you, I understand how compensation is calculated. You are all compensated based on your impact on our business, nothing more. I know several managers who earn less than the individuals they manage.

(Makes a dramatic pause)

It’s all about the impact!

BORIS 

(Points a finger)

Correct. I presume that when you talk about these managers, you’re primarily referring to team leads. Am I right?

LAURA 

(Pauses, then nods)

Actually, yes.

BORIS 

(Chuckles)

You see, a team lead may earn less than a developer, researcher, or designer they manage, especially if they oversee senior and experienced professionals. But who can bring a greater impact to the business: a senior programmer or a senior manager?

DAPHNE 

(Looks puzzled)

What do you mean?

BORIS 

(Turns to Daphne)

Let’s take me as an example. I’m an outstanding data scientist, one of the best in the field.

(LAURA and DAPHNE nod in agreement)

Nevertheless, my brain operates optimally only 8-9 hours a day. On the other hand, David, the head of the Modelling division, is also a top-tier professional, and he too works 8-9 hours a day. But since he is at the helm of a division, his work is amplified by the ten people who work under him.

LAURA 

There are thirteen now; we’ve hired two additional scientists.

BORIS 

(Turns to Daphne)

See what I mean? David’s impact is over a dozen times larger than mine. Therefore, it would only make sense for his salary to exceed mine.

Everyone falls silent. Boris signals for a refill. Daphne appears dejected.

Part 4. Don’t lose your sleep over this

BORIS 

(Looks at Daphne) 

Don’t lose sleep over this.

Boris takes a salt shaker, opens it, pours the salt onto the table, and draws two partially overlapping bell curves.

BORIS 

Here’s an analogy. Consider men and women. On average, men are stronger than women, right?

Laura and Daphne nod their heads. Laura looks concerned, anticipating that Boris might say something stupidly inappropriate.

BORIS 

(Points to the salt) 

In this graph, the X-axis represents strength, and the two curves represent men and women. Now, what does this mean? Does it mean that all men are stronger than all women? Certainly not! There are many strong women and many weak men! You can see this by looking at the overlapping part of these curves here.

He points to the intersecting area of the two curves drawn in the salt.

BORIS 

(Continues) 

Now, let’s return to our original discussion. Let’s say that the X-axis now stands for “promotion” — a vague amalgamation of social status, power, and money. I hope I’ve convinced you that management is a form of promotion, but consider these curves. As in the men versus women case, there will be many individual contributors positioned higher on the promotion axis than some managers.

Laura looks relieved. Daphne is deep in thought.

DAPHNE

That makes sense. I could focus on improving my development skills… Conversely, I could invest the same energy into enhancing my management skills and transition to the better curve.

The atmosphere in the room becomes dense with contemplation.

Part 5. You have to like your job

BORIS 

(pensively) 

You’re right but also somewhat wrong. Becoming good at your work is hard. It becomes even harder if you don’t enjoy it. If you like managing people, go for it. Enjoy the process, grow your skills, and plan the mansion you want to buy when you’re a big-shot CEO.

He takes a sip of his drink.

BORIS 

(continuing) 

But if you enjoy writing good code more than dealing with people, you might become miserable during your quest for a management career. Being miserable won’t leave enough energy to improve your skills, and you might end up as a mediocre, bitter, mid-level manager who’s jealous of her younger self.

Laura smiles in agreement.

BORIS 

(sincerely) 

I fully agree with that. In the past, in some companies, such a move would be perceived as a demotion, but now and not here. Nowadays, many companies, small and big, recognize that management is a separate profession. The atmosphere in our company is kind enough to accept that people need to search for their path in life. Take me for example I “stepped down” from a management position twice. I don’t regret taking those positions. Neither do I regret stepping down from them.

Daphne looks thoughtful, contemplating the information she’s received.

DAPHNE 

(tentatively) 

I think I need to explore more about myself. I need to see what suits me best. But I guess I won’t know until I try.

Laura and Boris share a look of approval.

Boris raises his glass for a toast.

BORIS 

(smiling) 

To exploration and finding what truly makes us happy!

Everyone raises their glasses, and the scene ends on a positive note of camaraderie and mutual respect.

FADE OUT.

FADE IN.

INT. HOTEL BAR TABLE – NIGHT

The conversation has come to a natural end. Boris stays at the table. Daphne and Laura are leaving the bar, their faces illuminated by the soft lights of the lobby.

LAURA

(sincerely)

Remember, Daphne. You have a whole community here that believes in you. Reach out anytime.

(checks to make sure nobody’s listening)

And remember, don’t take Boris too seriously. He’s… well… he’s Boris.

FADE OUT.

THE END

Director Matters. My new newsletter

So, I started a substack newsletter called “Direction Matters” (I hope you like the word play).

https://directionmatters.substack.com

It doesn’t matter how hard you push if you’re pushing in the wrong direction.

Direction Matters is a newsletter that focuses on teamwork, communication, and data, delivered with a blend of candid honesty and just the right amount of cynicism.

People managers will find value in the fresh perspectives, real-life case studies, and insightful advice on how to lead their colleagues effectively and with empathy.

For byte managers—an inventive term for individual contributors—I offer an opportunity to enhance communication skills, broaden their perspective, and learn strategies for making a more significant impact within their teams and organizations.

Join me on this exciting journey as we delve into the intricacies of teamwork, communication, and data-driven decision-making. Let’s find the right direction together.

Published
Categorized as blog

Prompt engineers, the sexiest job of the third decade of the 21st century (?), or Don’t study prompt engineering as a career move, you’ll waste your time

Do you recall when data scientists were the talk of the town? Dubbed the sexiest job of the 21st century, they boasted a unique blend of knowledge and skills. I still remember the excitement I felt when I realized that the work I did had a name, and the warm feeling I got when I saw those cool Venn diagrams showing just how awesome data scientists were. Well, it’s time for data scientists to step aside and make way for the new heroes in town: the Prompt Engineers!

The demand for prompt engineers is soaring, and it seems like everyone is trying to become one. But what exactly is a prompt engineer, and what are my thoughts on this new profession?

Let’s take a step back in time: we started with assembly languages, and then a language called Formula Translator (better known as Fortran), which significantly lowered the barrier of entry into the field. I’m sure back then, people rolled their eyes and said that with the emergence of high-level programming languages, anyone could now take any formula and get an output, without understanding how semiconductors worked.

Fast forward to today. What do prompt engineers do? They essentially translate their domain knowledge, language understanding, and AI algorithm expertise into computer output (sounds like “ForTran,” right?). Prompt engineering is, in essence, a super-high-level programming language. Over time, I believe we’ll see dedicated tools and established standards emerge. But for now, it’s a wild, untamed frontier.

In 2017, I wrote a blog post titled “Don’t study data science as a career move; you’ll waste your time!“. Until today, this is the most read post in my blog. Now, it’s time for a new warning: “Don’t study prompt engineering as a career move; you’ll waste your time!”

Meanwhile, here’s a nice Venn diagram for you 🙂

Not a feature but a bug. Why having only superstars in your team can be a disaster.

Read this to learn about well-rounded teams that can effectively collaborate and communicate. As an experienced team leader and builder, contact me to learn more about my services and how I can help you achieve better outcomes.

As a freelancer and a manager, I have worked with many companies and teams. Recently,  I talked to a CEO who built a data science team that consisted of several “wonder kids” who obtained University degrees before graduating high school. The CEO was very proud of them. However, he complained that they don’t deliver as expected. This made me realize that having only superstars is not a feature but a bug.

The fact is that most of us are average, even geniuses are average in most aspects. Richard Feynman, the Nobel laureate physicist, was also a painter, musician, and an excellent teacher, but he is unique. I, for example, tend to think of myself as an excellent generalizer, leader, and communicator. However, I need help with attention to detail and deep domain-specific knowledge. To work well, I need to have pedantic specialists in my team. Why? Because, on average, I’m average.

Most “geniuses” are extremely talented in one field but still need help in others. Many tend to be individual workers, meaning their team communication is often suboptimal. Additionally, the fact that the entire team is very young also means they need more expertise in project management, inter-team communication, business orientation, or even enough real-life experience. The result: a disaster. That company got a team of solo players who don’t communicate within the team, don’t communicate with other teams, and don’t deliver on time.

What do I suggest? They say that “A’s hire A’s”. However, this doesn’t mean that each “A person” must ace the same field. A good team needs an A generalizer, an A specialist, an A communicator, and an A business expert. If you only hire “A++ specialists,” you risk ending up with a group of individuals who are “C-” communicators.

As another CEO I consulted once told me, “genius developers can do 10x job. They also tend to enter rabbit holes, and if unattended, they can do 10x damage.” If you build a team, you cannot afford to have unbalanced expertise sets. 

The bottom line is to ensure your team is diverse in its capabilities. Hiring only superstars may seem like a good idea, but it can result in a lack of collaboration, communication, and the necessary skills to succeed as a team. A diverse team with various skills and expertise is essential for achieving better outcomes.

In conclusion, avoid falling into the trap of thinking that only superstars can make a great team. Instead, focus on creating a diverse team with various skills, and you’ll be surprised at how much your team can achieve.

Modern tools make your skills obsolete. So what?

Read this if you are a data scientist (or another professional) worried about your career.

So many people, including me, write about how fields such as copywriting, drawing, or data science change from being accessible to a niche of highly professional individuals to a mere commodity. I claim it’s a good thing, not only for humankind but for the individual professional. Since I know nothing about drawing, I’ll talk about data science.

I started working as a data scientist a long time ago, even before the term data science was coined. Back then, my data science job included:

  • writing code that implements this optimization algorithm or the other
  • writing code that implements this statistical analysis or the other
  • writing code that implements this machine learning technique of the other
  • writing code that implements this quality metric or the other
  • writing code that handles named columns
  • writing code that deals with parallelization, caching, fetching data from the internet

Back then, exactly when the term data scientist was coined, I used to say “data is data”. I claimed that it didn’t matter whether you write a model that detects cancer or detects online fraud, a model that simulates two molecules in a solution or a model that simulates players in the electric appliances market. Data was data, and my job, as a data scientist was to crunch it.

Time passed by. Suddenly, I discovered one cool library, the other, and a third one … Suddenly, my job was to connect these libraries, which allowed me to be more expressive in what I could achieve. It also allowed me to concentrate better on “business logic.” Business logic is the term I use to describe all the knowledge required for the organization that pays your salary to keep doing so. If you work for a gaming company, “business logic” is the gaming psychology, competitor landscape, growth methods, and network effect. If you work for a biotech company, “business logic” is the deep understanding of disease mechanisms, biochemistry, genetics, or whatever is needed to perform the breakthrough. The fact that I don’t need to deal with “low-level coding” made me obsolete and drove me to a state where I became more specialized.

These days, we are facing a new era in knowledge commoditization. This commoditization makes our skills obsolete but also makes us more efficient in tasks that we were slow at and lets us develop new skills. 

In 2017, Gartner predicted that more than 40% of data science tasks would be obsolete by 2020. Today, in 2023, I can safely say that they were right. I can also say that today, despite the recent layouts, there are much more busy data scientists than there were in 2017 or 2020.

The bottom line. Stop worrying.

Let me cite myself from 2017:

Data scientists won’t disappear as an occupation. They will be more specialized.

I’m not saying that data scientists will disappear in the way coachmen disappeared from the labor market. My claim is that data scientists will cease to be perceived as a panacea by the typical CEO/CTO/CFO. Many tasks that are now performed by the data scientists will shift to business developers, programmers, accountants and other domain owners who will learn another skill — operating with numbers using ready to use tools. An accountant can use Excel to balance a budget, identify business strengths, and visualize trends. There is no reason he or she cannot use a reasonably simple black box to forecast sales, identify anomalies, or predict churn.

This is another piece of career advice. I have more of them in my blog

Chances are that you don’t need a data scientist, and three things to consider before hiring one.

Read this if you are considering hiring data scientists

I already wrote about how data science becomes a commodity.

If you read this, I guess data science is not the core part of your business. If this is the case, consider the following before you hire data scientists.

Data engineers

Your data scientists can be as good as the data you provide them. You must collect the correct data, validate it, store it well, and be able to access it easily. I have hours of “war stories” about how each component of the last message went wrong, and the company burned tons of money because of that. Data piping is a serious challenge. So, before you hire a data scientist, ask yourself whether your data engineering needs are covered.

Data analysts

Data Analysts mainly focus on the organization and interpretation of data. Unlike data scientists, Analysts don’t build predictive models or create unique algorithms. However, they identify trends and insights and present their findings clearly and understandably. Not being required to build novel models and algorithms allow them to better connect with stakeholders’ business needs and practical questions. A good data analyst will take the business problem, translate it into a data-based question, will know its potential value, and in many cases, will be able to answer it.

Boxed Solutions

Data Science as a Service is a term for boxed solutions that are constantly becoming more versatile, flexible, and affordable. I was a freelancer for a company that built its data-based product on an open-source implementation of a single optimization algorithm. They managed to run a successful company without a single data scientist for more than five years, and they started thinking of better solutions when they squeezed everything they could from their MRE. At this point, they had their data storage pipelines (data engineering), a better picture of their business (data analysts), and paying customers to finance the development of new algorithms.

How to work with data scientists?
I’ll write separate posts on this topic, but the gist is: to make sure they know your business needs. Ensure you communicate your needs and problems to them and make sure they share their efforts with you. I have seen many failed data science projects in my life. Most failed due to a lack of alignment, communication, or both.

This was another career advice post. Read more of them here.

Data Science Reality Check: My Predictions Come True (or, A Piece of Advice to Young Data Scientists)

Read this if you’re a data scientist or consider becoming one.

Almost six years ago, when Data Scientist was named the “sexiest job of the 21st century”, I wrote a blog post telling young professionals not to learn data science as a career move. My claim was that the data science field fill gets commoditized, and if you don’t possess deep (I mean DEEP) knowledge of either algorithms or the business you are working at, you will end up a mediocre coder.

Look what happened. Data science has indeed become commoditized in many fields. Many data-intence businesses work just fine without data scientists. Even I, a very experienced data scientist, got laid off because I couldn’t bring the company value that would justify my salary. People like Matthew Yglesias from https://www.slowboring.com suggest that data scientists learn how to roll a burrito or mine lithium.

Why did this happen? Well, I was right. Data science has become a commodity. Each self-respecting platform offers AI tools (I hate the term AI, by the way) such as keyword extraction, insights, predictions, anomaly detection, recommendations, and many more. Tableau, PowerBI, and even Google Sheets or Excel offer tools that were once only available through custom data and code fiddling. The Data-Science-As-A-Service niche is full of products such as https://www.pecan.ai and https://www.anodot.com. And we haven’t even started talking about the new word of the day: the GPT.

Being an experienced data scientist, people often ask for my advice and help. In the past, when this happened, I used to discuss possible custom-tailored solutions. Now, I find myself suggesting the person looking at product X or Y will solve their problems in a fraction of the time and cost. 

So, what do we have? What does all that mean?

Data science has become a commodity. In the past, to get a nice salary and a sexy title, it was enough to know what training, testing, and cross-validation were. Today, you absolutely have to know the theory and be a fast and good coder. But most of all, you must hone your communication skills and learn the business of the company where you work. Only this way will you be able to ensure your efforts are always aligned with the stakeholders and that you can consistently deliver value.

This is a career advice post. Check out the career tag and the Career Advice category of this blog.

14-days-work-month — The joys of the Hebrew calendar

Tishrei is the seventh month of the Hebrew calendar that starts with Rosh-HaShana — the Hebrew New Year*. It is a 30 days month that usually occurs in September-October. One interesting feature of Tishrei is the fact that it is full of holidays: Rosh-HaShana (New Year), Yom Kippur (Day of Atonement), first and last days of Sukkot (Feast of Tabernacles) **. All these days are rest days in Israel. Every holiday eve is also a de facto rest day in many industries (high tech included). So now we have 8 resting days that add to the usual Friday/Saturday pairs, resulting in very sparse work weeks. But that’s not all: the period between the first and the last Sukkot days are mostly considered as half working days. Also, the children are at home since all the schools and kindergartens are on vacation so we will treat those days as half working days in the following analysis.

I have counted the number of business days during this 31-day period (one day before the New Year plus the entire month of Tishrei) between for a perios of several years.

Overall, this period consists of between 14 to 17 working days in a single month (31 days, mind you). This year, we only have 14 working days during the Tishrei holiday period. This is how the working/not-working time during this month looks like:

Now, having some vacation is nice, but this month is absolutely crazy. There is not a single full working week during this month. It is very similar to the constantly interrupt work day, but at a different scale.

So, next time you wonder why your Israeli colleague, customer or partner barely works during September-October, recall this post.

(*) New Year starts in the seventh’s month? I know this is confusing. That’s because we number Nissan — the month of the Exodus from Egypt as the first month.

(**)If you are an observing Jew, you should add to this list Fast of Gedalia, but we will omit it from this discussion

Book review: Extreme ownership

TL;DR Own your wins, own your failures, stay calm and make decisions. Read it. 5/5

Extreme ownership” is a book about leadership in business written by two ex-SEAL fighters. This book is full of war stories, as in actual stories from a real war. I read this book by the recommendation (an instruction, really) of the serial entrepreneur Danny Lieberman. After three years in the Israeli Border Police and after a cumulative year-and-a-half in active IDF reserve over almost twenty years, I learned to dislike war stories strongly. Had Danny not told me, “you have to read this book,” I would have ditched it after the first couple of pages. The war stories are self-bragging, and the business case studies are oversimplified and always have a happy ending. Moreover, the connection between a war story and a business case is sometimes very artificial.

Nevertheless, I’m glad that I read this book. It has several powerful messages and shows leadership aspects that I haven’t managed to formalize in my head before.

Key points

The best leaders don’t just take responsibility for their job. They take Extreme Ownership of everything that impacts their mission. When subordinates aren’t doing what they should, leaders that exercise Extreme Ownership cannot blame the subordinates. They must first look in the mirror at themselves.

  • It’s not what you preach; it’s what you tolerate
  • “Relax, look around, make a call.” 

This point takes me back to my days as the chief combat medic in an IDF infantry battalion (here we come, more war stories!). One day, an instructor, a very experienced paramedic, told me that the first thing a medic should do when they arrive at a scene is to take a pulse, not the pulse of the victims, but your own pulse, to make sure you’re calm and take the right decisions. 

  • Prioritize your problems and take care of them one at a time, the highest priority first. 
  • Leadership doesn’t just flow down the chain of command, but up as well.

This is a super valuable and insightful message.

The bottom line: Read it 5/5

New position, new challenge

I will skip the usual “I’m thrilled and excited…”. I’ll just say it.
As of today, I am the CTO of wizer.me, a platform for teachers and educators to create and share interactive worksheets.

On a scale of 1 to 10, how thrilled am I? 10
On a scale of 1 to 10, how terrified am I? 10
On a scale of 1 to 10, how confident am I that wizer.me will become the “next big thing” and the most significant chapter in my career? You won’t believe me, but also 10.

Back to in-person presentations

Today, I gave my first in-person presentation since the pandemic. It was awesome! I was talking about the study I performed with Nabeel Sulieman about data visualization in environments that use right-to-left writing systems.

I wrote about this study in the past [one, two]. Today, you may find the results of our study at http://direction-matters.com/. I hope to be able to publish the video recording of this presentation really soon.

An example of a very bad graph

An example of a very bad graph

Nature Medicine is a peer-reviewed journal that belongs to the very prestigious Nature group. Today, I was reading a paper that included THIS GEM.

These two graphs are so bad. It looks as if the authors had a target to squeeze as many data visualization mistakes as possible in a single piece of graphics.

Let’s take a look at the problems.

  • Double Y axes. Don’t! Double axes are bad in 99% of cases (exceptions do exist, but they are rare).
  • Two subgraphs that are meant to work together have different category orders and different Y-axis scales. These differences make the comparison much harder.
  • Inverted Y scale in a bar chart. Wow! This is very strange. Bizarre! It took me a while to spot this. First, I tried to understand why the line of P<0.05 (the magic value of statistics) is above 0.1. Then, I realized that the right Y-axis is reversed. At first, I thought, “WTF?!” but then I understood why the authors made this decision. You see, according to the widespread statistical ritual, the lower the “P-value” is, the more significant it is considered. The value of 1 is deemed to be non-significant at all, and the value of 0 is considered “as significant as one can have.” So, in theory, the authors could have renamed the axis to “Significance” and reversed the numbers. Still, the result would not be a real “significance,” nor would the name be intuitive to anyone familiar with statistical analysis. On the other hand, they really wanted more “significant” values to be bigger than less significant ones. So, what the heck? Let’s invert the scale! Well, no, this is not a good idea
  • Slanted category labels. This might be a matter of taste, but I dislike rotated and slanted labels. Turning the graph solves the need for label rotation, thus making it more readable and having zero drawbacks.

What can be done?

I don’t like criticism without improvement suggestions. Let’s see what I would have done with this graph. To make this decision, I first need to decide what I want to show. According to my understanding of the paper, the authors wish to show that the two data sets are very different in determining a specific outcome. To show that, we don’t need to depict both the P-value and variance (mainly since these two values are very much correlated). Thus, I will depict only show one metric. I will stick with the P-value.

I will keep the category order the same between the two subgraphs. Doing so will create a “table lens” effect; it will show the individual values while demonstrating the lack of correlations between the two groups. Finally, I will convert the bars into points, primarily to reduce the data-ink ratio. Two additional arguments against bar charts, in this case, are the facts that the P-values of a statistical test cannot possibly be zero and that bar charts don’t allow log-scale, in case we’ll want to use it.

The result should look like this sketch.

On proper selection of colors in graphs

Photo by Sharon McCutcheon on Pexels.com

How do you properly select a colormap for a graph? What makes the rainbow color map a wrong choice, and what are the proper alternatives?

Today, I stumbled upon a lengthy post that provides an in-depth review of the theory behind our color perception. The article concentrates on quantitative colormaps but also includes information relevant to selecting proper colors for categories. 

If you never learned the theory behind the color and are interested in data visualization, I strongly suggest investing 45-60 minutes of your life in reading this post.

Book review: The Hard Things About Hard Things by Ben Horowitz

TL;DR War stories and pieces of advice from the high tech industry veteran.

I read this book following recomendations by Reem Sherman, the host of the excellent (!!!) podcast Geekonomy (in Hebrew).

Ben Horowitz is a veteran manager and entrepreneur who found the company Opsware, which Hewlett-Packard acquired in 2007. This book describes Horotwitz’s journey in Opsware from the foundation to the sale. Book’s second part is a collection of advice to working and aspiring CEOs. The last part is, actually, an advertisement for Horowitz’s new project — a VC company.

Things that I liked

The behind the scenes stories are interesting and inspiring.
Ben Horowitz devoted the second part of the book to share his experience as a CEO with other actual or aspiring CEOs. I don’t work as a CEO, nor do I see myself in that position in the future. However, this part is valuable for people like me because it provides insights into how CEOs think. Moreover, “The Hard Things” is a popular book, and many managers learn from it.

Things that I didn’t like.

Ben Horowitz was a manager during the early days of the high-tech industry. As such, parts of his attitude are outdated. The most prominent example for this problem is a story that Horowitz tells, in which he asked the entire company to work 12+ hours a day, seven days a week for several months. He was very proud about this, but IMO, employees will not accept such a request in today’s climate.

The bottom line: 4/5

14-days-work-month — The joys of the Hebrew calendar

Tishrei is the seventh month of the Hebrew calendar that starts with Rosh-HaShana — the Hebrew New Year*. It is a 30 days month that usually occurs in September-October. One interesting feature of Tishrei is the fact that it is full of holidays: Rosh-HaShana (New Year), Yom Kippur (Day of Atonement), first and last days of Sukkot (Feast of Tabernacles) **. All these days are rest days in Israel. Every holiday eve is also a de facto rest day in many industries (high tech included). So now we have 8 resting days that add to the usual Friday/Saturday pairs, resulting in very sparse work weeks. But that’s not all: the period between the first and the last Sukkot days are mostly considered as half working days. Also, the children are at home since all the schools and kindergartens are on vacation so we will treat those days as half working days in the following analysis.

I have counted the number of business days during this 31-day period (one day before the New Year plus the entire month of Tishrei) between for a perios of several years.

Overall, this period consists of between 14 to 17 working days in a single month (31 days, mind you). This year, we only have 14 working days during the Tishrei holiday period. This is how the working/not-working time during this month looks like:

Now, having some vacation is nice, but this month is absolutely crazy. There is not a single full working week during this month. It is very similar to the constantly interrupt work day, but at a different scale.

So, next time you wonder why your Israeli colleague, customer or partner barely works during September-October, recall this post.

(*) New Year starts in the seventh’s month? I know this is confusing. That’s because we number Nissan — the month of the Exodus from Egypt as the first month.

(**)If you are an observing Jew, you should add to this list Fast of Gedalia, but we will omit it from this discussion

:-(

Usually, I keep my blog for professional news only, but this time, I’ll make an exception.

This frame is from a video that was taken a couple of days ago, less than one hour away from my home. Note how many people are there. 

Some people will claim that what we see is a peaceful protest by Palestinians against the Israeli occupation. Being a son and a grandson to the Holocaust survivors, I find it hard to connect to the peacefulness of what I see. I don’t have to hear them chanting “from the River to the Sea Palestine will be free” to understand that what they, and many thousands more, really mean is “free of Jews”.

Published
Categorized as blog

Opening a new notebook in my productivity system

Those who know me, know that I always care with me a cheep and thin notebook which I use as an extension to my mind. Today, I opened a new notebook, and this is a good opportunity to share some links about my productivity system.

  • Start with the post “The best productivity system I know
  • Failed attempt with tangible boards is here. This approach has an interesting idea behind it, but I couldn’t stick with it. YMMW
  • Failed attempt with digital/analog/tangible combo is here.

Another example of the power of data visualization

I stumbled upon a great graph that tells a complex story compellingly.

Comparison of two COVID-19 waves in the UK, taken from here.

This graph compares the last two waves of COVID-19 in the United Kingdom and is shows so clearly that the new wave (that is supposedly composed of the Delta variant) is much more infections on the one hand, but on the other hand, causes much less damage. Is the more moderate damage the result of the Delta variant nature of the protective effect of the vaccination is still an open question, but the difference is still striking.

Managing remotely. A podcast interview with Martin Remy

My podcast is mostly in Hebrew, but this interview was recorded in English. I hope you will enjoy it

Martin Remy has been managing teams of data engineers and data scientists for more than a decade, and he has been doing so remotely. What lessons can we learn from Martin? לינקים חשובים: https://marting.blog https://martinremy.com עמוד הפייסבוק של ההסכת:  https://www.facebook.com/reayonavodapodcast/ עמוד הבית שלי https://gorelik.net/about הרשמו להסכת ב־ גוגל פודקאסטס, ספוטיפיי, אפל מיוזיק, פודבין ובכל פלטפורמה […]

רעיון 38. Managing remotely — בוריס גורליק
Published
Categorized as blog

Another evolution of my offline productivity system


This week, I mark an important milestone in my professional life. It is an excellent opportunity to start a new productivity notebook and tell you about the latest evolution of the best productivity system I know.

To sum up, I use a custom variant of Mark Forster’s Final Version productivity system that uses a plain notebook to track, prioritize, and eliminate tasks. Using a physical notebook, as opposed to an electronic tool, is a massive boost in productivity, as it forces you to process your priorities in an unplugged mode, without any distractions.

When I was a freelancer, I felt forced to use a combination of a physical book and an electronic system (http://todoist.com/), but that didn’t work too well for me, the connected nature of this (and any other) app kept distracting me. I also played with a combination of a notebook and a portable kanban board. That didn’t work out for me either. So, right now, I’m back to a physical notebook with a small addition. 

I now have two notebooks. The first one is a small (80 pages) soft notebook that I use to track and prioritize tasks (as in Mark Forster’s system). I also use this notebook to reflect on what’s going on, write questions to my future self, and document my decisions.

The second, larger notebook is used for note keeping, drafts and sketches. The fact that the notebook is vertically bound allows me seemingly switching from Hebrew (that is written from right to left) and English. When a sketch of a draft isn’t relevant anymore, I tear the draft pages away; and I use a small binder to keep the note pages together for future reference.

Overall, I like this combo very much and it fits my workflow well.

Experiment report

In January 2020, I started a new experiment. I quit what was a dream job and became a freelancer. Today, the experiment is over. This post serves as omphaloskepsis – a short reflection on what went well and what could have worked better.

What worked well?

To sum up, I declare this experiment successful. I had a chance to work with several very interesting companies. I got exposed to business models of which I wasn’t aware. Most importantly, I met new intelligent and ambitious people. I also had a chance to feel by myself how it feels to be self-employed, to see the behind-the-scenes of several freelancers and entrepreneurs. I learned to appreciate the audacity and the courage of people who don’t rely on monthly paychecks and take much more responsibility for their lives than the vast majority of the “salarymen.”

Let’s talk about money. Was it worth it in terms of $$$$$ (or ₪₪₪₪₪₪)? Objectively speaking, my financial situation remained approximately unchanged. Towards the end of the experiment, I found myself overbooked, which means that, in theory, I could have increased my income substantially. But this is only in theory. In practice, I decided to end the freelance experiment and “settle down”.

What could have been better?

So, was it peachy? Not at all. For me, being a freelancer is much more stressful than being a hired employee. The stress does not come exclusively from the need to make sure one has enough projects in the pipeline (I had enough of them, most of the time). The more significant source of stress came from the lack of focus, the need for EXTREME context switching, and the lack of a team. 

I did receive one suggestion to mitigate this source of stress; however, when I heard it, I already had several job offers and was already 90% committed to accepting the position at MyBiotics.

To sum up

I’m am very happy I did this experiment. I learned a lot; I enjoyed a lot (and suffered a lot too), I met new people, and I changed the way I think about many things. Was it a good idea? Yes, it was. Should you try becoming a freelancer? How the hell can I know that? It’s your life; you enjoy the success and take the risk of failure. 

A new phase in my professional life

rbt

I’m excited to announce that I’m joining MyBiotics Pharma Ltd as the company’s Head of Data and Bioinformatics. I have been working with this fantastic company and its remarkable people as a freelancer for fourteen fruitful months. But today, I join the MyBiotics family as a full-time member. Together, we will strive to better understanding the interactions between humans and their microbiome to improve health and well-being.

rbt

Black lives matter. Lior Pachter

Almost one year after it was originally published, I stumbled upon this powerful post.

Today, June 10th 2020, black academic scientists are holding a strike in solidarity with Black Lives Matter protests. I strike with them and for them. This is why: I began to understand the enormity of racism against blacks thirty five years ago when I was 12 years old. A single event, in which I witnessed […]

Black lives matter
Published
Categorized as blog

Super useful videos for advanced data visualizers

The great Robert Kosara, also known as the “eager eyes” has started publishing a series of videos he calls Chart Appreciation. In these videos, Robert takes a piece of data visualization from a reputable and known source, and discusses why this particular piece is so good, what decisions were made that made it possible, what alternatives are, and more. If you consider yourself an intermediate or advanced practitioner of data visualization, you should subscribe. Here’s one example.

Career advise. Upgrading data science career

Photo by Kelly Lacy on Pexels.com

From time to time, people send me emails asking for career advice. Here’s one recent exchange.

Hi Boris,

I am currently trying to decide on a career move and would like to ask for your advice.

I have a MSc from a leading university in ML, without thesis.

I have 5 years of experience in data science at <XXX Multinational Company> , producing ML based pipelines for the products. I have experience with Big Data (Spark, …), ML, deploying models to production…

However, I feel that I missed doing real ML complicated stuff. Most of the work I did was to build pipelines, training simple models, do some basic feature engineering… and it worked good enough.

Well, this IS the real ML job for 91.4%* of data scientists. You were lucky to work in a company with access to data and has teams dedicated to keeping data flowing, neat, and organized. You worked in a company with good work ethics, surrounded by smart people, and, I guess, the computational power was never a big issue. Most of the data scientists that I know don’t have all these perks. Some have to work alone; others need to solve “dull” engineering problems, find ways to process data on suboptimal computers or fight with a completely unstandardized data collection process. In fact, I know a young data scientist who quit their first post-Uni job after less than six months because she couldn’t handle most of these problems.

However I don’t have any real research experience. I never published any paper, and feel like I always did easy stuff. Therefore, I lack confidence in the ML domain. I feel like what I’ve been doing is not complicated and I could be easily replaced.

This is a super valid concern. I am surprised how few people in our field think about it. On the one hand, most ML practitioners don’t publish papers because they are busy doing the job they are paid for. I am a big proponent of teaching as a means of professional growth. So, you can decide to teach a course in a local meetup, local college, in your workplace, or at a conference. Teaching is an excellent way to improve your communication skills, which are the best means for job security (see this post).

Since you work at XXXXX , I suggest talking to your manager and/or HR representative. I’m SURE that they will have some ideas for a research project that you can take full-time or part-time to help you grow and help your business unit. This brings me to your next question.

I feel like having a research experience/doing a PhD may be an essential part to stay relevant in the long term in the domain. Also, having an expertise in one of NLP/Computer Vision may be very valuable.

I agree. Being a Ph.D. and an Israeli (we have one of the largest Ph.D. percentages globally) makes me biased.

I got 2 offers:

– One with <YYY Multinational company> , to do research in NLP and Computer Vision. […] which is focused on doing research and publishing papers […]

– One with a very fast growing insurance startup, for a data scientist position, as a part of the founding team team. […] However, I feel it would be the continuation of my current position as a data scientist, and I would maybe miss on this research component in my career.

You can explore a third option: A Ph.D. while working at your current place of work. I know for a fact that this company allows some of their employees to pursue a Ph.D. while working. The research may or may not be connected to their day job.

I am very hesitant because

– I am not sure focusing on ML models in a research team would be a good use of my time as ML may be commoditised, and general DS may be more future-proof. Also I am concerned about my impact there.

– I am not sure that I would have such a great impact in the DS team of the startup, due to regulations in the pricing model [of that company], and the fact that business problems may be solved by outsourced tools.

These are hard questions to answer. First of all, one may see legal constraints as a “feature, not a bug,” as they force more creative thinking and novel approaches. Many business problems may indeed be solved by outsourcing, but this usually doesn’t happen in problems central to the company’s success since these problems are unique enough to not fit an off-the-shelf product. You also need to consider your personal preferences because it is hard to be good at something you hate doing.

From time to time, I give career advice. When the question or the answer is general enough, I publish them in a post like this. You may read all of these posts here.

Interview 27: Racial discrimination and fair machine learning

I invited Dr. Charles Earl for this episode of my podcast “Job Interview” to talk about racial discrimination at the workplace and fairness in machine learning.

Dr. Charles Earl is a data scientist in Automattic, my previous place of work. Charles holds a Ph.D. in computer science, M.A. in education, M.Sc in Electrical engineering, and B.Sc in mathematics. His career covered a position of assistant professor and a wide range of hands-on, managerial, and consulting roles in the field that we like to call today “data science.” 

But there is another aspect in Dr. Earl. His skin is brown. He was born to an African-American family in Atlanta, GA, in the 1960s when racial segregation was explicitly legal. I am sure that this fact affected Charles’ entire life, personal and professional.

Links

If you know Hebrew, follow my podcast Job Interview (Reayon Avoda), and This Week in the Middle East

Five things I wish people knew about real-life machine learning

Deena Gergis is a data science lead at Bayer. I recently discovered Deena’s article on LinkedIn titled “Five Things I Wish I Knew About Real-Life AI.” I think that this article is a great piece of a career advice for all the current and aspiring data scientists, as well as for all the professionals who work with them. Let’ me take Deena’s headings and add my 2 cents.

One. It is all about the delivered value, not the method.

I fully agree with this one. Nobody cares whether you used a linear regression or recurrent neural network. Nobody really cares about p-values or r-squared. What people need are results, insights, or working products. Simple, right?

Two. Packaging does matter

Again, well said. The way you present your solution to your colleagues, customers, or stakeholders can determine whether your project will get more funds and resources or not. 

Three. Doing the right things != doing things right.

Exactly. Citing Deena: “you might be perfectly predicting a KPI that no one cares about.” Enough said. 

Four. Set realistic expectations.

Not everybody realizes that “machine learning” and “artificial intelligence” are not a synonym of “magic” but rather a form of statistics (I hope “real” statisticians won’t get mad at me here). The principle “garbage in – garbage out” holds in machine learning. Moreover, sometimes, ML systems amplify the garbage, resulting in “garbage in, tons of garbage out”. 

Five. Keep humans in the loop.

Let me cite Deena again: “My customers are my partners, not just end-users.” Note that by “customers,” we don’t only mean walk-in clients, but also any internal customer, project manager, even a colleague who works on the same project. They are all partners with unique insights, domain knowledge, and experience. Use them to make your work better. 

Read the original article here. Deena Gergis has several more articles on LinkedIn here. And if you know Arabic, you might want to watch Deena’s videos on YouTube here. Unfortunately, my Arabic is not good enough to understand her Egyptian accent, but I suspect that her videos are as good as her writings.

One of the first dataviz blogs that I used to follow is now a book. Better Posters

I started following data visualization news and opinions quite a few years ago. One of the first bloggers who were active in this area NeurDojo, by the (now) professor Zen Faulkes. On of Zen’s spin-off blogs was devoted to better posters. This poster blog is called, surprisingly enough, Better Posters. Since I’m not in academia anymore, stopped caring about posters many years ago. Today, I stumbled upon this blog and was pleasantly surprised to discover that Better Posters is still active and that it is also now a book.

Working with the local filesystem and with S3 in the same code

Photo by Ekrulila on Pexels.com

As data people, we need to work with files: we use files to save and load data, models, configurations, images, and other things. When possible, I prefer working with local files because it’s fast and straightforward. However, sometimes, the production code needs to work with data stored on S3. What do we do? Until recently, you would have to rewrite multiple parts of the code. But not anymore. I created a sshalosh package that solves so many problems and spares a lot of code rewriting. Here’s how you work with it:

if work_with_s3:
    s3_config = {
      "s3": {
        "defaultBucket": "bucket",
        "accessKey": "ABCDEFGHIJKLMNOP",
        "accessSecret": "/accessSecretThatOnlyYouKnow"
      }
    }
    
else:
    s3_config = None
serializer = sshalosh.Serializer(s3_config)

# Done! From now on, you only need to deal with the business logic, not the house-keeping

# Load data & model
data = serializer.load_json('data.json')
model = serializer.load_pickle('model.pkl')

# Update
data = update_with_new_examples()
model.fit(data)

# Save updated objects
serializer.dump_json(data, 'data.json')
serializer.dump_pickle(model, 'model.pkl')

As simple as that.
The package provides the following functions.

  • path_exists
  • rm
  • rmtree
  • ls
  • load_pickle, dump_pickle
  • load_json, dump_json

There is also a multipurpose open function that can open a file in read, write or append mode, and returns a handler to it.

How to install? How to contribute?

The installation is very simple: pip install sshalosh-borisgorelik
and you’re done. The code lives on GitHub under http://github.com/bgbg/shalosh. You are welcome to contribute code, documentation, and bug reports.

The name is strange, isn’t it?

Well, naming is hard. In Hebrew, “shalosh” means “three”, so “sshalosh” means s3. Don’t overanalyze this. The GitHub repo doesn’t have the extra s. My bad

Book review. The Persuasion Slide by Richard Dooley

TL;DR Very shallow and uninformative. It could be an OK series of blog posts for complete novices, but not a book.

The Persuasion Slide by Richard Dooley was a disappointment for me. I love Dooley’s podcast Brainfluence, and I was sure that Richard’s book would full of in-depth knowledge and case studies. However, it contained neither. 

The only contribution of this book is the analogy between a sale process and an amusement part slide. The theory behind the book is mostly presented as a ground truth with almost no explanation or support from research. One will gain much more knowledge and understanding by reading Kahneman’s “Thinking, Fast and Slow,” Arieli’s “Predictably irrational.” or Weisman’s “59 seconds.”

Should I read this book?

No

Graphical comparison of changes in large populations with “volcano plots”

I recently rediscovered a volcano plot — a scatter plot that aims to visualize changes in large populations.

Volcano plots are very technical and specialized and, most probably, are not a good fit for explanatory data visualization. However, they can be useful during the exploration phase, and they come with a set of well-established metrics.

Moreover, if you are lucky enough to have well-behaved data, the plots look very cool

Visualization of RNA-Seq results with Volcano Plot
From here

Of course, in real life, the data is messy. Add bad visualization practices to the mess and you get a marvel like this one

From here

The bottom line: if you have two populations to compare, consider volcano plots. But do remember dataviz good practices.

Book review: Manager in shorts by Gal Zellermayer

TL;DR Nice’n’easy reading for novice managers

I read this book after hearing the author, Gal Zellermayer, in a podcast. Gal is an Israeli guy who has been working as a manager in several global companies’ Israeli offices. He brings a perspective that combines (what is perceived) the best practices of American managing style with the Israeli tendency to make things straight and simple. 

The greater part of the book is devoted to helping the people in your team develop. The book serves as a good motivator and helps to keep the importance of “peopleware.” I wish, however, it would bring more practical advice and cite more research and external analyses. 

Should you read this book?

If you are a beginning manager or want to be one – yes. 

If you never read a book on management – maybe (although Peopleware might be a better read).

The bottom line: 4/5

One idea per slide. It’s not that complicated


I wrote this post in 2009, I published it in March 2020, and am republishing it again


A lot of texts that talk about presentation design cite a very clear rule: each slide has to contain only one idea. Here’s a slide from a presentation deck that says just that.

And here’s the next slide in the same presentation

Can you count how many ideas there are on this slide? I see four of them.

Can we do better?

First of all, we need to remember that most of the time, the slides accompany the presenters and not replace them. This means that you don’t have to put everything you say as a slide. In our case, you can simply show the first slide and give more details orally. On the other hand, let’s face it, the presenters often use slides to remined themselves of what they want to say. 

So, if you need to expand your idea, split the sub-ideas into slides.

You can add some nice illustrations to connect the information and emotion. 

Making it more technical

“Yo!”, I can hear you saying, “Motivational slides are one thing, and technical presentation is a completely different thing! Also,” you continue, “We have things to do, we don’t have time searching the net for cute pics”. I hear you. So let me try improving a fairly technical slide, a slide that presents different types of machine learning.
Does slide like this look familiar to you?

First of all, the easiest solution is to split the ideas into individual slides.

It was simple, wasn’t it. The result is so much more digestible! Plus, the frequent changes of slides help your audience stay awake.

Here’s another, more graphical attempt

When I show the first slide in the deck above, I tell my audience that I am about to talk about different machine learning algorithms. Then, I switch to the next slide, talk about the first algorithm, then about the next one, and then mention the “others”. In this approach, each slide has only one idea. Notice also how the titles in these last slides are smaller than the contents. In these slides, they are used for navigation and are therefore less important.  In the last slide, I got a bit crazy and added so much information that everybody understands that this information isn’t meant to be read but rather serves as an illustration. This is a risky approach, I admit, but it’s worth testing.

To sum up

“One idea per slide” means one idea per slide. The simplest way to enforce this rule is to devote one slide per a sentence. Remember, adding slides is free, the audience attention is not.

Innumeracy

Innumeracy is the “inability to deal comfortably with the fundamental notions of number and chance”.
I wish there was a better term for “innumeracy”, a term that would reflect the importance of analyzing risks, uncertainty, and chance. Unfortunately, I can’t find such a term. Nevertheless, the problem is huge. In this long post, Tom Breur reviews many important aspects of “numeracy”. I already shared this post a long time ago, but it’s worth sharing again.

https://tombreur.wordpress.com/2018/10/21/innumeracy/

Published
Categorized as blog

Before and after — stacked bar charts

A fellow data analyst asked a question? What do we do when we need to draw a stacked bar chart that has too many colors? How do we select the colors so that they are nice but also are easily distinguishable? To answer this question, let’s look at the data similar to what appeared in the original question. I also tried to recreate the actual chart’s style

So, how do we select colors?
The answer to this question is pretty complicated. To have a set of easily distinguishable colors, one needs to model the color perception in a typical human being properly. Luckily, a tool called I Want Hue that’s based on a solid theory explained here. The problem, however, isn’t in colors.

This is not the right question

Distinguishing between eight colors in a graph is a challenging task. Selecting the right color scheme might help, but it won’t solve this fundamental problem. Moreover, stacked bar plots are tricky due to another complication.

We, the humans, are somewhat good are comparing positions but not as good at comparing sizes. This is why comparing the heights of the bars is relatively easy. It is easy because the bars start at the same line, and our task is to compare the bar end position, not the bar size. Reading the heights of the lowest segment in the bars is also an easy task for the same reason: we don’t compare the sizes but the heights.

However, comparing the sizes of the middle components is more challenging. As a result, the intermediate parts of a graph don’t add useful information but rather add noise. Thus, let us explain two options. First, we will reduce the number of groups. Next, we will explore what happens when reducing the number of groups is not an option.

Option 1. Reduce the number of categories

It is hard to advise about data visualization when I don’t know what conclusion the author wants to convey. However, I am sure that in many cases, the number of categories that are relevant to the viewer is much smaller than the number of types that are relevant to the analyst. The viewer might not care about all the hard job you did while collecting the data; what they are about is an insight. For example, if we reduce the discussion to two groups: the USA and non-USA data centers, the graph becomes much more readable.

Note how two groups in a stacked graph pose no problem in deciphering the sizes. If we take care of readability and improve the data-ink ratio, we get a nice data visualization piece.

Option 2. When reducing the number of categories is not an option

But what if reducing the number of categories is not an option? If you are absolutely sure that the audience absolutely needs to see all the information, you can split the different groups into separate subgraphs.

Have you noticed that the X-axis in our case represents time? In this case, we can replace the bars with an evolution plot and create a separate chart for each category in the data set. I took special care to keep the Y-axis scale equal between all the graphs so that the viewer can easily distinguish between data centers with a lot of errors and data centers with only a few of them. Here’s the result:

But what if the overall error rate is of greater importance than the individual groups. In that case, we can plot them in a larger graph and add the separate groups below, in smaller, un-emphasized subplots.

Summary — the Why and the What define the How

When you have a technical question about improving a graph, make sure you ask yourself “why.” Why is, does technical problems matter? Why will it improve the chart? To answer this question, you will have to ask another question: “what?”. “What is it that I want to say.” The easiest way to force yourself to ask these questions is to force yourself to add titles to every graph you create (see my how to suck less in data visualization post for more details).

Once you have your conclusion ready, you will notice that you don’t need a technical solution but rather a conceptual one. In this case, we solved the technical problem of looking for eight distinct colors by reducing the number of categories to two or splitting one elaborate graph into several straightforward ones.

So, remember, the Why and the What define the How

Python code that was used to generate all this graphs is available on (https://gist.github.com/bgbg/6c645a5fc48e61b1a917c9d1d66fa72f)

The Problem With Slope Charts (by Nick Desbarats)

Slope charts are often suggested as a valid alternative to clustered bar charts, especially for “before and after” cases.

So, instead of a clustered bar char like this

we tend to recommend a slope chart (or slope graph) like this

However, a slope chart isn’t free of problems either. In the past, I already wrote about a case of a meaningless slopegraph [here]. Today, I stumbled upon an interesting blog post (and a video) that surveys the problems of slope chars and their alternatives

All the graphs here come from the original post by Nick Desbarats that can be found [here].

Before and after: Alternatives to a radar chart (spider chart)

A radar chart (sometimes called “spider charts”) look cool but are, in fact,
pretty lame. So much so that when the data visualization author Stephen Few mentioned them in his book Show me the numbers, he did so in a chapter called “Silly graphs that are best forsaken.”

Here, I will demonstrate some of its problems, and will suggest an alternative

Before: The problems of a radar (spyder) plot

Above is my reconstruction of the original plot that I saw in a Facebook discussion. The graph looks pretty cool, I have to admit, but it is full of problems.
What are the problems of a spyder plot or a radar plot?
Let’s start with readability. Can you quickly tell the value of “Substance abuse” for the red series? Not that easy.

But a more significant problem emerges when one realizes that in most cases, the order of the categories is arbitrary and that different sorting options may result in entirely different visual pictures.

After: conclusion-based graph design

I have been continually preaching to add meaningful titles to all the graphs you are creating. (See How to suck less in data visualization and professional communication).

One of the byproducts of adding a title is the fact that when you write down your main takeaway of a graph, you force yourself to think, “does this graph show what it says it shows?” Thus, you guide yourself to better graph choices.

Let’s say that we conclude that there is no correlation between the two series of data. Is this conclusion evident from the graphs? I would say, not so much.

Instead of a radar chart, I suggest creating two aligned, horizontal graph plots. This way, we may sort one subplot according to the values, and then, correlation (or lack of thereof) will be evident.

But what if we noticed something interesting about the differences between A and B groups? If this is true, let’s show precisely this: the differences.

Notice how the bars in this version are sorted according to the difference. Sorting a bar chart is the easiest way to make it readable.

Python code that I used to create these graphs is available here https://gist.github.com/bgbg/db833db723998cd244b5049bfe01f5ac

Another language

بعد حوالي سنتين من الدراسة ، بحس حالي جاهز لإضافة اللغة العربية إلى قائمة اللغات في ال-LinkedIn 

After about two years of study, I feel ready to add Arabic to LinkedIn’s language list

Basic data visualization video course (in Hebrew)

I had the honor to record an introductory data visualization course for high school students as a part of the Israeli national distance learning project. The course is in Hebrew, and since it targets high schoolers, it does not require any prior knowledge.

I got paid for this job. However, when I divide the money that I received for this job by the time I spent on it, I get a ridiculously low rate. On the other hand, I enjoyed the process, and I view this as my humble donation to the public education system.
Since a government agency makes the course site, it’s UI is complete shit. For example, the site doesn’t support playlists, and the user is expected to search through the video clips by their titles. To fix that, I created a page that lists all the videos in the right order.

Text Visualization Browser

I’ve stumbled upon an exciting project — text visualization browser. It’s a web page that allows one to search for different text visualization techniques using keywords and publication time. 

Text visualization browser https://textvis.lnu.se

The ability to limit the search to various years gives a nice historical perspective on this interesting topic

This site’s information is based on a 2015 paper Text visualization techniques: Taxonomy, visual survey, and community insights. I wish the authors updated it with more recent data, though. 

Sharing the results of your Python code

Photo by veeterzy on Pexels.com

If you work, but nobody knows about your results or cares about them, have you done any work at all? 

A proverbial tree in the proverbial forest. Photo by veeterzy on Pexels.com

As a data scientist, the product of my work is usually an algorithm, an analysis, or a model. What is a good way to share these results with my clients? 

Since 99% of my time, I write in Python, I fell in love with a framework called Panel (http://panel.holoviz.org/). Panel allows you to create and serve basic interactive UI around data, an analysis, or a method. It plays well with API frameworks such as FastAPI or Flask.  The only problem is that to share this work. Sometimes, it is enough to run a local demo server, but if you want to share the work with someone who doesn’t sit next to you, you have to host it somewhere and to take care of access rights. For this purpose, I have a cheap cloud server ($5/month), which is more than enough for my personal needs.

If you can share the entire work publicly, some services can pick up your Jupyter notebooks from  Github and interactively serve them. I know of voila  and Binder)

Recently, Streamlit.io is entering this niche. It currently only allows sharing public repos, but promises to add a paid service for your private code. I’m eager to see that.

The information is beautiful. The graphs are shit!

I apologize for my harsh language, but recently I was exposed to a bunch of graphs on the “information is beautiful” site, and I was offended (well, ot really, but let’s pretend I was). I mean, I’m a liberal person, and I don’t care what graphs people do in their own time. Many people visit that site because they try to learn good visualization practices, but some charts on that site are wrong. Very wrong.

Here’s the gem:

I deliberately don’t share the link to this site. I don’t want let Google think it’s valuable in any way.

Now, the geniuses from “Information is beautiful” (let’s call them IB for brevety) wanted to share with us some positive stats. How nice of them. So what they did? They gathered together nine pairs of metrics collected at two different time points: one in the past and one furthermore in history. They used nice colors to create some sleeky shapes. So, what’s the problem? What’s wrong with that?

Everything is wrong!

Let’s start from my guess that they cherry-picked the stats with “positive” changes. Secondly, the comparison of this sort is mostly meaningless if we compare points at different years. What stopped the authors of that tasteless “infographic” from collecting data from the same years? I guess, their laziness. That’s how we ended up comparing the number of death penalties in 1990 and 2016, but the malaria deaths numbers are for 2000 and 2016, and dying mothers are compared for years 2000 and 2017?

Now, let’s talk about data viz.

Take a look at this graph.

The only time we use shapes like that is when we want to convey information about uncertainty. To do that, the X-axis represents the thing we are measuring, and the Y-axis represents our certainty about the current value. When we compare to uncertain measurements, we may judge the difference between these measurements by the distance between the curve peaks, and the width of the curve represents the uncertainty.

Here’s a good example from [this link]:

Can you see how the metric of interest is on the X-axis? The width of each bell curve represents the uncertainty and the difference between any pair of cases is the difference on the horizontal (X) axis, not the vertical one.

Instead, what do the IB authors did? They obviously like sleek looking shapes but know nothing about how to use them. They could have used two bars and let the viewer compare their heights. But nooooo! Bars are not c3wl! Bars are boring! Instead, they took probability density curves (that’s how they are technically called) and made them pretend to be bars.

Bars. Is this THAT hard?

I can hear some of you saying, “Stop being so purist! What’s wrong with comparing the heights of bell curves?” I’ll tell you what’s wrong! Data visualization is a language. As with any language, it has some rules and traditions. If you hear me saying, “me go home,” you will understand me without any problem. However, you will silently judge me for my poor use of the English language. I know that, and since English is my third language, I use all the help to make as few mistakes as possible. The same is correct with data visualization. Please respect its rules and traditions, even if (and especially if) are not fluent in it.

I never write more than two sentences in English without Grammarly

Visit the worst practice tag in this blog to see more bad examples

The Empirical Metamathematics of Euclid and Beyond — Stephen Wolfram Blog

I am seldomly jealous of people, but when I am, I’m jealous of Stephen Wolfram

Towards a Science of Metamathematics One of the many surprising things about our Wolfram Physics Project is that it seems to have implications even beyond physics. In our effort to develop a fundamental theory of physics it seems as if the tower of ideas and formalism that we’ve ended up inventing are actually quite general,…

The Empirical Metamathematics of Euclid and Beyond — Stephen Wolfram Blog
Published
Categorized as blog

Boris Gorelik on the biggest missed opportunity in data visualization — Data for Breakfast

My guest talk at Automattic.

Boris Gorelik recently joined us to present on The Biggest Missed Opportunity in Data Visualization based on his recent talk at the NDR conference. Boris was a data scientist at Automattic, is now a data science consultant, and blogs regularly on data visualization and productivity.  Some of highlights (along with a handy timestamp) include: Keep […]

Boris Gorelik on the biggest missed opportunity in data visualization — Data for Breakfast
Published
Categorized as blog

15-days-work-month — The joys of the Hebrew calendar

Tishrei is the seventh month of the Hebrew calendar that starts with Rosh-HaShana — the Hebrew New Year*. It is a 30 days month that usually occurs in September-October. One interesting feature of Tishrei is the fact that it is full of holidays: Rosh-HaShana (New Year), Yom Kippur (Day of Atonement), first and last days of Sukkot (Feast of Tabernacles) **. All these days are rest days in Israel. Every holiday eve is also a de facto rest day in many industries (high tech included). So now we have 8 resting days that add to the usual Friday/Saturday pairs, resulting in very sparse work weeks. But that’s not all: the period between the first and the last Sukkot days are mostly considered as half working days. Also, the children are at home since all the schools and kindergartens are on vacation so we will treat those days as half working days in the following analysis.

I have counted the number of business days during this 31-day period (one day before the New Year plus the entire month of Tishrei) between for a perios of several years.

Overall, this period consists of between 15 to 17 non-working days in a single month (31 days, mind you). This is how the working/not-working time during this month looks like this:

Now, having some vacation is nice, but this month is absolutely crazy. There is not a single full working week during this month. It is very similar to the constantly interrupt work day, but at a different scale.

So, next time you wonder why your Israeli colleague, customer or partner barely works during September-October, recall this post.

(*) New Year starts in the seventh’s month? I know this is confusing. That’s because we number Nissan — the month of the Exodus from Egypt as the first month.
(**)If you are an observing Jew, you should add to this list Fast of Gedalia, but we will omit it from this discussion

Career advice. Becoming a freelancer immediately after finishing a masters degree

Photo by Miguel u00c1. Padriu00f1u00e1n on Pexels.com

Will Cray [link] is a fresh M.Sc. in Computer Science and considers becoming a freelancer in the Machine Learning / Artificial Intelligence / Data Science field. Will asked for advice on the LocallyOptimistic.com community Slack channel. Here’s will question (all the names in this post are used with people’s permissions).

Read more career advices [here].

Let’s begin.

Will Cray 

I’m hoping to start a career as a freelancer in the AI space after finishing my Master’s in CS with a focus in AI. I don’t, however, have any industry experience in AI or data science. Do you all think it’s feasible to start a freelancing career without any industry experience? If so, do you have any tips on how to do it successfully?
[I worked for] two years at a major tech company, but I was a systems engineer. It was experience that isn’t necessarily relevant to what I want to work on as a freelancer.

Let’s divide the response to Will’s questions into two parts that correspond to Slack’s two discussion threads.

Thread #1 – Michael Kaminsky

This is a copy/paste from Slack.

Michael Kaminsky 

LocallyOptimistic.com — a valuable source for data folks

My hunch is that it’s going to be pretty tough to get started, though not impossible. You’re probably looking at a pretty lean year or two to build up a reputation out of the gate

Michael Kaminsky 

AI work in general is sort of difficult to contract out — so you might have more luck if you team up with a larger consulting outfit that can handle the other non-AI parts of the work

Michael Kaminsky 

very rarely is someone like “we have all of the data pipeline and pieces working, now we just need to hire someone to do the AI part” — in general, the model-fitting part of an AI project is the easiest and fastest

Will Cray 

Thank you so much for the info–it’s really helping me getting a better understanding of the landscape. Would your opinion, especially regarding that last message, change if the AI work I was doing was more custom model/agent design and training, rather than doing something quick like .fit() in sklearn?

Michael Kaminsky

ummm maybe? but like who needs custom model/agent design and training that doesn’t already have in-house data scientists working on it?

Michael Kaminsky

I don’t want to dissuade you, but my point is that you should think about who your customers are, and how you can market your services in such a way that it will provide them value. If you don’t have a clear map of the three concepts in italics, it could get rough — you can definitely figure it out by doing it, but that’s what you’ll be up against

Will Cray

You mentioned “larger consulting outfits” earlier–do you have any examples of organizations that you think could be a good fit?

Michael Kaminsky

so Brooklyn Data Company and 4 mile consulting are the two that jump to my mind — they specialize in BI and data but might want flex capacity into DS — they might be able to give you deal flow, etc. I know there are a number of others, maybe even folks in this channel

Thread #2 – Boris Gorelik

This is a copy/paste from Slack with some later edits and additions. 

Boris Gorelik 

Another thing to consider is what your risks are. If there are people who depend on you financially, starting with a freelance career might be too risky, especially if you don’t have 1-2 (better 2) customers who already committed to paying you for your services.

If you can afford several months without a steady income, or no income at all, being a freelancer might expose you to a larger variety of companies and business models in the market. I know some people who used to work as freelancers and gradually “adopted” one customer and moved to full employment. In these cases, freelance projects were, in fact, mutual trial periods where both sides decided whether there is a good fit.

Will Cray 

I greatly appreciate this insight. I have little risks. I’m single, my living expenses are low, and I have some financial runway. Part of the reason I like the idea of freelancing is for the reason you stated–I’ll get to see many different business models. As an aspiring entrepreneur, I think diversity of experiences and exposure would be useful to me. I also think being flexible in how many hours I work will allow me to allocate more time to developing my own ideas/projects; although, I understand that’s a luxury that comes with being an established freelancer. I don’t have any clients currently. Do you have any recommendations for channels to try and garner clients?

Boris Gorelik

> As an aspiring entrepreneur, I think ….

Even though a freelancer and an entrepreneur’s legal status may be the same, they are different occupations and careers. An entrepreneur creates and realizes business models; a freelancer sells their time and expertise to fulfill someone else’s ideas. That’s true that most of the time (not always), combining freelance with entrepreneurship is easier than combining entrepreneurship with being a full-time employee in a traditional company.

 > Do you have any recommendations for channels to try and garner clients?

Nothing except the regular facebook/linkedin/ but mostly friends and former coworkers and, in your case, teachers/lecturers. I got my first job interview via my Ph.D. advisor. Later, when I helped in hiring processes, I asked him and other professors to refer me to proper candidates. So yeah, make sure your professors know your status.

Exploring alternatives to population pyramids

A population pyramid also called an “age-gender-pyramid”, is a graphical illustration that shows the distribution of various age groups in a population (typically that of a country or region of the world), which forms the shape of a pyramid when the population is growing [citation from Wikipedia].

In some cases, the pyramid provides interesting insights into the entire population. In this post, I will explore ways to make some of these insights more visible. 

The basic case

Let’s start with the basic case. If you have two-three hours of spare time, you can go to the site devoted to population pyramids — https://www.populationpyramid.net. There, you will find population pyramids for every country in the world. The site provides present and past data, as well as future forecasts. To understand how insightful age pyramids can be, look at the graph that represents the entire world.

(this and most other images in this post are from the site http://populationpyramid.net/)

You can clearly see that the world is mostly young, that the amount of people declines as the age progresses, and that there is a rough balance between men and women in the world, at least before the ages of 70+.

Now, examine the stark difference between the populations of Western Africa and Western Europe. Citing the late professor Hans Rosling, we can still see two worlds, one with large families and short lives, and one with small families and long lives. 

Another starking example of an age pyramid is the following

Do you want to guess what country is that? This particular graph shows the age distribution of the United Arab Emirates. Such a vast distortion in symmetry and age distribution stems from the fact that more than 80% of the UAE’s population is composed of expats who come to this rich country to work. The pyramid below (taken from [this article]) sheds some light to the population composition of UAE. (Note that the genders in this graph are reversed).

Whose bar is longer?

The male-female disbalance in the UAE and some other Gulf countries is very striking and cannot be missed. But what about other, more subtle cases? Take a look at the world graph above. If you follow the numbers on the bars, you will notice that more boys are born than girls, but there are more old ladies than old gents in the world. Can we make such differences less subtle?

To answer this question, we need to understand why we find it hard to compare almost equal bars. The reason for that is that our eyes (or brains) are not so good at comparing sizes. They do, however, do a much better job comparing positions. Thus, if we overlap these bars, we will see the small differences in a much more precise manner. 

(I thank the data visualization expert Bella Graf from InfoServiz.co.il for the idea of this graph).

Now, the subtle differences in gender composition are more visible. 

What am I looking at?

When I teach data visualization, I always tell my students to add a meaningful title to the graph. By “meaningful,” I mean a title that does not answer the question “what” but rather “so what”? (See my posts “How to suck less in data visualization,” and “C for conclusion“). What would a good title for this graph be? Let’s try the following

OK, so now, when we have a title, we can ask ourselves, “does the graph show what it says it shows”? And the answer is no. Right now, the title talks about differences, but we don’t see the differences. We see the differences and other stuff. Let’s look only at the differences.

I don’t like this.

What about this?

Now, this is not an age pyramid. That’s for sure. This graph doesn’t show the wealth of data that the classical pyramid shows. On the other hand, it does offer one thing, and it does it very well. Look, for example, at the male/female distortion in China in 1990.

You may find the code I used to create the graphs in this post [on GitHub].

The Mysterious Status of .blog Domains

Photo by Bruno Bueno on Pexels.com

When the .blog TLD was started by Automattic, employees were given the option to reserve a domain for free. In return […], they asked that the domain be used as a primary domain (no forwarding to a different site), and that the site be updated with new content at least once a month. This requirement was the last argument for me NOT taking boris.blog — I didn’t want to make this commitment, plus I like gorelik.net a lot.

Recently, there were some not so nice developments about .blog names that were given away to Automatticians. The complains about this situation are usually anonymously, but I think that in this case, anonymity isn’t the right approach. That is why, I decided to share here an anonymous post from the Antimattic blog. Although I am not the author of this original post, and I don’t share the views of some of the posts written there, I do share the concerns expressed in this particular article. Posting in return for a domain name might have been a reasonable request at the beginning of the .blog TLD to help promoting its adaptation. But now, several years after this TLD is active, this requirement is simply not OK. To read the original post, click the screenshot below.

The first paragraph of this post is a verbatim copy from Antimattic.

ASCII histograms are quick, easy to use and to implement

From time to time, we need to look at a distribution of a group of values. Histograms are, I think, the most popular way to visualize distributions. “Back in the old days,” when most of my work was done in the console, and when creating a plot from Python was required too many boilerplate code lines, I found a neat function that produced histograms using ASCII characters.

Recently, I updated the python function that I use to create ASCII histograms. The updated function [link] uses more modern formatting and includes several signal-to-noise improvements. One can also use it with custom output functions, such as logging.info.

A short compilation of productivity blog posts

Photo by Mike on Pexels.com

This post contains a bunch of links to blogs that write about productivity.

  1. Musings of Brown Girls

This is not an exclusively productivity blog. The authors of this collective effort write about other interesting things. I read some posts, and I liked them

2. Self care

Do you know that feeling when you feel bad and don’t have the energy to do anything about that? This post is for you.

3. Saying NO

Being a freelancer, I have to practice saying NO. Saying NO isn’t only good for productivity but also for your mental health. Interesting post.

Many is not enough: Counting simulations to bootstrap the right way — Yanir Seroussi

An interesting post by my former coworker, Yanir Seroussi.

Previously, I encouraged readers to test different approaches to bootstrapped confidence interval (CI) estimation. Such testing can done by relying on the definition of CIs: Given an infinite number of independent samples from the same population, we expect a ci_level CI to contain the population parameter in exactly ci_level percent of the samples. Therefore, we […]

Many is not enough: Counting simulations to bootstrap the right way — Yanir Seroussi
Published
Categorized as blog

Book review: The Abyss: Bridging the Divide between Israel and the Arab World

TL;DR If you are an Israeli and don’t feel like learning the behind the scenes stories, skip it. Otherwise, I do recommend reading this book. I enjoyed it a lot 4.5/5

The Abyss: Bridging the Divide between Israel and the Arab World went to print slightly after the outbreak of the “Arab Spring.” The author, Eli Avidar, is a former Israeli intelligence officer and diplomat. Among other things, Eli Avidar served as the head of the Israeli diplomatic mission to Qatar in 1999. Today, Eli Avidar is a Knesset member for the right-wing Yisrael Beiteinu party. Even though so many things have changed since the book was published, I didn’t find any claim that Eli Avidar made, and that turned out to be wrong, nine years after the publication. 

I enjoyed reading this book a lot despite the fact that most of Eli Avidar’s claims are not new to me. Most of them are widely known to all the Israelis, and the real question is not whether you are aware of these claims, but whether you agree with them and what conclusions you make out of them.

On the other hand, The Abyss is an interesting storybook full of behind the scenes anecdotes and gossip. All who know me know how much I like gossips. It also provides a great introspection of how the (Jewish-)Israeli society sees the Arab-Israeli conflict, and what it feels towards it.

Should you read the book? If you are an Israeli and don’t feel like learning the behind the scenes stories, you may skip it. Otherwise, I do recommend reading this book. I don’t know how accurate is Avidar’s description of the Arab world, but his analysis of the Israeli behavior and attitude is very accurate. If you ever cough yourself wondering “What the fuck do the Israelis think?”, this book might shed some light for you. That is why I write this review in English, despite my tendency to review Hebrew books in my Hebrew blog.

Fun fact. I finished reading this book on August the 13th. I closed the book, opened Twitter, and saw my feed FULL with news about the upcoming normalization treaty between Israel and UAE. 

What is the biggest problem of the Jet and Rainbow color maps, and why is it not as evil as I thought?

There was a consensus among the data visualization purists that the rainbow color map, and it’s close cousin Jet are bad. Really bad. These colormaps used to be popular at the beginning of the computational data visualization era. However, their popularity decreased in the last five years or so. The sentiment isn’t as bad as it used to be a couple of years ago, but still.

A screenshot from circa 2016. Today we are less fanatic than that

What is the biggest problem of the rainbow colormap? The most apparent problem with this particular colormap is that it not perceptually uniform. By “perceptually uniform,” I mean that equal changes in the value that we encode using a colormap should correspond to same changes in the color perception. This is not the case with the rainbow or the Jet colormaps. They have distinct bright and dark stripes within the number range, making them the wrong choice to encode numerical data. The situation is even worse for people with impaired color vision.

Can you be less perceptually uniform?

The solution to this problem was proposed in the form of better colormaps. The first one that I know of is Parula by Matlab, and it’s opensource alternative Viridis that is available in matplotlib and many other plotting libraries. (Watch this video about viridis to get a good introduction to color perception and color maps).

Viridis, the new rainbow

Everything was nice and good, and I was trashing the rainbow colormap whenever I could. Until yesterday, when I read about Turbo, the improved rainbow colormap developed by Google.

In the long and interesting blog post that describes Turbo, Anton Mikhailov, a software engineer in Google, describes several relevant applications of a “good rainbow” scheme. 

According to Anton, “Because of rapid color and lightness changes, Jet accentuates detail in the background that is less apparent with Viridis and even Inferno. Depending on the data, some detail may be lost entirely to the naked eye. The background in the following images is barely distinguishable with Inferno (which is already punchier than Viridis), but clear with Turbo.”

I must admit that I’m convinced. 

The biggest problem with that is mentioned concerning the original rainbow scheme that its brightness varies too much. However, it turns out that the color saturation and hue attract our attention more than the lightness (here’s the reference which I haven’t read yet). As such, it makes sense to construct a colormap that relies more on color and hue changes. 

Moreover, in many cases, the interesting details appear in the extreme values of the data range, not in the middle. In thes cases, a properly applied rainbow-like color scheme becomes a valid choice.

The bottom line is that one should not refrain from using rainbow(-like) color maps in their visualizations anymore, provided that they use a modern implementation. Luckily, it’s even available in matplotlib

If you don’t teach yet, start! It will make you a better professional.

Many people know me as a data scientist. However, I also teach, which is sort of unnoticed to many of my friends and colleagues. I created a page dedicated to my teaching activity. Talk to me if you want to organize a course or a workshop.

I also highly recommend teaching as way of learning. So, if you don’t teach yet, start! It will make you a better professional.

How to suck less in data visualization and professional communication

In technical communication, the main thing is to keep the main thing the main thing. There are multiple ways to ensure this principle. Some of these ways require careful chart fine-tuning. However, there is one tool that is easy to master, fast to apply, and that provides a high return on the investment rate. I refer to chart titles. In this talk, I had two main theses. My first thesis is that most of you suck in communication (and not only data visualization).

My second thesis is that you can quickly improve your graphs by merely adding a good title. The importance of good titles is not new to my preaching, but I thought it was an excellent thing to formalize this thesis a bit, and I’m thankful to the NDR organizers for giving me this opportunity.

Following is the slide stack from my NDR presentation.

35 (and more) Ways Data Go Bad — Stats With Cats Blog

If you plan working data analysis or processing, read the excellent post in the “stats with cats blog” titled “35 Ways Data Go Bad” post. I did experience each and every one of the 35 problems. However, this list is far from being complete. One should add the comprehensive list of Falsehoods Programmers Believe About Time.

When you take your first statistics class, your professor will be a kind person who cares about your mental well-being. OK, maybe not, but what the professor won’t do is give you real-world data sets. The data may represent things you find in the real world but the data set will be free of errors. […]

35 Ways Data Go Bad — Stats With Cats Blog

Unexpected hitch of working in a distributed team

Photo by Porapak Apichodilok on Pexels.com

It has been about half a year after I became a freelance data scientist. Before my career change, I worked in a distributed team for more than five years. Today, I suddenly realized that working in a distributed team has a significant problem, inherent to its distributed, multinational, nature.

My team was always spread over multiple time zones. Sometimes, the time zone span was so broad, that we could never find a time slot where all the team members were ordinarily awake. Automattic, the company I used to work for, is a firm believer in asynchronous communication, but from time to time, you HAVE to meet over a Zoom/Slack/Whatever call. Since I wasn’t a manager, the number of live calls that I had to attend was kept to a minimum, and yet, I found myself at least twice a week in a 10 pm Zoom call. I don’t know what about you, but my brain keeps working for at least two outs after log off. Thus, twice a week, I would find myself going to bed after one o’clock at night. As a result, I was sleep deprived for the majority of the week.

Only now have I noticed the fact that my sleep has improved so much after the career change. I know that people who work in “colocated” teams also find themselves in late night phone calls, but working in a distributed group means that you’ll do it regularly.

Hybrid digital/analog tangible week planning

Here’s a neat method that helps me organize my week, increase my productivity and fight procrastination. 

Being a freelancer data scientist, I’m involved in three hands-on projects for two clients. I also manage/mentor two data scientists in two other projects, and participate in strategic discussions for a customer of mine, and in a startup in which I invest. Oh, I am also in the final stages of writing a paper. I never imagined I would be in the situation with so many balls that I need to keep in the air. How do I manage to keep sanity? 

This is what I do. Following the advice in “15 Secrets Successful People Know About Time Management“, I try to keep as many items in my calendar as possible. When my workweek starts, I print out the weekly schedule on a sheet of paper. Then, I apply the tangible GTD hack that I learned from another book [link] and write out all my projects on a bunch of small post-it notes. These notes allow me “dumping” all my brain contents into an external medium, which frees up my brain to spend more CPU cycles on processing, rather than remembering and worrying. 

Next comes the fun part, I get to play with my cards by arranging them on the weekly schedule. The geometry of the post-it notes and the sheet of paper ensures that I allocate reasonably larget chunks of time for each “big thing.” It also reminds me that the amount of time each day is limited, and I can’t stick too many plans into a day or a week. (No, I won’t be able to finalize the paper, complete the analysis for a retail shop, learn a chapter in Bayesian statistics book, before the end of today).

After I’m done, I copy each post-it note into my calendar. Thanks to the integration with Todoist (an excellent productivity tool), all these tasks end up in my todo list, where I can further work with them.

To sum up:

  • Global week overview – check
  • Prioritization and honesty – check.
  • Fun playing with sticky notes – check.
  • Work gets done – (I wish!).

Oh, did you notice the appointments between 5 and 6 am? This is my sports activity. Sometimes working out charges me for the entire day. Sometimes, all I want to do for the entire day is to have a nap 🙂

Before and after. Even excellent graphs can be improved

Being a data visualization consultant, I can’t help looking for dataviz problems in graphs that I see. Even if the graph is good. Even if I know that I would not be able to create a graph that good. Even if the overall graph is excellent, and the problems are minor, or maybe especially when the graph is excellent, and the problems are minor.

This is a nice graph published by Nevo Benita on Linkedin.

The graph presents the gap between the men and the women in the Israeli job market. As I said, the graph is excellent. However, there are several small problems that, like grains of sand in a chocolate mousse, stand in the way. Let’s take a look at them.

The time-series line in the upper right part of the graph shows good use of the real estate. The problem is that the X-axis ticks (the years) look as if they belong to the chart below. It takes some time to realize that the numbers are years of the upper graph, and not the X-axis of the graph below. Moving the numbers upwards by several pixels would have fixed that.

Now, it is more clear that “1990” and “2018” relate to the time-series graph above.
Before (left) and after (right).

Let’s talk about the left-side bar chart. It took me a while to understand what it is. As a matter of fact, I managed to write a critique paragraph about that bar chart, how it is unclear what the percentages are, and how they were computed. Only then had I noticed the explanation below. Such confusion isn’t the viewer’s fault. Since we usually scan images from top to bottom, moving the title to the top of the chart will reduce this confusion. The word “percent” is also redundant in that title since it comes after the percent sign.

Moving the explanation to the top makes it easier to notice. Before (left) and after (right)

The last point that is worth optimizing is the color order. Consistent element order in an image makes navigation and comprehension much easier. When the order is preserved, our brain can use mental shortcuts without losing much information. When these shortcuts are broken, the brain has to work harder. What am I talking about? The graph author made the correct decision to use different font colors in the graph title to specify which color stands for which gender. This way, we don’t need a separate legend, and this is good. The title is an ordered sequence of words. The visualizer could use this order to create the order heuristic that is so helpful. Such a heuristic isn’t always possible. Fortunately for the visualizer (and sadly for the society), the salary gap in all the occupations in this graph have the same direction: men earn more than women. As a result, the rightmost part has all the green dots on the right, and the purple dots are on the left. This direction is opposite to the gender direction in the title and the color direction in the bar chart. To fix this situation, I made sure that the color that stands for the women (purple) is always to the left of the color that designates the men (green).

Keeping the color order. Before (left) and after (right)

So, this is the final result. I hope you can see why I like it better.

That’s how I took and excellent graph and made it even more awesome.

Data visualization is not only dots, bars, and pies

Look at this wonderful piece of data visualization (taken from here). If you know the terms “tertiary structure” and “glycan”, there is NO way you miss the message that the author of this figure wanted to convey.

Also, note how using appropriate colors in the title, the authors got rid of graph legend.

How to become a Python professional in 42 hours?

Here’s an appealing ad that I saw

This image has an empty alt attribute; its file name is image-2.png

How to become a Python professional in 42 hours? I’ll tell you how. There is no way. I don’t know any field of knowledge in which one can become professional after 42 hours. Certainly not Python. Not even after 42 days. Maybe after 42 weeks if that’s mostly what you do and you already a programmer.

Book review. Five Stars by Carmine Gallo

TL;DR Good motivation to improve communication. Inadequate source of information on how to achieve that 

The central premise of Five Stars Communication Secrets to Get from Good to Great by Carmine Gallo is that professionals who don’t invest in communication skills are at high risk of being replaced by computers and robots. One of the book’s sections bares the title that summarises this premise very well “Storytelling isn’t a soft skill; it’s the equivalent of hard cash.” I firmly believe in these premises. That is why I invest so much time in learning and teaching data visualization, in public speaking, and blogging. 

When I started reading this book, I got excited. I kept marking one passage after another. Gallo packed the first part of the book with numerous citations and explanations on how a lack of communication skills is the most severe risk factor in the career of a modern professional, team, or company. One example leads to another one, and one smart conclusion followed another one. 

Then, I started noticing that the book tries to convince me more and more, but I didn’t need that convincing in the first place. More than half of the book is evangelism. The author tells you how essential communication skills are, then he gives you some examples of people who did it right, and then again talks on importance. Again, and again, and again. Where are all those “secrets to get from good to great”???

When, finally, we get to the practical parts, the reader is left mostly with shallow, almost trivial bits of advice. 

Some of the most important points I took from this book

Slight feeling of a hamster-wheel while reading this book

Adopt the three-act storytelling approach to presentations. The three-act storytelling approach worked for Homerus, Shakespear, Tarantino, and there is no reason it should fail you in your technical presentations. Fair enough. On the other hand, this 2012 article by Nancy Duarte, provides more depth and more actionable information on this approach (follow Duarte’s blog if presentation skills are something you are interested in). 

“In the first two to three minutes of a presentation, I want people to lean forward in their chairs.” I like this citation by Avinash Kaushik, Google’s digital marketing evangelist. I will undoubtedly try this approach in my next presentations.

Should you read this book?

If you read these lines, your job depends on your communication and presentation skills. If you believe this premise, you can skip the first 60% of the book. If you want to improve your communication skills, I suggest reading Jean-luc Doumont’s “Trees, Maps, and Theorems,” which is much shorter, but also much denser in methods and practical advice. 

The bottom line

3.5/5

The delicate art of fine trolling

Photo by Pixabay on Pexels.com

I’m reading the a 1991 paper by Barbara Tversky that deals with the directional representation of time. One sentence in the paper interview says

“There does not seem to be strong universal cognitive associations of quantity or quality to left or right”

Whenever I make a similar statement in the context of data visualization, I frequently get a self-assured response “of course there is – smaller numbers appear on the left!”. To answer this remark, Barbara Tversky added a small footnote that says

“Anyone in doubt should consult politicians on both the left and the right.”

Photo by Pixabay on Pexels.com

So gentle, yet so powerful.

Lie factor in ad graphs

It’s fun to look at the visit statistics and to discover old stories. I wrote this post in 2016. For a reason I don’t know, this post has been one of the most viewed posts in my blogs during the last week. 

So, I decided to publish it again. I won’t add any new examples, but if you want to see more stuff, type [lying with data visualization] in your favorite search engine

Lie factor in ad graphs

What do you do when you have spare time? I tend to throw graphs from ads to a graph digitizer to compute the “lie factor”. Take the following graph for example. It appeared in an online ad campaign a couple of years ago. In this campaign, one of the four Israeli health care providers bragged about the short waiting times in their phone customer support. According to the Meuheded (the health care provider who run the campaign), their customers had to wait for one minute and one second, compared to 1:03, 1:35, and 2:39 in the cases of the competitors. Look how dramatic the difference is:

Screen Shot 2018-02-16 at 18.34.38

The problem?

If the orange bar represents 61 seconds, then the dark blue one stands for 123 seconds, almost twice as much, compared to the actual numbers, the green bar is 4:20 minutes, and the light-blue one is approximately seven minutes, and not 2:39, as the number says.

Screen Shot 2018-02-16 at 18.32.53

I can’t figure out what guided the Meuhedet creative team in selecting the bar heights. What I do know that they lied. And this lie can be quantified.

 

 

 

StellarGraph — another promising network analysis library for Python and Scala

Network (graph) analysis is a complicated topic. There are several tools available for this task with different pros and cons. Recently, I stumbled upon another tool StellarGraph. StellarGraph authors claim to provide excellent performance; NumPy, Pandas, TensorFlow integration, an impressive set of algorithms, inter compatibility with Neo4j (THE graph database); and much more. The documentation looks very clear and extensive too.

I didn’t use it yet, but I certainly plan to.

https://www.stellargraph.io

The hazard of being a wizard. On balance between specialization and the risk to become obsolete.

A wizard is a person who continually improves his or her professional skill in a particular and defined field. I learned about this definition of wizardness from the book “Managing project, people and yourself” by Nikolay Toverosky (the book is in Russian).  

Recently, Nikolay published an interesting post about the hazards of becoming a wizard. The gist of the idea is that while you are polishing your single skill to perfection, the world changes. You may find your super-skill irrelevant anymore (see my Soviet Shoemaker story).

Nikolay doesn’t give any suggestions. Neither do I. 

Below is the link to the original post. The post is in Russian, and you can use Google Translate to read it.

Страница о магах У меня в книге есть глава про полководцев и магов. В её конце я подвожу итог: Несмотря на свою кру­тость, маг уяз­вим. Он поле­зен, только если его навык под­хо­дит к задаче. 658 more words

Почему опасно быть магом — Об управлении проектами и дизайне

Nice but useless data visualization

Network visualization can mesmerize and hypnotize. Chord diagrams are especially cool because they are so colorful and smooth. The problem is that sometimes, the result doesn’t provide any actual value, and serves as a cute illustration. Cute illustrations are cute; they help put some “easiness” to the text without the risk of looking too unprofessional. 

Take the two examples below.

One example (taken from here) shows worldwide migration patterns in a clear and useful way. You can take a look at the graph and make real conclusions.

The other example (taken from here) is mostly a useless illustration.

The only “conclusion” that a viewer can make out of this graph is “everything is connected with everything.”

This type of conclusion is OK for an ad or a general overview of a problem, but it is NOT a valid way to end a discussion. 

Bioinformatics career advice and a story about a Soviet shoemaker

When I was in elementary school (back in the USSR of the mid 80’s), I had a friend whose father was a shoemaker. Due to the crazy stupid way the Soviet economy worked, a Soviet shoemaker was much richer than a physician or an engineer. But this is not the story. The story is that one day this friend’s father had a chat with me about selecting a profession. This man’s point was that for as long as people have feet and need shoes on their feet, a shoemaker would be required and well-earning occupation. Guess what? People still have feet, and still, ware shoes, but I don’t see too many successful shoemakers anymore. 

Common wisdom says, “It is very hard to predict, especially the future.” And I will add “even more especially, about the job market.”. Nevertheless, people need to decide what to do with their lives, how to live, and what career paths to pursue. Some of them ask me, and I’m glad to answer. If you have any career-related questions, don’t be shy! Write to boris@gorelik.net, and I’ll see what wisdom I will be able to share with you.

Anyhow, this is a letter that I got from another pharmacist looking for a data science career.

Hope you are doing well. I saw your posts on Quora and thought of asking a doubt.
First let me tell my background. I am from India, I completed my Doctor of Pharmacy program (Pharm D). I am familiar with computer programming. I have intermediate knowledge in python and R programming.  So I thought taking up Bioinformatics and computational biology Masters program so that I can connect Pharma industry and my knowledge in computer science. 
What do you think? 
I have applied to University XYZ and got offer letter. I have to take a decision within 2 weeks.
Please let me know your thoughts on this.

To which I replied

Obviously, since the path you are describing similar to the one I took, I will think that it is a good idea. Moreover, as you might have read in my blog (for example, here), my opinion is that advanced degrees give much more stable foundations, compared to the “fast and easy” courses. Having said that, your life is yours, not mine, and the job market today is not the job market in 2001 when I graduated my B.Pharm.  

Thank you so much for replying to my silly question. I am honoured to get a response from you. 

First of all, I don’t believe in “there are no silly questions” bullshit, but asking a silly question is better than not asking at all. Secondly, these questions are not silly at all.

I have a question, in your post dated 2017, you have mentioned that Bioinformatics was booming in 2001 and now it has lost its significance. Are you still have the same thoughts? 

I think that this person refers to the most visited post of mine “Don’t study data science as a career move; you’ll waste your time!”.  There is also a 2019 follow-up.

If that is the case then me taking a master’s in bioinformatics and computational genomics would be a bad idea, right ?

Here’s what I responded. Keep in mind that I wrote this before the COVID-19 outbreak.

Look, the markets in different countries are different. 

Back in the old days, there was a worldwide wave of closing bioinfo companies. All the Israeli ones were either closing or counting weeks before closing. One anecdote: I was interviewing at a company. Two weeks later, I called the person who interviewed me to ask whether I got the job or not, and the secretary told me that that person was fired due to layoffs. 

Right now, Israel sees a renaissance of bioinformatics companies, but I don’t know what will happen in the future. These companies live mostly out of investors’ money and are subject to strict regulations. However, if you get a good education, your head will be full of useful mental models, relevant basic knowledge, and good practices. 

End of quote. One of The COVID-19 madness side effects is the massive influx of money into biotech companies. Is this a short-term anecdote, or will it become a sustainable trend? I have no idea.

Do you have any career-related questions to me? You don’t have to be a pharmacist to ask :-). Write to boris@gorelik.net. I promise to respond, even if by sending a link to my blog posts. 

The difference between statistically meaningful and practically meaningful. An interview with me

Recently, I gave an interview to the Techie Leadership site. Andrei Crudu, the interviewer, made a helpful outline of the conversation. I marked the most important parts in bold.

  • Academic views on leadership;
  • Managing people isn’t for everyone;
  • Lessons from a practical approach;
  • Data Science is predominantly about data cleaning;
  • The difference between statistically meaningful and practically meaningful;
  • How sometimes companies tweak results to match expectations;
  • Bad managers make you appreciate the good managers;
  • Giving credit, being decent and not cheating;
  • All good teamwork starts with effective communication;
  • You don’t know that the stuff that you know is unknown to others;

Overall, I enjoyed chatting with Andrei, and I hope you’ll enjoy listening to the interview. If you have any comments, feel free sharing them here or on the Techie Leadership size

Is Distributed Work a Divide and Conquer Strategy?

Photo by Markus Spiske on Pexels.com

Before becoming a freelance data scientist, I used to work at Automattic, which I used to regard as my dream job. Not every current and ex-Automattician share that rosy point of view. Antimattic is an anonymous blog that allows ex-Automattic employees to vent their feelings about what used to be their workplace. One recent post on that blog raises a fascinating question about distributed (or work from home, or remote) companies. “Is Distributed Work a Divide and Conquer Strategy?” I have to admit that I haven’t thought about this perspective before. It looks like we will see more and more companies switching to remote work. It’s an interesting interpretation of the “future of work.”

Obviously this site exists because people have had negative experiences at Automattic. But many people have also had very positive experiences at the company. Could it be that the distributed nature of Automattic allows for such varying experiences? 45 more words

Is Distributed Work a Divide and Conquer Strategy? — Antimattic

Logarithmic scale misinforms. Period

Being a data scientist and a self-proclaimed data visualization expert, I like using log scale graphs when I find them appropriate. However, as a speaker and a communicator, I refrain from using them in presentations as much as possible. From my experience as a data visualization lecturer, I noticed that even “technical” struggle grasping the concept of log scale graphs. 

One of the Coronavirus side effects was the introduction of the term “exponential growth” to every living room. Naturally (to some of us), exponential growth is best presented using a semi-log graph, where the X-axis represents the time (linear), and the Y-axis represents the degree of magnitude of a value (log scale). 

A recent study (link) tested and demonstrated how bad log-scale is. The research title is “The Logarithmic Scale Misinforms the Public and Affects Policy Preferences.” From my experience, log scale graphs misinform everybody. Except for experienced data scientists. Nothing can confuse or misinform us, obviously 😉

It is a bummer though that data visualization in that paper sucks so much.

Don’t publish graphs like this. Especially not in data visualization papers.

Thanks to Bella Graph who pointed me to the original study.

Book review: The Year Without Pants. WordPress.com and the future of work by Scott Berkun

TL;DR Interesting “history of work” book (definitely not “future of work”) with insights on transition-state organizations. Read it if history of work is your thing, or if you work in a small company that grows rapidly. 4.5/5 (due to the personal connection)



I got The Year Without Pants in 2014 as an onboarding present when I joined Automattic. The author, Scott Berkun, used to work as a manager at Microsoft (and maybe more places) before he quit and became a career of an adviser and an author. In 2011, the Automattic founder brought Scott to work at the company. About seventy people were working in the company back then and the company was growing rapidly. Automattic has just introduced a concept of teams, and the idea was that Scott will work as a team leader, consulting the management on how to deal with the transition.

Being an ex-Microsoft manager, Scott was fascinated by the small distributed company, and wrote a book on it, proclaiming that the way Automattic worked was “the future of work”.

The book was published in 2012. Today, in post-COVID 2020, nobody is surprised by people who don’t need to go to the office every day. Automattic has now more than 1,000 employees and has adopted many of the rituals big companies have, such as endless meetings, tedious coordination, name tags, and corporate speak.

Why, then, did I enjoy the book? First, for me, it was a pleasant “time travel.” I enjoyed reading about people I knew, teams I worked with, and practices I used to love or hate. Secondly, this book provides insights on a transition from a small group of like-thinkers to a formalized organization.

“Why it burns when you P” and other statistics rants

Do you sometimes Google for something only to find stuff written by yourself?
I teach a course called “data-based decision making.” While googling for examples of statistics misuse, I stumbled upon an interesting blog post that I wrote about one and a half years ago.

The post is so good; I decided to post it again.

——————————

“Sunday grumpiness” is an SFW translation of Hebrew phrase that describes the most common state of mind people experience on their first work weekday. My grumpiness causes procrastination. Today, I tried to steer this procrastination to something more productive, so I searched for some statistics-related terms and stumbled upon a couple of interesting links in which people bitch about p-values.

Why it burns when you P” is a five-years-old rant about P values. It’s funny, informative and easy to read

Everything Wrong With P-Values Under One Roof” is a recent rant about p-values written in a form of a scientific paper. William M. Briggs, the author of this paper, ends it with an encouraging statement: “No, confidence intervals are not better. That for another day.”

Everything wrong with statistics (and how to fix it)” is a one-hour video lecture by Dr. Kristin Lennox who talks about the same problems. I saw this video, and two more talks by Dr. Lennox on a flight I highly recommend all her videos on YouTube.

Do You Hate Statistics as Much as Everyone Else?” — A Natan Yau’s (from flowingdata.com) attempt to get thoughtful comments from his knowledgeable readers.

This list will not be complete without the classics:

Why Most Published Research Findings Are False“, “Mindless Statistics“, and “Cargo Cult Science“. If you haven’t read these three pieces of wisdom, you absolutely should, they will change the way you look at numbers and research.

*The literal meaning of שביזות יום א is Sunday dick-brokenness.

Visualising Odds Ratio — Henry Lau

Besides being a freelancer data scientist and visualization expert, I teach. One of the toughest concepts to teach and to visualize is odds ratio. Today, I stumbled upon a very interesting post that deals exactly with that

On Thursday 7 May, the ONS published analysis comparing deaths involving COVID-19 by ethnicity. There’s an excellent summary on twitter but the headline is that when taking into account age and other socio-demographic factors, such as deprivation, household composition, education, health and disability, there is higher risk for some ethnic groups of a COVID related…

Visualising Odds Ratio — Henry Lau

Calling bullshit on “persistence leads to success”

Did you know that J.K. Rowling, the author of Harry Potter, submitted her books 13 times before it was accepted? Did you know that Thomas Edison tried again and again, even though his teachers thought he was “too stupid to learn anything?” Did you know that Lior Raz (Fauda’s creator and lead actor) was an anonymous actor for more than ten years before he broke the barrier of anonymity? What do these all people have in common? They persisted, and they succeeded. BUT, and there is a big but.

girl wearing pink framed sunglasses

People keep telling us: follow your dream, and if you persist, it will come true. You will learn from your mistakes, improve, and adapt, and finally, will reach your goal. I call bullshit

Think of the Martingale betting strategy. In theory, it works. Why doesn’t it work in practice? Because nobody has infinite time and infinite pockets. The same is right with chasing your dream. We need to pay for the shelter above our heads, the food on our tables, the clothes that we wear. Often other people depend on us. Time passes by. I had to be a party pooper, but some people who chase their dreams will eat all their savings and will either have to give up or declare bankruptcy (and then give up).

Survivorship bias

But what about all those successful failers? What we see a typical example of survivorship bias, the logical error of concentrating on the people or things that made it past some selection process and overlooking those that did not, typically because of their lack of visibility. We know the names Rowling, Edison, Raz, and others not because of their multiple failures but DESPITE them. For every Rowling, Edison, and Raz, there are thousands of failed writers, engineers, and actors who ended up broke and caused sorrow to their families.

So, should I quit?

I don’t know. Maybe. Maybe not. It’s your life, your decision.

On a person that falls into the water. Or why thinking short-time is a good strategy in times of crisis

Photo by Life Of Pix on Pexels.com

At the beginning of the COVID-19 crisis, I tried to explain to my daughter (and to myself) the rationale behind the draconic measures the governments take to fight with the crisis. One rationalization that I found was an analogy of a person that falls into the water. In this situation, the person needs to act FAST to stabilize the situation. Only than, he or she can start planning their steps.

I have been very vocal criticizing the dramatic measures that many governments took in the beginning of this crisis. It looks like these measures were more-or-less correct, and that the countries that didn’t implement them are now in a much worse situation, compared to the countries that did impose severe limitations. But even if in the retrospective it will turn out that one could do much better without the many “hammers,” I tend to think that those hammers were inevitable.

The conclusion? One day or another, we will all need to act very fast. This means that we need to be prepared, have plan B’s work on resilience, and maybe perform emergency drills.

Bad advice from a reputable source is bad advice.

Would you buy a grammar book with a clear spelling mistake on its cover? I hope not. That’s what happened to IBM when it published it’s new data visualization guide. I didn’t bother reading the manual because of what IBM decided to use as the first image of their guide.

We use graphs to transfer information into images that are supposed to be later transformed in our brains to information. What visual attributes do we use to interpret the information behind a pie chart? It is the segment angle, its area, or maybe the arc length? Most probably, the answer is “all of the above” (see Robert Kosara’s works for more info). When done right, the three attributes of pie segments are linearly connected one to another, which allows synergism between the visual clues.
But what did our friends at IBM do? The deliberately distorted the data! I took the screenshot from the guide homepage and made some measurements.
The purple segment has the angle of 182 degrees, and the angle of the black segment is 75 degrees, which gives us the ratio of 2.42. However, while the radius of the purple segment is 135 pixels, the radius of the black one is only 110 pixels. Why is this a problem? Well, due to the radius differences, the ratio between the arc lengths is 2.91, and the ratio between the areas is 3.66. So now, let me ask you: what is the ratio between the numbers represented by the purple and the black segments?
It is correct that the colors that IBM people used in their guide are neat, but data visualization that distorts information is not visualization but a piece of garbage. I assume that IBM produces decent computers, but don’t learn data visualization from them

Why is it (almost) impossible to set deadlines for data science projects?

I wrote this post in 2017. For some reason, it started gaining traffic in the last two weeks. I reviewed this post and couldn’t find any new insights. But maybe you can help me.

Boris Gorelik

In many cases, attempts to set a deadline to a data science project result in a complete fiasco. Why is that? Why, in many software projects, managers can have a reasonable time estimate for the completion but in most data science projects they can’t? The key points to answer this question are complexity and, to a greater extent, missing information. By “complexity” I don’t (only) mean the computational complexity. By “missing information” I don’t mean dirty data. Let us take a look at these two factors, one by one.

Complexity

Illustration: famous xkcd comic. Two programmers play during the compilation time
Think of this. Why most properly built bridges remain functional for decades and sometimes for centuries, while the rule in every non-trivial program is that “there is always another bug?”. I read this analogy in Joel Spolsky’s post written in 2001. The answer Joel provides is:

Once you’ve written a subroutine, you can call it as often as you…

View original post 665 more words

Published
Categorized as blog

Finally We May Have a Path to the Fundamental Theory of Physics… and It’s Beautiful — Stephen Wolfram Blog

OK, so Stephen Wolfram (a mega celebrity in the computational intelligence world and, among other things a physicist) claims that he may have found a path to the Fundamental Theory of Physics. The blog post is long, and I hope to be able to finish reading it in a week or two. The accompanying technical text is a 450-page tome available on a dedicated site.

Also, it turns out that Stephen Wolfram has a Twitch.tv channel in which he talks about science.

Website: Wolfram Physics Project Technical Intro: A Class of Models with the Potential to Represent Fundamental Physics How We Got Here: The Backstory of the Wolfram Physics Project… 26,455 more words

Finally We May Have a Path to the Fundamental Theory of Physics… and It’s Beautiful — Stephen Wolfram Blog

The quintessence of data visualization usefulness

I have to admit, I was skeptical at the beginning of the COVID-19 crisis. I started becoming skeptical now when it seems that the crisis didn’t hit my country too hard. But then I saw the graphs in this Financial Times article, and the skepticism disapeared. The graphs are accompanied by hundreds of words, but there is no need for reading the text to understand almost everything.

These graphs are so good, so convincing, so well performed, they don’t leave any place for doubt or misunderstanding of the message the author wants to convey.

If you study data visualization, look at these graphs. Look at the color choice, legend location, and design. Look at the ticks on the X- and Y-axes, how they are spaced and typeset. Note the amount of details on the axes, specifically how sparse these details are.