From time to time, people send me emails asking for career advice. Here’s one recent exchange. Hi Boris, I am currently trying to decide on a career move and would like to ask for your advice. I have a MSc from a leading university in ML, without thesis. I have 5 years of experience in … Continue reading Career advise. Upgrading data science career
I invited Dr. Charles Earl for this episode of my podcast “Job Interview” to talk about racial discrimination at the workplace and fairness in machine learning. Dr. Charles Earl is a data scientist in Automattic, my previous place of work. Charles holds a Ph.D. in computer science, M.A. in education, M.Sc in Electrical engineering, and … Continue reading Interview 27: Racial discrimination and fair machine learning
Deena Gergis is a data science lead at Bayer. I recently discovered Deena’s article on LinkedIn titled “Five Things I Wish I Knew About Real-Life AI.” I think that this article is a great piece of a career advice for all the current and aspiring data scientists, as well as for all the professionals who … Continue reading Five things I wish people knew about real-life machine learning
I started following data visualization news and opinions quite a few years ago. One of the first bloggers who were active in this area NeurDojo, by the (now) professor Zen Faulkes. On of Zen’s spin-off blogs was devoted to better posters. This poster blog is called, surprisingly enough, Better Posters. Since I’m not in academia … Continue reading One of the first dataviz blogs that I used to follow is now a book. Better Posters
Danny Lieberman managed teams of programmers before I couldn’t read, so when Danny writes a post as bold and blunt as this, you should read it.
Working with local and S3 I/O with minimal code changes in Python
TL;DR Very shallow and uninformative. It could be an OK series of blog posts for complete novices, but not a book. The Persuasion Slide by Richard Dooley was a disappointment for me. I love Dooley’s podcast Brainfluence, and I was sure that Richard’s book would full of in-depth knowledge and case studies. However, it contained neither. The … Continue reading Book review. The Persuasion Slide by Richard Dooley
I recently rediscovered a volcano plot — a scatter plot that aims to visualize changes in large populations. Volcano plots are very technical and specialized and, most probably, are not a good fit for explanatory data visualization. However, they can be useful during the exploration phase, and they come with a set of well-established metrics. Moreover, … Continue reading Graphical comparison of changes in large populations with “volcano plots”
TL;DR Nice’n’easy reading for novice managers I read this book after hearing the author, Gal Zellermayer, in a podcast. Gal is an Israeli guy who has been working as a manager in several global companies’ Israeli offices. He brings a perspective that combines (what is perceived) the best practices of American managing style with the … Continue reading Book review: Manager in shorts by Gal Zellermayer
A couple of weeks go, I wrote a post about an unexpected hitch of working in a distributed team. Yesterday, my ex-coworker, Ann McCarthy wrote a related, more elaborative post on the same issue. It’s worth reading.
“One idea per slide” means one idea per slide. The simplest way to enforce this rule is to devote one slide per a sentence. Remember, adding slides is free, the audience attention is not.
Innumeracy is the “inability to deal comfortably with the fundamental notions of number and chance”.I wish there was a better term for “innumeracy”, a term that would reflect the importance of analyzing risks, uncertainty, and chance. Unfortunately, I can’t find such a term. Nevertheless, the problem is huge. In this long post, Tom Breur reviews … Continue reading Innumeracy
A fellow data analyst asked a question? What do we do when we need to draw a stacked bar chart that has too many colors? How do we select the colors so that they are nice but also are easily distinguishable? To answer this question, let’s look at the data similar to what appeared in … Continue reading Before and after — stacked bar charts
Slope charts are often suggested as a valid alternative to clustered bar charts, especially for “before and after” cases. So, instead of a clustered bar char like this we tend to recommend a slope chart (or slope graph) like this However, a slope chart isn’t free of problems either. In the past, I already wrote … Continue reading The Problem With Slope Charts (by Nick Desbarats)
A radar chart (sometimes called “spider charts”) look cool but are, in fact,pretty lame. So much so that when the data visualization author Stephen Few mentioned them in his book Show me the numbers, he did so in a chapter called “Silly graphs that are best forsaken.” Here, I will demonstrate some of its problems, … Continue reading Before and after: Alternatives to a radar chart (spider chart)
بعد حوالي سنتين من الدراسة ، بحس حالي جاهز لإضافة اللغة العربية إلى قائمة اللغات في ال-LinkedIn After about two years of study, I feel ready to add Arabic to LinkedIn’s language list
I had the honor to record an introductory data visualization course for high school students as a part of the Israeli national distance learning project. The course is in Hebrew, and since it targets high schoolers, it does not require any prior knowledge. I got paid for this job. However, when I divide the money … Continue reading Basic data visualization video course (in Hebrew)
I’ve stumbled upon an exciting project — text visualization browser. It’s a web page that allows one to search for different text visualization techniques using keywords and publication time. The ability to limit the search to various years gives a nice historical perspective on this interesting topic This site’s information is based on a 2015 paper Text … Continue reading Text Visualization Browser
הקליקו כאן לקבלת פרטים והרשמה!
If you work, but nobody knows about your results or cares about them, have you done any work at all? As a data scientist, the product of my work is usually an algorithm, an analysis, or a model. What is a good way to share these results with my clients? Since 99% of my time, … Continue reading Sharing the results of your Python code
This notebook is a part of my productivity system. Read more on productivity and procrastination here.
I apologize for my harsh language, but recently I was exposed to a bunch of graphs on the “information is beautiful” site, and I was offended (well, ot really, but let’s pretend I was). I mean, I’m a liberal person, and I don’t care what graphs people do in their own time. Many people visit … Continue reading The information is beautiful. The graphs are shit!
I am seldomly jealous of people, but when I am, I’m jealous of Stephen Wolfram Towards a Science of Metamathematics One of the many surprising things about our Wolfram Physics Project is that it seems to have implications even beyond physics. In our effort to develop a fundamental theory of physics it seems as if … Continue reading The Empirical Metamathematics of Euclid and Beyond — Stephen Wolfram Blog
My guest talk at Automattic. Boris Gorelik recently joined us to present on The Biggest Missed Opportunity in Data Visualization based on his recent talk at the NDR conference. Boris was a data scientist at Automattic, is now a data science consultant, and blogs regularly on data visualization and productivity. Some of highlights (along with … Continue reading Boris Gorelik on the biggest missed opportunity in data visualization — Data for Breakfast
Next time you wonder why your Israeli colleague, customer or partner barely works during October, recall this post
Will Cray [link] is a fresh M.Sc. in Computer Science and considers becoming a freelancer in the Machine Learning / Artificial Intelligence / Data Science field. Will asked for advice on the LocallyOptimistic.com community Slack channel. Here’s will question (all the names in this post are used with people’s permissions). Read more career advices [here]. Let’s begin. … Continue reading Career advice. Becoming a freelancer immediately after finishing a masters degree
A population pyramid also called an “age-gender-pyramid”, is a graphical illustration that shows the distribution of various age groups in a population (typically that of a country or region of the world), which forms the shape of a pyramid when the population is growing [citation from Wikipedia]. In some cases, the pyramid provides interesting insights into … Continue reading Exploring alternatives to population pyramids
When the .blog TLD was started by Automattic, employees were given the option to reserve a domain for free. In return […], they asked that the domain be used as a primary domain (no forwarding to a different site), and that the site be updated with new content at least once a month. This requirement … Continue reading The Mysterious Status of .blog Domains
From time to time, we need to look at a distribution of a group of values. Histograms are, I think, the most popular way to visualize distributions. “Back in the old days,” when most of my work was done in the console, and when creating a plot from Python was required too many boilerplate code … Continue reading ASCII histograms are quick, easy to use and to implement
This post contains a bunch of links to blogs that write about productivity. Musings of Brown Girls This is not an exclusively productivity blog. The authors of this collective effort write about other interesting things. I read some posts, and I liked them 2. Self care Do you know that feeling when you feel bad … Continue reading A short compilation of productivity blog posts
An interesting post by my former coworker, Yanir Seroussi. Previously, I encouraged readers to test different approaches to bootstrapped confidence interval (CI) estimation. Such testing can done by relying on the definition of CIs: Given an infinite number of independent samples from the same population, we expect a ci_level CI to contain the population parameter … Continue reading Many is not enough: Counting simulations to bootstrap the right way — Yanir Seroussi
TL;DR If you are an Israeli and don’t feel like learning the behind the scenes stories, skip it. Otherwise, I do recommend reading this book. I enjoyed it a lot 4.5/5 The Abyss: Bridging the Divide between Israel and the Arab World went to print slightly after the outbreak of the “Arab Spring.” The author, … Continue reading Book review: The Abyss: Bridging the Divide between Israel and the Arab World
What is the biggest problem of the Jet and Rainbow color maps, and why is it not as evil as I thought?
There was a consensus among the data visualization purists that the rainbow color map, and it’s close cousin Jet are bad. Really bad. These colormaps used to be popular at the beginning of the computational data visualization era. However, their popularity decreased in the last five years or so. The sentiment isn’t as bad as … Continue reading What is the biggest problem of the Jet and Rainbow color maps, and why is it not as evil as I thought?
Many people know me as a data scientist. However, I also teach, which is sort of unnoticed to many of my friends and colleagues. I created a page dedicated to my teaching activity. Talk to me if you want to organize a course or a workshop. I also highly recommend teaching as way of learning. … Continue reading If you don’t teach yet, start! It will make you a better professional.
In technical communication, the main thing is to keep the main thing the main thing. There are multiple ways to ensure this principle. There is one tool that is easy to master, fast to apply, and that provides a high return on the investment rate.
I will be talking about data visualization at the next NDR conference on July 28. All the conferences organized by the NDR team are well organized and of a very high value. I hope to keep the level high. And here’s the brief description of my talk See you
If you plan working data analysis or processing, read the excellent post in the “stats with cats blog” titled “35 Ways Data Go Bad” post. I did experience each and every one of the 35 problems. However, this list is far from being complete. One should add the comprehensive list of Falsehoods Programmers Believe About … Continue reading 35 (and more) Ways Data Go Bad — Stats With Cats Blog
It has been about half a year after I became a freelance data scientist. Before my career change, I worked in a distributed team for more than five years. Today, I suddenly realized that working in a distributed team has a significant problem, inherent to its distributed, multinational, nature. My team was always spread over … Continue reading Unexpected hitch of working in a distributed team
Here’s a neat method that helps me organize my week, increase my productivity and fight procrastination.
Even good graphs have place for improvement. Follow this post to
Look at this wonderful piece of data visualization (taken from here). If you know the terms “tertiary structure” and “glycan”, there is NO way you miss the message that the author of this figure wanted to convey. Also, note how using appropriate colors in the title, the authors got rid of graph legend.
Here’s an appealing ad that I saw How to become a Python professional in 42 hours? I’ll tell you how. There is no way. I don’t know any field of knowledge in which one can become professional after 42 hours. Certainly not Python. Not even after 42 days. Maybe after 42 weeks if that’s mostly … Continue reading How to become a Python professional in 42 hours?
I’m honored to take part in standardizing bidirectional language support in interfaces and visualization, as a part of an expert group formed for the Hebrew Support in Computerized Systems Committee at the SII-the standards institution of Israel. The Committee is led by Gilad Almosnino. Below is Gilad’s project announcement.
TL;DR Good motivation to improve communication. Inadequate source of information on how to achieve that The central premise of Five Stars Communication Secrets to Get from Good to Great by Carmine Gallo is that professionals who don’t invest in communication skills are at high risk of being replaced by computers and robots. One of the … Continue reading Book review. Five Stars by Carmine Gallo
I’m reading the a 1991 paper by Barbara Tversky that deals with the directional representation of time. One sentence in the paper interview says “There does not seem to be strong universal cognitive associations of quantity or quality to left or right” Whenever I make a similar statement in the context of data visualization, I … Continue reading The delicate art of fine trolling
It’s fun to look at the visit statistics and to discover old stories. I wrote this post in 2016. For a reason I don’t know, this post has been one of the most viewed posts in my blogs during the last week. So, I decided to publish it again. I won’t add any new examples, … Continue reading Lie factor in ad graphs
Network (graph) analysis is a complicated topic. There are several tools available for this task with different pros and cons. Recently, I stumbled upon another tool StellarGraph. StellarGraph authors claim to provide excellent performance; NumPy, Pandas, TensorFlow integration, an impressive set of algorithms, inter compatibility with Neo4j (THE graph database); and much more. The documentation looks … Continue reading StellarGraph — another promising network analysis library for Python and Scala
On balance between specialization and the risk to become obsolete.
Network visualization can mesmerize and hypnotize. Chord diagrams are especially cool because they are so colorful and smooth. The problem is that sometimes, the result doesn’t provide any actual value, and serves as a cute illustration. Cute illustrations are cute; they help put some “easiness” to the text without the risk of looking too unprofessional. … Continue reading Nice but useless data visualization
When I was in elementary school (back in the USSR of the mid 80’s), I had a friend whose father was a shoemaker. Due to the crazy stupid way the Soviet economy worked, a Soviet shoemaker was much richer than a physician or an engineer. But this is not the story. The story is that … Continue reading Bioinformatics career advice and a story about a Soviet shoemaker
Interview on leadership. The difference between statistically meaningful and practically meaningful;Giving credit, being decent and not cheating;
All good teamwork starts with effective communication;
You don’t know that the stuff that you know is unknown to others;
Is Distributed Work a Divide and Conquer Strategy?
Being a data scientist and a self-proclaimed data visualization expert, I like using log scale graphs when I find them appropriate. However, as a speaker and a communicator, I refrain from using them in presentations as much as possible. From my experience as a data visualization lecturer, I noticed that even “technical” struggle grasping the concept of log scale graphs.
Book review: The Year Without Pants. WordPress.com and the future of work. Read it if history of work is your thing, or if you work in a small company that grows rapidly
“Why it burns when you P” and other statistics rants
Besides being a freelancer data scientist and visualization expert, I teach. One of the toughest concepts to teach and to visualize is odds ratio. Today, I stumbled upon a very interesting post that deals exactly with that
Did you know that J.K. Rowling, the author of Harry Potter, submitted her books 13 times before it was accepted? So what?
COVID-19 vs. influenza dataviz. The order is now correct
On a person that falls into the water. Or why thinking short-time is a good strategy in times of crisis
One day or another, we will all need to act very fast. This means that we need to be prepared, have plan B’s work on resilience, and maybe perform emergency drills.
It is correct that the colors that IBM people used in their guide are neat, but data visualization that distorts information is not visualization but a piece of garbage. I assume that IBM produces decent computers, but don’t learn data visualization from them
Originally posted on Boris Gorelik:
In many cases, attempts to set a deadline to a data science project result in a complete fiasco. Why is that? Why, in many software projects, managers can have a reasonable time estimate for the completion but in most data science projects they can’t? The key points to answer this…
NDR is a family of machine learning/data science conferences. Their next conference will be held online on May, 28 and the agenda looks great. Now, I’m not super objective here, because I’m presenting at NDR July event. But look at the topics, what an impressive selection!
Finally We May Have a Path to the Fundamental Theory of Physics… and It’s Beautiful — Stephen Wolfram Blog
OK, so Stephen Wolfram (a mega celebrity in the computational intelligence world and, among other things a physicist) claims that he may have found a path to the Fundamental Theory of Physics. The blog post is long, and I hope to be able to finish reading it in a week or two. The accompanying technical … Continue reading Finally We May Have a Path to the Fundamental Theory of Physics… and It’s Beautiful — Stephen Wolfram Blog
The quintessence of data visualization usefulness. These graphs are SOOOO good and convincing.
Never Split the Difference. A negotiation book that you might want to read. A book review.
Today, Israel marks Holocaust Day. Many words have been written about the Holocaust, and I want to write about missing graves.If you visit a Jewish cemetery, you might see a lot of gravestones with additional memorial plates. I took this picture in the Chișinău (Kishinev) Jewish cemetery. Burial of the deceased is considered the final … Continue reading The missing graves
Constance Crozier (@clcrozier) shared an interesting simulation in which she tried to fit a sigmoid curve (s-curve) to predict a plateau in a time-series. It took me a while to find the reference for a paper that explains why.
My colleague, Simon Ouderkik, recorded a REALLY interesting interview with Stephen Levin of Zapier and Emilie Schario of Gitlab on organizing data org in a company, job titles, career ladders, and other important stuff.
If there is only one document you can read about data visualization, this is the one
I wrote about data giraffes two weeks ago. Usually, “data giraffes” are a problem and we need to work hard in order to solve it. Sometimes, they are a useful feature. Take a look at this NYT front page that shows the number of new unemployment applications in the United States over the time And … Continue reading Data giraffe is sometimes a feature, not a problem
My job wasn’t affected by the COVID madness in almost any way. I used to work from home before, and I work from home now, none on my customers cancelled any projects, the health system in Israel is still functioning, all of my relatives are in good health, everything is just fine! I know how … Continue reading Everything is NOT just fine (repost)
More than two years ago, I took a look at Google Trends for three phrases “start a blog”, “create a blog”, and “create a site”. I was surprised by the high volume of blog searches, compared to “create a site”. Today, I decided to go back to Google Trends and to add the new rising … Continue reading Blogging isn’t what it used to be. Podcasting is on the rise
A super-important read on the COVID-19 situation. I’m finally convinced
Data scientist? Thinking of working in a distributed company? The team at Automattic in which I used to work is looking for a Machine Learning specialist. It’s an awesome team. Give it a try https://automattic.com/work-with-us/machine-learning-engineer/
Make the personal meeting personal, even if it’s remote.
An interesting solution of the data giraffe problem
COVID-19 vs. influenza dataviz (an update)
Here’s another email that I got with the question about switching to the data science career
I suppose that you knot that THE software developement Q&A site has its own job board. I suspected that the Corona pandemic would lead to a sharp decrease in the number of job postings on that board. I scraped the data, and it looks like for now, there are no drastic changes in the amount of postings published in the last couple of days.
The cardiovascular safety of antiobesity drugs—analysis of signals in the FDA Adverse Event Report System Database
I am glad and proud to announce that a paper which I helped to prepare and publish is available on the Nature’s group site. The paper, The cardiovascular safety of antiobesity drugs—analysis of signals in the FDA Adverse Event Report System Database, by Einat Gorelik et al. (including myself) analyzes the data in the FDA Adverse … Continue reading The cardiovascular safety of antiobesity drugs—analysis of signals in the FDA Adverse Event Report System Database
Please leave a comment to this post. It doesn’t matter what, it can be a simple Hi or an interesting link. It doesn’t matter when or where you see it. I want to see how many real people are actually reading this blog.
Before becoming a freelancer data scientist, I used to work in a distributed company. Remote communication, including remote presentations were the norm for me, long before the remote work experiment no one asked for. In this post, I share some tips for delivering better presentations remotely. Stand up! Usually, we stand up when we present … Continue reading Tips for making remote presentations
Originally posted on בוריס גורליק:
תרשים עוגה כחלופה הולמת לגרף עמודות במהלך חיי המקצועיים שמעתי רבות בגנות תרשימי עוגה. הסיבה לכך נעוצה בעובדה שקל מאוד לייצר זוועות עם תרשימים אלו. לא עזרה העובדה שבמשך המון זמן ברירת המחדל של תרשימי עוגה, בכל כלי ההדמיה העיקריים, ייצרה תרשימים מעוותים לגמרי. מצדדי החרם על תרשים עוגה מציעים את גרף…
“One idea per slide” means one idea per slide. The simplest way to enforce this rule is to devote one slide per a sentence. Remember, adding slides is free, the audience attention is not.
Graph code: here.
Being a data science freelancer, and a long-time AnnMaria’s fan, I HAVE to repost here latest post on consulting success
People ask me for good intro video to data visualization. I tend to ask them to look for one of my lectures. To save the search, here’s one of the most relevant talks that I gave This lecture was a part of 2018 EuroScipy conference, where I also ran a workshop.
Career advice. A clinical pharmacist, epidemiologist, and a Ph.D. student wants to become a data scientist.
From time to time, I get emails from people who seek advice in their career paths. This time, I got an email from a clinical pharmacist and a Ph.D student
Being a freelancer data scientist, I get to talk to people about proposals that don’t materialize into projects.
I can’t elaborate yet, but in case you wondered how scientific satisfaction looks like, here’s a perfect illustration. Stay tuned
Gilad Almosnino is an internationalization expert. I’m reading his post “Eight emojis that will create a more inclusive experience for Middle Eastern markets,” in which he mentions “Turkish or Arabic Coffee,” which reminded me of my last visit to Athens. When, in one restaurant, I asked for a Turkish coffee, the waiter looked at me harshly and … Continue reading Which coffee is this?
Do you believe in telepathy? Yesterday, I submitted final proofs of a paper in which I actively participated. During the proofreading, I noticed that our abstract ends with “further research is needed” and scratched my head. I submitted the proofs and then then, I saw this pearl in my blog feed
TL;DR shallow and disappointing The Great Mental Models by Shane Parrish was highly praised by Automattic’s CEO Matt Mullenweg. Since I appreciate Matt’s opinion a lot, I decided to buy the book. I read it and was disappointed. This book is very ambitious but yet shallow and non-engaging. If you consider reading a book on … Continue reading Book review: Great mental models by Shane Parrish
Which data scientists can refuse more computing power? None. My collection of computing devices has a new addition a Soviet arithmometer Felix M.
TicToc — a flexible and straightforward stopwatch library for Python.
Why it is OK to have a loud argument with your co-workers.
The difference between python decorators and inheritance that cost me three hours of hair-pulling