I am seldomly jealous of people, but when I am, I’m jealous of Stephen Wolfram Towards a Science of Metamathematics One of the many surprising things about our Wolfram Physics Project is that it seems to have implications even beyond physics. In our effort to develop a fundamental theory of physics it seems as if … Continue reading The Empirical Metamathematics of Euclid and Beyond — Stephen Wolfram Blog
My guest talk at Automattic. Boris Gorelik recently joined us to present on The Biggest Missed Opportunity in Data Visualization based on his recent talk at the NDR conference. Boris was a data scientist at Automattic, is now a data science consultant, and blogs regularly on data visualization and productivity. Some of highlights (along with … Continue reading Boris Gorelik on the biggest missed opportunity in data visualization — Data for Breakfast
Next time you wonder why your Israeli colleague, customer or partner barely works during October, recall this post
Will Cray [link] is a fresh M.Sc. in Computer Science and considers becoming a freelancer in the Machine Learning / Artificial Intelligence / Data Science field. Will asked for advice on the LocallyOptimistic.com community Slack channel. Here’s will question (all the names in this post are used with people’s permissions). Read more career advices [here]. Let’s begin. … Continue reading Career advice. Becoming a freelancer immediately after finishing a masters degree
A population pyramid also called an “age-gender-pyramid”, is a graphical illustration that shows the distribution of various age groups in a population (typically that of a country or region of the world), which forms the shape of a pyramid when the population is growing [citation from Wikipedia]. In some cases, the pyramid provides interesting insights into … Continue reading Exploring alternatives to population pyramids
When the .blog TLD was started by Automattic, employees were given the option to reserve a domain for free. In return […], they asked that the domain be used as a primary domain (no forwarding to a different site), and that the site be updated with new content at least once a month. This requirement … Continue reading The Mysterious Status of .blog Domains
From time to time, we need to look at a distribution of a group of values. Histograms are, I think, the most popular way to visualize distributions. “Back in the old days,” when most of my work was done in the console, and when creating a plot from Python was required too many boilerplate code … Continue reading ASCII histograms are quick, easy to use and to implement
This post contains a bunch of links to blogs that write about productivity. Musings of Brown Girls This is not an exclusively productivity blog. The authors of this collective effort write about other interesting things. I read some posts, and I liked them 2. Self care Do you know that feeling when you feel bad … Continue reading A short compilation of productivity blog posts
An interesting post by my former coworker, Yanir Seroussi. Previously, I encouraged readers to test different approaches to bootstrapped confidence interval (CI) estimation. Such testing can done by relying on the definition of CIs: Given an infinite number of independent samples from the same population, we expect a ci_level CI to contain the population parameter … Continue reading Many is not enough: Counting simulations to bootstrap the right way — Yanir Seroussi
TL;DR If you are an Israeli and don’t feel like learning the behind the scenes stories, skip it. Otherwise, I do recommend reading this book. I enjoyed it a lot 4.5/5 The Abyss: Bridging the Divide between Israel and the Arab World went to print slightly after the outbreak of the “Arab Spring.” The author, … Continue reading Book review: The Abyss: Bridging the Divide between Israel and the Arab World
What is the biggest problem of the Jet and Rainbow color maps, and why is it not as evil as I thought?
There was a consensus among the data visualization purists that the rainbow color map, and it’s close cousin Jet are bad. Really bad. These colormaps used to be popular at the beginning of the computational data visualization era. However, their popularity decreased in the last five years or so. The sentiment isn’t as bad as … Continue reading What is the biggest problem of the Jet and Rainbow color maps, and why is it not as evil as I thought?
Many people know me as a data scientist. However, I also teach, which is sort of unnoticed to many of my friends and colleagues. I created a page dedicated to my teaching activity. Talk to me if you want to organize a course or a workshop. I also highly recommend teaching as way of learning. … Continue reading If you don’t teach yet, start! It will make you a better professional.
In technical communication, the main thing is to keep the main thing the main thing. There are multiple ways to ensure this principle. There is one tool that is easy to master, fast to apply, and that provides a high return on the investment rate.
I will be talking about data visualization at the next NDR conference on July 28. All the conferences organized by the NDR team are well organized and of a very high value. I hope to keep the level high. And here’s the brief description of my talk See you
If you plan working data analysis or processing, read the excellent post in the “stats with cats blog” titled “35 Ways Data Go Bad” post. I did experience each and every one of the 35 problems. However, this list is far from being complete. One should add the comprehensive list of Falsehoods Programmers Believe About … Continue reading 35 (and more) Ways Data Go Bad — Stats With Cats Blog
It has been about half a year after I became a freelance data scientist. Before my career change, I worked in a distributed team for more than five years. Today, I suddenly realized that working in a distributed team has a significant problem, inherent to its distributed, multinational, nature. My team was always spread over … Continue reading Unexpected hitch of working in a distributed team
Here’s a neat method that helps me organize my week, increase my productivity and fight procrastination.
Even good graphs have place for improvement. Follow this post to
Look at this wonderful piece of data visualization (taken from here). If you know the terms “tertiary structure” and “glycan”, there is NO way you miss the message that the author of this figure wanted to convey. Also, note how using appropriate colors in the title, the authors got rid of graph legend.
Here’s an appealing ad that I saw How to become a Python professional in 42 hours? I’ll tell you how. There is no way. I don’t know any field of knowledge in which one can become professional after 42 hours. Certainly not Python. Not even after 42 days. Maybe after 42 weeks if that’s mostly … Continue reading How to become a Python professional in 42 hours?
I’m honored to take part in standardizing bidirectional language support in interfaces and visualization, as a part of an expert group formed for the Hebrew Support in Computerized Systems Committee at the SII-the standards institution of Israel. The Committee is led by Gilad Almosnino. Below is Gilad’s project announcement.
TL;DR Good motivation to improve communication. Inadequate source of information on how to achieve that The central premise of Five Stars Communication Secrets to Get from Good to Great by Carmine Gallo is that professionals who don’t invest in communication skills are at high risk of being replaced by computers and robots. One of the … Continue reading Book review. Five Stars by Carmine Gallo
I’m reading the a 1991 paper by Barbara Tversky that deals with the directional representation of time. One sentence in the paper interview says “There does not seem to be strong universal cognitive associations of quantity or quality to left or right” Whenever I make a similar statement in the context of data visualization, I … Continue reading The delicate art of fine trolling
It’s fun to look at the visit statistics and to discover old stories. I wrote this post in 2016. For a reason I don’t know, this post has been one of the most viewed posts in my blogs during the last week. So, I decided to publish it again. I won’t add any new examples, … Continue reading Lie factor in ad graphs
Network (graph) analysis is a complicated topic. There are several tools available for this task with different pros and cons. Recently, I stumbled upon another tool StellarGraph. StellarGraph authors claim to provide excellent performance; NumPy, Pandas, TensorFlow integration, an impressive set of algorithms, inter compatibility with Neo4j (THE graph database); and much more. The documentation looks … Continue reading StellarGraph — another promising network analysis library for Python and Scala
On balance between specialization and the risk to become obsolete.
Network visualization can mesmerize and hypnotize. Chord diagrams are especially cool because they are so colorful and smooth. The problem is that sometimes, the result doesn’t provide any actual value, and serves as a cute illustration. Cute illustrations are cute; they help put some “easiness” to the text without the risk of looking too unprofessional. … Continue reading Nice but useless data visualization
When I was in elementary school (back in the USSR of the mid 80’s), I had a friend whose father was a shoemaker. Due to the crazy stupid way the Soviet economy worked, a Soviet shoemaker was much richer than a physician or an engineer. But this is not the story. The story is that … Continue reading Bioinformatics career advice and a story about a Soviet shoemaker
Interview on leadership. The difference between statistically meaningful and practically meaningful;Giving credit, being decent and not cheating;
All good teamwork starts with effective communication;
You don’t know that the stuff that you know is unknown to others;
Is Distributed Work a Divide and Conquer Strategy?
Being a data scientist and a self-proclaimed data visualization expert, I like using log scale graphs when I find them appropriate. However, as a speaker and a communicator, I refrain from using them in presentations as much as possible. From my experience as a data visualization lecturer, I noticed that even “technical” struggle grasping the concept of log scale graphs.
Book review: The Year Without Pants. WordPress.com and the future of work. Read it if history of work is your thing, or if you work in a small company that grows rapidly
“Why it burns when you P” and other statistics rants
Besides being a freelancer data scientist and visualization expert, I teach. One of the toughest concepts to teach and to visualize is odds ratio. Today, I stumbled upon a very interesting post that deals exactly with that
Did you know that J.K. Rowling, the author of Harry Potter, submitted her books 13 times before it was accepted? So what?
COVID-19 vs. influenza dataviz. The order is now correct
On a person that falls into the water. Or why thinking short-time is a good strategy in times of crisis
One day or another, we will all need to act very fast. This means that we need to be prepared, have plan B’s work on resilience, and maybe perform emergency drills.
It is correct that the colors that IBM people used in their guide are neat, but data visualization that distorts information is not visualization but a piece of garbage. I assume that IBM produces decent computers, but don’t learn data visualization from them
Originally posted on Boris Gorelik:
In many cases, attempts to set a deadline to a data science project result in a complete fiasco. Why is that? Why, in many software projects, managers can have a reasonable time estimate for the completion but in most data science projects they can’t? The key points to answer this…
NDR is a family of machine learning/data science conferences. Their next conference will be held online on May, 28 and the agenda looks great. Now, I’m not super objective here, because I’m presenting at NDR July event. But look at the topics, what an impressive selection!
Finally We May Have a Path to the Fundamental Theory of Physics… and It’s Beautiful — Stephen Wolfram Blog
OK, so Stephen Wolfram (a mega celebrity in the computational intelligence world and, among other things a physicist) claims that he may have found a path to the Fundamental Theory of Physics. The blog post is long, and I hope to be able to finish reading it in a week or two. The accompanying technical … Continue reading Finally We May Have a Path to the Fundamental Theory of Physics… and It’s Beautiful — Stephen Wolfram Blog
The quintessence of data visualization usefulness. These graphs are SOOOO good and convincing.
Never Split the Difference. A negotiation book that you might want to read. A book review.
Today, Israel marks Holocaust Day. Many words have been written about the Holocaust, and I want to write about missing graves.If you visit a Jewish cemetery, you might see a lot of gravestones with additional memorial plates. I took this picture in the Chișinău (Kishinev) Jewish cemetery. Burial of the deceased is considered the final … Continue reading The missing graves
Constance Crozier (@clcrozier) shared an interesting simulation in which she tried to fit a sigmoid curve (s-curve) to predict a plateau in a time-series. It took me a while to find the reference for a paper that explains why.
My colleague, Simon Ouderkik, recorded a REALLY interesting interview with Stephen Levin of Zapier and Emilie Schario of Gitlab on organizing data org in a company, job titles, career ladders, and other important stuff.
If there is only one document you can read about data visualization, this is the one
I wrote about data giraffes two weeks ago. Usually, “data giraffes” are a problem and we need to work hard in order to solve it. Sometimes, they are a useful feature. Take a look at this NYT front page that shows the number of new unemployment applications in the United States over the time And … Continue reading Data giraffe is sometimes a feature, not a problem
My job wasn’t affected by the COVID madness in almost any way. I used to work from home before, and I work from home now, none on my customers cancelled any projects, the health system in Israel is still functioning, all of my relatives are in good health, everything is just fine! I know how … Continue reading Everything is NOT just fine (repost)
More than two years ago, I took a look at Google Trends for three phrases “start a blog”, “create a blog”, and “create a site”. I was surprised by the high volume of blog searches, compared to “create a site”. Today, I decided to go back to Google Trends and to add the new rising … Continue reading Blogging isn’t what it used to be. Podcasting is on the rise
A super-important read on the COVID-19 situation. I’m finally convinced
Data scientist? Thinking of working in a distributed company? The team at Automattic in which I used to work is looking for a Machine Learning specialist. It’s an awesome team. Give it a try https://automattic.com/work-with-us/machine-learning-engineer/
Make the personal meeting personal, even if it’s remote.
An interesting solution of the data giraffe problem
COVID-19 vs. influenza dataviz (an update)
Here’s another email that I got with the question about switching to the data science career
I suppose that you knot that THE software developement Q&A site has its own job board. I suspected that the Corona pandemic would lead to a sharp decrease in the number of job postings on that board. I scraped the data, and it looks like for now, there are no drastic changes in the amount of postings published in the last couple of days.
The cardiovascular safety of antiobesity drugs—analysis of signals in the FDA Adverse Event Report System Database
I am glad and proud to announce that a paper which I helped to prepare and publish is available on the Nature’s group site. The paper, The cardiovascular safety of antiobesity drugs—analysis of signals in the FDA Adverse Event Report System Database, by Einat Gorelik et al. (including myself) analyzes the data in the FDA Adverse … Continue reading The cardiovascular safety of antiobesity drugs—analysis of signals in the FDA Adverse Event Report System Database
Please leave a comment to this post. It doesn’t matter what, it can be a simple Hi or an interesting link. It doesn’t matter when or where you see it. I want to see how many real people are actually reading this blog.
Before becoming a freelancer data scientist, I used to work in a distributed company. Remote communication, including remote presentations were the norm for me, long before the remote work experiment no one asked for. In this post, I share some tips for delivering better presentations remotely. Stand up! Usually, we stand up when we present … Continue reading Tips for making remote presentations
Originally posted on בוריס גורליק:
תרשים עוגה כחלופה הולמת לגרף עמודות במהלך חיי המקצועיים שמעתי רבות בגנות תרשימי עוגה. הסיבה לכך נעוצה בעובדה שקל מאוד לייצר זוועות עם תרשימים אלו. לא עזרה העובדה שבמשך המון זמן ברירת המחדל של תרשימי עוגה, בכל כלי ההדמיה העיקריים, ייצרה תרשימים מעוותים לגמרי. מצדדי החרם על תרשים עוגה מציעים את גרף…
“One idea per slide” means one idea per slide. The simplest way to enforce this rule is to devote one slide per a sentence. Remember, adding slides is free, the audience attention is not.
Graph code: here.
Being a data science freelancer, and a long-time AnnMaria’s fan, I HAVE to repost here latest post on consulting success
People ask me for good intro video to data visualization. I tend to ask them to look for one of my lectures. To save the search, here’s one of the most relevant talks that I gave This lecture was a part of 2018 EuroScipy conference, where I also ran a workshop.
Career advice. A clinical pharmacist, epidemiologist, and a Ph.D. student wants to become a data scientist.
From time to time, I get emails from people who seek advice in their career paths. This time, I got an email from a clinical pharmacist and a Ph.D student
Being a freelancer data scientist, I get to talk to people about proposals that don’t materialize into projects.
I can’t elaborate yet, but in case you wondered how scientific satisfaction looks like, here’s a perfect illustration. Stay tuned
Gilad Almosnino is an internationalization expert. I’m reading his post “Eight emojis that will create a more inclusive experience for Middle Eastern markets,” in which he mentions “Turkish or Arabic Coffee,” which reminded me of my last visit to Athens. When, in one restaurant, I asked for a Turkish coffee, the waiter looked at me harshly and … Continue reading Which coffee is this?
Do you believe in telepathy? Yesterday, I submitted final proofs of a paper in which I actively participated. During the proofreading, I noticed that our abstract ends with “further research is needed” and scratched my head. I submitted the proofs and then then, I saw this pearl in my blog feed
TL;DR shallow and disappointing The Great Mental Models by Shane Parrish was highly praised by Automattic’s CEO Matt Mullenweg. Since I appreciate Matt’s opinion a lot, I decided to buy the book. I read it and was disappointed. This book is very ambitious but yet shallow and non-engaging. If you consider reading a book on … Continue reading Book review: Great mental models by Shane Parrish
Which data scientists can refuse more computing power? None. My collection of computing devices has a new addition a Soviet arithmometer Felix M.
TicToc — a flexible and straightforward stopwatch library for Python.
Why it is OK to have a loud argument with your co-workers.
The difference between python decorators and inheritance that cost me three hours of hair-pulling
In playing cards, the Queen is worth less than the King? Is it time for a change? #gender-equality
Originally posted on Akshay Budhkar:
? Introduction I was fascinated by Zipf’s Law when I came across it on a VSauce video. It is an empirical law that states that the frequency of occurrence of a word in a large text corpus is inversely proportional to its rank in its frequency table. The frequency distribution…
A great piece of advice from an experienced freelance consultant
“Replay” by Ken Grimwood is an excellent fiction reading. Here’s my review
From time to time, we need to look at the distribution of a group of values. Histograms are, I think, the most popular way to visualize distributions. “Back in the old days,” when we did most of our work in the console, and when creating a plot from Python required too many boilerplate code lines, … Continue reading ASCII histograms are quick, easy to use and to implement
Some people, in face of important changes visit tombs of the righteous for a blessing. I went to see WEIZAC — Israel’s first computer (and one of the first ones in the world) that was built in 1955.
I got a dream job at one of the biggest distributed companies in the world, almost by chance. It was an excellent experience, but it’s time for a change.
If you read my shortish post about staying employable as a data scientist, you might like a longer post by a colleague, Yanir Seroussi. In his post, Yanir lists four possible paths for a data scientist. To his list, I add two other options.
I received an email from a pharmacist who considers becoming a data scientist. Since this is not a first (or last) similar email that I receive, I think others will find this message exchange interesting.
On November 7, 2016, I started an experiment in personal productivity. I decided to use a notebook for thirty days to manage all of my tasks. The thirty days ended more than three years ago, and I still use notebooks to manage myself.
Don’t we all like a good contradiction? On gut feelings.
Is data science immune to becoming obsolete? I claim it is not. As time passes by, tools become stronger, smarter, and faster. To stay relevant, we need to be in a constant movement.
Yes, ML transparency opens opportunities for hacking and abuse. However, this is EXACTLY the reason why such openness is needed. Hacking attempts will not disappear with transparency removal; they will be harder to defend.
Last year I talked at NDR Iasi. I enjoyed that so much and when Vlad Iliescu, one of the NDR organizers, asked me to present at NDR Bucharest in June, I didn’t think twice.
TL;DR: a nice popular science book that covers many aspects of the modern science A Short History of Nearly Everything by Bill Bryson is a popular science book. I didn’t learn anything fundamental out of this book, but it was worth reading. I was particularly impressed by the intrigues, lies, and manipulations behind so many … Continue reading Book review. A Short History of Nearly Everything by Bill Bryson
Yesterday, a new episode was published in the Popcorn podcast, where the host, Lior Frenkel, interviewed me. Everyone who knows me knows how much I love talking about myself and what I do. I definitely used this opportunity to talk about the world of data. Some people who listened to this episode told me that … Continue reading Cow shit, virtual patient, big data, and the future of the human species
Data visualization as an engineering task – a methodological approach towards creating effective data visualization
Data visualization as an engineering task – a methodological approach towards creating effective data visualization
Combining getting things done with a tangible Kanban method
Knowledge graphs and NLP — a conference summary
What are some Data science tools with a graphical user interface?
On differences in communication styles when working in a distributed company.