Tag: statistics
-
Dual axis with shared x-axis — a much better way
I recently came across a financial update with a plot that looks like the one below: a bar plot of revenue with a line plot of growth rate. The growth rate is on a secondary y-axis, and the x-axis is shared between the two plots. This type of plots are very common in financial reports, and they are often used to show the relationship between two variables that have different scales. However, they are also often criticized for being misleading, as the two variables are not directly comparable. In this post, I will show a better way to visualize this type of data. But first, what’s wrong with the plot below?
-
35 (and more) Ways Data Go Bad — Stats With Cats Blog
If you plan working data analysis or processing, read the excellent post in the “stats with cats blog” titled “35 Ways Data Go Bad” post. I did experience each and every one of the 35 problems. However, this list is far from being complete. One should add the comprehensive list of Falsehoods Programmers Believe About Time.
-
"Why it burns when you P" and other statistics rants
Do you sometimes Google for something only to find stuff written by yourself?
I teach a course called “data-based decision making.” While googling for examples of statistics misuse, I stumbled upon an interesting blog post that I wrote about one and a half years ago. -
Against A/B tests
Traditional A/B testsing rests on a fundamentally flawed premise. Most of the time, version A will be better for some subgroups, and version B will be better for others. Choosing either A or B is inherentlyinferior to choosing a targeted mix of A and B.
Michael Kaminsky locallyoptimistic.com
-
Can error correction cause more error? (The answer is yes)
This is an interesting thought experiment. Suppose that you have some appliance that acts in a normally distributed way. For example, a nerf gun. Let’s say now that you aim and fire the gun. What happens if you miss by some amount of X? Should you correct your aim in the opposite direction? My intuition says “yes.” So does the intuition of many other people with whom I talked about this problem. However, when we start thinking about this problem, we realize that the intuition is wrong. Since we aim the gun, our assumption should be that the deviation is zero. A single observation is not sufficient to reject this assumption. By continually adjusting the data generating process based on a single observation, we reduce the precision (increase the dispersion).
Below is a simulation of adjusted and non-adjusted processes (the code is here). The broader spread of the adjusted data (blue line) is evident. -
Mammogram, breast cancer, and manipulative statistics
Here’s a quiz
-
Overfitting reading list
Overfitting is a situation in which a model accurately describes some data but not the phenomenon that generates that data. Overfitting was a huge problem in the good old times, where each data point was expensive, and researchers operated on datasets that could fit a single A4 sheet of paper. Today, with mega- giga- and tera-bytes datasets, overfitting is … still a problem. A very painful one. Following is a short reading list on overfitting.
-
Why deeply caring about the analysis isn't always a good thing?
Does Caring About the Analysis Matter?
-
On statistics and democracy, or why exposing a fraud may mean nothing
“stat” in the word “statistics” means “state”, as in “government/sovereignty”. Statistics was born as a state effort to use data to rule a country. Even today, every country I know has its own statistics authority. For many years, many governments, have been hiding the true statistics from the public, under the assumption that knowledge means power. I was reminded of this after reading Charles Earl’s (my teammate) post “Mathematicians, rock the vote!”, in which he encourages mathematicians to fight gerrymandering. Gerrymandering is a dubious practice in the American voting system, where a regulatory body forms voting districts in such a way that the party that appointed that body has the highest chance to win. Citing Charles:
-
The Monty Hall Problem simulator
A couple of days ago, I told to my oldest daughter about the Monty Hall problem, the famous probability puzzle with a counter-intuitive solution. My daughter didn’t believe me. Even when I told her all about the probabilities, the added information, and the other stuff, she still couldn’t “feel” it. I looked for an online simulator and couldn’t find anything that I liked. So, I decided to create a simulation Jupyter notebook.
-
What are the best practices in planning & interpreting A/B tests?
Compiled by my teammate Yanir Serourssi, the following is a reading list an A/B tests that you should read even if you don’t plan to perform an A/B test anytime soon. The list is Yanir’s. The reviews are mine. Collective intelligence in action :-)