Bootstrapping the right way?

Many years ago, I terribly overfit a model which caused losses of a lot of shekels (a LOT). It’s not that I wasn’t aware of the potential overfitting. I was. Among other things, I used several bootstrapping simulations. It turns out that I applied the bootstrapping in a wrong way. My particular problem was that I “forgot” about confounding parameters and that I “forgot” that peeping into the future is a bad thing.

Anyhow, Yanir Seroussi, my coworker data scientist, gave a very good talk on bootstrapping.

Yanir Seroussi

Bootstrapping the right way is a talk I gave earlier this year at the YOW! Data conference in Sydney. You can now watch the video of the talk and have a look through the slides. The content of the talk is similar to a post I published on bootstrapping pitfalls, with some additional simulations.

The main takeaways shared in the talk are:

  • Don’t compare single-sample confidence intervals by eye
  • Use enough resamples (15K?)
  • Use a solid bootstrapping package (e.g., Python ARCH)
  • Use the right bootstrap for the job
  • Consider going parametric Bayesian
  • Test all the things

Testing all the things typically requires writing code, which I did for the talk. You can browse through it in this notebook. The most interesting findings from my tests are summarised by the following figure.

Revenue confidence intervals

The figure shows how the accuracy of confidence interval estimation varies by algorithm, sample size…

View original post 405 more words

One thought on “Bootstrapping the right way?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s