Why is it (almost) impossible to set deadlines for data science projects?

I wrote this post in 2017. For some reason, it started gaining traffic in the last two weeks. I reviewed this post and couldn’t find any new insights. But maybe you can help me.

Boris Gorelik

In many cases, attempts to set a deadline to a data science project result in a complete fiasco. Why is that? Why, in many software projects, managers can have a reasonable time estimate for the completion but in most data science projects they can’t? The key points to answer this question are complexity and, to a greater extent, missing information. By “complexity” I don’t (only) mean the computational complexity. By “missing information” I don’t mean dirty data. Let us take a look at these two factors, one by one.

Complexity

Illustration: famous xkcd comic. Two programmers play during the compilation time
Think of this. Why most properly built bridges remain functional for decades and sometimes for centuries, while the rule in every non-trivial program is that “there is always another bug?”. I read this analogy in Joel Spolsky’s post written in 2001. The answer Joel provides is:

Once you’ve written a subroutine, you can call it as often as you…

View original post 665 more words

Published
Categorized as blog

By Boris Gorelik

Machine learning, data science and visualization http://gorelik.net.

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: