Took 10 years longer and $100 million more to complete than initially planned... Oops!
Client 😎: “How much time do you need to build an API for this new model?”
You 😰: “Uhmmm, two days? Two weeks? Let me get back to you on that.”
Yeah. We’ve all been there. Whether it’s a client, our boss or the team’s scrum master asking it, we often get the question of how long a task will take.
Serving a machine learning model, fixing a data pipeline or building the Sydney Opera House; all these tasks require planning. However, we often find ourselves overshooting deadlines and overspending budgets, especially those we set ourselves. Why is that?
Setting up deadlines only to overshoot them by weeks, under-budgeting the project proposal and overestimating the ease of the road ahead. This familiar phenomenon is called the ‘planning fallacy’.
Daniel Kahnemann and Amos Tversky first coined the term when researching biases in judgements for the U.S. Department of Defence back in 1977. More recently, Kahnemann describes the symptoms of the planning fallacy as ‘plans and forecasts that are unrealistically close to best-case scenarios’.
I’m going to show you how the planning fallacy affects me and how I use Danish planning expert Bent Flyvbjerg’s method to avoid it.
In his book, Thinking Fast and Slow, Kahnemann explains how planning fallacy is the result of the ‘inside’ and ‘outside’ views being so different to each other.
The ‘inside view’ is based on the current facts at hand in your project. This includes the number of active team members and their seniority, tasks that the team had successfully completed and of course what the team plans to do next. To keep my example about the model API going; my inside view was that we had both worked with endpoints before and that the hard work of training the models was already done, all that was left was to serve them. Easy right?
People have a habit of extrapolating the duration of tasks yet to come based on those already completed. In my case, training a model seemed like the hard part of a project, so surely serving that same model should be a much shorter task?
The problem with using your ‘inside view’ to estimate future tasks is that it blinds you to Donald Rumsfeld’s ‘unknown unknowns’.
These are unpredictable events that are not accounted for during the planning phase and can seriously delay your work in the execution phase.
These ‘black swans’, as Nassim Nicholas Taleb likes to call them, often delay the time required to complete a task. In our case, we had never before deployed API endpoints on a custom provisioned server instead of the cloud. Suddenly our previous experience mattered a lot less.
Now let’s look at the Flyvbjerg’s three steps for leveraging the ‘outside view’ when planning a project.
The first step is to make sure you know exactly *what* you are planning. We like to substitute difficult questions with easier ones, according to Kahnemann’s attribute substitution theory.
Oftentimes when working on a machine learning problem, your task can consist of different tools, methods and protocols. When we were planning on serving models through an API, we could have chosen ‘APIs’, ‘ML model serving’ or even ‘ML in production’ as reference class. In our case the client specifically asked for a web-based API that could be accessed via endpoints on a server. Pretty straightforward, our reference class was ‘APIs deployed on a private server’.
If the reference class is not as clear-cut, you can try and focus on what tasks define your project. In our case it was important to use a specific tool (APIs) and deploy them on a specific architecture (private server). What if the client had asked for the endpoints to only use a GET method? Well that would have been an additional part of the reference class.
So now you know exactly what it is you’re estimating, time to gather some data! Uncertainty regarding task completion times can stem, among other things, from lack of prior experience and unpredictable events. Getting a base rate allows you to compensate for your lack of experience and be aware of possible unforeseen events.
First, decide on some metrics that can define the reference class. An example could be as simple as task duration, or as complicated as number of deployments to a development environment. The goal here is to choose a statistic that will give you an indication of how these tasks usually go.
For our project I decided to go with ‘number of weeks an ML engineer needs to deploy a model API’. Now there are plenty of details I’m leaving out, like seniority and experience of the engineer, but we’ll get to that in the next step.
Once you have a metric, you can start collecting data. Flyvbjerg had built himself a nice database containing different construction projects (reference class) and their expenditures exceeding budget (metric) to use as reference. Of course no-one systematically scours the web for cold hard facts about project management statistics in IT and software dev… or at least most people don’t. A useful way of collecting data is to go around and ask your colleagues or read blogs online about implementing ML projects.
In our case, the average time to deploy API endpoints was one working week. Some estimates that made up that average included a full suite of tests and AWS integration, other user stories had a team of 3 working to update some existing endpoints in a larger codebase. All of these estimates were useful in building a base rate.
A base rate is nice but as we saw earlier, not all projects are the same. Some maybe had an experienced senior grinding out code like there was no tomorrow, while others could have been slowed down by corporate bureaucracy.
Take a look at your own situation and find specific information that can make your base rate vary. In other words, adjust that average metric based on the unique circumstances of your task at hand. This adjustment can go either way. For example, most people I talked to and lots of the blogs I read online mentioned some kind of cloud infrastructure that already took care of the back-end networking. In our case, we were working on a private server with a layer of docker containers on top, so we had to adjust the average up slightly.
Be careful when making adjustments in your favour, as there is often a tendency to see the future (especially your own) through rose-colored glasses. Whatever you end up with as final value, it will be a much better and more informed estimate than if you were to disregard the outside view.
That’s it! Seems simple right? The most important part of this method is being honest, both with yourself and with the client or stakeholder.
That means that you can always say ‘I don’t know, let me get back to you on that’, when asked to estimate a task’s duration. We’re all human and no-one expects us to know everything all the time.
The good news is that not only have you estimated your task’s duration, you can also explain the reasons behind it. No more pure gut-based decision making!
A simple and effective method for estimating task durations. Seem like a robust framework right? Well the catch is that it only works when it is fed the right information.
Which means that the better your data for estimating the base rate is, the better your final estimation will be. My advice would be to take note of task durations and project circumstances, either your own or those of colleagues, as they happen. Build your very own database, just like Bent Flyvbjerg.
One way to cover ‘your six’, in case you lack confidence in the final estimation, is the good-old-fashioned method of adding a buffer. I’ve found that in general a buffer of 20% will offer you some breathing room. That means those 5 workdays turn into 6, giving you some space to underpromise and overdeliver.
A buffer of 20% will always give you some breathing room
Make an initial estimate, get a base rate and adjust for circumstances. That’s it. And if you’re ever in doubt then feel free to add a buffer of 20% on. Just in case.
Hope you enjoyed reading! Smack that follow button if you did 👨🌾
P.S. Leave a comment if you manage to apply this method to estimate some task in your daily grind. Would love to hear your stories.