Risk is an inherent aspect of the pursuit of reward and few tasks involve one without the other. We encounter risk in everyday project activities and though most developers can say what it is they are working on (e.g., user stories) or when their work is complete (e.g., acceptance criteria, Sprint definition of done) few can articulate clearly how their work contributes towards the managing of risk. Agile risk management is about the identification, analysis, treatment and monitoring of risks in a manner consistent with the principles and practices of the agile manifesto. Central to this is the balance of risk and reward.
Consider a project that you have recently been involved with and try to identify what you consider to have been risks that you encountered. Perhaps you recall that the customer was not really that clear about what they actually wanted, that implementing a specific feature was technically very difficult or that the time schedule was tight. When we start to think about risks, we tend to worry mostly about what might go wrong and usually end up expressing risks in terms of their effects on the project. Agile risk management helps us to identify the source of risk in our projects and encourages us to see risk both in positive as well as negative terms.
The following picture describes an approach to agile risk management that suits Scrum teams using a Kanban board approach (also known as a Scrum-ban) in their work. We will return to each section of this diagram later during our discussion of agile risk management.
What is Risk?
The most succinct definition of project risk is "uncertainty that matters" (I particularly like this definition, which I attribute to Hillson, because it is both short and true!). In the project context, this refers to any uncertainty that has a positive or negative impact on one or more project tasks or objectives. Let me put this another way using an analogy: whilst the outcome of a horse race may well be uncertain, it only becomes a risk when you place a bet. As soon as you do this, the objective (of making money) is affected by the uncertainty which can result in you having more (positive risk) or less (negative risk) money after the race. Perhaps you are not used to hearing about "positive risk" or maybe you prefer to use phrases such as "upside risk" or "opportunity". Wherever your linguistic preferences may lie, the point to take away is that risk is not only about what might go wrong and that being in a position to mitigate (negative) risk or exploit (positive) risk is a source of competitive advantage that a risk-embracing agile stance offers.
Identifying risks is harder than you might imagine. The biggest problem is conflating uncertainties and effects. Let’s take an example: suppose that a website is going to be migrated from a physical to a virtual server. Take a moment to think about what the risks might be. Did the thought of the website not being available just pop into your head? It might surprise you to learn that this is not a risk but rather an effect of the migration being unsuccessful. To understand this you must ask the question: why might the website no longer be available? If you hear from the team that they are not sure how to configure the DNS or not certain if the virtual server has the same configuration as the physical server, then you are on the right track since you are encountering statements that express uncertainty. Thus a "what" describes an effect and a "why" describes an uncertainty or risk.
Of the numerous techniques for identifying risks, the one that I have found to work best for agile teams is precisely this what/why approach. During a risk workshop (which usually takes place during Sprint planning) ask the team members to brainstorm and write down "what" might happen. Once this list is complete and has been reviewed (e.g., elimination of duplicates), it is time to ask why each "what" might occur. A common approach here is to write down each "what" as the title on a separate blank page. Then pass the pages around in a round robin, inviting everyone to contribute their whys. But you may want to use other approaches such as writing the "whats" across the top of a whiteboard and asking everyone to contribute their "whys" under each "what". In the example above, the "what" item "website not available" might have the following whys "DNS might not be configured correctly" or "virtual server configuration might differ from that of the physical server". One final note: be careful not to frame the "what" question negatively (i.e., what might go wrong) as everyone in the team needs to be open to the possibility that there might be opportunities ripe for exploitation in the project.
Risk Analysis, Prioritization and Treatment
Once identified, risks should be logged and analyzed further to assess their likelihood and impact (together these are known as risk exposure) – generally in each case T-shirt sizing (i.e., small, medium and large) suffices. First we assess the risk as we first encounter it (this is called inherent risk) and later we will re-assess the risk in the light of what we intend to do about it (this is called residual risk). Risk assessment is difficult at the best of times so it worth asking why we even bother. There are two reasons: first, since not all risks are equal, we need a simple means of prioritizing them and second we need to understand exposure in order to determine how to treat risk (more on that later!). Another detail associated with risk assessment is the ability to score risks which will later become important when we come to risk monitoring. It is important though to understand the limitations of risk assessment techniques such as asking people (e.g., hidden agendas, confirmation bias), using past data (e.g., might not be indicative of future trends) or probability models (e.g., hidden assumptions) so whenever an assessment is made, it should be challenged.
Remember that we said that risk management is about the balance of risk and reward? Well, when we look at user stories on the backlog (which represent value), we must be able to assess their associated risk. It is a common misconception that high risk must imply high reward. In fact, what we should really be asking ourselves is whether or not the reward implied by a story or task warrants the level of risk it entails. Or in other words: can we not achieve the same level of reward but for less risk? This helps us to better prioritize user stories on the backlog as risk becomes an influencing factor in prioritization though it is never the primary determinant of ordering on the backlog.
When deciding what to do about risks it is common to select one of the following risk response strategies (which one you chose should be recorded in the risk log alongside the other risk attributes):
- Accept: Undertake no action to manage the risk, but instead have a contingency plan in place in the event that it should occur (e.g., do not implement code to protect against data loss but instead rely on backups).
- Exploit/Reduce: Enact measures to increase/decrease either the likelihood or the impact of the risk (e.g., implement an algorithm capable of processing a higher number of transactions per minute).
- Share/Transfer: Endeavour to share/transfer the risk to other parties in exchange for a share in the rewards or fee for assuming the risks (e.g., outsourcing of specific tasks to specialists).
- Avoid: Refrain from taking part in the task that gave rise to the risk (e.g., dropping a task entirely).
One of the biggest difficulties in choosing a strategy is that we often choose instinctively first and rationalize later. In fact, studies have shown that this is the area of risk management that is most susceptible to social and cultural influence. We can, however, use risk exposure to help to select an appropriate strategy using the following chart.
Since we are sometimes not quite sure how to assess a risk, it is fine to accept a range (e.g., the blue bar in the above diagram) and use this as the basis for discussion. Remember, the point of risk assessment is to prioritize risks (thereby contributing towards backlog prioritization) and to determine how to go about managing them.
Once we understand how we want to tackle a risk the next step is to determine what this means in practical terms i.e., how do we treat the risk. The following are our options:
- Do nothing (but plan): We accept the risk might occur and think about what we would have to do if it actually occured. This becomes an optional task on our Kanban which might never be needed.
- Risk Task: Create a task that deals with the risk (e.g., exploit it, reduce, share it or transfer it). These tasks are just like another other task in our Sprint planning though I recommend that they be colour coded (e.g., red for reduction, green for exploitation) so that we can later visualize the distribution of risk in our Kanban.
- Risk Tag: This refers to the selection of an agile technique specifically chosen to cater for a risk (e.g., pair programming) that is applied to a class of activities (e.g., all GUI related tasks). We then look over all of our tasks "tagging" them with a reminder to engage in that technique when we come to doing the task.
Risk monitoring provides us with feedback on how we are managing risk in a project and reminds us of the fact that we can never entirely eliminate it.
Risk monitoring requires that we have already assigned scores to our risk assessments both before (i.e., inherent) and after (i.e., residual) risk treatment. For example, take a look at the "risk rainbow" diagram where we mapped risk exposures to risk strategies. We might assign a score of two to the inner circle, four to the middle and six to the outer one. Once we have assessed both the inherent and the residual risks (e.g., a score of six that reduces to a score of two after the risk treatment has been applied) then it is the difference between these two (i.e., four) that we deduct from our total risk score once the task is complete. In this way, we have the basis for a risk burndown chart that measures our progress towards managing risk.
Two things are important to bear in mind. The first is that we can never entirely eliminate risk (e.g., there will always be residual risk, risk arising from accept strategies) so our risk burndown will have a systemic level of risk for the Sprint. The second point is that sometimes new risks arise from the way we treat risk (these are referred to as secondary risks) which might cause the risk burndown to rise during the Sprint.
I hope this article has given you an idea of how risk management can be done in Scrum projects. There are many subtlies to risk management which we have not covered here in detail but you can find more information at our website. We welcome your involvement in research and related activities.
The Institute for Agile Risk Management (IARM) is a Swiss based institution that exists to promote the principles and practices of agile risk management in the context of agile project management and the agile enterprise. We are primarily engaged in research and training activities and operate chiefly through our network of third parties in the agile and academic communities as well as in the private sector.