Pain-Killers Can Be Addictive

Dave Nicolette, January 26, 2009

I'm going to break my pattern and just state the point I'm trying to make at the outset instead of sneaking up on it from behind through a series of half-baked analogies: If we "solve" a problem by painting over it rather than digging down to the root cause, we may institutionalize and "lock in" sub-optimal processes and practices, thus making it more difficult to achieve improvements of larger scope in future.

There are consultants who specialize in process improvement who claim they can help organizations achieve 3x to 4x improvement in software development effectiveness in a matter of months. It sounds boastful, but it isn't. I've seen it happen and I've been involved in initiatives like that myself. In fact, it isn't as impressive as it may sound initially. Most IT organizations are so deeply dysfunctional that one can gain 3x to 4x improvement just by picking the low-hanging fruit, as the saying goes. By introducing a few concepts from systems thinking, theory of constraints / lean manufacturing, and agile software development, it's easy to clear away some of the most obvious obstructions to effective delivery. All of that is good news. The bad news is most organizations that undertake this kind of initiative stop as soon as they've got all the low-hanging fruit in hand. The potential exists to gain 10x improvement or more over the original state. I've seen and worked on that level of improvement, as well. Why, then, does improvement tend to stabilize at the 3x to 4x level in most cases?

One explanation might be that people tend to emphasize tactical thinking over strategic thinking. They look for quick gains, and to achieve gains quickly they may destroy the groundwork necessary for larger strategic gains. Previously, they had been armed only with conventional tools and methods. Improvements on the order of 3x to 4x look pretty phenomenal when you've been stuck in the same rut for 20 to 30 years. People are so thrilled to see visible improvements occur that they want to institutionalize the changes as rapidly as possible. The trouble is that some of the improvements may have been accomplished through quick and dirty means; putting band-aids on top of organizational issues rather than solving the underlying problems. Sure, things are better than they used to be...a lot better. Nothing wrong with that. But there's still room for improvement. So, why stop?

I'd like to explore that question by by examining elements of software development that we consider "norms," identifying challenges that organizations are having in implementing these norms, seeing what they've done to meet the challenges, and then asking whether they have really solved the root cause of the problem or if they have merely implemented a tactical work-around for the problem. I call the challenges pain and the tactical work-arounds pain-killers. When the work-arounds become institutionalized and permanent, I think of it as a form of addiction. Just as the human body can become acclimated to pain-killers such that one experiences discomfort without the drugs, an organization can become acclimated to tactical work-arounds such that people start to think of the work-arounds as the "right way" to do things. The farther an organization goes down that path, the harder it is for them to get clean.

How can you tell whether an organization might be addicted to pain-killers? A clear sign of addition is the attitude, sometimes expressed verbally and sometimes not, that "ideally (or in theory or according to the book) [this] should happen, but as a practical matter [that] has to happen instead, because [this] cannot happen in The Real World."


The addiction cycle

I find it useful to apply some terminology from lean manufacturing to software development processes. In particular, I appreciate the concepts of value, the value stream, and the notion of waste as either unavoidable or avoidable. The Japanese word for waste in this context is muda, and in the seminal Lean Thinking book authors Womack and Jones call unavoidable waste Type 1 muda and avoidable waste Type 2 muda. When an organization is stuck with a sub-optimal process they can't eliminate without interfering with value delivery, it's Type 1 muda. What we'd like to do is identify these instances and convert the Type 1 muda into Type 2 so that we can dispense with it.

Let's walk through a couple of obvious examples. Then, if you find the approach useful, you can apply it to other situations. Conversely, if you don't find the approach useful, you can suggest improvements. (Bear in mind that what I consider "norms" are norms in the context of state-of-the-art software development methods. I'm not talking about traditional methods. You can't really achieve 3x improvement in delivery effectiveness using those because they consist mostly of non-value-add overhead activities. Traditional methods are, in fact, the very rut I mentioned earlier, and you can't climb out of a rut by sitting down in the bottom of it and chanting, "There's no place like home! There's no place like home!")

Obvious Example #1: Extended User Stories

Norm: User Stories
The canonical software specification document is the User Story. It comprises a brief description of an atomically-testable chunk of functionality as seen from the perspective of a system actor, and a short list of examples of expected behavior that can be used as acceptance test criteria. Ideally, this is the only functional specification the development team needs to implement the application properly.

Pain: When our teams build software based only on User Stories, they overlook critical information and the results are unacceptable.

Observed symptoms:

  1. Software doesn't support organizational standards, usability guidelines, regulatory or audit requirements, performance requirements, architectural standards, branding guidelines, or other non-functional requirements.
  2. Software doesn't handle boundary conditions or expected exceptions gracefully, or handle them at all.

Mis-diagnosis: User Stories are inherently inadequate as functional specifications.

Pain-killer: Extend the User Story format to include all information relevant to delivering acceptable results.

Tactical benefit: Teams deliver software that meets all requirements, not just those that are stated in the form of User Stories.

Strategic consequences: The development/delivery process includes muda in the form of duplication (same requirements specified multiple times).

This is a situation I've seen many times, and I expect you've seen it just as many times, as well. Because it's so commonplace, I thought it would make a good example to illustrate the approach. I've seen extended User Story formats used in several companies, with names like Agile Use Case, Operation Contract, or Story Narrative. The formatting varies from company to company, but the extended content usually consists of the same general kinds of information: Non-functional requirements; company standards for UI layout, branding, internationalization, accessibility, etc.; compliance with regulatory requirements; adherence to enterprise architectural standards; coding standards...in other words, everything other than the basic functional requirements. It's the URPS+ in FURPS+.

You might argue that the information is necessary to achieve an acceptable result. You'd be right. The problem isn't that the information is superfluous. The problem is that the information is the same for every User Story. Therefore, when it is repeated in multiple documents, the repetition is a form of muda. To be fair, most of the organizations where I've seen people take this approach recognize that repetition is waste. They deal with it by documenting the standard information in a central location and including references to that location in all the extended User Story documents. This is better than repeating all the information, but it's still a form of repetition.

The reason I say it's a mis-diagnosis to fault the User Story for the omissions is that it isn't the purpose of a User Story to capture every requirement of every kind. In these cases, the real problem is that developers tend to assume they're supposed to implement exactly what the User Story tells them to implement, and nothing more. That's a basic misunderstanding of their role in the software development process. Developers are expected to be software engineering professionals. That means we have a right to expect certain things of them: They will learn and apply fundamental software engineering principles; they will routinely test for boundary conditions and expected exceptions; they will seek out and incorporate the organization's standards, guidelines, and non-functional requirements on their own initiative; they will not expect business people to request arcane technical functionality explicitly; etc.

My observation is that most developers who are new to contemporary methods bring with them a certain amount of conceptual baggage from the traditional methods they have practiced over the years. One part of that baggage is the assumption that the "specifications" are, literally, instructions for the developers to follow. With contemporary methods, developers are expected to bring expertise to the table. It shouldn't be necessary to give them instructions on how to write clean, robust, functional code. If that were the case, developers would not be professionals at all, but merely typists. (It's possible that some developers aspire to be typists; in those cases, we should adjust their salaries accordingly.)

The root problem is not addressed by extending the User Story, hoping we don't forget to tell the developers anything important, and then hoping they actually read it all. The root problem is addressed by education, mentoring, and continuous improvement of the technical staff.

A second cause of the problem in some organizations is that the common information about non-functional requirements and enterprise standards is not available in a central, easy-to-find location. Many teams are eager to apply their expertise in a comprehensive way, but they just can't find the information they need, so they guess at the non-functionals. When the process includes so-called Quality Gates (also known as Blame Avoidance Theater), specialized groups from remote parts of the organization suddenly come out of the woodwork for the first time in the life of the project to complain loudly that the team has failed to follow some "well-known" enterprise standard. In those cases, part of the solution is to collect, document, and centralize the information the development teams require to deliver good results.

In a nutshell, then, the information has to be available to developers, and developers have to understand it is their responsibility, not their customers' and not the QA department's, to know what the non-functional requirements are, to write tests for them, and to ensure their applications support them. Once those things are achieved, it will be feasible to dispense with the extended User Story format. It will have been converted from Type 1 (unavoidable waste due to an organizational constraint) to Type 2 (totally unnecessary) muda. Any time we can convert Type 1 muda into Type 2 and eliminate it, it's a strategic gain rather than just a tactical gain.

Obvious Example #2: Mini-waterfall process within the team

Norm: Team of generalizing specialists
Frankly, I thought twice about including this as a "norm." Whenever the idea of generalizing specialists is brought up, someone within earshot takes the opportunity to fly off on a tangent about how indispensable specialists are to the success of any project. They usually beat that poor straw man to a pulp before anyone can get a word in edgewise. It saddens me to witness the pointless, brutal beating of an innocent bystander, even if he's only made of straw. So, let's be very clear about context before we proceed. There are many different types of work that involve IT expertise. Corporate IT organizations perform a wide range of services for the companies they support. One — just one — of those services is application software development. That is the context in which the generalizing specialist is the preferred type of individual for a project team.

Business application development teams don't have to redefine the enterprise technical infrastructure, introduce new shared technical assets, create an enterprise service bus, or craft high-performance custom database management systems. They just have to write business applications that plug into the company's existing technical infrastructure. They don't define enterprise standards, they follow enterprise standards. To that end, a reasonably skilled professional software engineer possesses enough expertise in closely-related areas (database technologies, network technologies, security, UI design, etc.) to carry out the task of building a business application. It isn't necessary (or cost-effective, for that matter) to include all manner of narrow-and-deep experts on every application development team. Application development teams rarely get into a situation that requires arcane Wizard-level expertise in any given technology. Most of the heavyweight stuff is going to be hidden behind some SOA-style interface layer, anyway, if we're talking about enterprise-scale work (and we are).

Within that context, a team of generalizing specialists is preferable to a cross-functional team of individual experts because they need not queue up work and wait for specialists to find a time-slice whenever their expertise is needed. They can just go ahead and do whatever needs to be done at any given time during the project. Stories the developers believe to be complete need not wait in a queue for a testing specialist to vet them. Any other team members can do that work. When the team needs clarification of something from the customer, they need not wait for a business analyst to ask the question for them. They are able to interact effectively with customers themselves. When the team finds a database table needs a new column, they can just add it. They need not submit a formal request for five minutes of a DBA's time for something as trivial as that. And so it goes.

That said, it isn't enough simply to declare oneself a generalizing specialist. One must actually possess some skills above and beyond one's usual area of specialization. That isn't always the case, and thus...

Pain: We tried the generalizing specialist thing, but then tasks that used to be performed properly were being performed sloppily, leading to unacceptable results.

Observed symptoms:

  1. A lot of code that passes the team's internal acceptance tests causes problems in production.
  2. Customer needs and priorities aren't properly understood by the team, and they deliver code the customer can't use and doesn't want.
  3. There's rarely any end user documentation, and what there is of it tends to be techno-babble.
  4. Team members just do the tasks they enjoy and ignore everything else.

Mis-diagnosis: The whole "generalizing specialist" idea is nothing but a crock of doo-doo.

Pain-killer: Form teams of specialists and revert to a batch-and-queue system of hand-offs and work-in-process inventories inside the team.

Tactical benefit: Improvements in the quality of different kinds of tasks the team performs.

Strategic consequences: Locks in certain forms of waste (muda). At a minimum, these include inventory and waiting. They probably also include overproduction, since work items that pile up in front of a specialist must have been produced by another specialist upstream; overprocessing, since each specialist in the series must repeat a bit of the work that was done before in order to understand the status of the incomplete work item; and people's talents, since anyone on the team probably could have done the "specialized" work at the moment it was ready to be done. The net results are increased costs and increased lead times.

When I've seen this situation, it's been for one (or more) of four reasons that seem to come up time and time again:

First, it may simply be that people who are accustomed to performing a certain set of tasks and who self-identify as a certain sort of specialist don't wish to take on other kinds of work. For example, most developers would be perfectly able to communicate effectively with customers about business needs and help them express the requirements of a business application; they just don't enjoy that sort of work. They could write documentation; they did complete high school, after all. They could write test scripts; it's an easier technical task than software engineering. The thing is, they would rather play with technology. So, they just avoid doing anything else. Other specialists may have the same attitude. A DBA may simply prefer to spend the day fine-tuning a high-performance database system than writing unit tests and application code; it isn't that he lacks the skill to perform those tasks. When a team that is supposed to comprise generalizing specialists includes team members with this attitude, the tasks no one is especially interested in performing are either done carelessly or ignored altogether. Hence the pain. The solution in this case is to ensure team members understand what contemporary software development processes demand of IT professionals. It's no longer okay to sit back and just do the few things you enjoy the most. You've got to be willing, as a colleague of mine likes to say, to "pick up a shovel" and do whatever the project needs done.

Second, it may be that team members are willing to try other forms of work, but they don't have a very good handle on how to approach other sorts of tasks. For example, it's a commonplace to hear test specialists assert that developers "cannot" test applications effectively; that they literally have no aptitude for it. I think that's an overstatement, but there is a grain of truth in it. Consider the famous quote (or is it just part of the lore of the profession?) from Donald Knuth, reputedly from source code comments in one of his programs: "Beware of bugs in the above code. I have only proved it correct, not tested it." Many developers have the same mentality about testing their code: Their goal is to prove the code correct. That's certainly a necessary step in producing an application, but what about testing with the goal of breaking the code? If developers can cultivate the ability to set their brains in that mode, then they can learn to perform the work of testers. Some testing has the purpose of proving the code correct, and some has the purpose of discovering the code's limitations, but the two tasks are not so different that they require two different sub-professions. The solution in this case is usually a combination of training and mentoring. There are some tricks to the testing trade that developers need to learn if they are to do an effective job at that task. It's a shorter learning curve than a tester would have in learning software engineering principles, so it makes sense for developers to pick up testing skills in becoming generalizing specialists. (I have seen testers acquire development skills, too, but it's less common and it's a steeper learning curve for them.)

Third, it may be that some individuals with many years invested in honing professional skills (or at least building paper credentials) in a given area of specialization are simply worried that they will not be able to thrive in a world that demands a broader range of skills from them. In certain individual cases, their fears may be fully justified. For instance, I've worked with business analysts who didn't really know how to analyze anything. They simply wrote down whatever the customers told them, and then handed the document off to the developers. Much of what they wrote was nonsense, and we had to go talk to the customers ourselves to sort it out. That type of person may not feel confident that he/she will be able to add new skills on such a weak foundation. The solution in these cases may, unfortunately, involve personnel changes.

Fourth (and this one is the killer), some organizations are structured such that different specialists involved with software development report up different management hierarchies. In these cases, individuals are literally not allowed to perform any of the tasks that are in the jurisdiction of a different specialized silo. As long as this structure persists, the level of waste caused by the batch-and-queue process is greater than when everyone works in the same department, because informal communication and collaboration across specialized silos is all the more difficult. Such collaboration may even be actively discouraged by management. This is a much more difficult problem to solve than the first two, because it isn't just a question of education and mentoring. The strategic solution involves significant and disruptive changes in organizational structure. The managers in the various silos are all peers on the organizational chart. None will be willing to sacrifice his/her position of power for the good of the enterprise. If you choose to attack a problem of this kind, be prepared for an exhausting and frustrating political battle that you may well lose. Situations like this make it very tempting indeed to remain addicted to pain-killers. It may be considerably less trouble than addressing the core problem! Nevertheless, simply to reinstate the old batch-and-queue system using teams of specialized individuals isn't the real answer. That approach merely institutionalizes muda and makes it harder to remove in future.

There are many more examples than just these two. I'm sure you can see how organizations sometimes "fix" problems in ways that actually cement the muda in place and paint a smiley on it. Pain-killers sure make you feel better (for the moment). Despite the challenges, there are practical steps we can take to address the true problems and kick our addiction to pain-killers. Withdrawal is a bitch, but it can be done.

You can probably see that if we look at situations through the lens of certain "norms," we recognize deviations from those norms as antipatterns or process smells, if you prefer that term. I'm not making these scenarios up out of thin air. For instance, the problem of specialists handing off work-in-process in a batch-and-queue process is documented in Wayne Allen's collection of process smells under the name, Waiting on Specialists. While I characterize the problem in terms of lean thinking, Wayne describes it in terms of agile software development principles. It's the same problem either way, it's usually caused by the same issues, and the solution is usually more-or-less the same. Sometimes people assume the problems in their organization must be unique, but in fact many IT organizations have very similar problems. When you learn to perceive the patterns or "smell the smells," you start to find opportunities for improvement.

Tools

The disciplines of Theory of Constraints and Systems Thinking offer a range of analysis tools that can help us identify the root causes of problems like these. I don't want to try and explain each of these tools in detail here. You can find information about them easily online or in books. I just want to mention a few analysis tools that I've found helpful for root cause analysis of process-related issues.

If you think there is likely to be a linear chain of cause and effect leading to a problem, you might want to try and map the situation using an Ishikawa diagram, also known as a "fishbone" diagram because of its appearance. It isn't suitable for complex systems. If you're analyzing a limited part of a process or organization it might be sufficient.

More often, though, problems in complex systems such as human organizations don't exhibit simple linear cause-and-effect behavior. Instead, an event or situation may be both a cause and an effect, and influences among them may either reinforce or counteract other influences in the organization. The influences may be circular rather than linear, and may have multiple, as well as mutual, causes and effects. An increase in A might cause more of B to occur, and when more of B occurs it results in less of C, which causes more of D, which reduces A but increases E, which (after influencing F, G, H, I, and J) ultimately increases A again. These circular relationships are called causal loops, and they can be mapped using a Diagram of Effects. The Diagram of Effects is especially useful in cases when a problem appears to be the result of interaction between different working groups in an organization, and most especially when those working groups represent different professional specialties that are organized as silos, and interim work artifacts are handed off between silos during the development/delivery process.

That sort of organizational structure leads to complex interaction among the different silos that generates balancing loops ("no matter what we do, nothing ever changes") and reinforcing loops ("the more we compensate for the problems caused by that other group, the worse it gets"). A typical pattern is that people within a given working group implement self-defense strategies against the general organizational dysfunction. These behaviors are, in effect, a form of local optimization that produces little reinforcing loops that, in turn, exacerbate the very organizational dysfunctions that prompted their creation. In effect, the localized defenses against the original organizational dysfunction actually strengthen the dysfunction and hold it in place. They do this by generating influences that create and sustain an overall, giant balancing loop: The general flow of work in the organization...that which never changes, no matter what we do. Deliciously ironic, eh?

Sometimes, the existence of muda in a process is non-obvious. Due to the typical forest-and-trees phenomenon, it may be that those who are caught up in the midst of an endemic problem are in the worst position to see its cause. The Value Stream Map is a simple and powerful analysis tool that can help you expose the muda in a process. In some organizations, people are so accustomed to traditional role segregation on teams that they simply assume the usual role silos are necessary and normal. They may not perceive the muda in such an arrangement until you can show them how the hand-offs between silos affects lead time and cost for software delivery. Thus, tools like this are useful both for discovering the root causes of problems and for overcoming longstanding habits of mind and communicating the basic value proposition of contemporary methods.