Like many veteran software developers, I am sold on the value of defensive programming. It seems that no matter how thorough the requirements, nor how good the design, things can go wrong; and I’d like my code to be able to handle it. So imagine my surprise when no less than a DER provided me with a perfectly valid and well-reasoned argument to discard defensive programming techniques altogether.

DO-178B makes only a single mention of defensive programming. The last sentence of a note at the end of section 4.5 states “Defensive programming practices may be considered to improve robustness.” With this ringing endorsement, one would expect everyone to leap whole-heartedly onto the defensive programming bandwagon, yes? Well, not exactly. One the argument that comes up over and over again is that defensive programming injects dead code into the application. By the DO-178B definition, dead code is “executable object code (or data) which, as a result of a design error cannot be executed (code) or used (data) in an operational configuration of the target computer environment and is not traceable to a system or software requirement. An exception is embedded identifiers.”

I would argue that it injects deactivated code, “executable object which by design is … (a) not intended to be executed (code) or used (data)”. The definition of deactivated code makes no mention of requirements. Note however that with the presence of deactivated code, section 6.4.4.3.d of DO-178B imposes a verification burden to prove that it cannot be executed. That having been done, then does the code serve any purpose? I would suggest that it has a two-fold purpose: to add protection in the event of an SEU; and to aid in verification following maintenance; when it must once again be shown that the defensive code does not execute, giving further evidence to the correctness of the change.

The first definition states that dead code is both the “result of a design error” and “not traceable to a system or software requirement”. I don’t believe that it qualifies as a design error as it is intentional and likely documented, and if it maintains application integrity in the presence of logically inconsistent data, then I would say it is supporting the requirements allocated to that segment of the program; but may or may not be directly traceable to them. Still, it fails to satisfy the first clause; so it is clearly not dead code. Defensive programming clearly appears to conform to the letter of definition of deactivated code; but there is still a glitch. Some argue that the phrase “by design” implies the need for requirements. Therefore, they argue, if no requirements for the defensive programming code exists, then it does not meet the definition of deactivated code; so it must be dead code. I believe that is a very flawed argument. Consider this: If “by design” implies the presence of requirements, then “design error” would have to be referring only to an error in the requirements. If this is true then any code which could be traced to correct requirements but was non-executable due to implementation error would also not meet the two-part definition of dead-code. Quite clearly “design” occurs at many levels, including during implementation. Performance, memory-consumption, robustness and style considerations all play a part in these decisions. Every implemented function is a mosaic of micro-design decisions that exist only within the implementation.

So with my case stated so strongly above, what was it that finally swayed me to really rethink my position? The argument presented by the aforementioned DER was pretty simple. She thought that the need for defensive programming was an indication of a lack of robustness requirements; and that in the absence of such requirements, what assurance was there that the implementor would correctly handle the erroneous condition. My gut reaction was that it was better for the code to do something than to do nothing; after all, doing nothing might allow the entire partition to be corrupted. Furthermore, with the discouragement of defensive practices, many corruption possibilities were not being considered. But in time, I began to see her point. With no standards or requirements dictating the actions to be taken by defensive clauses; the implementor might just make a choice that masks a serious problem. Who is to say that this is better than the alternative? The original error might easily cause the system to be safely restarted due to an exception, whereas the handling now hides a problem that later creates a safety issue; but then, the opposite might also be true. So, how has my position changed? Should defensive programming practices be stricken from the development process?

Over the past few years I’ve had time to give these arguments some strong consideration; and I still believe that defensive programming has a very important role, especially in safety-critical software. It is not sufficient to simply disregard erroneous conditions because they can only occur when the machine was in a logically inconsistent state. Logical inconsistence is not the same as impossibility in the presence of external factors such as SEUs; and continuing to function as if nothing has happened may also mask a condition that will later manifest as a safety issue. It is also not sufficient for the implementor to make arbitrary choices about the handling of such inconsistencies. Instead the means and methods of dealing with such errors must be considered at the outset. It will still generally be the implementors choice as to when defensive programming is needed; but the way in which it is handled must reflect the policies, requirements, and/or development standards established at the start. As an example, there may be a global requirement that says “Where logical inconsistencies are detected by defensive programming means, the system fault handler shall be notified and provided with the type and location of the fault; after which the calling function shall …”. Shall what? Clean-up and exit, set values and continue, or do whatever the implementor deems best? This is a policy decision, and to some extent it may not matter. The fault handler is made aware of the issue, and can affect a restart of the whole system, or the faulted partition, or nothing at all; depending entirely on the type and location of the fault.

The establishment of a means and method ties defensive programming in as a design feature without imposing an unrealistic burden on the implementors, designers, or requirements engineers. It lets us continue to use a software development best-practice without impeding concerns of safety.

As for that argumentative DER; I owe her a big thanks for illuminating this issue for me. Thank you, Marge!

About Max H:
Max is a father, a husband, and a man of many interests. He is also a consulting software architect with over 3 decades experience in the design and implementation of complex software. View his Linked-In profile at http://www.linkedin.com/pro/swarchitect