by Max H



Not so very long ago, I was given an assignment to perform a forensic analysis on a piece of firmware. It seems that my client’s device had a rather spectacular failure which endangered many lives. The job was critical, but this was new territory for me.

We did have some valuable information provided by the internal and external logs. We knew that a failure had occurred with an internal redundant sensor; and although the sensor was correctly de-activated, the failure had also erroneously caused the device to re-instate a previously failed sensor, resulting in device outputs that had near-catastrophic results.

I was given the source code, and a 20-minute overview of the theory of operation by one of the original architects of this 15 year-old device; and turned loose. Where to begin?

The overview had narrowed the field of likely suspects to five or six modules, but this was still a large chunk of code. The program was laden with global variables (more than 1000 of them); which was intentional and practical given the nature of the software and the tools in use when the program was developed. Furthermore this code was fairly complex, written in C, and operated by a hand-crafted preemptive multi-threaded operating system. Figuring out what all of the possible interactions were was going to be a nightmare.

Within a short time, I was able to locate the line of code responsible for switching on the offending sensor; unfortunately the work had just begun. The sensor switching was based on a variable whose value was the result of a chain of calculations that were scattered all over the application. The variable contained bits representing states for each of the six sensors. By design, up to three of these sensors could fail without adversely impacting operation. The program operated in loop, so variables retained values until they were re-written, or until the device was re-started. The erroneous logic might have occurred anywhere in this chain of responsibility. A full data-flow analysis was the only technique I had used that might yield answers, yet given the urgency of a determination this was not a realistic option.

The answer came in a paper that described an analysis technique based on a combination of program slicing and weakest-precondition analysis. The ideas in that paper gave me the tools I needed to rapidly perform a dataflow analysis with a scope limited to the variable.

Slicing is an analysis technique that extracts pieces of source code that are to be the focal point of the analysis. Basically, you discard any source lines that aren’t relevant to the problem being analyzed. The contribution of Dijkstra’s Weakest Precondition Calculus, is that it allows us to discard those intermediate slices that cannot directly lead to the result we are looking for. Combining these concepts, I was able to very quickly isolate possible sources of the error.

To begin, I used regular expressions to locate all statements where the value of the variable in question, let’s call it sensor_control, was modified. To do this I looked for all of the statements where sensor_control was the target of an assignment, a short-cut assignment, an increment or decrement operation (pre- or post-). With the results of this search, I was able to see all of the data dependencies of sensor_control. For each variable in these dependencies, I was able to add it into the regular-expression in the same fashion as sensor_control, thereby increasing the scope of the slice. This was done iteratively, increasing the length of the dependency chain until no more links could be found. A full round of this approach yielded fewer than 150 lines of source code that actually needed to be analyzed.

Since my editor (Multi-Edit) can perform code-folding on regular expressions, as well as RE searches; I was able to look at each statement in the context of the application or extract them to another file. The slice was first placed in it’s own analysis text file, then I examined the source context for each statement, and commented the analysis file. For example, a comment might be a pseudo-code if-statement indicating the conditions under which a line might be executed.

With the program-slice now fully elaborated, and the context of each statement clear; I could now perform data-flow analysis on just the code that was relevant to the issue. While this was much easier than dealing with the whole program, the analysis was fairly dense, when one considers that every single one of those lines had an impact on the value of sensor_control. As I walked through the code, I applied the WP approach, eliminating any statement as it became clear that it could not cause the sensor to be enabled. While this seems like common sense in retrospect; it was a very helpful addition. I also developed a simple notation that helped significantly in following the state changes. The notation consisted of 3 statements:

  • #FILE: <file name>: <list of modified variables> – This annotation indicated which file the statements that followed had been extracted from, and provided the list of variables that were modified therein.
  • #FOLLOW: <variable> –> <file> – This indicated the last modification of variable within a file, and indicated the file where the next use (read or write) of the variable in the control flow of the program. This was a key concept in the analysis.
  • #DROP: <variable> – This indicated the last significant use of a variable in the control flow. This aided by reducing clutter, and allowing some variables to be tracked for only part of the loop.

The consistent syntax allowed me to employ syntax-highlighting on the analysis file that made these really standout; and also allowed me to fold code around these statements on the analysis file, which greatly aided in navigating the dataflows.

Using the above techniques, I was able to isolate 4 potential points of failure in this application that I had never before laid eyes on. With some additional information provided by the architect, we eliminated two of those. As it turns out, the remaining two were complicit in the failure. A small change to either statement could have avoided the failure. In about 12 hours, we had found the fail-point, using only static analysis, in a complex application that exceeded 100 KLOC; exceeding all expectations, including my own. I had fully expected to spend at least two weeks on the analysis.

The analysis was so effective and efficient, that I was later asked to perform similar analyses on other safety functions of the software; and across all of the releases so that we could identify other potential issues in the field.

The paper that inspired my approach is “Program Slicing using Weakest Preconditions”, by Joseph J. Comuzzi and Johnson M. Hart. It seems to have all but disappeared from the internet, but it is still available in the cache at Citeseer.

Although my use of Dijsktra’s WP calculus was informal, a good overview is available online at http://www.scss.tcd.ie/Edsko.de.Vries/talks/weakest_precondition.pdf