As the holiday weekend winds down, the list of the worlds 500 fastest supercomputers has once again been updated. Since 1993 the list has been updated twice a year, and the world has watched as nations and corporations compete to move ahead and stay ahead of the competition. Publicity occasionally accompanies a major technology leap; but normally the updates go by without much fanfare.
I can’t claim that I’ve ever had the privilege to work with one of the amazing machines on this list; I would love to opportunity to do so. Indeed, I know very little about the software that these machines run. I’ve read a handful of articles that discuss the parallel architectures, the complex mathematics and the need for teraflops; but I’ve seen nothing about the development processes or quality assurance measures that are taken to ensure correctness.
I find myself wondering… if software is written to run on these super-fast machines, does the increased speed just lead to a higher defect-density per hour of run-time; or are extra-ordinary measures being taken to assure high-quality software? Isn’t the cost of defects substantially higher when they occur on this class of computer?
I expect that most programmers who write software for these systems are probably above-average, and perhaps make fewer mistakes than the rank-and-file programmer; yet the same might be said of those who program safety-critical systems, and we’ve certainly had more than a few things go wrong in these systems. I am not aware of any instance where a software error has resulted in a faulty scientific theory, or a misplaced policy decision; but in decades of using supercomputers for large-scale simulations of economies, seismology, chemistry, biology, and the like, it seems likely that it has happened. Certainly much of what is computed on these machines is pure research, but at least some is for application.
It may be that due to the politics of money and relatively low profile, we might never know about any really big mistakes arising from supercomputing software. Maybe some of you are aware of some that you would share.
For those who are interested in latest Top500 list you can find it at http://www.top500.org/list/2010/06/100
For another interesting application of a supercomputer there’s always this video:

Comments