VANCOUVER (CP) -- Automated banking machines could prove hazardous to your love life, as an unidentified Vancouver woman can testify. The woman tried to use her banking card to get money from an automatic teller in Honolulu. "But by the time the message went, via satellite, from Hawaii to the central computer in New Jersey, then via land line to Seattle and Vancouver, then back to Hawaii, the teller machines [sic] had gone past its allowable waiting time," says a credit union spokesman. The woman did not get any money but the credit union in Vancouver took the money out of her account. When the woman learned her account had been debited $1,100, she accused her fiance of taking it. The fiance moved out and the woman reported the theft to the police, who picked up the man for questioning. It took almost a month for the two banks involved to solve the problem. The couple has since reunited. [Reproduced from the Toronto Star, November 20, 1986. Submitted to RISKS by Mark Brader. Glossary for foreign readers: a "credit union" is similar to a bank; $1,100 Canadian is about $800 US.]
If another car cuts in front of mine, then I would usually be alert enough to take evasive action. But, suppose ! The day will come when I happen to be looking at the scenery, or when there is a patch of mud on the road: and then the two problems compound into something serious. It is in just this compounding manner that minor events turn into major events. Once upon a time, a friend of mine was using a microprogrammed box to process satellite images. One day, it seemed to be malfunctioning: and in fact, when he looked inside, some of the error-indication LEDs were glowing. Naturally, he ran the hardware test suite. However, the suite indicated that all was well. And thus it came about that my friend investigated the suite - and found that although they had written it, and although he ran it, it wasn't there ! The tests had been written in the only language which the box had, namely, a pretty homebrew assembler for its (wide) microcode. The assembler gave rather difficult listings, and did not finish by giving a count of errors. As a result, 4 of the 8 tests had in fact never assembled, and the programmer hadn't noticed. Now, the host machine had a downloader, and it had an idiotic property. When asked to download a file which did not exist, it would simply create a null file, and then download that. Pardon ? Did I hear the phrase "error message" ? On top of all this, the box's loader did not set the memory to a known state (like, all zero) before loading a file. Worse yet, all of the 8 tests started at the same address, and printed the same messages (e.g. "Test starting"). We therefore see how an operator could faithfully run tests 1 through 8 without ever knowing that in fact, tests 5, 6, 7 and 8 did not exist ! We can also blame the original programmer for never having simulated a hardware error, to see if the tests caught it. And where was his manager ? And where is he now - out building missile guidance systems, maybe ?
Computer arbitrage should be self-limiting, just as pre-computer arbitrage is self-limiting. The price differential between a future and the stock index tends to permit arbitrage to occur. The question is who profits and who loses ? Clearly, after one of the huge price moves in a stock, the last arbitrager will experience a loss. Too many losses and he exits the game. Thus we have one less computer trader. Eventually, the number of successful computer traders should be the number who don't experience losses, and the stock price moves we see should be limited to smaller percentage moves. Why hasn't this occurred ? A couple answers suggest themselves. (1) Computer arbitrage is not to blame any more than human-speed arbitrage is to blame. (2) Volatility as is perceived is not there (the same percentage move now as in 1974 would be three times as much in a absolute stock move.) (3) Other factors which are hidden and not well understood.
I am somewhat concerned by the implication in the report that checking the software by formal mathematical proof is the answer to the safety problem. Although I believe that mathematical proof and certainly mathematical analysis should play an important role in building safety-critical software, it alone certainly will not guarantee an acceptable level of risk. Putting aside technical questions of whether it can be accomplished at all (e.g. what if the software contains real numbers?), formal mathematical proof can be used to show only the consistency between the specification and the program (or between levels of specification). BUT most accidents involving software have not been caused by coding errors but rather by misunderstandings about what the software should have been doing at all or erroneous assumptions about the actions of the environment or the controlled system, i.e. specification errors. It is the things that are left out or forgotten that cause the most problems. Furthermore, mathematical proof of the software will not handle the cases where the accident occurs because of the interaction between the software and the controlled system -- the software was "correct" in the usual formal mathematical sense. Safety is a system problem and one cannot guarantee software safety by looking only at the software or by mathematically proving properties of the software in isolation from the operation of the rest of the system. Nancy Leveson
The proposals in the ACARD report seem to place a great deal of emphasis on mathematical proof-of-correctness of computer programs (and the tools used to build them). I wonder just how practical this is, given the current state-of-the-art in software construction and theory, and I have a few questions to toss out. Disclaimer: I'm a [reasonably good] programmer, not a high-power computer-science theorist; my knowledge of the state-of-the-art in correctness proofs is fragmentary and badly out of date. If I speak from ignorance, please feel free to correct and enlighten me! 1) Are existing programming languages constructed in a way that makes valid proofs-of-correctness practical (or even possible)? I can imagine that a thoroughly-specified language such as Ada [trademark (tm) Department of Defense] might be better suited for proofs than machine language; there's probably a whole spectrum in between. 2) Is the state of the art well enough advanced to permit proofs of correctness of programs running in a highly asynchronous, real-time environment? 3) Will the compilers have to be proved mathematically correct also? or might something like the Ada compiler/toolkit validation be adequate? 4) The report seems to imply that once a system is proven correct/safe, it can be assumed to remain so (for the [limited] lifetime of its License to Operate) so long as maintenance is performed by a certified software engineer. Is this reasonable? My own experience is that _any_ patch or modification to a program, no matter how minor it may seem, has a pretty substantial chance of causing unwanted side effects and thus voiding the program's correctness. Seems to me that a life-critical system should be completely revalidated (if not necessarily recertified) after any change, and that changes should probably be made in the original programming language rather than by low-level patches. 5) Many of the program "failures" I've encountered in "stable" software have been due to unexpected inputs or unplanned conditions, rather than to any identifiable error in the program itself. Can any proof-of- correctness guard against this sort of situation? 6) What are the legal aspects of this sort of proposal, from the programmer's point of view? Anybody got a good source of Programmers' Malpractice insurance? 7) Are the error-rate goals suggested in the report (1 error per 100,000 lines of code, or even less?) reachable? 8) Military systems such as the SDI control software would appear to belong to the "disaster-level" classification... will they be subject to this level of verification and legal responsibility, or will they be exempted under national-security laws? [Of course, if an SDI system fails, I don't suppose that filing lawsuits against the programmer(s) is going to be at the top of anybody's priority list...] 9) If the certified software engineer responsible for a particular piece of life-critical code resigns or is reassigned, is it reasonable to assume that another (equally-qualified) CSE could in fact take over the job immediately (on an urgent-call-out basis, for example)? I respect the committee's concern for this problem, but I wonder whether they haven't focused too much on one aspect (software correctness) at the expense of considering other aspects (hardware reliability, adequate specification of operating conditions, interfaces to humans and external physical control systems, etc.).
There is an announcement by John Rushby in the November 1986 issue of the Communications of the ACM (pp. 1031-2) regarding the establishment of Dependable Computing as a CACM department -- regarding systems that must dependably satisfy certain critical requirements such as safety, fault avoidance, and fault tolerance. This announcement is also noteworthy in that it provides a concise, easily accessible summary of some generally accepted terminology that contributors to RISKS would do well to observe and practice, including the Melliar-Smith / Randell distinctions among faults, failures, and errors. It is hoped that RISKS readers with serious technical contributions may find this CACM department an appropriate printed medium.
Please report problems with the web pages to the maintainer