The Risks Digest

The RISKS Digest

Forum on Risks to the Public in Computers and Related Systems

ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator

Volume 4 Issue 16

Saturday, 22 November 1986


o Banking machine almost ruins love life of Vancouver couple
Mark Brader
o 2+2= ? (Risks of self-testing, especially with nonexistent tests)
o Re: Computer-based stock trading
Roger Mann
o Re: appendix to ACARD report
Nancy Leveson
o Some further thoughts on the UK software-certification proposals
Dave Platt
o Dependable Computing and the ACM Communications
o Info on RISKS (comp.risks)

"Banking machine almost ruins love life of Vancouver couple"

mnetor!lsuc!dciem!msb@seismo.CSS.GOV <Mark Brader>
Fri, 21 Nov 86 14:28:20 est
VANCOUVER (CP) — Automated banking machines could prove hazardous to
your love life, as an unidentified Vancouver woman can testify.

The woman tried to use her banking card to get money from an automatic
teller in Honolulu.

"But by the time the message went, via satellite, from Hawaii to the
central computer in New Jersey, then via land line to Seattle and
Vancouver, then back to Hawaii, the teller machines [sic] had gone past
its allowable waiting time," says a credit union spokesman.  The woman
did not get any money but the credit union in Vancouver took the money
out of her account.

When the woman learned her account had been debited $1,100, she accused
her fiance of taking it.

The fiance moved out and the woman reported the theft to the police,
who picked up the man for questioning.  It took almost a month for the
two banks involved to solve the problem.  The couple has since reunited.

[Reproduced from the Toronto Star, November 20, 1986.  Submitted
to RISKS by Mark Brader.  Glossary for foreign readers: a "credit
union" is similar to a bank; $1,100 Canadian is about $800 US.]

2+2= ? (Risks of self-testing, especially with nonexistent tests)

Sat 22 Nov 86 19:18:45-EST
If another car cuts in front of mine, then I would usually be alert enough
to take evasive action. But, suppose ! The day will come when I happen
to be looking at the scenery, or when there is a patch of mud on the road:
and then the two problems compound into something serious.

It is in just this compounding manner that minor events turn into major events.

Once upon a time, a friend of mine was using a microprogrammed box to process
satellite images. One day, it seemed to be malfunctioning: and in fact, when
he looked inside, some of the error-indication LEDs were glowing.

Naturally, he ran the hardware test suite. However, the suite indicated that
all was well. And thus it came about that my friend investigated the suite -
and found that although they had written it, and although he ran it, it
wasn't there !

The tests had been written in the only language which the box had, namely, 
a pretty homebrew assembler for its (wide) microcode. The assembler gave
rather difficult listings, and did not finish by giving a count of errors.
As a result, 4 of the 8 tests had in fact never assembled, and the
programmer hadn't noticed.

Now, the host machine had a downloader, and it had an idiotic property.  When 
asked to download a file which did not exist, it would simply create a null
file, and then download that. Pardon ?  Did I hear the phrase "error message" ?

On top of all this, the box's loader did not set the memory to a known
state (like, all zero) before loading a file.

Worse yet, all of the 8 tests started at the same address, and printed the
same messages (e.g. "Test starting"). 

We therefore see how an operator could faithfully run tests 1 through 8
without ever knowing that in fact, tests 5, 6, 7 and 8 did not exist !

We can also blame the original programmer for never having simulated a
hardware error, to see if the tests caught it. And where was his manager ?
And where is he now - out building missile guidance systems, maybe ?

Re: Computer-based stock trading

Fri, 21 Nov 86 14:51 MST
Computer arbitrage should be self-limiting, just as pre-computer arbitrage
is self-limiting.  The price differential between a future and the stock
index tends to permit arbitrage to occur.  The question is who profits and
who loses ?  Clearly, after one of the huge price moves in a stock, the last
arbitrager will experience a loss.  Too many losses and he exits the game.
Thus we have one less computer trader.  Eventually, the number of successful
computer traders should be the number who don't experience losses, and the
stock price moves we see should be limited to smaller percentage moves.

Why hasn't this occurred ?  A couple answers suggest themselves.  (1)
Computer arbitrage is not to blame any more than human-speed arbitrage is to
blame.  (2) Volatility as is perceived is not there (the same percentage
move now as in 1974 would be three times as much in a absolute stock move.)
(3) Other factors which are hidden and not well understood.

Re: appendix to ACARD report

Nancy Leveson <nancy@ICSD.UCI.EDU>
21 Nov 86 16:26:19 PST (Fri)
I am somewhat concerned by the implication in the report that checking the
software by formal mathematical proof is the answer to the safety problem.

Although I believe that mathematical proof and certainly mathematical
analysis should play an important role in building safety-critical software,
it alone certainly will not guarantee an acceptable level of risk.  Putting
aside technical questions of whether it can be accomplished at all (e.g.
what if the software contains real numbers?), formal mathematical proof can
be used to show only the consistency between the specification and the
program (or between levels of specification).  BUT most accidents involving
software have not been caused by coding errors but rather by
misunderstandings about what the software should have been doing at all or
erroneous assumptions about the actions of the environment or the controlled
system, i.e. specification errors.  It is the things that are left out or
forgotten that cause the most problems.  Furthermore, mathematical proof of
the software will not handle the cases where the accident occurs because of
the interaction between the software and the controlled system — the
software was "correct" in the usual formal mathematical sense.

Safety is a system problem and one cannot guarantee software safety by 
looking only at the software or by mathematically proving properties of 
the software in isolation from the operation of the rest of the system.

Nancy Leveson

Some further thoughts on the UK software-certification proposals

Dave Platt <dplatt@teknowledge-vaxc.ARPA>
Fri, 21 Nov 86 10:28:33 PST
The proposals in the ACARD report seem to place a great deal of emphasis on
mathematical proof-of-correctness of computer programs (and the tools used
to build them).  I wonder just how practical this is, given the current
state-of-the-art in software construction and theory, and I have a few
questions to toss out.

Disclaimer: I'm a [reasonably good] programmer, not a high-power
computer-science theorist;  my knowledge of the state-of-the-art in
correctness proofs is fragmentary and badly out of date.  If I speak
from ignorance, please feel free to correct and enlighten me!

1) Are existing programming languages constructed in a way that makes
   valid proofs-of-correctness practical (or even possible)?  I can
   imagine that a thoroughly-specified language such as Ada [trademark
   (tm) Department of Defense] might be better suited for proofs than
   machine language; there's probably a whole spectrum in between.

2) Is the state of the art well enough advanced to permit proofs of
   correctness of programs running in a highly asynchronous, real-time

3) Will the compilers have to be proved mathematically correct also?  or
   might something like the Ada compiler/toolkit validation be adequate?

4) The report seems to imply that once a system is proven correct/safe,
   it can be assumed to remain so (for the [limited] lifetime of its
   License to Operate) so long as maintenance is performed by a
   certified software engineer.  Is this reasonable?  My own experience
   is that _any_ patch or modification to a program, no matter how minor
   it may seem, has a pretty substantial chance of causing unwanted
   side effects and thus voiding the program's correctness.  Seems to me
   that a life-critical system should be completely revalidated (if not
   necessarily recertified) after any change, and that changes should
   probably be made in the original programming language rather than by
   low-level patches.

5) Many of the program "failures" I've encountered in "stable" software
   have been due to unexpected inputs or unplanned conditions, rather than
   to any identifiable error in the program itself.  Can any proof-of-
   correctness guard against this sort of situation?

6) What are the legal aspects of this sort of proposal, from the programmer's
   point of view?  Anybody got a good source of Programmers' Malpractice

7) Are the error-rate goals suggested in the report (1 error per 100,000
   lines of code, or even less?) reachable?

8) Military systems such as the SDI control software would appear to belong
   to the "disaster-level" classification... will they be subject to this
   level of verification and legal responsibility, or will they be exempted
   under national-security laws?  [Of course, if an SDI system fails,
   I don't suppose that filing lawsuits against the programmer(s) is going
   to be at the top of anybody's priority list...]

9) If the certified software engineer responsible for a particular
   piece of life-critical code resigns or is reassigned, is it reasonable
   to assume that another (equally-qualified) CSE could in fact take over
   the job immediately (on an urgent-call-out basis, for example)?

I respect the committee's concern for this problem, but I wonder whether
they haven't focused too much on one aspect (software correctness) at
the expense of considering other aspects (hardware reliability, adequate
specification of operating conditions, interfaces to humans and external
physical control systems, etc.).

Dependable Computing and the ACM Communications

Peter G. Neumann <Neumann@CSL.SRI.COM>
Wed 19 Nov 86 19:40:40-PST
There is an announcement by John Rushby in the November 1986 issue of the
Communications of the ACM (pp. 1031-2) regarding the establishment of
Dependable Computing as a CACM department — regarding systems that must
dependably satisfy certain critical requirements such as safety, fault
avoidance, and fault tolerance.  This announcement is also noteworthy in
that it provides a concise, easily accessible summary of some generally
accepted terminology that contributors to RISKS would do well to observe and
practice, including the Melliar-Smith / Randell distinctions among faults,
failures, and errors.  It is hoped that RISKS readers with serious technical
contributions may find this CACM department an appropriate printed medium.

Please report problems with the web pages to the maintainer