Forum on Risks to the Public in Computers and Related Systems
ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator
Volume 3: Issue 57
Tuesday, 16 September 1986
Contents
Computers and the Stock Market (again)- Robert Stroud
The Old Saw about Computers and TMI- Ken Dymond
Do More Faults Mean (Yet) More Faults?- Dave Benson
A critical real-time application worked the first time- Dave Benson
Autonomous weapons- Eugene Miya
"Unreasonable behavior" and software- Eugene Miya on Gary Chapman
Risks of maintaining computer timestamps revisited- John Coughlin
Info on RISKS (comp.risks)
Computers and the Stock Market (again)
Robert Stroud <robert%cheviot.newcastle.ac.uk@Cs.Ucl.AC.UK>
Mon, 15 Sep 86 16:53:37 gmt
The computers had a hand in the dramatic fall on Wall Street last week
according to an item on the BBC TV news. Apparently, the systems were
not designed to cope with the sheer volume of sales, (anybody know more
about this?). The report continued
"In London they still do it the old fashioned way with bits
of paper, which makes people think twice before joining in
a mindless selling spree. However, all this could change in
October with the Big Bang..."
What price progress?
Robert Stroud,
Computing Laboratory,
University of Newcastle upon Tyne.
ARPA robert%cheviot.newcastle@ucl-cs.ARPA
UUCP ...!ukc!cheviot!robert
The Old Saw about Computers and TMI
"DYMOND, KEN" <dymond@nbs-vms.ARPA>
16 Sep 86 09:25:00 EDT
Ihor Kinal says in RISKS-3.55 >Obviously, one can present arguments for each side [human > vs computer having the last say -- at TMI, computers >were right, but ...] I would say that if humans do >override CRITICAL computer control [like TMI], then >some means of escalating the attention level must be >invoked [e.g., have the computers automatically notify >the NRC]. This belief keeps surfacing but is false. There was no computer control in safety grade systems at TMI -- see the documentation in the Kemeny report and probably elsewhere. There was a computer in the control room but it only drove a printer to provide a hardcopy log of alarms in the sequence in which they occurred. The log is an aid in diagnosing events. The computer (a Bendix G-15 ??) did play a role in the emergency since at one point its buffer became full and something like 90 minutes of alarms were not recorded, thus hampering diagnosis. On a couple of occasions I have asked NRC people why computers aren't used to control critical plant systems and have been told that "they aren't safety grade." I'm not quite sure what this means, but I take it to mean that computers (and software) aren't trustworthy enough for such safety areas as the reactor protection system. This is not to say that computers aren't used in monitoring plant status, quite different from control. Ken Dymond (the opinions above don't necessarily reflect those of my employer or anybody else, for that matter.)
Do More Faults Mean (Yet) More Faults?
Dave Benson <benson%wsu.csnet@CSNET-RELAY.ARPA>
Sun, 14 Sep 86 19:00:30 pdt
|In RISKS 3.50 Dave Benson comments in "Flight Simulator |Simulators Have Faults" that | | >We need to understand that the more faults found at | >any stage to engineering software the less confidence one has in the | >final product. The more faults found, the higher the likelyhood that | >faults remain. | |This statement makes intuitive sense, but does anyone know of any data |to support this ? Is this true of any models of software failures ? |Is this true of the products in any of the hard engineering fields -- civil, |mechanical, naval, etc. -- and do those fields have the confirming data ? | |Ken Dymond, NBS Please read the compendium of (highly readable) papers by M.M.Lehman and L.A.Belady, Program Evolution: Processes of Software Change, APIC Studies in Data Processing No. 27, Academic Press, Orlando, 1985. This provides data. It is (sorry-- should be, but probably isn't) standard in software quality assurance efforts to throw away modules which show a high proportion of the total evidenced failures. The (valid, in my opinion) assumption is that the engineering on these is so poor that it is hopeless to continue to try to patch it up. Certain models of software failure place increased "reliablity" on software which has been exercised for long periods without fault. One must understand that this is simply formal modelling of the intuition that some faults means (yet) more faults. This is certainly true of all engineering fields. While I don't have the "confirming data" I suggest you consider your car, your friends car, etc. Any good history of engineering will suggest that many designs never are marketed because of an unending sequence of irremediable faults. The intuitive explaination is: Good design and careful implementation works. This is teleological. We define good design and careful implementation by "that which works". However, I carefully said "confidence". Confidence is an intuitive assessment of reliability. I was not considering the formalized notion of "confidence interval" used in statistical studies. To obtain high confidence in the number of faults requires observing very many errors, thus lowering one's confidence in the product. To obtain high confidence in a product requires observing very few errors while using it.
I found one! (A critical real-time application worked the first time)
Dave Benson <benson%wsu.csnet@CSNET-RELAY.ARPA>
Sun, 14 Sep 86 22:40:21 pdt
Last spring I issued a call for hard data to refute a hypothesis which I,
perhaps mistakenly, called the Parnas Hypothesis:
No large computer software has ever worked the first time.
Actually, I was only interested in military software, so let me repost the
challenge in the form I am most interested in:
NO MILITARY SOFTWARE (large or small) HAS EVER WORKED IN ITS FIRST
OPERATIONAL TEST OR ITS FIRST ACTUAL BATTLE.
Contradict me if you can. (Send citations to the open literature
to benson@wsu via csnet)
Last spring's request for data has finally led to the following paper:
Bonnie A. Claussen, II
VIKING '75 -- THE DEVELOPMENT OF A RELIABLE FLIGHT PROGRAM
Proc. IEEE COMPSAC 77 (Computer Software & Applications Conference)
IEEE Computer Society, 1977
pp. 33-37
I offer some quotations for your delictation:
The 1976 landings of Viking 1 and Viking 2 upon the surface of
Mars represented a significant achievement in the United States
space exploration program. ... The unprecented success of the Viking
mission was due in part to the ability of the flight software
to operate in an autonomous and error free manner. ...
Upon separation from the Oribiter the Viking Lander, under autonomous
software control, deorbits, enters the Martian atmosphere,
and performs a soft landing on the surface. ... Once upon the surface,
... the computer and its flight software provide the means by
which the Lander is controlled. This control is semi-autonomous
in the sense that Flight Operations can only command the Lander
once a day at 4 bit/sec rate.
(Progress occured in a NASA contract over a decade ago, in that)
In the initial stages of the Viking flight program development,
the decision was made to test the flight algorithms and determine
the timing, sizing and accuracy requirements that should be
levied upon the flight computer prior to computer procurement.
... The entire philosophy of the computer hardware and
software reliability was to "keep it simple." Using the
philosophy of simplification, modules and tasks tend toward
straight line code with minium decisions and minimum
interactions with other modules.
(It was lots of work, as)
When questioning the magnitude of the qulity assurance task,
it should be noted that the Viking Lander flight program development
required approximately 135 man-years to complete.
(But the paper gives no quantitative data about program size or complexity.)
Nevertheless, we may judge this as one of the finest software engineering
acomplishments to date. The engineers on this project deserve far more
plaudits than they've received. I know of no similar piece of software
with so much riding upon its reliable behavior which has done so well.
(If you do, please do tell me about it.)
However, one estimates that this program is on the order of kilolines of FORTRAN
and assembly code, probably less than one hundred kilolines. Thus
Parnas will need to judge for himself whether or not the Viking Lander
flight software causes him to abandon (what I take to be) his hypothesis
about programs not working the first time.
It doesn't cause me to abandon mine because there were no Martians shooting
back, as far as we know...
David B. Benson, Computer Science Department, Washington State University,
Pullman, WA 99164-1210 csnet: benson@wsu
Autonomous weapons
<LIN@XX.LCS.MIT.EDU>
Tue, 16 Sep 1986 08:31 EDT
From: eugene at AMES-NAS.ARPA (Eugene Miya)
... another poster brought up the issue of autonmous weapons.
We had a discussion of of this at the last Palo Alto CPSR meeting.
Are autonmous weapons moral? If an enemy has a white flag or hand-ups,
is the weapon "smart enough" to know the Geneva Convention (or is too
moral for programmers of such systems)?
What do you consider an autonomous weapon? Some anti-tank devices are
intended to recognize tanks and then attack them without human
intervention after they have been launched (so-called fire-and-forget
weapons). But they still must be fired under human control. *People*
are supposed to recognize white flags and surrendering soldiers.
"Unreasonable behavior" and software
<LIN@XX.LCS.MIT.EDU>
Tue, 16 Sep 1986 09:01 EDT
From: Gary Chapman
Risks of maintaining computer timestamps revisited
John Coughlin <JC%CARLETON.BITNET@WISCVM.WISC.EDU>
15 Sep 86 12:14:00 EDT
Some time ago I submitted an item to RISKS describing the way in which the
CP-6 operating system requires the time to be set manually during every warm
or cold boot. The latest release of this OS contains an improvement: in most
cases the time need only be manually set on a cold boot. Unfortunately, with
this enhancement came an unusual bug.
The timestamp is stored in a special hardware register, which is modified by
certain diagnostic procedures run during preventive maintenance. It seems
these diagnostic procedures were not modified to reflect the new use put to
the timestamp register. As a result, any time a warm boot was performed after
PM, the monitor would freak out at the illegal timestamp and mysteriously
abort the boot with a memory fault. Until this bug was patched the only fix
was to power the computer down, thus clearing the offending value.
Luckily, the PM procedure set the timestamp register to an impossible value,
rather than a realistic but incorrect value. Therefore the problem manifested
itself in an obvious way, instead of subtly changing the date and time. Of
course this was at the cost of having to fix a hung system. This is yet
another illustration of the risk of breaking one thing while fixing another.
/jc

Report problems with the web pages to the maintainer