The Risks Digest

The RISKS Digest

Forum on Risks to the Public in Computers and Related Systems

ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator

Volume 3 Issue 57

Tuesday, 16 September 1986

Contents

o Computers and the Stock Market (again)
Robert Stroud
o The Old Saw about Computers and TMI
Ken Dymond
o Do More Faults Mean (Yet) More Faults?
Dave Benson
o A critical real-time application worked the first time
Dave Benson
o Autonomous weapons
Eugene Miya
o "Unreasonable behavior" and software
Eugene Miya on Gary Chapman
o Risks of maintaining computer timestamps revisited
John Coughlin
o Info on RISKS (comp.risks)

Computers and the Stock Market (again)

Robert Stroud <robert%cheviot.newcastle.ac.uk@Cs.Ucl.AC.UK>
Mon, 15 Sep 86 16:53:37 gmt
The computers had a hand in the dramatic fall on Wall Street last week
according to an item on the BBC TV news. Apparently, the systems were
not designed to cope with the sheer volume of sales, (anybody know more
about this?). The report continued

    "In London they still do it the old fashioned way with bits
    of paper, which makes people think twice before joining in
    a mindless selling spree. However, all this could change in
    October with the Big Bang..."

What price progress?

Robert Stroud,
Computing Laboratory,
University of Newcastle upon Tyne.

ARPA robert%cheviot.newcastle@ucl-cs.ARPA
UUCP ...!ukc!cheviot!robert


The Old Saw about Computers and TMI

"DYMOND, KEN" <dymond@nbs-vms.ARPA>
16 Sep 86 09:25:00 EDT
Ihor Kinal says in RISKS-3.55

    >Obviously, one can present arguments for each side [human
    > vs computer having the last say -- at TMI, computers
    >were right, but ...]   I would say that if humans do
    >override CRITICAL computer control [like TMI], then
    >some means of escalating the attention level must be
    >invoked [e.g., have the computers automatically notify
    >the NRC].

This belief keeps surfacing but is false.  There was no computer
control in safety grade systems at TMI -- see the documentation in
the Kemeny report and probably elsewhere.  There was a computer in
the control room but it only drove a printer to provide a hardcopy
log of alarms in the sequence in which they occurred.  The log is
an aid in diagnosing events.  The computer (a Bendix G-15 ??) did 
play a role in the emergency since at one point its buffer became 
full and something like 90 minutes of alarms were not recorded, thus
hampering diagnosis.  

On a couple of occasions I have asked NRC people why computers aren't
used to control critical plant systems and have been told that "they aren't
safety grade."  I'm not quite sure what this means, but I take it
to mean that computers (and software) aren't trustworthy enough for
such safety areas as the reactor protection system.  This is not to
say that computers aren't used in monitoring plant status, quite
different from control.

Ken Dymond
(the opinions above don't necessarily reflect those of my employer
or anybody else, for that matter.)


Do More Faults Mean (Yet) More Faults?

Dave Benson <benson%wsu.csnet@CSNET-RELAY.ARPA>
Sun, 14 Sep 86 19:00:30 pdt
  |In RISKS 3.50 Dave Benson comments in "Flight Simulator
  |Simulators Have Faults" that
  |
  |    >We need to understand that the more faults found at
  |    >any stage to engineering software the less confidence one has in the
  |    >final product.  The more faults found, the higher the likelyhood that
  |    >faults remain.
  |
  |This statement makes intuitive sense, but does anyone know of any data
  |to support this ?  Is this true of any models of software failures ?
  |Is this true of the products in any of the hard engineering fields -- civil,
  |mechanical, naval, etc. -- and do those fields have the confirming data ?
  |
  |Ken Dymond, NBS

Please read the compendium of (highly readable) papers by M.M.Lehman and
L.A.Belady, Program Evolution: Processes of Software Change, APIC Studies
in Data Processing No. 27, Academic Press, Orlando, 1985.  This provides data.
It is (sorry-- should be, but probably isn't) standard in software quality
assurance efforts to throw away modules which show a high proportion of
the total evidenced failures.  The (valid, in my opinion) assumption is
that the engineering on these is so poor that it is hopeless to continue
to try to patch it up.

Certain models of software failure place increased "reliablity" on software
which has been exercised for long periods without fault.  One must
understand that this is simply formal modelling of the intuition that
some faults means (yet) more faults.  This is certainly true of all
engineering fields.  While I don't have the "confirming data" I suggest you
consider your car, your friends car, etc.  Any good history of engineering
will suggest that many designs never are marketed because of an unending
sequence of irremediable faults.

The intuitive explaination is: Good design and careful implementation works.
This is teleological.  We define good design and careful implementation by
"that which works".

However, I carefully said "confidence".  Confidence is an intuitive
assessment of reliability.  I was not considering the formalized notion
of "confidence interval" used in statistical studies.  To obtain high
confidence in the number of faults requires observing very many errors,
thus lowering one's confidence in the product.  To obtain high confidence
in a product requires observing very few errors while using it.


I found one! (A critical real-time application worked the first time)

Dave Benson <benson%wsu.csnet@CSNET-RELAY.ARPA>
Sun, 14 Sep 86 22:40:21 pdt
Last spring I issued a call for hard data to refute a hypothesis which I,
perhaps mistakenly, called the Parnas Hypothesis:
    No large computer software has ever worked the first time.
Actually, I was only interested in military software, so let me repost the
challenge in the form I am most interested in:
    NO MILITARY SOFTWARE (large or small) HAS EVER WORKED IN ITS FIRST
    OPERATIONAL TEST OR ITS FIRST ACTUAL BATTLE.
Contradict me if you can. (Send citations to the open literature
to benson@wsu via csnet)

Last spring's request for data has finally led to the following paper:
    Bonnie A. Claussen, II
    VIKING '75 -- THE DEVELOPMENT OF A RELIABLE FLIGHT PROGRAM
    Proc. IEEE COMPSAC 77 (Computer Software & Applications Conference)
    IEEE Computer Society, 1977
    pp. 33-37

I offer some quotations for your delictation:

    The 1976 landings of Viking 1 and Viking 2 upon the surface of
    Mars represented a significant achievement in the United States
    space exploration program. ... The unprecented success of the Viking
    mission was due in part to the ability of the flight software
    to operate in an autonomous and error free manner. ...
    Upon separation from the Oribiter the Viking Lander, under autonomous
    software control, deorbits, enters the Martian atmosphere,
    and performs a soft landing on the surface. ... Once upon the surface,
    ... the computer and its flight software provide the means by
    which the Lander is controlled.  This control is semi-autonomous
    in the sense that Flight Operations can only command the Lander
    once a day at 4 bit/sec rate.

(Progress occured in a NASA contract over a decade ago, in that)

    In the initial stages of the Viking flight program development,
    the decision was made to test the flight algorithms and determine
    the timing, sizing and accuracy requirements that should be 
    levied upon the flight computer prior to computer procurement.
    ... The entire philosophy of the computer hardware and
    software reliability was to "keep it simple."  Using the
    philosophy of simplification, modules and tasks tend toward 
    straight line code with minium decisions and minimum
    interactions with other modules.

(It was lots of work, as)

    When questioning the magnitude of the qulity assurance task,
    it should be noted that the Viking Lander flight program development
    required approximately 135 man-years to complete.

(But the paper gives no quantitative data about program size or complexity.)

Nevertheless, we may judge this as one of the finest software engineering
acomplishments to date.  The engineers on this project deserve far more
plaudits than they've received.  I know of no similar piece of software
with so much riding upon its reliable behavior which has done so well.
(If you do, please do tell me about it.)

However, one estimates that this program is on the order of kilolines of FORTRAN
and assembly code, probably less than one hundred kilolines.  Thus
Parnas will need to judge for himself whether or not the Viking Lander
flight software causes him to abandon (what I take to be) his hypothesis
about programs not working the first time.

It doesn't cause me to abandon mine because there were no Martians shooting
back, as far as we know...

David B. Benson, Computer Science Department, Washington State University,
Pullman, WA 99164-1210  csnet: benson@wsu


Autonomous weapons

<LIN@XX.LCS.MIT.EDU>
Tue, 16 Sep 1986 08:31 EDT

    From: eugene at AMES-NAS.ARPA (Eugene Miya)

    ... another poster brought up the issue of autonmous weapons.
    We had a discussion of of this at the last Palo Alto CPSR meeting.
    Are autonmous weapons moral?  If an enemy has a white flag or hand-ups,
    is the weapon "smart enough" to know the Geneva Convention (or is too
    moral for programmers of such systems)?

What do you consider an autonomous weapon?  Some anti-tank devices are
intended to recognize tanks and then attack them without human
intervention after they have been launched (so-called fire-and-forget
weapons).  But they still must be fired under human control.  *People*
are supposed to recognize white flags and surrendering soldiers.


"Unreasonable behavior" and software

<LIN@XX.LCS.MIT.EDU>
Tue, 16 Sep 1986 09:01 EDT
    From: Gary Chapman 

Risks of maintaining computer timestamps revisited

John Coughlin <JC%CARLETON.BITNET@WISCVM.WISC.EDU>
15 Sep 86 12:14:00 EDT

Some  time ago I  submitted an item  to RISKS describing  the way in which the
CP-6 operating system  requires the time to be set  manually during every warm
or cold boot.  The latest release  of this OS contains an improvement: in most
cases the time need only be  manually set on a cold boot.  Unfortunately, with
this enhancement came an unusual bug.

The timestamp is  stored in a special hardware register,  which is modified by
certain  diagnostic procedures  run during  preventive maintenance.   It seems
these diagnostic  procedures were not modified  to reflect the new  use put to
the timestamp register.  As a result, any time a warm boot was performed after
PM,  the monitor  would freak  out at  the illegal  timestamp and mysteriously
abort the boot  with a memory fault.  Until this bug  was patched the only fix
was to power the computer down, thus clearing the offending value.

Luckily, the  PM procedure set the timestamp  register to an impossible value,
rather than a realistic but incorrect value.  Therefore the problem manifested
itself in  an obvious way, instead  of subtly changing the  date and time.  Of
course  this was  at the  cost of  having to  fix a  hung system.  This is yet
another illustration of the risk of breaking one thing while fixing another.

                                                                     /jc

Please report problems with the web pages to the maintainer

Top