The Risks Digest

The RISKS Digest

Forum on Risks to the Public in Computers and Related Systems

ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator

Volume 2 Issue 29

Monday, 17 Mar 1986

Contents

o Commission vs. Omission
Martin J. Moore plus an example from Dave Parnas
o A Stitch in Time
Jagan Jagannathan
o Clockenspiel
Jim Horning
o Cordless phones
Chris Koenigsberg
o Money talks
Dirk Grunwald
date correction from Matthew Kruk
o [Non]computerized train wreck
Mark Brader
o On-line Safety Database
Ken Dymond
o Info on RISKS (comp.risks)

Commission vs. Omission

"MARTIN J. MOORE" <mooremj@eglin-vax>
0 0 00:00:00 CDT
Dave Parnas's points regarding the shuttle destruct system are well taken.
The policy, stated informally, was that "it better work if we need it --
but it absolutely better NOT 'work' when we DON'T need it" which generated the
extreme emphasis on preventing what Dave calls "risks of commission."  I feel
that the risk of commission on the destruct system is extremely small, while
the risk of omission is somewhat higher, although still small.  During
validation testing and in every pre-launch checkout, we performed "exhaustive"
checks -- "exhaustive" meaning that we tried every combination of
  [(2 central computers) * (6 remote sites) * (2 computers per site)
      * (2 transmitters per site) * (2 comm paths to each site)
      * (2 possible commands in various sequences)].
Yeah, this takes a *LONG* time (with practice, we got it down
to several hours if everything went smooth.)  On one occasion during
validation testing, we did find a software error which only manifested on a
particular (central computer/comm path/remote computer/unusual command
sequence) combination.  Exhaustive tests *are* necessary.

I have often wondered why the emphasis was to prevent errors of commission
over errors of omission (not to say that we wanted either kind, but errors
of commission were definitely considered to be worse!).  An erroneous
destruct would cost the lives of the flight crew, loss of the Orbiter, and
possibly damage on the ground if it occurred early in the flight (e.g.,
windows blown out, etc.)  An erroneous non-destruct, in the worst case (if
the ET were to detonate near the crowded spectator area on the NASA
causeway), could cause the loss of TENS OF THOUSANDS of lives.  Certainly
this is worse than an erroneous destruct.  I believe there may be a
subconscious feeling that an erroneous destruct means the difference between
a success and a disaster, while an erroneous non-destruct means the
difference between a disaster and a worse disaster.  Subjectively, that
difference is not as great as the first, although objectively it may be much
greater.
                                     Martin Moore

<The usual disclaimers.  I'm too tired to type in the whole silly thing.>

        [By the way, Dave Parnas suggested the following example to
         illustrate his message in RISKS-2.28:]

         "Consider elevators.  Consider how much easier it is to prevent the
         floor indicator from saying "13" than to assure that the floor
         indicator will always give the actual floor that the elevator is
         on.  The risk of indicating "13" can be gotten acceptably low by
         eliminating "13" from the set of indicator lights.  The risk of
         indicating an incorrect floor or not indicating the current floor
         is much harder to eliminate."  [Dave Parnas]


A Stitch in Time

<JAGAN@SRI-CSL.ARPA>
Mon 17 Mar 86 11:43:53-PST
   [As it now turns out, the reboot occurred just moments BEFORE I logged
    in Sunday night.  Here are some further details.  PGN]

This is the probable sequence of events that led us back in time on CSLA:
1. A power glitch (late night SUNDAY) caused the F4 to hard boot.
2. During a hard boot, the TIME is retrieved from eleven independent sources
   (which are assumed to be correct!)
3. One of these sources had the incorrect time of some warm day in 1972
   causing the average to be wrongly computed resulting in Dec 6th/1985.

Suggestion:
1. Change the statistical measure from MEAN to something less sensitive to
   one or two abnormal times; for example the average of the 5th, 6th, and 7th
   largest times.

       [IT IS ABSOLUTELY INCREDIBLE THAT UNSAFE ALGORITHMS continue to
        be used.  This problem is as old as the hills.  Statisticians
        routinely throw out the absurd values before computing the
        mean.  Dorothy Denning pointed out the pun in their terminology
        (applicable to Byzantine agreement algorithms, where you don't
        trust anyone): the OUT-LIERS are really the OUT-LIARS.

        EVEN WORSE, Jagan points out that if the clock had been accidentally
        set INTO THE FUTURE, things could also get very sticky.  We also
        have a problem of nonunique clock readings during the hour at 2AM when
        Daylight Savings Time ends.  A good time to be asleep.  PGN]

[Here is some more background.]

  Date: Mon 17 Mar 86 12:37:37-PST
  From: Mark Lottor <MKL@SRI-NIC.ARPA>
  Subject: [Louis A. Mamakos <louie@trantor.UMD.EDU>: time]
  To: Jagan@SRI-CSL.ARPA

  This was just to verify that the problem was on the
  remote system and not some local problem...
                ---------------

  Date: Mon, 17 Mar 86 15:31:14 EST
  From: Louis A. Mamakos <louie@trantor.UMD.EDU>
  To: MKL@sri-nic.ARPA
  In-Reply-To: Mark Lottor's message of Mon 17 Mar 86 11:34:41-PST
  Subject: time

  Yes, I can verify that it was indeed the clock (actually the host the clock
  was on) that was screwed up.  It it unfortunate that there is no way to get
  the current year out of the WWVB clock.  There was work being done in the
  computer room, which reset our LSI-11/73 host, which subsequently got
  confused.  Sorry about the problem.

  Louis A. Mamakos  WA3YMH    Internet: louie@TRANTOR.UMD.EDU
  University of Maryland, Computer Science Center - Systems Programming


Clockenspiel

Jim Horning <horning@decwrl.DEC.COM>
17 Mar 1986 1436-PST (Monday)
Your errant clock reminds me of something that happened at Stanford in
the mid-sixties. I was apparently one of the first users of the 360/67
to run a job that started on one day and finished on the next. When the
statement for my account arrived in the mail, I had quite a job
convincing my wife that the huge figure (to a graduate student couple)
was nothing to worry about: It was a CREDIT resulting from a job that
was charged for minus 23 hours and 58 minutes!

The Xerox Alto operating system had a compiled-in reasonableness check
on the date and time. When it started up, if the local clock wasn't
"reasonable," it sent a request over the Ethernet and put "Date and
Time Unknown" in the banner. Well, you guessed it: The day came when
the (correct) time from the time server was no longer "reasonable," and
therefore couldn't be corrected by appeal to the time server....

Jim H.


RISKS re: Cordless phones

<Chris.Koenigsberg@G.CS.CMU.EDU>
17 Mar 1986 11:48-EST
My roommate has a cordless phone and it goes on the blink every few weeks.
All the phones in the house stop working. When you pick any one up, all you
get is a very loud static sound and you can't dial out. I have learned that
I can fix this problem by sneaking in his room and unplugging the cradle for
his cordless phone. A visitor in the house was very frightened one night
when she was left alone and though someone had cut the phone lines or
something. It was the cordless phone on the blink again.

Chris Koenigsberg
ckk@g.cs.cmu.edu , or ckk%andrew@pt.cs.cmu.edu
{harvard,seismo,topaz,ucbvax}!g.cs.cmu.edu!ckk
(412)268-8526 office, (412)362-6422 home
Center for Design of Educational Computing
Carnegie-Mellon U.
Pgh, Pa. 15213


re: money talks

Dirk Grunwald <grunwald@b.CS.UIUC.EDU>
Mon, 17 Mar 86 16:44:15 CST
I read the 'money talks' article with great amusement. One of the risks to
society which is worth talking about is the risk of using inappropriate, or
downright silly, technology.

Talking money would appear to be such a waste of resources. Certainly some
other method of denomination descrimination could be devised for the
visually impared. Rasied lettering, coinage instead of paper money, different
sized paper money, different paper stock. But talking money?

dirk grunwald
university of illinois


Money Talks

<Matthew_Kruk%UBC.MAILNET@MIT-MULTICS.ARPA>
Mon, 17 Mar 86 09:07:32 PST
Correction to my previous message: The date of the article should be
March 14th (Friday).


[Non]computerized train wreck

<ihnp4!utzoo!lsuc!msb@seismo.CSS.GOV>
Mon, 17 Mar 86 19:04:03 EST
Me:
> Not only was this NOT a case of computer malfunction, but indeed, a more
> fully computerized system (with cab signalling and automatic train stopping)
> would probably have prevented the accident.

PGN:
> [... Thus, even though it appears NOT to be a computer problem,
> we discover that the computer could have done better!  But, of course,
> don't blame the computer system.  Blame the people who specified,
> designed, and implemented it -- not JUST the train operator(s).  PGN]

You sound more critical than I meant to be.  The cost of equipping all major
railways with cab signalling and the like would be considerable, to say the
least.  While such installations certainly do exist, especially on busy
high-speed lines, the "centralized traffic control" in use on the route in
question is probably much more common.  Are you calling on all railways to
upgrade their signaling systems long before they are life-expired, every
time something somewhat better comes along?

Mark Brader  (ihnp4!utzoo!lsuc!msb and ...!dciem!msb are both me.)

   [One would hope that new improvements do not always require everything
    to be thrown out.  Long ago we discovered the advantages of software
    solutions over hardware solutions.  But when human lives are at stake,
    safer systems may be worth the price of upgrading equipment.  I think
    that the incredible escalation of law-suit awards and of rates for
    malpractice and liability insurance may provide some new incentives.  PGN]


On-line Safety Database

"DYMOND, KEN" <dymond@nbs-vms.ARPA>
17 Mar 86 15:14:00 EST
Our Library Bulletin (and as a frequent user I'd have to say that the NBS
has one of the best technical libraries going) for February contained a
notice that Pergamon Infoline (evidently a supplier of such services) is
offering a new online database service, SAFETY: "SAFETY, produced by
Cambridge Scientific Abstracts, provides broad interdisciplinary
coverage of safety, including industrial, transportation, environmental,
and medical safety.  This database indexes journals, books, reports,
patents, and proceedings published in 1981 or later."  If someone on
the list uses this database, please let us know how well it covers
computer and software safety.

Ken Dymond
National Bureau of Standards

Please report problems with the web pages to the maintainer

Top