The Risks Digest

The RISKS Digest

Forum on Risks to the Public in Computers and Related Systems

ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator

Volume 3 Issue 25

Thursday, 21 July 1986

Contents

o Petroski on the Comet failures
Alan Wexelblat
o Re: Comet and Electra
Douglas Adams
o On the dangers of human error
Brian Randell via Lindsay Marshall
o Software Paranoia
Ken Laws
o Royal Wedding Risks
Lindsay Marshall
o How to Think Creatively
John Mackin
o Dangers of improperly protected equipment
Kevin Belles
o Info on RISKS (comp.risks)

Petroski on the Comet failures

Alan Wexelblat <wex@mcc.com>
Thu, 24 Jul 86 12:02:41 CDT
Henry Petroski's book _To Engineer is Human_ has a segment discussing
the Comet crashes and the detective work done to figure out why they
occurred (pages 176-184).  The story he tells makes no mention of curved
or rounded window corners.  The highlights:

 - On May 2, 1953, a de Havilland Comet was destroyed on takeoff
   from Dum-Dum Airport in Calcutta, India.  The Indian Government
   Board of Inquiry concluded officially that the accident was caused
   by some sort of structural failure either due to a tropical storm
   or to pilot overreaction to storm conditions.

 - The Comet was flown "off the drawing board"; no prototypes were ever
   built or tested.

 - On January 10, 1954, a Comet exploded after takeoff from Rome under
   mild weather conditions.  The plane was at 27,000 feet so the debris
   fell into a large area of the Mediterranean.  Not enough was recovered
   to allow any conclusion on why the crash had occurred.

 - On April 8, 1954, another flight leaving Rome exploded.  The pieces from
   this one fell into water too deep to allow recovery, so more pieces from
   the previous crash were sought and found.

 - Investigators eventually found the tail section which provided conclusive
   evidence that the forward section had exploded backward.  The print from
   a newspaper page was seared into the tail so strongly that it was still
   legible after months in the Mediterranean.

 - The question now was WHY did the cabin explode?  The reason was found only
   by taking an actual Comet, submerging it in a tank of water and simulating
   flight conditions (by pressurizing and depressurizing the cabin and by
   hydraulicly simulating flight stresses on the wings).

 - After about 3000 simulated flights, a crack appeared at a corner of one
   cabin window which rapidly spead (when the cabin was pressurized) and the
   cabin blew apart.

 - Analysis finally showed that rivet holes near the window openings in the
   fuselage caused excessive stress.  The whole length of the window panel
   was replaced in the later Comet 4 with a new panel that contained special
   reinforcement around the window openings.

Although Petroski doesn't give his sources directly, much of his material
appears to be drawn from the autobiography of Sir Geoffrey de Havilland
(called _Sky Fever: The Autobiography_, published in London in 1961) and from
a book called _The Tale of the Comet_ written by Derek Dempster in 1958.

In general, I recommend Petroski's book; it's quite readable and has lots of
material that would be interesting to we RISKS readers.  Of particular
interest is the chapter called "From Slide Rule to Computer: Forgetting How
it Used to be Done."  It's an interesting (if superficial) treatment of
some of the risks of CAD.

Alan Wexelblat
ARPA: WEX@MCC.ARPA
UUCP: {ihnp4, seismo, harvard, gatech, pyramid}!ut-sally!im4u!milano!wex

Currently recruiting for the `sod squad.'


Re: Comet and Electra

Adams Douglas <crash!pnet01!adamsd@nosc.ARPA>
Thu, 24 Jul 86 07:43:49 PDT
It was my understanding that the problem with the early Electras was whirl-mode
flexing of the outboard half of the wing. I had heard that Lockheed reassigned
its few then-existing computers to full-time research on the problem. But it
was also my understand that the original design cycle for the Electra did not
involve computer assistance at all--they weren't being used for aircraft
"simulation" that early (1948?).


On the dangers of human error [contributed on behalf of Brian Randell]

"Lindsay F. Marshall" <lindsay%kelpie.newcastle.ac.uk@Cs.Ucl.AC.UK>
Thu, 24 Jul 86 11:28:28 bst
[From brian Fri Jul 18 17:30 GMT 1986]
The following article appeared in the Guardian newspaper (published in
London and Manchester) for Wed. July 16. The author, Mary Midgely is,
incidentally, a former lecturer of Philosophy at the University of Newcastle
upon Tyne. Brian R. was pleased to see such a sensible discussion in a daily
newspaper of the dangers of human error that he thought it worth passing on
to the RISKS readership, so here it is.....

  IDIOT PROOF

  Little did I know, when I wrote my last article about human error, that the
  matter was about to receive so much expensive and high-powered attention. 
  Since Chernobyl, it has been hard to turn on television without receiving 
  more official reassurance that accidents do not happen here.  Leading the 
  chorus, the chairman of the Central Electricity Generating Board came on 
  the air to explain that, in British nuclear reactors, human error has been 
  programmed out entirely.  Other equally impressive testimonies followed.  
  Even on these soothing occasions, however, disturbing noises were sometimes 
  heard.  During one soporific film, an expert on such accidents observed that 
  human error is indeed rather hard to anticipate, and told the following 
  story.

  A surprising series of faults occurred at a newly-built nuclear power 
  station, and were finally traced to failure in the cables.  On
  investigation, some of these proved to have corroded at an extraordinary
  rate, and the corroding substance turned out to be a rather unusual one,
  namely human urine.  Evidently the workmen putting up the power-station
  had needed relief, and had found the convenient concrete channels in the
  concrete walls they were building irresistibly inviting.  Telling the
  tale, the chap reasonably remarked that you cannot hope to anticipate this
  kind of thing - infinitely variable human idiocy is a fact of life, and
  you can only do your best to provide against the forms of it that happen
  already to have occurred to you.

  This honest position, which excluded all possible talk of programming it
  out, is the one commonly held by working engineers.

  They know by hard experience that if a thing can go wrong it will, and that
  there are always more of these things in store than anybody can possibly have
  thought of.  (Typically, two or three small things go wrong at once, which is
  all that is needed).  But the important thing which does not seem to have
  been widely realised is that hi-tech makes this situation worse, not better.

  Hi-tech concentrates power.  This means that a single fault, if it does
  occur, can be much more disastrous.  This gloomy truth goes for human as well
  as mechanical ones.  Dropping a hammer at home does not much matter; dropping
  it into the core of a reactor does.  People have not been eliminated.  They 
  still figure everywhere - perhaps most obviously as the maintenance-crews who
  seem to have done the job at Chernobyl, but also as designers, sellers and 
  buyers, repairers, operators of whatever processes are still human-handled, 
  suppliers of materials, and administrators responsible for ordering and 
  supervising the grand machines.

  What follows?  Not, of course, that we have to stop using machines, but that 
  we have to stop deceiving ourselves about them.  This self-deception is 
  always grossest over relatively new technology.  The romanticism typical of 
  our century is altogether at its most uncontrolled over novelties.  We are as
  besotted with new things as some civilisations are with old ones.

  This is specially unfortunate about machines, because with them the gap
  between theory and practice is particularly stark.  Only long and painful
  experience of actual disasters - such as we have for instance in the case
  of the railways - can ever begin to bridge it.  Until that day, all
  estimates of the probability of particular failures are arbitrary guesses.

  What this means is that those who put forward new technology always
  underestimate its costs, because they leave out this unpredictable extra
  load.  Over nuclear power, this is bad enough, first, because its single
  disasters can be so vast - far vaster than Chernobyl - and second, because
  human carelessness has launched it before solving the problem of nuclear
  waste.

  Nuclear weapons, however, differ from power in being things with no actual
  use at all.  They exist, we are assured, merely as gestures.  But if they
  went off, they would go off for real.  And there have been plenty of
  accidents involving them.  Since Chernobyl and Libya, people seem to be
  noticing these things. Collecting votes lately for my local poll on the
  Nuclear Freezen project, I was surprised how many householders said at
  once: "My God, yes, let's get rid of the things."  This seems like sense.
  Could it happen here?  Couldn't it? People are only people.  Ooops - sorry...


Software Paranoia

Ken Laws <Laws@SRI-STRIPE.ARPA>
Thu 24 Jul 86 17:40:04-PDT
  From: Bard Bloom <BARD@XX.LCS.MIT.EDU>
  The VAX's software generated an error about this.  The IBM
  did not; and the programmers hadn't realized that it might be a problem (I
  guess).  They had been using that program, gleefully taking sines of random
  numbers and using them to build planes, for a decade or two.

Let's not jump to conclusions.  Taking the sine of 10^20 is obviously bogus,
but numbers of that magnitude usually come from (or produce) other bogus
conditions.  The program may well have included a test for an associated
condition <>after<< taking the sine, instead of recognizing the situation
<>before<< taking the sine.  Poor programming practice, but not serious.

A major failing of current programming languages is that they do not force
the programmer to test the validity of all input data (including returned
function values) and the success of all subroutine calls.  Debugging would
be much easier if errors were always caught as soon as they occur.  The
overhead of such error checking has been unacceptable, but the new hardware
is so much faster that we should consider building validity tests into the
silicon.  The required conditions on a return value (or the error-handling
subroutine) would be specified as a parameter of every function call.

I tend to write object-oriented subroutines (in C) that return complex
structures derived from user interaction or other "knowledge-based"
transactions.  Nearly every subroutine call must be followed by a test
to make sure that the structure was indeed returned.  (Testing for valid
substructure is impractical, so I use NULL returns whenever a subroutine
cannot construct an object that is at least minimally valid.)  All these
tests are a pain, and I sometimes wish I had PL/I ON conditions to hide
them.  Unfortunately, that's a bad solution: an intelligent program must
handle error returns intelligently, and that means the programmer should
be forced to consider every possible return condition and specify what
to do with it.

Errors that arise within the error handlers are similarly important, but
beyond my ability to even contemplate in the context of current languages.

Expert systems (e.g., production systems) often aid rapid prototyping by
ignoring unexpected situations -- the rules trigger only on conditions
that the programmer anticipated and knew how to handle.  New rules are
added whenever significant misbehavior is noticed, but there may be
no attempt to handle even the full range of legal conditions intelligently
-- let alone all the illegal conditions that can arise from user, database,
algorithm, or hardware errors.  I like expert systems, but from a Risks
standpoint I have to consider them at least an order of magnitude more
dangerous than Ada software.
                    -- Ken Laws


Royal Wedding Risks

"Lindsay F. Marshall" <lindsay%cheviot.newcastle.ac.uk@Cs.Ucl.AC.UK>
Thu, 24 Jul 86 13:46:31 gmt
Yesterday (23rd) we lost all power to our machine room when a circuit
breaker blew.  The cause of this was a glitch which hit us at about
13:50 P.M.  This was approximately the time that the main Royal Wedding
television coverage stopped............ 


How to Think Creatively

<munnari!basser.oz!john@seismo.CSS.GOV>
Thu, 24 Jul 86 18:21:08 EST
Recent comments in Risks about ``computer literacy'' lead Herb Lin
to comment that:

> The problem is ultimately related to clear thinking, and how to teach
> people to do THAT.

This reminded me of some mail I received last year, from a staff member
here who was teaching a first-year course on data structures.  His mail,
which was sent to a number of us here, was a plea for assistance as to
the right way to respond to some mail he had received from one of his
students.  The student's mail said:

> Dear Jason,... You have really done a great job on IDS. It really helped to
> clear a lot of lingering doubts Lent term left behind.  Thanks a lot
> again.  Could you advise on how to think creatively. I can't "see" a
> program naturally and think deep enough to make the required alterations...

None of us really knew how to answer that.

John Mackin, Basser Department of Computer Science,
         University of Sydney, Sydney, Australia

john%basser.oz@SEISMO.CSS.GOV
{seismo,hplabs,mcvax,ukc,nttlab}!munnari!basser.oz!john


Dangers of improperly protected equipment

Kevin Belles <crash!pnet01!kevinb@nosc.ARPA>
Thu, 24 Jul 86 01:08:50 PDT
  Is there any device or devices that protect not only the active lines 
but the ground lines as well from surge, spike, and EMI-type disturbance? 
My system appears to have been victimized, thanks to our local electric
utility, by the ground for my apartment complex being raised, which caused
damage to all the damage to all the grounded equipment on my home computer
system, save some cards apparently protected by my boat-anchor power supply,
and the fact that each card in my cage is independently regulated.  In my
case, the surge entered the ground and apparently corrupted my main floppy
drive supply to the point where it propagated along the 8" and 5 1/4"
cables, destroying the logic boards on all drives and the dynamic memory,
which was being accessed at that time. It also managed to get my printer, on
another leg entirely, while miraculously missing my terminal and modem. This
completely bypassed the fuses and only a trace on the controller board being
opened saved the rest of my system being damaged. Result: 3 dead DSDD 8"
drives, 1 dead SSDD 5 1/4" drive, 3 drive power supplies, 1 dot-matrix
printer, 1 64K DRAM board, and a floppy controller board. Dollar cost:
estimated minimum of over $2000.00 if equipment is replaced by new, with no
cost for loss of access being figured in.

Let this be a warning: Protect your equipment! Any investment in anti-surge
equipment, anti-spike equipment, and UPSs are investments in your computing
future.

Kevin J. Belles - UUCP {sdcsvax,noscvax,ihnp4,akgua}!crash!pnet01!kevinb

(Disclaimer: Anything I may say is my opinion, and does not reflect
            the company I keep. KjB)

Please report problems with the web pages to the maintainer

Top