The Risks Digest

The RISKS Digest

Forum on Risks to the Public in Computers and Related Systems

ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator

Volume 9 Issue 72

Wednesday 28 February 1990

Contents

o Clients cross about crossed wires
David Sherman
o 100-year-old can drive four years without test
David Sherman
o Some comments on the Airbus
Robert Dorsett
Martyn Thomas
o Re: Problems/risks due to programming language
Bruce Hamilton
o Comments on programmer error
Geoffrey Welsh
o "Goto considered harmful" considered harmful
Brad Templeton
o lockd
Caveh Jalali
o Re: Railroad interlocking systems
J.A.Hunter via Brian Randell
o Info on RISKS (comp.risks)

Clients cross about crossed wires

David Sherman <dave@lsuc.on.ca>
Mon, 26 Feb 90 16:47:29 EST
Toronto Star, February 25, 1990:

BROCKVILLE (CP) - Ma Bell got a wrong number last month -- 17,000 times.  That
is the number of long-distance calls incorrectly charged to telephone numbers
in Iroquois, Ont.  A Bell Canada official has described the mistake, blamed on
computer error, as the company's biggest foul-up in years.

During the billing period from Jan. 12 to Feb. 16, calls made by residents of
Athens, Ont. were charged to customers in the Iroquois exchange.  [...]
I"ve never seen anything like this in my seven years as area manager," said Ron
Kristiansen, customer services manager for Bell's Brockville, Kingston and
Belleville exchange areas.  Some Iroquois customers reports they had been
charged for up to 40 long-distance calls they had not made, more than doubling
their monthly phone bills.


100-year-old can drive four years without test

David Sherman <dave@lsuc.on.ca>
Mon, 26 Feb 90 07:55:31 EST
Toronto Star, February 26, 1990:

    "I think it's stupid -- the whole damn thing," grumbled Charles
Narraway.  The Nepean, Ont. [suburb of Ottawa - DS] resident will be 100 years
old in March, and that seems to have turned up a bug in the transportation
ministry's computer.  For 20 years, Narraway has had to take an annual road
test o get his driver's license renewed.  And for 20 years he passed it the
first time, every time.  But this year he got a license, good for four years,
without a test.     Narraway's driver's license shows his date of birth as
90-03-04, and he figures the computers have tacked the wrong century on to the
front....
          [Some of you will recall the story of the man whose insurance rates
          tripled (high-risk) when he turned 101.  The computer truncated... PGN]


Some comments on the Airbus

Robert Dorsett <rdd@walt.cc.utexas.edu>
Mon, 26 Feb 90 23:28:54 -0600
Dave Morton wrote in RISKS-9.65:

> We talked about the
> Airbus crash and it seems that in pilot circles it was attributed to the
> fact that the older Airbus machines had Rolls-Royce engines. It seems that
> when given full power they reacted about 3 seconds faster that those on
> The A320. In his opinion these three seconds were vital.

However, it's a totally different category of engine (CFM-56 vs. large-fan).
Different airplanes require different approaches, and "type conversion"
training is usually very thorough.  The pilots in question also had 300-odd
hours in type (each), 10,000+ hours of experience (each, distributed among
different aircraft types), and the captain (who was flying) was the chief of
A320 training for Air France.  What's noteworthy about the Mulhouse-Habsheim
crash was the lack of formal cockpit procedures and the lack of flight crew
experience in the type of maneuver performed.  It was almost definitely pilot
error.

>a role in his judgement. On a side note the pilot also mentioned that the
>757, which also has the fly by wire system, sometimes "hangs". The only
>way to get the electronics to function again is to power cycle the lot.

That may be so, but it's not possible to extend the lesson to the A320, since
the design of the systems are totally different (the fundamental *design*
philosophy as regards pilot access to the electrical system is different, too).
And as a small note, the flight augmentation system of the 757 is in no way
comparable to the A320's fly-by-wire system (or its associated software
protections).  The former is dedicated to damping aerodynamic instability;
the former introduces new *control laws* and appears to be intended to work
around problems involved in the use of side-stick controllers.

George Michaelson wrote in RISKS-9.69:

>I find it hard to raise any possible risks in technology transfer to developing
>countries (does that label apply to India?) given the overtones of chauvinism
>if not downright racism

This sentiment bugged me--light-footing it around safety-critical issues
because they might reflect poorly of the policies of developing nations.
Bugged me enough to write a rather long-winded response (which I'll be glad
to email to anyone who's interested in my ramblings).  Suffice it to say that
maintenance *is* a critical issue among all airlines operating new equipment,
and that the situation is *particularly* serious in developing nations.

Udo Voges wrote in RISKS-9.70:

>have decided to require an extended basic training in India.  The pilot S. S.
>Gopujkar was probably acting as an instructor during the accident flight on
>last Wednesday (14 Feb 90 uv), despite the fact that he didn't have the
>required qualification. This would explain, why he asked the control tower to
>make a manual landing on sight.

The problems in this account are:

1.  No airline that I'm aware of performs simulated in-flight systems failures
on revenue flights.

2.  Airbus proper regards "manual" (electric horizontal stabilizer trim and
manual rudder) backup as a last-ditch emergency system.  Numerous reports
indicate that its operation isn't even part of the standard Airbus training
curriculum.  It's not something that a marginal pilot is likely to be playing
with.  Marginal pilots tend to hide *within* automation, to become systems
managers, rather than pilots.

3.  It's highly unusual for an aircraft to consult with an air traffic con-
trol facility on the operation of on on-board system (unless, of course,
the pilot expected to crash).

The account sounds highly suspect.  Upon examination, various terms could
be examined--perhaps "manual" means hand-flying it (with protections)
to the ground, rather than tracking an ILS, and the tower wished to confirm
that the ILS wouldn't be used or something.  I don't know.  But note that if
this IS the case, the A320's "safety" features should prevent precisely this
sort of crash--the pilot should be able to fly to the surface, jerk
back on the stick, do the most violent maneuvers, etc--all without crashing
the airplane (that is, above 100', beneath which the protections disappear).

> So again it is - only - human  malfunctioning (insufficient training,
> bad management procedures, financial reasons ?).   Udo Voges

But then the question is: assuming that the systems ARE working properly, when
does it stop being "human error" and become poor ergonomics?  RISKS has
traditionally concentrated on software or hardware reliability issues, which
are in themselves important issues.  But don't forget that the very way the
pilot INTERACTS with the aircraft is also a safety issue, and the A320 has
thrown the lessons painfully learned over the past 80 years right out the
window.

Indeed, on a more general scale, the A320 is only the most extreme example
of a disturbing trend.  If one studies the different cockpit designs of
airplanes introduced since 1981 (all of which use different display formats on
CRT-based flight and engine instrumentation), one starts to recall the absolute
lack of standardization that was rampant during the 1920's, 30's, and 40's.
When one considers the problems this will cause in training, and factors in
pilot gullibility (perhaps believing what the manufacturer says about the
safety features of the airplane--I find it quite amusing that the pilot op-
ponents to excessive automation or fly-by-wire are being branded with '50's-
mentality "anti-progress" labels), I suggest that we can probably expect the
same sort of safety records.


A320 sidesticks

Martyn Thomas <mct@praxis.UUCP>
Tue, 27 Feb 90 16:51:23 BST
The A320 sidestick layout is asymmetric - the captain flies with the left hand,
the first officer with the right. This means that when the first officer is
flying in the left-hand seat - as will happen from time-to-time, e.g., for
training - there is the added unfamiliarity of using the other hand.

Martyn Thomas, Praxis plc, 20 Manvers Street, Bath BA1 1PX UK.

     [It is pessimal when the captain is right-handed and the first officer
     is left-handed and *both* are flying with the *wrong* hand.  But the
     switching back and forth must undoubtably be confusing.  PGN]


Re: Problems/risks due to programming language

Bruce Hamilton <BHamilton.osbuSouth@Xerox.COM>
26 Feb 90 17:48:40 PST (Monday)
I'm incredulous at the number of replies seeming to say, "No language
prohibits all errors, so we might as well use "C"".

Some 25 years ago, PL/I had named scopes.  You can say something like

Foo: BEGIN
...
  EXIT Foo;
...
  END Foo;

and the EXIT has a clearly-defined scope, irrespective of how many
intermediate levels of nesting there are.

--Bruce   213/333-8075


Comments on programmer error by Clifford Johnson & others

Wed, 28 Feb 90 11:46:17 EST
   In RISKS 9.71, there seems to be a general discussion of programmer
error and its inevitability. I think that the entire discussion can be
summarized as follows (sorry, I don't know who originally wrote this):

     The tendency to err which programmers have been noticed
     to share with other human beings has often been treated as
     if it were an awkwardness attendant upon programming's
     adolescence, which (like acne) would disappear with the
     craft's coming of age. It has proved otherwise.

   For years I avoided `C' because I was literally intimidated by the ease
with which one can confuse oneself, such as:

#define square(x) ( x * x )
[...]
b = square( a++ );

   Eventually a wise friend pointed out that no language can hope to
second-guess your intentions and point out possible errors in logic, and
that few programmers could maintain their sanity near a compiler that did
so. Since then I've developed a few rules of thumb, such as not modifying
the contents of more than one variable per line wherever possible (a
natural outgrowth of having done some assembler), and I have been able to
put out occasional `C' code that is neither more nor less buggy on average
than what I write in any other language - though that doesn't say much!

Geoff Welsh, 602-66 Mooregate Crescent, Kitchener, Ontario N2M 5E6 CANADA


"Goto considered harmful" considered harmful

Brad Templeton <brad@looking.on.ca>
Wed, 28 Feb 90 14:42:03 EST
This is a most unusual RISK.  That famous paper put the poor GOTO so out of
vogue that programmers are actually afraid of using it -- they will make their
programs more complex and harder to understand in order to avoid using the
dreaded goto.

They do this out of fear of rejection.  Some great programmer will read
their code, see the "goto" and go tsk-tsk.  Horrors.

Because C has no multi level break or continue, the use of a goto to do loop
exception handling detracts in no way from the "structuredness" of the program.

But the fear of the stigma of goto has lead programmers to bugs, it seems.
Yet another example of a RISK due to fear of RISKS.

Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473


lockd

Caveh Jalali <caveh@csl.sri.com>
Mon, 26 Feb 90 13:34:05 -0800
To lock is to block!  sometimes.

Due to a bug in SunOS 4.0.3, lockd (/usr/etc/rpc.lockd) will occasionally dump
core on NFS servers.  This happens silently and goes unnoticed until some NFS
client attempts to lock a file on that server.

As a result, a process requesting a file lock on a file will block on disk wait
indefinitely (`D' state report from ps) until lockd is restarted (manually) on
the NFS server for that file.

The quickest way to detect this is to check for a core file in /.
"file /core" will provide the name of the process which is responsible
for the core file.  Now, quickly check if that process is still around
using "ps".  If it is not, it is probably necessary to restart the
service manually.  In the case of lockd, it is sufficient to:

    cd / ; /usr/etc/rpc.lockd

to restart/restore the service.  The "cd" makes sure the next core
dump will go in / again, which is where we "expect" it.

00c - Caveh Jalali

   [Our MM -- but not RISKS' --was out of commission as a result of the lockd
   problem.  The LOCK-MESS MONSTER STRIKES AGAIN.  BY THE WAY, we are backing
   off to a `less advanced' (old standard) SENDMAIL.  It will probably be
   ready for the next issue of RISKS, so please stand by for a different set of
   horror tales.  We had a dilly yesterday when SENDMAIL suddenly started
   spawning processes like crazy, and had to be killed.  PGN]


Re: Railroad interlocking systems

Brian Randell <Brian.Randell@newcastle.ac.uk>
Tue, 16 Jan 90 11:44:35 BST
A short while ago I passed a message I'd seen on the net (in RISKS?)
about railway interlocking to one of my colleagues here at Newcastle
who is very interested in such topics. He gave me permission to
forward his reply to RISKS, so here it is.

Brian Randell

   =======
>From: J.A.Hunter@newcastle.ac.uk
With reference to:
> Date: Fri, 5 Jan 90 09:49:21 CST
> From: Douglas W. Jones <jones@pyrite.cs.uiowa.edu>
> Subject: Railroad interlocking systems
> [...]
> I suspect that there are useful parallels between the history of railroad
> interlocking machines and the more recent developments in safety critical
> digital control systems.  An error in the logical design of an interlocking
> machine could easily go undetected until it caused a train wreck, and I
> wonder if old cases involving railroad interlocking machines might provide
> useful precidents for many of the software liability questions that have been
> raised recently.

There's one case in British railroad history of an interlocking frame being in
service for many years and then (apparently) causing a train wreck.  It
happened on February 14th, 1928 at Paragon Street station, Hull, in the East
Riding of Yorkshire.

The area had a large and complex track layout controlling several main and
secondary lines and freight routes to the local dockyards.  On that day two
trains met head on, one having been incorrectly routed onto the wrong track.
The signal box (control tower) in question would have had around two hundred
levers and had three signalmen in charge.

To understand the accident it is necessary to realise that signalling safety
depends on more than just the interlocking frame.  There are trackside devices
that "prove" the position of points and signals and prevent, for instance, a
signal being cleared if the switch it refers to has not been set appropriately,
with both blades in position and locked.  Additional equipment detects the
presence of a train to prevent a switch being moved under it.  These days this
is done using electrical track circuits, in which the presence of a train is
detected by the short-circuit through the axles from one rail to the other, but
in mechanical systems this was not always present and various mechanical
detectors were used.  One of these was a treadle fitted near to switches;
depressed by the flanges of the vehicles' wheels, it locked the switch in place
until cleared.

In the Hull accident, the signalmen were concerned not to delay a late running
train and one of them reset the signal cleared for train A after the locomotive
and three coaches had passed it.  At this point the locomotive was still thirty
feet from the trackside treadle which would lock the next switch during
passage.  During the time the train took to reach the treadle, less than a
second, it had effectively disappeared from the logic of the interlocking
frame.

During this "window" one of the other signalmen set the route for train B.
Working from the situation found after the accident, it was possible to show
that this second signalmen has mistakenly pulled levers 95 and 96 instead of
the correct 96 and 97.  Train A was then routed into the path of train B.  The
interlocking that would have prevented this had been disabled because the first
signalman had been too quick in cancelling the cleared signal, releasing the
interlock on the switch, and the trackside detection of the train had not yet
been triggered.

Clearly this type of accident requires a human presence.  The two signalmen
represent co-operating processes who failed to obey the rules on (human)
interlocks, and the machinery failed to limit their actions.  I suspect
that many aspects of railroad safety are probably of greater interest to
psychologists than to computer scientists.  Railroad interlocking frames
may represent some of the most reliable computer programs ever constructed,
but they are not immune from misuse by the "user".

In present-day railroad signalling systems, many more of the operations are
automatic and detection systems are more complete.  Modern British practice
requires that once a route has been set, a time delay of about one minute is
enforced by the software before a conflicting route can be set.  That such
systems are not perfect and still rely on human vigilance was shown by the
Clapham (South London) accident late in 1988 where a faulty track circuit train
detector "lost" a train standing at a signal allowing an automatic system to
route another train into the rear of it.

A good source book for information on British railroad safety is:

     "Red for Danger", by L.T.C.Rolt,
     published by David & Charles Inc. (North Pomfret, Vermont 05053),
     ISBN 0-7153-8362-0

My copy is a 1982 fourth edition, but the book is is still in print.  The
information on the Hull accident above is my precis from a more detailed
study within that publication.

Phone: +44 91 222 8226                                 Computing Laboratory
Fax:   +44 91 222 8232       The University of Newcastle upon Tyne, England

Please report problems with the web pages to the maintainer

Top