The Risks Digest

The RISKS Digest

Forum on Risks to the Public in Computers and Related Systems

ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator

Volume 18 Issue 10

Tuesday 7 May 1996

Contents

o The Cali and Puerto Plata B757 Crashes
Peter Ladkin
o Telephone accounting
Warrick Jackes
o DOs and DON'Ts: A Perversity of Owner's Manuals
Ken Knowlton
o 30% of the births in California
Bob Frankston
o "Survey Finds Computers Under Siege"
Peter G. Neumann
o RISKS posting leads to e-mail attack!
Martyn Thomas
o Denial of service made easy....
David Lesher
o ACLU Post-Trial Brief on the Web Site
Ann Beeson
o Re: Cambridge University systems hacked!
Stephen Early
o Re: AOL censors British town's name!
Peter Miller
o Re: Odds of an accident for the Challenger
Gareth McCaughan
Pete Mellor
Paul Green
Dani Eder
o Info on RISKS (comp.risks)

The Cali and Puerto Plata B757 Crashes

<ladkin@TechFak.Uni-Bielefeld.DE>
Tue, 7 May 1996 23:29:07 +0200
On December 20, 1995, American Airlines flight 965 (AA965), a B757-223
serial number N651AA, crashed into mountains on approach to Cali, Colombia.
This was the first B757 fatal accident in a decade and a half of service,
and I believe the first jet accident for AA in about the same length of time.

On February 6, 1996, a B757 of Birgenair, registration TC-GEN, crashed
into the ocean off Puerto Plata, Dominican Republic, shortly after takeoff.
This was the second fatal B757 accident.

The US National Transportation Safety Board is `participating fully' in each
of the investigations. Statements and documents concerning the accidents
released by the Columbian and Dominican Republic agencies are available from
the NTSB. Statements on Cali were released on 28 Dec 1995 and Feb 15 1996,
and the `docket' (containing the factual reports of the `specialty groups'
of investigators) has been released recently. Statements on the Puerto Plata
accident were released Feb 7, Mar 1 and Mar 18 1996.  Final reports for
neither are yet available.

Contrary to the impression given by some recent articles in the
non-specialist press, neither findings nor causes nor causal factors have
yet been officially determined in either investigation. However, the factual
reports so far indicate to me that the human-computer interface could be
involved in both accident sequences.  These airplanes have a safety record
amongst the best of any type in regular airline use - many B757 and B767
aircraft have been flying for a decade and a half in the fleets of airlines
all over the world, and there have only been three fatal accidents (the
first, to a Lauda Air B767, is thought to have been due to an unrecoverable
technical failure alone).  After 16 years of exemplary safe use, one should
therefore not be too hasty to `blame the computer' or its interface.  Here
are some comments on both accidents.

Cali:

Happened on approach to the airport, which is aligned along a relatively
narrow valley with high mountains to either side. Because of benign weather
conditions, the pilots were offered a `straight in' approach (to land in the
direction they were flying) rather than to fly beyond, turn, and land in the
opposite direction (the more usual procedure). They were not familiar with
the `new' approach. During the approach procedures, the pilots were confused
about exactly where they were in relation to the arrival procedure charts.
They had passed a specific radio beacon (VOR) called Tulua, which is the
start of the arrival procedure. It seems that they were not aware that they
had done so, entered this fix into the Flight Management Computer (FMC), and
didn't immediately notice that the aircraft had begun to turn back to the
Tulua VOR. This turn began a quick sequence of events that led to impact
with a mountain (a CFIT, Controlled Flight Into Terrain accident) about
3,000ft below the summit. A Ground Proximity Warning System warning sounded
about 9 seconds before impact, initiated recovery procedures (full power,
maximum angle of climb, retract any draggy control surfaces that are
deployed).  The crew didn't retract the speed brakes (manually controlled),
which is a necessary part of the escape procedure.

The actual manoeuvers that led to impact were that of turning towards
a specified VOR, and turning towards a specified heading. Both were
initiated by the crew. The former was apparently accomplished with the
FMS, the latter with the autopilot.  Small airplanes such as my Piper
Archer have autopilots capable of such manoeuvers. The pilot flying
must command either, as happened at Cali. However, the more
sophisticated FMC requires more attention than a simple autopilot when
entering fixes, and investigators are paying attention to the role the
FMC-pilot interaction played.

The question that arises is what exactly played a causal role. There is a
known HCI phenomenon that when something is `not right', it takes more time
and attention to `debug' the situation when more sophisticated devices are
involved than when simpler ones are being used.  However, there were also
other, human, procedural problems. The December 28 statement noted that
there was no indication of descent checklist procedures being performed by
the crew, and no indication of an arrival or approach procedures briefing.
Also, it is a basic rule of flying that you know where you are at all times.
These pilots didn't, at a crucial phase of flight, even though they had
reputations for conscientiousness (see the Operational Factors/Human
Performance factual report).

Their confusion may have been aided also by some of the pilot-controller
discourse about the route for which they were cleared. (However, there
was no evidence of `language difficulty', as this is usually understood.)

Puerto Plata:

The captain's airspeed indicator (ASI) was observed to be failed on takeoff.
This is not an event that requires emergency handling.  The captain asked
the copilot to call out the significant airspeeds for takeoff (called V1 and
V2, also normal procedure) and took off as normal.  He then called for the
center autopilot to be switched on. His own airspeed indicator showed higher
and higher airspeeds as the aircraft climbed, even though the actual
airspeed of the aircraft (as recorded by ground radar) was much lower. The
first officer observed that his ASI showed marked decrease in airspeed, and
the pilots became confused over which ASIs were failed (the captain thought
at one point that they'd both failed). The aircraft apparently stalled and
the pilots did not succeed in recovering before hitting the ocean.

The DR/NTSB factual statement noted that the behavior of the aircraft was
consistent with the captain's pitot being blocked. The airplane had been
sitting on the ground for many days in the tropics without the pitots being
covered (there is no procedural requirement for this, but pilots and
mechanics all know that insects love pitots).  The pitot is a tube facing
into the airstream receiving air pressure facing in the direction of flight,
and a related static port receives (roughly) ambient air pressure. The
difference between the two is used to drive the ASIs in all aircraft. If the
pitot is blocked, then pitot pressure remains roughly constant and ambient
air pressure decreases as the aircraft climbs, leading to greater difference
between pitot pressure and static pressure, and thus to greater `indicated'
airspeed on the ASI.

The copilot has a separate pitot-static system. The pitot-static readings on
the B757 are fed into the left (for the captain's system) and right (for the
copilot's) Air Data Computers, and thence to the CRT display instruments for
the respective positions.  There is a third pitot-static system that is
purely mechanical (and therefore `traditional'). Normally, pilots of all
airplanes are also trained to use `alternative source' if a pitot-static
system fails. `Alternative source' on the B757 is to switch the displays so
that the captain's ASI reads from the right ADC and the first officer's from
the left ADC. Also, the `glass' ASIs should be checked against the
mechanical `backup'. There is no evidence that either `alternate source' was
used during the accident flight, or that either instrument was checked
against the backup, even when the captain falsely thought that both his and
the first officer's ASIs had failed.

David Learmount asserted in Flight International (27 Mar - 2 April) that the
central autopilot gets its data from `the' ADC. He must have meant to say
from the *left* (captain's) ADC. Supposing this is the case, the autopilot
would react to the (false) increasing airspeed indication by raising the
nose, to attempt to reduce `airspeed'.  Continuing to do so, since the false
`airspeed' continued to increase with increasing altitude, the airplane
would radically lose (real) airspeed, and eventually stall, which appears to
be what happened.  I have not yet been able to verify Learmount's (modified)
assertion with the NTSB or Boeing engineers. The question arises, why would
the captain switch on the center autopilot if his ASI had failed and he knew
it got its data from the same ADC?  Anecdotal information (a colleague with
access to a B757 operations manual, another who is a B757 pilot) suggests
that this might not be information that one could expect to be at the front
of every B757 pilot's mind.  These two situations (center AP gets data from
left ADC; this information not in the mental foreground when dealing with
ASI problems) may thus have played a causal role in the accident. (I
emphasise again that I have not yet confirmed either situation.) This could
be categorised as an HCI issue.  As at Cali, there were other apparent
procedural failures: failure to switch to `alternate source'; failure to
check against the standby mechanical instrument. Performance of either would
have avoided the accident: as Learmount says, the pilots lost control of a
flyable airplane.

Further Commentary:

Maybe one can see in these accidents two known and often-observed HCI
effects of automation, which I shall call Complacency and Complexity.  The
Complacency effect is that use of (normally reliable) automation can lead to
reduced awareness of the state of the system.  The Complexity effect is that
increased automation makes some straightforward tasks more complex and
interdependent. The effects are distinct, but both have the consequence that
it becomes harder to figure out what's going on when something is wrong. As
a discussion point, let me propose that the greater role at Cali seems to
have been played by the Complacency effect, with some Complexity effect; and
that at Puerto Plata by the Complexity effect alone.  For the answers, we'll
have to wait until the final reports.

Finally, I don't regard either accident as giving grounds for concern about
the role of automation in itself. But the reports might yield insights into
and improvements to the procedures for dealing with certain forms of
automation. One always hopes to learn from the tragedies.

The text of the official statements referred to above, as well as other
pertinent documents, may be found in the sections on these accidents in the
hypertext Compendium `Computer-Related Incidents and Accidents to Commercial
Airplanes' under
   http://www.techfak.uni-bielefeld.de/~ladkin/

Peter Ladkin


Telephone accounting

Warrick Jackes <wjackes@cit.gu.edu.au>
Tue, 7 May 1996 20:03:49 +1000 (EST)
* Cutting: "Sunday Mail (Brisbane)", Sunday, the 14th April 1996, page 12:

A Brisbane bloke was stunned to discover on his latest phone bill an amount
of nearly $900 for a call of more than 10 hours duration to the Solomon
Islands.  The bloke _does_ occasionally call the Solomons and _does_ admit
to being a bit of a yakker [talk the doors off a barn].  But for 10 hours?
Query with Telstra [read as Bell Australia] brought the testy advice that he
must have forgotten to hang up his receiver.  The bloke pointed out that
this theory was flawed by Telstra's own bill that showed that he'd used the
same phone to call Melbourne (Australia) only 11 minutes after the start of
the call to the Solomons.

* Cutting: "Sunday Mail (Brisbane)", Sunday, the 21st April 1996, page 12:

This case apparently stirred Telstra into prompt action.  By Monday, 15
April, the matter had been "investigated" and the charge waived.

Warrick JACKES, 52 Hamilton Road, MOOROOKA, Q.  4105 AUSTRALIA
+61 411 18 55 68  wjackes@cit.gu.edu.au

   [Otherwise, he would have been a Gofer-Bloke.  PGN]


DOs and DON'Ts: A Perversity of Owner's Manuals

Ken Knowlton <KCKnowlton@aol.com>
Tue, 7 May 1996 17:38:58 -0400
Letter-to-the-editor of *The New York Times* Magazine section, 5 May 1996,
quoted in full:

> James Gleick's 'manual Labor' (Fast Forward, April 7) touched a
> long-tender sore spot with me.  For example, the manual that came
> with a car I bought not long ago contained no fewer than 31 Cautions,
> 32 Warnings, 28 Do Nots and 2 Nevers.  (I never did discover the
> difference between a Do Not and a Never).  My favorite was this:
> `Do not open sun roof when car is covered with snow.'
> Robert L. Wolke,  Pittsburgh"

So much stuff to remember, but so few reasons.  Wouldn't it be a lot easier
to remember (or forever dismiss) such advisories if we knew the reasons, and
had a structure to hang them on?  Suppose I'm wedged between two trucks, the
car is on fire, but there's still some snow on the roof.  Should I try to
get out through the sun roof?  Do I risk only the obvious (some snow down my
neck) -- or a death worse than by fire?  Maybe best to wait til the snow has
melted and hope that by then I'm not already fried?  After all, if they've
taken such pains to tell me something, then more than the obvious risk must
be involved -- because if it were only the obvious, it wouldn't need to have
been stated, right?


30% of the births in California

<Bob_Frankston@frankston.com>
Sat, 4 May 1996 07:25 -0400
I heard on the radio today that a state legislator wondered why she was
getting literature aimed at single parents. Investigating, she discovered
that the state didn't record the marital status on birth records.  Instead,
they assumed that if the parents' last names were not the same, they were
both single!

And these are the kind of "statistics" on which we base public policy!

In thinking about the 30% of the California mothers who are thereby
supposedly single, I realize that the issue is deeper than bad methodology;
it is the reason why such numbers become important.  Much public policy is
driven by numbers.  The CPI (Consumer Price Index) is a more pervasive
example.  It reflects something or another but what is less important than
that there be a way to keep score.

Lest we blame the government, remember how much business policy is also "by
the numbers" -- numbers that aren't necessarily accurate.  In one case a
very large company assigned a programmer to assure consistency in the
numbers they had gathered from the field, which were the basis for billions
of dollars of corporate decision making.  He discovered that after
attempting to clean up the data that it was essentially pure noise by the
time it was massaged.

More obvious and more explicit are the "sweep" numbers used by television
stations.  Someone more involved in the specifics should provide the details
but apparently the advertising rates are set by tallying the viewing during
designated weeks.  The problem is that everyone knows which weeks they are
and therefore create special shows to cook the numbers.  Is there a better
system??  Perhaps not.

This is, of course, a risk of technology in that we have much better tools
to gather numbers than we did in the past.  But we also need some agreed to
rules even if they are perverse.  Interestingly, these numbers might assure
some degree of fairness by being so noisy that it is hard to predict their
outcome using inside knowledge.  But, maybe I'm just getting too cynical. Or
pragmatic?


"Survey Finds Computers Under Siege"

"Peter G. Neumann" <neumann@chiron.csl.sri.com>
Tue, 7 May 96 17:50:27 PDT
Much as I dislike statistics about ``computer crimes'', this might at least
be worth noting for the RISKS archives.

Jon Swartz, writing in the *San Francisco Chronicle*, 7 May 1996, p. C1,
summarized the conclusions of a survey just released by the Computer
Security Institute and the FBI's International Computer Crime Squad.  41% of
the respondents admitted to electronic intrusions or unauthorized probes of
their systems by disgruntled employees or competitors in the past year, and
roughly half of those involved insiders.  The Internet has become a "prime
source" of such activities.  20% said they did not know whether they had
been invaded; and nearly 75% of those said they would not report incidents
because of fear of negative publicity.  Medical and financial institutions
were particularly prone to data manipulations.  (There were 428 respondents
out of almost 5000 queried.)

One of the biggest problems in justifying the need for computer security has
always been that many organizations and individuals appear not to have been
compromised.  The 75% figure is perhaps the most interesting.


RISKS posting leads to e-mail attack!

Martyn Thomas <mct@praxis.co.uk>
Tue, 7 May 1996 18:41:47 +0100 (BST)
This morning, I started to receive e-mails accusing me of forcing a Swedish
band called Ace of Base to leave Sweden for ever. The e-mails continue to
arrive - some are very abusive. I have never heard of Ace of Base! (AoB)

Some detective work later, I discover the story: Scandinavian Airlines (SAS)
have reviewed Ace of Base in their in-flight magazine (apparently) and said
that they prove that musical ability is not a requirement for Swedish bands
(I paraphrase).

An AoB fan has produced a Web page
(http://www.ultranet.com/~wurther/opmdv.htm) that gives the story and
invites readers to send their views by e-mail to a "senior SAS official"; he
then gives *my* e-mail address!

Why does he do this? I'm guessing, but an Altavista search for my e-mail
address turns up Jan-Erik Andelin's MD80 Accidents Pages
(http://www.clinet.fi/~andelin/md80acsk.htm) containing a copy of a report I
posted to Risks in November 1993, quoting a Flight International report on
the Dec 1991 crash of an SAS MD-81! So I guess that "Wurther" searched for
SAS, found this page, assumed I must be an SAS official, and added me to his
list of hate targets!

I've mailed him and his webmaster, so far with no effect.

Martyn Thomas, Praxis plc, 20 Manvers Street, Bath BA1 1PX UK.
+44-1225-444700.    mct@praxis.co.uk    Fax: +44-1225-465205


Denial of service made easy....

David Lesher <wb8foz@nrk.com>
Fri, 3 May 1996 14:13:14 -0400 (EDT)
re: X-URL: http://www.bell-atl.com/college/

Bell Atlantic is making it easy for students to disconnect service.  Of
course, they are also making it easy for OTHERS to disconnect service for
you, and exposing the information you provide to anyone in between....


ACLU Post-Trial Brief on the Web Site

Ann Beeson <beeson@pipeline.com>
Wed, 1 May 1996 12:39:26 -0400
Folks following the _ACLU v. Reno_/_ALA v. DOJ_ case may want to check out
our post-trial brief, filed this past Monday and now available from our home
page at http://www.aclu.org .

We'll also be posting our joint (ACLU/ALA) 200-page "Findings of Fact"
shortly.  That document neatly reorganizes all of the evidence presented at
the hearing.

The next and final step in the case is oral argument, scheduled for Friday,
May 10th, at 9:30 a.m. in Philly.

Ann Beeson
Staff Counsel, _ACLU v. Reno_
American Civil Liberties Union
132 W. 43rd St.
NY, NY  10036
212-944-9800 x788

  [Courtesy of Audrie Krause, Executive Director, CPSR, P.O. Box 717 Palo Alto
  CA 94302 (415) 322-3778 akrause@cpsr.org * Web: http://cpsr.org/home.html ]


Re: Cambridge University systems hacked! (RISKS-18.09)

Stephen Early <sde1000@chiark.chu.cam.ac.uk>
Thu, 2 May 1996 11:19:11 +0100 (BST)
Another two risks demonstrated here are:

Summarisation of technical information by people who do not understand it
- a reporter in this case, but the risk probably applies elsewhere

Believing what newspapers print.

The story printed in the newspaper bears only a passing resemblance to the
real incident. What actually happened was that a packet sniffer was found
running on a machine on the subnet that connects the central Unix service,
mail server, and so on.  Everyone who uses these systems was required to
change passwords.  The e-mail system has not been replaced, and I've no idea
how this detail got into the article.

Steve Early  sde1000@cam.ac.uk


Re: AOL censors British town's name! (Overy, RISKS-18.08)

Peter Miller <pmiller@bmr.gov.au>
Mon, 6 May 96 11:39:30 EST
> 1) Use PGP.

This does not help.  What if the output of PGP encryption innocently
contains the byte sequence 0x63 0x75 0x6E 0x74?  Being gibberish, you didn't
check - but the computer may, and censor it.  There are similar problems
with uuencode, rot13, etc.  Sigh!

Peter Miller   pmiller@agso.gov.au  uunet!munnari!agso.gov.au!pmiller

   [Sigh-bernetics caught in a sigh-clone?  Sigh-onara!
   Imagine being tossed in jail because your encrypted
   message just happened to trigger a filter!  PGN]


Re: Odds of an accident for the Challenger (Perelman, RISKS-18.09)

Gareth McCaughan <G.J.McCaughan@pmms.cam.ac.uk>
Thu, 2 May 96 11:03 BST
I don't know about any public announcement, but in Richard Feynman's "What
do you care what other people think?" there is an extended account of his
part in the investigation of the Challenger disaster, including quite a lot
about the odds quoted within NASA for various kinds of failure, and their
(often tenuous) relation to reality.

I'm afraid I don't remember any of the figures, but they were wildly
overoptimistic. One interesting thing was that when Feynman talked to the
engineers who actually worked on the shuttle components, they gave pretty
good estimates for things; but somehow, as information propagated up the
management structure, it got fudged. (Cf. the entry in the Jargon File under
"SNAFU principle".)

Gareth McCaughan     Dept. of Pure Mathematics & Mathematical Statistics,
gjm11@pmms.cam.ac.uk Cambridge University, England.


Re: Odds of an accident for the Challenger (Perelman, RISKS-18.09)

Pete Mellor <pm@csr.city.ac.uk>
Thu, 2 May 96 11:05:20 BST
[Referring to Richard P.Feynmann: "What Do You Care What Other People
Think?", Unwin Hyman Ltd. (London), 1988, ISBN 0-04-440341-0 (first
published in USA in 1988 by W.W.Norton & Company Inc.)  Chapter "Fantastic
Figures" pp 177-188.]

> What were the odds?

10^-5 (i.e., 1 in 100,000) was NASA's "official" figure.  Feynman quotes the
range safety officer at Kennedy as having arrived privately at an estimate
of 1 in 100 (based on observation that 4% of unmanned flights failed, and an
optimistic assumption that manned craft must be safer).

Peter Mellor, Centre for Software Reliability, City University, Northampton
Square, London EC1V 0HB, UK. Tel: +44 (171) 477-8422 p.mellor@csr.city.ac.uk

   [Also commented on by
      smurf@noris.de (Matthias Urlichs).
     "Ratzka Wolfgang Dr." <ratzka@braun0.HRZ.Uni-Marburg.DE>
     Roy Murphy <murphy@panix.com>
   PGN]


Re: Odds of an accident for the Challenger (Perelman, RISKS-18.09)

<Paul_Green@vos.stratus.com>
Tue, 7 May 96 12:24 EDT
I have Volume I of the Report of the Presidential Commission on the Space
Shuttle Challenger Accident (U.S.  Gov't Printing Office, June 6th, 1986).
There are 5 volumes; Volume I is the summary and has 256 pages.  It is
well-written, easy-to-read, and remarkably free of technobabble.  Any large
library should have it; some of it is also online at <A
HREF="http://www.ksc.nasa.gov/shuttle/missions/51-l/docs/
rogers-commission/table-of-contents.html">Rogers Commission Report </A>.

Nowhere in this volume could I find a reference to the numerical odds
of a shuttle accident.  There are many statements that recognize that
there are risks that cannot be totally eliminated.  Conceivably, there
might be a calculation of the odds in one of the other volumes of the
report.  According to <A HREF="http://spacelink.msfc.nasa.gov/
NASA.Projects/Human.Space.Flight/Shuttle/Shuttle.Program.Changes.
Since.1986">NASA Press Release</A> there has been 1 failure in 74
flights (thru January 1996), for a reliability of 0.986.  (I asked a
web index tool to find references to "odds of a space shuttle
accident" to find these documents.)

Personally, I believe that every practicing engineer, and every manager in
an engineering organization, should read this report regularly.  It is both
enlightening and sobering on the difficulties of building reliable, complex
systems in the real world.

The accident was caused by "known problems" in a faulty design.  The
attempted resolution of the problems was poorly executed and poorly managed.
Safety concerns were not escalated up the management chain.  Known problems
were dismissed as "still within limits".  Launch constraints were waived at
the expense of safety.  Management reversed the position of its own
engineers.  All of this came out at the time, but perhaps some people who
were not around at the time have not heard about it.  I urge everyone to
read it.

Paul Green, Senior Technical Consultant, Stratus Computer, Inc., Marlboro, MA
01752   Paul_Green@stratus.com +1 508-460-2557  FAX: +1 508-460-0397


Re: Odds of an accident for the Challenger (Perelman, RISKS-18.09)

Dani Eder <ederd@bcstec.ca.boeing.com>
Thu, 2 May 1996 17:11:50 GMT
I have not seen that report, but in the late 1980's, when I was working on
Shuttle-derived launch vehicle studies, we did an in-house assessment
assuming the fixes post-Challenger were made correctly.  We got a figure of
1 in 70 per launch of losing an Orbiter, and 1 in 100 per launch in losing
the crew.  The difference in the two figures is because you can have a
landing that the crew can walk away from, but the landing was hard enough to
overstress the Orbiter structure, so that you have to write off the vehicle.

Dani Eder

  [Further comments from
    Ann_Holt@ftdetrck-ccmail.army.mil.  PGN]

Please report problems with the web pages to the maintainer

Top