The Risks Digest

The RISKS Digest

Forum on Risks to the Public in Computers and Related Systems

ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator

Volume 19 Issue 75

Thursday 21 May 1998


o Galaxy IV malfunction causes massive pager outages
Roy Rodenstein
o Galaxy IV and the risks of efficient technologies
Richard Cook
o Navy turns to off-the-shelf PCs to power ships
Chiaki Ishikawa
o Frankness on Frankston
Peter B. Ladkin
o Review of RISKS comments on Frankston
Peter B. Ladkin
o Re: Once again, I'm risking my life flying
Jim Wolper
o Info on RISKS (comp.risks)

Galaxy IV malfunction causes massive pager outages

Roy Rodenstein <>
Wed, 20 May 1998 10:03:39 -0400 (EDT)
Around 6PM on Tuesday, May 20th the Galaxy IV satellite's "onboard control
system and a backup switch failed." The satellite reportedly provided pager
service to 80%+ of US customers, and also carried NPR, several television
networks, and Reuters news feeds.  A spokesman for PanAmSat (which owns the
satellite) has said contact does exist with the satellite but has not said
how soon service can be restored- apparently one option is to use a backup
satellite already in orbit, but it could take several days to reposition it.

This is a single-point-of-failure case with a twist. Although Galaxy IV
reportedly had a backup system it failed, which might make one wonder
how thoroughly backup systems are tested. If 80%+ of pagers in the US
have been affected, this is quite an egregious case of SPOF. On the
other hand, from the point of view of PanAmSat, adding points of failure
for services in space is not the same as doing so for services on earth.
As for pager companies, given that CBS quickly switched to Galaxy 7,
perhaps their backup plans were not as robust as they could have been.

Roy Rodenstein,
Future Computing Lab, GVU Center, Georgia Tech

    [Also noted by many others.  This case certainly brings up a lot of
    issues discussed here previously.  The event occurred just as my plane
    was leaving Dulles after my Senate testimony on infrastructural risks
    and vulnerabilities <>!  PGN]

Galaxy IV and the risks of efficient technologies

Richard Cook <ri-cook@UCHICAGO.EDU>
Wed, 20 May 1998 17:36:44 -0500
It will take some time to gauge the complete impact of the loss of a
communications satellite Galaxy IV. Some technical details of the narrow
locus of failure are available at PanAmSat ( The
event will certainly be expensive if control of the satellite cannot be
reestablished. But more difficult to calculate are the larger costs of the
disruption of service. Particularly troubling is the loss of paging services
used for hospitals including (according to news reports) Johns Hopkins and

Radio news reports suggested that hospitals and other services were
retreating to simple telephone communications. It is unlikely that such a
retreat would be feasible. The widespread use of paging systems has produced
a highly distributed system that is nearly entirely dependent on these
methods of establishing communications. Work patterns themselves are now
distributed in hospitals in ways that make operation without paging systems
effectively impossible. Some hospitals in which I work no longer have any
overhead voice paging capability at all and the huge volumes of paging for
virtually every possible service and activity make even those archaic
remnants of a previous time woefully inadequate.

The outage points out the subtle dependencies that flow from the
applications of larger scale, highly integrated technologies, especially the
boundaries of failure. The benefits of these modern systems are largely in
efficiencies of work that they permit. When they fail, the impact can extend
in space and time far beyond the obvious, first order 'edges' of the
system. Hence, the term coined above: the "narrow locus of failure". This is
the technical system and its well defined and readily apparent functions. In
this incident, the NLF is the satellite and the pagers and the gasoline
pumps and the television nets, and so forth. But the larger locus of failure
here would include all the subsequent and serial effects.  These are
especially difficult to describe completely because of they are so
distributed and diverse.

Predecessor systems were not robust, of course. Hard wired telephones failed
and the local doctor-in-the-hospital [not for nothing were they called
`residents'!] could be overwhelmed by the demands of a single ward or
floor. What was distinct about these earlier systems was the narrowly
restricted boundaries of their failure. The failure of a local telephone
system or a local resident was limited in scope. Not so with the new
systems. That this sort of failure occurs shoudl give us real pause as we
consider the various proposals to use advanced technology for medical

There are a host of proposals to use advanced information technology for
electronic medical records, for so called 'telemedicine', for remote robotic
surgery, for drug dispensing and monitoring, for home infusion, etc. etc.
These applications are always touted by proselytizers as advances in quality
or capability to be employed for improved performance.  But the history of
technological applications shows that these advances are exploited mainly to
achieve greater efficiencies in production, and that the gains are quickly
eaten up by this use. That is to say, the advances are exploited not to
achieve a more robust system but a more efficient one -- and one with the
potential for large scale, catastrophic modes of failure. The example of
Galaxy IV is a demonstration of the potential of such highly technical

Richard I. Cook, MD, Cognitive Technologies Lab., Dept of Anesthesia and
Critical Care, University of Chicago, Chicago, IL 60637 1+773-702-5306

Navy turns to off-the-shelf PCs to power ships (Educom)

Chiaki Ishikawa <>
Thu, 21 May 1998 10:50:58 +0900 (JST)
The U.S. Navy, facing pressure from Congress to cut spending, is maintaining
its cutting edge by replacing expensive custom-built systems with
off-the-shelf products.  "If we insisted on military specs, we'd be a
generation behind, and they'd cost twice as much," says the intelligence
officer on the USS Coronado.  The new strategy, called IT-21 or Information
Technology for the 21st Century, is the brainchild of the Pacific Fleet
commander-in-chief Adm. Archie Clemins.  "If you use proprietary systems,
you can never stay current with technology," says Clemins.  Another
advantage is a shortened learning curve: "Everybody knows how to use the
technology so training costs are way down."  In addition, using
off-the-shelf systems makes it a lot easier to coordinate joint operations
with U.S. allies.  "Proprietary computers were too expensive for our
coalition partners."  The only downside is that the Navy may be losing some
of its computer brain power to the private sector: "Our people are very
valuable in the commercial world," says a spokesman.  (*St. Petersburg
Times*, 18 May 1998; Edupage, 19 May 1998)

  I am not quite sure what the phrase in the Educom headline "off the shelf
PC", but I certainly wish that the Navy is not trusting weapon control or
cruise control to Windows 95.

Come to think of it at least the quote from the Naval personnel doesn't
include "PC".  But then, "Everybody knows how to use the technology..."
suggests a well-known brand of software products.

I shudder to think that Win95 is used to control real-time embedded systems
and such...

Chiaki Ishikawa
Personal Media Corp.,   Shinagawa, Tokyo, Japan 142

Frankness on Frankston (RISKS-19.73)

"Peter B. Ladkin" <>
Wed, 20 May 1998 23:34:49 +0200
Well, Bob Frankston [19.73] is risking his life again, but not by flying....

Let's start out calmly. He's deluding himself if he thinks he should
feel safer on a plane with a handheld GPS rather than a sextant; or,
one takes it, a working compass. Four crucial points are:

1. A `magnetic direction indicator' is one of three required air
   data instruments on board every airplane, per FAR 91.205(a).
   The two others are altimeter and airspeed indicator. This
   requirement has been valid, and almost unquestioned, for a half
   century. These are three highly-reliable instruments with few
   failure modes, most (but not all) of which are visible or testable.

2. A compass has very few failure modes compared with that of a GPS,
   and multiple compasses have only one non-visible common failure
   mode, namely when flying in a heavy magnetic field or near either of
   the two Poles. GPS has failure modes that are not easily inspectable;
   e.g. 80-mile position shifts, and any signal disturbances, signals
   themselves having multiple failure modes. See recent Risks, and
   the comment by O'Connell, below.

3. A GPS and a compass have different functions. It's hard to keep the
   wings level on an aircraft with total power failure using a GPS
   (but it can be learned, according to Jim Wolper); you can easily
   with a compass (if it turns, your wings aren't level). Keeping wings
   level is essential for flying an airplane under instruments. Similarly,
   you fly a heading with a GPS by extrapolating; it's harder to tell from
   the GPS how well and accurately you are flying the heading, but this is
   essential information; radar vectors are also based on headings; ATC does
   not do it by GPS fixes. A GPS tells you where you are; a compass does not.

4. The DC-10 has two compasses, not three as Frankston suggests. If one
   is not functioning, then the aircraft is left with only one. That is,
   no backup; remember, this is one of the three essential air data

Now upping the shrill level: concerning point 2 I'm flabbergasted that
Frankston apparently thinks a handheld GPS is somehow equivalent in safety
to a working compass or a sextant. You can't jam a sextant or a compass
because there are no signals to jam, and you can tell by inspection if your
sextant is working. Celestial navigation works (at night!)  where other
things don't (as noted by Dave Alexander).

Points 1-4 may be well illustrated by example.  Suppose the DC-10
suffered a total electrical failure (say, like the Martinair B767 a
couple of years ago -- see my `incidents' compendium). Suppose also
that the East Coast suffers a power failure sufficient to take out
the radar (check out the 200+ hours outages suffered by En-Route ATCCs
in the US from Sept94-Sept95, the NTSB report concerning which appears
in my compendium). Suppose also that GPS signals were reported to be
jammed (see Risks-19.71 and earlier. Afficionados will notice that they
don't have to *be* jammed, they just have to be *reported* to be unreliable).
Could Frankston now explain how the DC-10 is supposed to keep flying in a
straight line, let alone to a particular identified area of the country,
without a working compass (or sextant :-)? If his conversation with the crew
didn't enlighten him, he might be advised to pick another airline. There's
probably an equal chance they were pulling his leg.

I would probably have wanted to pick another airline anyway - I'll admit to
a superstition about any DC-10 with a `cargo door problem' (this particular
story is a classic for engineering ethics courses, and there's a source book
on it). Another superstition is flying in V-tail Bonanzas in
light-to-moderate turbulence at over 125kts. I won't admit to any others.

A British pilot told me that most of his colleagues assume that IRS or radio
fixes will be available; therefore it is not standard company policy to
check the standby compass; and that the only couple of pilots he knew who
always did check were ex-727. As Charlie Brown would say: Good Grief!

Peter Ladkin

Review of RISKS comments on Frankston (RISKS-19.73)

"Peter B. Ladkin" <>
Wed, 20 May 1998 23:18:08 +0200
PGN graciously (or desperately) allowed me to compile the comments to Bob
Frankston's article, which include Frankston's reply and (editor's
perquisites) a pontifical postscript from me.

Alexander McClellan <> pointed
to an article on Transport Canada's WWW site: (Aviation Safety Letters,
Issue 3/97, "But I Could Hit a Hill..." )
which says:

"First of all, GPS is not infallible. As we've said many times in the
past, GPS satellites can transmit faulty signals and, unless you have an
installation certified for instrument flight rules (IFR) flight, you
won't be warned. Faulty satellites have caused 80-mi. position errors in
the past. Even if you have an IFR box, there will be times when there
just won't be enough satellites to navigate. What if this happens at a
critical point in your flight when the visibility is too poor to

Transport Canada was talking about hand-held GPS boxes.  Henry Spencer
<> commented on the reliability issue for commercial

[Spencer] Consumer GPS receivers are not necessarily designed to work
at airliner speeds.  While there is quite extensive informal use of
hand-held GPS receivers as auxiliary navigation aids, especially in
smaller aircraft which lack heavy-duty navigation systems of their
own, the FAA is quite rightly reluctant to officially sanction this
practice without some assurance that the receivers are actually
designed to work reliably in that environment.  Having a
supposedly-reliable navigation aid that is lying to you is much worse
than having to get by without it.

John T Faulks <>, who works in Lockheed Martin
FADEC Product Support, elaborated some of the questions an avionics
professional would ask about the requirements behind Frankston's proposal:

[Faulks] How should you add a GPS to the cockpit - do you want it to
talk with the other navigation systems, do they have available input
channels (ARINC or whatever)? Or do you want to sit it in the pilots
lap? Should they turn them off during taxi and takeoff - if there are
concerns about the passengers using electronics in the main cabin, the
risk in the cockpit is much much greater.

This concern was echoed by David Alexander <>,
an ex-military pilot working now on Systems Integration, who commented on
resiliency and integration problems as well as a (literal) war story about
sextant use:

[begin Alexander]
I for one am glad that the regulations are that tight. As
someone who builds resilient systems (and I also design and build High
Availability systems) [I am] aware of the need to retest the entire
configuration if you make a single change. [..] The fact that the
system is complex in itself adds a new dimension of risk [..] and you
cannot be certain that there has been no unintentional impact on the
other systems (you mean you believe the spec was right ?!!!) without

[..] don't forget that aircraft systems are normally specially
designed to reduce and remove spurious EMP that could affect other
systems and rely on external antennae. What with the potential RFI and
shielding I'm not sure I'd have confidence in a handheld GPS in a
cockpit of a big 'multi'.

[Speaking of sextants,] During the (1982) Falklands War
[one of my former military instructors] navigated a Vulcan bomber
(early 1960's design) of the RAF [on the longest bombing mission in
history] from Ascension Island to the Falklands Islands, successfully
bombed the runway there, and got back to Ascension using mainly a
sextant. The Vulcan only had a LORAN C navigation system that was not
much use in the South Atlantic.
[end Alexander]

Another tale (pilots love tales!) was contributed by George Bleyle
<>, a United Airlines A320 captain and check pilot, to confirm
the advantages of a sextant over even a compass:

[Bleyle] True story.......30+ years ago, the brother of a Navy navigator
friend (also a Navy nav) was assigned to VX-6, the Navy C-130 [otherwise
known as the Hercules turboprop transport aircraft PBL] squadron that
provided logistics support out of Christchurch, NZ, for the annual
wintering-over expeditions to the Antarctic. On one trip, after departing
the ice shelf in a near white-out for a return trip to Christchurch, and
climbing out to VMC [Visual Meteorological Conditions, that is, you can see
where you're going PBL] on top of an apparently endless cloud deck, the
aircraft suffered a TOTAL and complete electrical failure. No AC no DC
(after batteries depleted), no comm, no nav, no nothing; VMC on top with all
directions NORTH. Well, using just his periscopic sextant, HO-214 (the Air
Almanac), and a chart, he was able to continuously shoot the sun to get a
True Bearing to the sun, work backwards, and compute headings to fly to
Christchurch...  The aircraft arrived successfully, and he was awarded a
Navy Commendation, etc.  All done with a sextant and chart.  Nice work.

  [Spelling of Christchurch fixed in archive copy.  PGN]

Ryan O'Connell <> addressed reliability and,
indirectly, safety pithily:

[begin O'Connell]
[..] The compass is the most basic form of instrument on
an aircraft - if you don't have a reliable one, you're
stuffed. (Flying a standard instrument arrival (STAR) without one is
difficult at best)

[and, substantiating the point about GPS unreliability:]
A quick look through a database of Notices to Airmen (NOTAMs) reveals:
          NMR NID
          4605N 'S OF MONTE CENERI'

Vincent Dovydaitis <> noted the general covariance of
reliability with complexity, as well as the importance of determining
the boundaries of the engineering task, and the unsolved, often
sparsely addressed, cognitive issues that arise with technology change:

[begin Dovydaitis]
1. New technology is necessarily better than old technology -- a GPS
(a highly complex instrument involving reception of satellite signals,
calculation (triangulation) and data processing) is not more reliable
than a compass (a magnetized needle floating in a fluid).

2. [..] A reliable GPS installation involves questions of structural
integrity (drilling holes in the skin of the plane to mount the
antenna), electrical integrity (connecting it to the plane's power
supply), and electromagnetic integrity (interference from or to other
devices), assuming you start with reliable GPS hardware and software.

3. System reliability -- the total navigation/communications system of
the plane must have a probability of failure of less than 10^-9. Not
easy to achieve, and requires elimination of single points of failure,
etc.  Which is why more than one compass is required in a commercial
plane, and why complex technology (like GPSs) must be backed up by
simpler technology (compasses).

4. The dominant display effect -- if a GPS driving a moving map and a
compass disagree, which is the pilot looking at?  Which does he believe?
[end Dovydaitis]

Jeremy Leader <> considered safety, taking
to task Frankston's comparison with WWW development:

[Leader] Frankston ends by citing the Web as a "good example" of a
resilient distributed system [..]  I agree that the web is, over all,
a robust distributed system.  But to imply that aircraft should be
developed by "hacking" rather than "design" suggests that the penalty
for error in an aircraft is similar to the penalty for a web server
going down! [..] the design techniques used should be chosen in part
based on the cost of being wrong.  Hacking is great if you don't mind
occasional crashes.

Rob Borsari <> played the role of the offended pilot,
giving double meaning to the phrase `enough is enough':

[begin Borsari]
[..] The technology of flight navigation is very reliable
and accurate with out using any "modern electronics" at all.  It is
quite possible and safe to navigate a plane using a magnetic compass
and a watch.  In fact all of the computer systems on the plane rely on
these two instruments for their baseline data.  [...]

"Strange systemic interactions [Frankston quote]" in a flight
navigation system means that you are given a position with an unknown
variance.  This can be fatal.  It is far better to know that you don't
know where you are then to incorrectly assume that you are given a
precise position when it is incorrect.

And here is the insulting part: Lousy navigation.  I flew a 1946
Cessna for many years using only a mag compass a watch and the most
basic of radio navigation systems VORs.  [...] any trained
pilot can use those tools to locate his position and allow accurate

Here is a list of electronic navigation technologies in order of age:

Radio Range
VOR (Tacan for the military but that is a different story)

The only one not currently in use is the Radio Range.  Any one of them
provides a cross check to the pilot's basic navigation.  Without any of
them an aircraft can still be navigated safely.  In bad weather you  need
at least a VOR for navigation and ILS for precision navigation.
[end Borsari]

Many of these points were also addressed by Kyle Schmidt
<> and
David Eklund <>.

Bob Frankston <> replied:

[begin Frankston]
[...] avionics is just an example [...] to emphasize the flaws in the
philosophy of "whole system testing". It just doesn't scale in a world
where there are no systems that work in isolation.

The Internet is a great counter-example. It shifts the responsibility
for reliability (including defining what is meant by reliability) to
the end points and doesn't do more than try to get packets through
reasonably often and reasonably quickly.  See [..]

If we were better able to evolve systems and to add functionality
without decreasing reliability we would be to take advantage of the
rapid evolution in capability, performance, cost and reliability,
we've seen in computer systems.  Reliability?? Yep. I know that
computers crash (or otherwise get befuddled) but I'm glad to have the
choice of having a very powerful system even if I do have to press
reset once in a while. It is wonderful that the Web allows bad HTML
and stale URLs.

I realize that these issues are as much about marketplace philosophy
as about technology. But a point of Risks is that technology doesn't
exist in isolation.  While it is useful to be aware of risks, the
issue is more one of tradeoffs.  Which risks does one choose and when
does the design paradigm have to change?
[end Frankston]

Now, one can observe that some discussants, including Frankston, focus
on reliability but barely mention safety. For those unsure of the difference,
I quote from Nancy Leveson's text Safeware:

[Leveson, p163] *Reliability* is the characteristic of an item
expressed by the probability that it will perform its required
function in the specified manner over a given time period and under
specified or assumed conditions.  [The second definition on p172 is
similar. PBL]

[Leveson, p175] An *accident* is an undesired and unplanned (but not
necessarily unexpected) event that results in (at least) a specified
level of loss.

[Leveson, p181] *Safety* is freedom from accidents or losses.

What's the difference? A car whose brakes always work but which often
won't start is safe but not reliable. A car whose brakes occasionally
don't work but which starts every time is probably more reliable but
certainly less safe.

Suppose, as many of us and also the FAA believe, safety is a major
issue in aviation data systems. Now, it's easy to see how braking
systems involve safety whereas cabin reading lights may not, partly one
may suspect because they play different roles. But since it has been
argued that a compass and a GPS can be used to play similar roles, and
these roles are informational rather than directly physically-causal,
is there an example of an important safety consideration that would not
also be a reliability issue? Indeed yes - the failure modes of the
compass are more visible and testable; the failure modes of the GPS
can be hidden. Knowing when you're broken is a safety issue.

Suppose Frankston were to be right in his more general point that
uncoordinated human market activity can ultimately ensure the emergent
property of reliability - his argument that the WWW is an example is
certainly worth debating. Those who think that such activity might
also ensure safety may wish to ponder whether seated passengers on
commercial jets tend to keep their safety belts fastened in cruise.

The last word on safety goes to Peter Mellor:

[Mellor] I do think it would be a good idea to point out to the
average frequent flier that seeing the pilot getting on board with a
toy GPS should inspire as much confidence as seeing her get on board
with a guide-dog.

edited by Peter Ladkin

Re: Once again, I'm risking my life flying (Frankston, RISKS-19.73)

Jim Wolper <>
Tue, 12 May 1998 16:42:16 -0600 (MDT)
Bob Frankston, in RISKS 19.73, writes about avionics, and raises issues
about complex systems.  I have changed the order of some of the quotes for
clarity.  I have omitted his arguments in favor of redundancy.

> But the new problem is (was) a bad compass. The third compass on the plane
> had to be replaced due to FAA rules [...] what is of concern is that they
> couldn't just go out to the store, buy a GPS, and place it in the cockpit.
> As a passenger, when I bring my GPS and PC, I've got technology far far
> ahead to the technology on the plane.

As Bob notes elsewhere, a transport aircraft is a complex system, but he
seems to ignore the role of the crew in its operation.  Here's an analogy:
since I am a pilot, I probably know more about navigation than the
average train crew, but my bringing a GPS and PC onto a train will
not improve its safety or reliability unless I know something about
trains, switches, signals, and the like.

> The reason that the systems can't be upgraded is that the whole plane
> would have to be recertified as a new aircraft [...]  There is something
> very wrong here. The engineering practices that are supposed to assure our
> safety seem to work to assure our lack of safety.

I would argue that GPS is no safer than other navigation
systems; its advantage is its efficiency.  It is certainly neither
failure proof nor fail-safe.  Nathaniel Bowditch (1773 - 1838), compiler
of the first compendium on navigation, wrote "A prudent navigator will
never limit himself to a single method, particularly one requiring
 ... a device that is subject to mechanical damage or loss."  The
most robust navigational device is the compass.   Speaking as a pilot,
if the chips (no pun intended) were down and I could only have one
navigational device I would choose the compass over the GPS
every time.

> I presume, though, that the mechanical systems try to be
> independent-enough to reduce the propagation of failures.

Actually, American 191, the DC-10 lost at O'Hare in May 25, 1979, uncovered
many interactions in the mechanical systems.

  [Corrected in ARCHIVE copy. PGN]

> But, if we think about the simple example of just placing a GPS in the
> cockpit and allowing the airplanes computer to use the data we have a very
> different model.

I would argue that this is not a simple example.  A GPS would need to be
thoroughly tested for system interactions, wiring would be needed, software
would be needed to interface with the existing navigational systems, company
training manuals would need to be rewritten (I work part-time for a charter
operator, and I am currently rewriting the training manual for one of the
aircraft types we operate; it is a rather large task), the Minimum Equipment
List (which must be consulted when there is inoperative equipment; it's why
the original flight couldn't depart without the spare compass) would need to
be revised, and maintenance and inspection procedures would need to be
revised.  I would be uncomfortable about omitting any one of these steps.

> Yes, there can be strange systemic interactions. But, instead, we have a
> situation that assures lousy navigation rather than permitting
> improvements when available.

The situation does not assure lousy navigation.  It assures good navigation
and avoids strange interactions.

> Understanding how to build such resilient distributed systems is still in
> the challenge category.


Jim Wolper ATP/PhD/CFI, Associate Professor of Mathematics,
Idaho State University; Pilot/Instructor, Avcenter, Inc.

Please report problems with the web pages to the maintainer