The Risks Digest

The RISKS Digest

Forum on Risks to the Public in Computers and Related Systems

ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator

Volume 20 Issue 49

Wednesday 21 July 1999

Contents

o Intercom hang-up caused 1997 train collision
Mark Brader
o Computer-based patient monitor problems: improvements still needed
John Doyle
o Statistical errors in medicine
John Doyle
o Centaur/Milstar Software Error
Peter B. Ladkin
o Small problem escalates into major disruption
Doug Moore
o Computer startup circuits
M. Simon
o Netcom partial e-mail outage
Keith A Rhodes
o junkfilter vs. xxx.lanl.gov
Thomas Roessler
o "Bright Light" POP-based spam filtering: a bad idea
Lauren Weinstein
o E-mail attachments and local names
Avi Rubin
o Ab van Poortvliet: Risks, Disasters, and Management
PGN
o REVIEW: "The Mythical Man-Month", Frederick P. Brooks Jr.
Rob Slade
o Info on RISKS (comp.risks)

Intercom hang-up caused 1997 train collision

<msbrader@interlog.com>
Fri, 16 Jul 1999 00:40:38 -0400 (EDT)
On November 19, 1997, two GO commuter trains collided at low speed in
Toronto's central station (Union Station).  One was stationary, with 800+
people on board, of whom 50+ were injured enough to be taken to hospital;
the other would have been the next train out of the same track, and only
two crew members were aboard.

<http://www.bst-tsb.gc.ca/eng/reports/rail/1997/er97t0299.html> is the
official report, and two things it discusses seem worth a write-up here.
The first is the "dwarf signals" used for shunting moves in the station area,
where it's not uncommon for trains to be sent onto an already-occupied track.
When a move was authorized, the dwarf signal would show green (15 mph okay)
if the track was actually clear AND the movement was in the preferred
direction for that track, but otherwise yellow (15 mph max, but must be
able to stop within half the distance you can see).

This is all right if everyone drives according to the rules, but it means
that yellow sometimes does and sometimes doesn't mean that there's a train
known to be on the track.  And a survey of train crews after the accident
revealed that they generally did not know this.

The second and I think more serious problem was the intercom.  Commuter
trains with diesel locomotives often have the locomotive permanently at one
end, and a driving cab in the car at the other end, allowing quick reversal
between trips, and that's what GO does.  On this occasion the train was
making a short reverse move to change tracks, and the engineer (driver)
elected to stay in the locomotive and drive from the rear for the second
move.  The conductor was to keep a lookout ahead from the cab car.

Now, there is a good deal of train radio traffic in the area, and GO trains
have an intercom between the two cabs, so the crew members naturally used
that instead of their radios.  To start a conversation they would have to
buzz the other cab for attention and wait for the other person to pick up
the handset.  The risk factor is that there was NO indication of when the
handset had been HUNG UP again -- and no official rules for intercoms, which
could have required someone to say when they were hanging up.

The conductor had been on the intercom a little before he sighted the other
train, which was still a safe 200 feet away, and he assumed the engineer
was still listening -- so when called for a stop, he didn't buzz first!
And then when he realized that the train wasn't braking, he didn't have
quite enough time left to correctly decide what it was best to try next.

Mark Brader, Toronto  msbrader@interlog.com


Computer-based patient monitor problems: improvements still needed

"Dr D John Doyle" <djdoyle@home.com>
Sat, 17 Jul 1999 14:46:39 -0400
As an anesthesiologist with 30 years of computer experience I sometimes
give thought to the design of the computer-based instrumentation on
which my patients lives depend. For example, when I put a patient to
sleep for heart surgery, I insert monitoring lines to follow cardiac
filling pressures, cardiac output pressures and other parameters such as
the electrical pacing and conduction properties of the heart. Usually
five or more signals are displayed in real-time on a high-resolution
display. This data is just as important to us as flight-related data is
to an aircraft pilot, and forms the basis on deciding, for example,
which drugs we administer to get out of clinical crises.

Regretfully, personal clinical experience has lead me to realize that
much needs to be done to make patient monitors trustworthy and
ergonomically sensible. Two examples illustrate the point.

I know of two computer-based monitor designs that occasionally "reboot"
in the middle of surgery for no apparent reason (possibly due to
electromagnetic interference from the electrosurgical apparatus, or even
nearby cellular phones, but possibly also software-related), with the
result that the patient is at increased risk during the reboot period
(where you are "flying blind"). The situation is especially frustrating
when the pressure-signal-related zero offset information is lost on
reboot and the pressure transducers must all be rezeroed!

Also, another monitoring system I use frequently will annunciate an
"asystole" (cardiac arrest) alarm when ever the electrocardiogram signal
falls below a certain amplitude. The fact that normal cardiac pressures
are still being generated is ignored by the alarm management software,
resulting in an obviously wrong diagnosis. This kind of haphazard and
ill-conceived alarm arrangement is the reason why many of my colleagues
globally disable all alarms at the beginning of surgery, so they can
concentrate on taking care of the patient rather than following up on
countless false alarms.

So much needs to be done!

D. John Doyle MD PhD FRCPC, Associate Professor, Department of Anesthesia,
University of Toronto  djdoyle@home.com  http://doyle.ibme.utorono.ca


Statistical errors in medicine

"John Doyle" <djdoyle@hotmail.com>
Tue, 20 Jul 1999 10:52:22 PDT
Since physicians are not in general particularly inclined towards
mathematics, some individuals have expressed concern that statistical or
other analytical errors may sometimes creep into medical research reports as
a result. It would appear that, in fact, this is truly a problem. Several
studies of statistical errors present in medical journals have been
undertaken by statistical experts; their results suggest that statistical
errors are frequent in published medical research reports, even in
"prestigious" journals such as The New England Journal of Medicine.

For example, in a British study [1] papers published in the British Journal
of Psychiatry in 1993 were reviewed. Sixty-five (40% of 164) papers
containing numerical results contained statistical errors. While many errors
were not serious, some were viewed to be sufficiently problematic to cast
doubt on the conclusions.  The author concluded that the statistical error
rate was "unacceptably high."

Similarly, in an American study [2] the authors reviewed all papers included
in the Clinical Articles section and transactions of societies sections of
the January through June 1994 issues of the American Journal of Obstetrics
and Gynecology (volume 170, numbers 1 to 6).  The authors concluded: "The
lack of complete and detailed listings of applied statistics made it
difficult to assess the appropriateness of more than half the studies
examined, suggesting a need for more detailed guidelines as to the listing
of statistical procedures used. Despite this fact, nearly one third of the
articles contained examples of statistics used inappropriately."

More recently, a study on the misuse of correlation and regression in three
top-rated medical journals found serious problems [3]. The authors noted:
"Fifteen categories of errors were identified of which eight were important
or common. These included: failure to define clearly the relevant sample
number; the display of potentially misleading scatterplots; attachment of
unwarranted importance to significance levels; and the omission of
confidence intervals for correlation coefficients and around regression
lines."

I suspect that one problem may be the availability of easy-to-use
statistical software that makes no requirement that the user actually
understand the underlying principles behind the tests employed. In any
event, it would appear that the general standard of statistics in medical
journals is shabby. Perhaps special emphasis should given to the necessity
for medical journals to have proper statistical refereeing of submitted
papers. Indeed, some journals, embarrassed by reports such as these, are
doing exactly that.

References

[1] McGuigan SM The use of statistics in the British Journal of Psychiatry.
Br J Psychiatry 1995 Nov;167(5):683-8

[2] Welch GE 2nd, Gabbe SG Review of statistics usage in the American
Journal of Obstetrics and Gynecology. Am J Obstet Gynecol 1996
Nov;175(5):1138-41

[3] Porter AM Misuse of correlation and regression in three medical
journals. J R Soc Med 1999 Mar;92(3):123-8

D. John Doyle MD PhD, Department of Anesthesia, University of Toronto
djdoyle@home.com  http://doyle.ibme.utoronto.ca


Centaur/Milstar Software Error (Re: RISKS-20.36 and 39)

"Peter B. Ladkin" <ladkin@rvs.uni-bielefeld.de>
Tue, 22 Jun 1999 18:43:22 +0200
As reporting in RISKS-20.36 (PGN) and 20.39 (Rhodes), software failed in a
Centaur upper stage of a Titan IVB rocket, launched from Cape Canaveral on
30 Apr 1999, resulting in the loss of an $800-million Milstar satellite.

*Aviation Week* (21 Jun 1999, p21) writes that Air Force officials said that
the error was in the Centaur's attitude control system software.  The
inertial nav unit `perceived' a zero roll rate, which was incorrect,
`creating attitude errors'. The attitude control system tried to correct
attitude, but the incorrect software parameter `prevented the system from
orienting the stage properly'. Attitude control system fuel ran out. The
USAF and LM don't yet know how the software error escaped detection during
verification.

Let me try to interpret this. I'm afraid I won't do very well. All these
control system bits, software or hardware, have inputs and outputs.  An
inertial nav unit (INU) calculates position only, using information from the
dynamics history of the vehicle. To say it `perceived' a zero roll rate is
to say either that something fed it zeros instead of correct information
about roll, or that it gave information out about a roll rate of zero, or
maybe both. To say that this `created attitude errors' is to say that the
control system responsible for controlling the attitude (ACS) failed
(`attitude errors'), and to subscribe this to faulty input from the INU
(`created'). Since the ACS tried to `correct' this, it must mean that
although the ACS was working on faulty information from the INU, it obtained
other input (from somewhere else?) or calculated the information internally
that contradicted this faulty information, and resolved the conflict in
favor of this new information, firing the attitude control thrusters (I
think that's what such things are called). But apparently the faulty info
from the INU still exerted an influence (`incorrect .. parameter prevented
the [ACS] from orienting the stage properly').

As far as I understand such things, when a sensor/info conflict is detected,
a system will act on the conflict by attempting to identify (or, if not,
simply arbitrate) the faulty information and lock it out. Or maybe, if
it's really sophisticated, use Byzantine resolution techniques, if ways have
been found of implementing these efficiently now. I don't know of any system
design which, having identified a conflict, proceeds to use both pieces of
conflicting information further, unless it's trying Byzantine resolution
techniques, which require four participants at least. I don't see the
four here, so this story, even though sparse, still doesn't make sense to
me yet. And I'm loathe to ascribe truth to a story which doesn't even make
sense. But I guess I'm grateful to AvWeek for trying.

Prof. Peter Ladkin, Univ. Bielefeld, Technische Fakultaet D-33501 Bielefeld
GERMANY  +49 (0)521 106-5326/5325 http://www.rvs.uni-bielefeld.de


Small problem escalates into major disruption

<dougmoore@ibm.net>
Sun, 18 Jul 1999 16:51:28 -0400
A very small fire at a major Bell switching centre caused at least 113,000
phone lines to go out in Toronto's downtown business section, including
lines to poison information centres and paramedics, and stopped bank
machines, electronic credit card systems, Interac, many computerized traffic
lights, nearly $1 billion of electronic stock exchange trading, and several
major data networks and internet portals, and disrupted phone services as
far away as Ottawa, Halifax,and Chicago.  911 emergency lines had reduced
capacity but did stay up.  The fire was caused by a technician dropping a
rod in a single switching chamber; which started to fry or melt, and smoke
filled the area and sprinklers went off.  (Later reports say it was electric
arcs, not fire.)  Power in that one part of the building was lost ---
putting some phones out of service.  Employees were unable to get close to
turn the power back on in that area.  (There was no remote power switch to
do it?)  According to reports in the Toronto Star, the emergency power
generator system was not designed to kick in unless power was lost to the
entire building, and other floors still had commercial power.  (Poor
planning?)  According to reports, a telephone company employee cut power to
the whole building so as to get the emergency power to kick in, which it
did, but fifteen minutes later the telephone switching system ran out of
power.  (Insufficient emergency power generating capacity?)  Now the entire
facility was out for another four and a half hours.

  Follow-up, Mon, 19 Jul 1999 06:14:03 -0400

According to a report in the *Toronto Star*, Bell's repairs to its switching
facility that failed in downtown Toronto could not be completed since the
building is kept locked during the weekend.  (Isn't security that prevents
restoration of service not in fact security?)  In what is said to be a
separate incident, about 960,000 residents of a very nearby city found
themselves cut off from 911 police emergency service.  Some phones that
tried to call the number were "frozen".  After about 9 hours, Bell managed
to "call forward" 911 calls to the non-emergency police number, and three
hours later full 911 service was restored.  The cause of the failure is,
apparently, unknown.


Computer startup circuits

"M. Simon" <mlsimon@mail.rkd.snds.com>
Fri, 16 Jul 1999 18:41:57 -0700
Starting up a computer system with all its necessary hardware in a known
state is a notoriously difficult problem. There are so many opportunities
for errors of omision.  There are many and difficult interactions of
hardware and software.

Because the circuits themselves are very simple the work is usually given to
novices.

The real problem is that so few engineers are cross trained.  Few hardware
engineers know software and vice versa.

I blame this on unnecessarily complicated software compilers. Minimum
competency in C takes six to twelve months. I can teach software competency
in my favorite language in 4 to 8 hours. (I did it with some engineers in my
mfg. plant about three weeks ago, so don't tell me I am smoking something)

If you are interested in the name of the language e-mail me.  I am not
interested in starting a general language flame war.

Simon


Netcom partial e-mail outage

"Keith A Rhodes" <rhodesk.aimd@gao.gov>
Fri, 16 Jul 1999 08:50:48 -0500
For about 24 hours after 7pm on 14 Jul 1999, Netcom users with user names
beginning with a, d, f, h, i, k, l, n, q, r, u, or v were unable to read
stored e-mail, because of a hardware failure in a file server in Netcom's
San Jose data center (now part of Mindspring).  On 10 May 1999, an outage
affected only users with names starting with the letter "d".  [Source: ZDNN
(http://www.zdnet.com/zdnn/), PGN-ed]


junkfilter vs. xxx.lanl.gov

Thomas Roessler <roessler@guug.de>
Mon, 19 Jul 1999 18:06:58 +0200
This is essentially a "for the record" contribution on a well-known risk:
False positives with content filtering.

Junkfilter (see http://www.pobox.com/~gsutter/junkfilter/) is a rather
sophisticated set of procmail rules and regular expression lists designed to
catch spam mail.  I've been using it for a couple of months now, and it does
a nice job in preventing at least some of that "make money fast" stuff from
coming through to me, while usually keeping false positives negligible.

Today, I started to obviously miss important messages on an internal mailing
list.  It turned out that junkfilter's body check module triggered on the
string "http://xxx", as in "http://xxx.lanl.gov/abs/quant-ph/9603004".  This
URL was contained in one list member's new .signature.

The underlying problem is, in this case, drawing conclusions from a
(partial) URL on the content referred to by this URL.  (One may also
criticize drawing conclusions from content referred to in e-mail messages to
such messages qualifying as "spam".  Anyway, it usually works. ;-)


"Bright Light" POP-based spam filtering: a bad idea

Lauren Weinstein <lauren@vortex.com>
Tue, 20 Jul 99 11:16 PDT
Greetings.  Bright Light Technologies (http://www.brightlight.com), which
sports an impressive list of technology partners and investors, has
introduced a new "free" service to users of POP-based e-mail (previously
Bright Light has apparently mainly worked through ISPs) that attempts to
filter out most unsolicited e-mail (SPAM) before it reaches the user.  They
do this by trying to detect spam flowing around the net and then applying
filtering rules.  Rejected messages are pushed aside and can be viewed later
if the user wishes, and lists of rejected messages are made available.

I'm a long time spam-fighter myself--I maintain a public spam blocking list
at http://www.vortex.com.  I'm more than willing to declare the concept of
trying to filter out spam (so long as there aren't too many false positives)
to be a good one.  Unfortunately, the method chosen by Bright Light for
end-user POP use is a potentially major invasion of privacy--ironic in light
of Bright Light's written statements that they want to "avoid the appearance
of violating email privacy" (exact quote).

The problem doesn't take a masters degree in Internet engineering to
understand.  To use their new POP service, you have to route ALL of your
inbound e-mail through Bright Light servers.  Your POP account accesses
Bright Light, then they login to your ISP to pick up your mail.  It passes
through Bright Light, and then to you.

From both a Privacy and Risks standpoint, it's hard to imagine a system
more primed for potential trouble.  Any centralization of e-mail handling
systems in this manner, funneling in e-mail from numerous ISPs, represents
an immense target for all manner of mischief--possibly even more attractive
to problems than the largest individual ISPs.  Systems failures and
overloading can happen.  Hackers can target the facilities.  And of course,
the concentration of e-mail traffic could make Bright Light the recipient of
choice for legal actions, by those seeking to track or access e-mail
messages for any number of purposes (an increasingly popular legal maneuver,
as you probably know).  The requirement to provide such information could
occur regardless of how little (or how much) of users' e-mail is "normally"
stored on disk at the service (as opposed to passing through) in the course
of routine operations.  With this service, users now have two entities with
which they have to entrust their e-mail--their "real" ISP, and Bright Light.

Bright Light has other products that send spam filtering rules directly to
ISPs with the spam blocking applied at the ISP level--such services don't
present these same concerns.  The fundamental problem with the new service,
aimed at individual POP e-mail users, is having the full text of users'
total incoming e-mail passing through a centralized third party e-mail
service outside of the users' direct control or affiliation.

This isn't rocket science--it should be obvious that this sort of
centralization of actual e-mail traffic flow is exactly the *wrong*
direction to be moving in.  I'd recommend thinking long and hard before
participating, as an end-user, in any third party service that asks you to
route all of your incoming e-mail through them.  Even with the best of
intentions (and I assume these on the part of Bright Light), and even with a
"free" service, the price is much too high.

My RealAudio "Vortex Daily Reality Report & Unreality Trivia Quiz" for
7/19/99 (see below for URL to the archive of segments) was devoted to this
topic.  Take care, all.

Lauren Weinstein <lauren@vortex.com>; Moderator, PRIVACY Forum
<http://www.vortex.com>; Member, ACM CCPP; Host, "Vortex Daily Reality
Report & Unreality Trivia Quiz" <http://www.vortex.com/reality>


E-mail attachments and local names

Avi Rubin <rubin@research.att.com>
Tue, 20 Jul 1999 17:46:58 GMT
I just had a mildly embarrassing incident because of a Netscape mail
"feature" when sending attachments. I'm sure that other mailers do this as
well. When attaching a file to an e-mail message, the local name of the file
is included in the attachment.  So, the recipient gets to see the contents
of the file, as you would hope, but also the local name that you gave
it. Imagine several possible embarrassing scenarios:

  Attachment: our.most.gullable.client.invoice.doc
  Attachment: proposal.with.doctored.data.xls
  Etc.

It seems that the user should be able to rename the attachment for the
purposes of the message or that the file should just be called attachment1,
attachment2, ...

Avi Rubin


Ab van Poortvliet: Risks, Disasters, and Management

"Peter G. Neumann" <neumann@csl.sri.com>
Thu, 15 Jul 99 17:29:07 PDT
A really interesting book on risks in transportation:

Risks, Disasters, and Management:
Understanding the Management of High Risks and
Its Consequences for Passenger Safety
by Ab van Poortvliet

Eburon Publishers, P.O. Box 2867, Delft, The Netherlands, 1999


REVIEW: "The Mythical Man-Month", Frederick P. Brooks Jr.

"Rob Slade, doting grandpa of Ryan and Trevor" <rslade@sprint.ca>
Fri, 16 Jul 1999 08:33:03 -0800
BKMYMAMO.RVW   990502

"The Mythical Man-Month", Frederick P. Brooks Jr., 1995,
0-201-83595-9, U$24.69
%A   Frederick P. Brooks Jr.
%C   P.O. Box 520, 26 Prince Andrew Place, Don Mills, Ontario M3C 2T8
%D   1995
%G   0-201-83595-9
%I   Addison-Wesley Publishing Co.
%O   U$24.69 416-447-5101 fax: 416-443-0948 bkexpress@aw.com
%P   308 p.
%T   "The Mythical Man-Month: 20th Anniversary Edition"

Brooks' work on software management is a classic, but, even with a quarter
million copies in print, it is more regarded than read.  A great many can
quote Brooks' Law ("Adding manpower to a late software project makes it
later") without knowing that it was Brooks' who said it.  Which is a pity.
I can fully agree with the promotional literature that says Brooks' work is
"timeless."  On the "influential" side, however, I have my doubts.  If
Brooks' truly had the influence he deserved, we would have fewer late
projects.

Brooks was originally writing from his experiences as project manager for
IBM's System/360, and later with OS/360.  It is remarkable how well his
ideas have stood the test of time.  While today we would long for an
operating system that was "bloated" to a mere 400K in size (and I still
haven't figured out the "tables" that keep being referred to), the concepts
outlined for project management are as sound today as when first set down.
And the objections raised against sound management and documentation were as
silly in 1975 as they are today.

Chapter one outlines the joy of programming, and any creative work, and how
tragically that joy can be drowned in the crush of a badly managed project.
Brooks shows very clearly how work can, and cannot, be partitioned in
chapter two.  A very interesting method of structuring a project team is
given in chapter three, slightly weakened by the ship and surgical analogies
which do not fully hold up in programming: software project teams are never
faced with immediate life and death decisions.  The problem of abstracting
all "interesting" work to team leaders is examined in chapter four.

Chapter five notes a tendency to try to overcompensate for prior missed
opportunities, leading to software bloat.  Team communication is looked at
two ways in chapters six and seven.  Project estimation, in chapter eight,
is shown to be a poorly practiced art.  Software bloat is revisited in
chapter nine, and documentation in ten.  Chapter eleven notes something we
have lost sight of in our reliance on demo software: it isn't meant to be a
mere GUI shell, but a working system that we make all the mistakes on.  The
need for tools, outlined in chapter twelve, is well accepted today, but the
insistence on common tools would pull more than a few programmers up short.
Modularity, promoted in chapter thirteen, is also praised today--but not
always used.  Chapter fourteen has a very solid grasp of the reasons for
project slip.  Documentation, of the right sort, is reviewed in chapter
fifteen, including an attack on the venerable flowchart.

The preceding chapters are all essentially unchanged from the original 1975
edition.  Chapter sixteen is Brooks' "No Silver Bullet" paper, wherein he
opined (in 1986) that programming was an inherently complex and difficult
task, and that no development in the next ten years would provide a
productivity increase on the level of an order of magnitude.  (It is
interesting that Java was unleashed upon the world at the end of that ten
year projection, but, three years later, it hasn't opened any programming
floodgates either.)  Brooks points out objections to "NSB" in chapter
seventeen, and answers them.  The first edition is recast in point form in
chapter eighteen.  Chapter nineteen analyses what was right and enduring in
the initial material, and what was wrong in the first place.

An enduring classic that deserves to be read and re-read.

copyright Robert M. Slade, 1999   BKMYMAMO.RVW   990502
rslade@vcn.bc.ca  rslade@sprint.ca  slade@victoria.tc.ca p1@canada.com
http://victoria.tc.ca/techrev    or    http://sun.soci.niu.edu/~rslade

Please report problems with the web pages to the maintainer

Top