The Risks Digest

The RISKS Digest

Forum on Risks to the Public in Computers and Related Systems

ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator

Volume 11 Issue 14

Wednesday 20 February 1991

Contents

o Another Computer Application fiasco story
Christopher Allen
o Software failures and safety of American (Red Cross) blood supply
Rob James
o MD-11 computer problems
Steve Bellovin
o More on very reliable systems
Anthony E. Siegman
o Re: Predicting system reliability
Henry Spencer
o Broadcast LANs and data sensitivity
Henry Spencer
o Re: software warranties (re: SysV bug)
Brian Kantor
David Lamb
Steve Eddins
Peter da Silva
Henry Spencer
Flint Pellett
o Info on RISKS (comp.risks)

Another Computer Application fiasco story

Christopher Allen <Allen@RELAY.PRIME.COM>
20 Feb 91 11:09:43 EST
The following copied from the (Manchester, UK) Guardian Weekly, Feb 17, 1991:

                         High-tech blackout

Whitehall's auditors are unable to verify Foreign Office spending last
year because of a breakdown in the main computer controlling the accounts.
John Bourn, the Comptroller and Auditor General, has refused to approve the
Foreign Office accounts, covering embassies, Nato, the United Nations, military
aid, the BBC World Service, and the British Council, because the ministry
cannot produce accurate evidence of the spending.

At one stage auditors found discrepancies totalling 458 million pounds between
the money granted to the Foreign Office and their own records.  After extensive
checking the auditors were left with 26 million pounds of imbalances in
accounts for embassies, external relations, the BBC and the British Council.

The trouble began when it was decided to replace its six-year-old computer
system with a new high-technology version.  Ministers allocated 560 thousand
pounds for the scheme, but ended up paying 937 thousand pounds employing a
software company, Memory Computers, which failed to deliver on time and went
into liquidation just after it did deliver.  Meanwhile, a hard disc shattered
inside the old computer, destroying all the information, and leaving officials
to rely on the new untried system.  Within months it started shutting down
unexpectedly, and inexplicably posting money to the wrong accounts.  All the
bookkeeping staff left and their replacements were not able to familiarise
themselves properly with the system to prevent further errors.

A consultant from the bankrupt software company is now working for the FO at a
salary of 53 thousand pounds a year to try to solve the problems.  MPs on the
Commons Public Accounts Committee are to summon Sir Patrick Wight, permanent
secretary at the FO, to explain the mess.

[FO=Foreign Office, MP=Member of Parliament, 1 pound ~ $2]


Software failures and safety of American (Red Cross) blood supply

<JAMESRC@QUCDN.QueensU.CA>
Wed, 20 Feb 1991 12:51 EST
Recently "60 minutes" presented a report on the American Red Cross and on the
accidental release of untested donations to transfusion centres.  The FDA
documented >2000 such cases, although (to my knowledge) no documented cases of
transfusion associated infection have yet been reported in the literature.
Some of the failures were associated with software problems in the various
regional AmCross centres, but that is as much as I know.

If anyone has further information on these (apparent) software failures,
I would appreciate additional information.

Rob James, Department of Community Health and Epidemiology, Queen's University
Kingston, Ontario, Canada K7l 2M1 613-542-3696


MD-11 computer problems

<smb@ulysses.att.com>
Wed, 20 Feb 91 15:50:02 EST
According to the AP, American Airlines has suspended flight certification
tests on a new MD-11 plane because of ``computer problems''.  That, and
some other problems with fuel economy, may lead the airline to refuse
delivery of a second MD-11.  No technical details on the computer problems
were given in the article; does anyone on RISKS know more?


More on very reliable systems

Anthony E. Siegman <siegman@sierra.Stanford.EDU>
Wed, 20 Feb 91 12:13:10 PST
   Anyone concerned with the subject of multiple correlated failures in systems
with very reliable individual components should look back at the incident some
years ago when a United Airlines jet lost all three engines simultaneous in
flight over the Caribbean south of Miami.

   As best I recall, a mechanic servicing the plane had made the same mistake,
leaving out a needed O-ring (!), on the oil pressure sensors in all three
engines.  He did this because the stockroom clerk, who normally installed the
O-rings on the sensors before handing them to the mechanic, was temporarily
away, so the mechanic went behind the counter and got the sensors himself.

  This incident seemed to have multiple classic elements:

   1.  Minor change in procedures had major consequences.

   2.  The problem was really a false alarm, i.e., the oil
       pressures were OK, just the sensor indications were wrong.

   3.  Confident claims that multiple jet engine failures were
       totally improbable proved completely wrong.

Oh, they did get one (or two?) of the engines restated, just in the
nick of time, however, and limped into Miami.


Re: Predicting system reliability

<henry@zoo.toronto.edu>
Tue, 19 Feb 91 23:40:50 EST
>Argument 2: A system as complex as SDI can never be evaluated in a way which
>would give reasonable grounds for claiming that it would work correctly when
>deployed.

Of course, just what constitutes "reasonable grounds" is itself something
that should be part of the specifications, and it is something that may
have to be justified.  None of the complex systems designed for fighting
nuclear wars -- including the ones whose supposed efficacy has preserved
the peace of the planet for circa 40 years -- has *ever* been evaluated
in such a way (i.e. under nuclear attack!).  The design of test criteria
for such systems all too easily becomes a second-order way of cooking up
fallacious "proofs" that the system is anywhere from trivial to impossible.

Our lives depend frequently on systems that cannot possibly be tested to the
"reasonable confidence" point before they are used... if you interpret that to
imply reasonable confidence under the worst conceivable conditions.  No
airliner is ever tested in six-sigma turbulence.  No building is tested in
once-per-century wind loads.  Space-shuttle payload limits are based on landing
weight for the "Return To Launch Site" abort mode... a procedure that has never
been tried and which some astronauts doubt is survivable.  Operating-system
kernel software is rarely stress-tested under truly severe operational loads.

(As an example of that last, one of the major RISC-processor manufacturers does
massive simulation of new designs, to the point where their machine rooms go
from fairly quiet to furiously active at the time in late evening when the
night's batch of simulation jobs fire up.  This sudden huge surge in load has,
I'm told, had a serendipitous side effect: at least once it uncovered the
existence of very short "interrupt windows" in kernel code, where erroneous
assumptions about the atomicity of operations caused system failures only if an
interrupt struck in a window about a hundred nanoseconds long.  (Specifically,
the programmers had assumed that incrementing an integer in memory was an
atomic operation, which is sort of true on single-processor CISCs but is rarely
true on multiprocessors or RISC systems.)  The code containing this botch is
now theoretically obsolete, but it was in wide production use before the
problem was discovered and is probably still in use here and there.)

In traditional engineering, it is routine to assess worst-case behavior based
on extrapolation from less severe testing.  A demand that the worst case be
tested is often a disguised call for a system's cancellation, since such
testing is seldom feasible for large systems.  The proper consideration is not
whether we can safely extrapolate from less severe tests, because we must rely
on such extrapolation and we already do; the questions are how best to do such
extrapolation and what form of testing must be done to permit confident
extrapolation.
                         Henry Spencer at U of Toronto Zoology   utzoo!henry


Broadcast LANs and data sensitivity

<henry@zoo.toronto.edu>
Tue, 19 Feb 91 23:45:37 EST
>How much of your data on the average network is really a security issue?
>I work here at Boeing and, at least in my area, sensitive data is not
>kept on the network ...

Unfortunately, this is probably a definition of "sensitive" which is too narrow
to be really applicable.  How much of your area's data could be posted on a
bulletin board in the bus station without upsetting Boeing or one of its
customers?  Probably not much.  Even material which is not "sensitive" in a
military sense is often of commercial value or significant to privacy.

                          Henry Spencer at U of Toronto Zoology    utzoo!henry


Re: software warranties (was: the SysV security bug)

Brian Kantor <brian@ucsd.Edu>
20 Feb 91 06:00:25 GMT
>From: rick@pavlov.ssctr.bcm.tmc.edu (Richard H. Miller)
>[As an aside, I have a difficult time understanding why a person who does not
>pay for software maintenance expects to have bugs fixed. If you choose to
>not pay for the service, then don't expect the vendor to fix it for free.]

I do expect the vendor to fix it for free.  When I pay for software, I do so
under the assumption that it would perform as specified, not that it sorta
might work kinda like what the manual said.

When you buy a device (say, a car) you are granted a warrantee that it's free
of defects, and will remain so for some length of time that is predicated
(disregarding, for the moment, marketing factors) on the designed length of
time it's to operate before it wears out.  Software does not wear out, although
it can become obsolete, so I would maintain that a piece of software which is
found, at any time, to not perform as specified at time of purchase is
defective and must be remedied by the vendor.  A manufacturer must remedy the
errors of his employees.  That you hire peoples' mistakes when you hire their
talents is one of the RISKS of doing business.

A company which is selling defective or adulterated materials should be made to
replace those materials with goods of the quality advertised.  I see no reason
that software should be exempt from this basic principle of fair dealing in
business.

Vendors who require paid maintenance agreements to repair faults where their
software does not perform as specified are ripping off their customers.  I
believe that it is unjust enrichment.

Does it seem fair to you that the product you bought does not work as specified
and could not have been tested by the maker of the product or he would have
found out that it didn't so work?  Not only have you been burned by buying
something that is broken, but you've had to act as the manufacturer's unpaid
quality control department, and they want you to pay for the privilege of
getting a working item!  Recall the definition of chutzpah - the man who kills
his parents then begs mercy from the court because he is now an orphan.

Now while I've overstated this for effect, I would really like people to think
about software warrantees, before the courts do it for us.  Ethics ARE
important: a professional should strive to be "an ornament to his profession."
    - Brian


serious bug in SVR3.2 gives root access easily (RISKS-11.13)

David Lamb <dalamb@avi.umiacs.umd.edu>
20 Feb 91 14:07:18 GMT
A *bug* is a defect in the product.  Why *shouldn't* the vendor fix it for
free?  Enhancements, I'll agree, ought to be paid for.  Maybe this is diverging
from what RISKS ought to talk about, but...What are the risks to society of
fostering an attitude that the vendor has no responsibility for defects like
the security bug mentioned?

"Due to a design defect, your 1986 Model X car blows up if hit from behind in
an accident.  What, you didn't buy the $4000/year maintenance agreement?**
Sorry, no free recall for you.  Buy a 1991 Model X."

**25% of purchace price per year: a software "maintenance" price I've seen a
few times.
                                                  David Alex Lamb


Charging for bug fixes (was: serious bug in SVR3.2 ... RISKS-11.10)

Steve Eddins <eddins@uicbert.eecs.uic.edu>
Wed, 20 Feb 91 14:37:22 GMT
I'll be the first to admit that I don't understand the economics of the
software industry and the in's and out's of software maintenance.  I'm just a
customer.  However, as a customer, I expect that when I pay for any product it
will work as advertised.  If it doesn't work, and the vendor wants to charge me
more money to make it work, I will cease purchasing from that vendor.

Steve Eddins    University of Illinois at Chicago, EECS Dept., M/C 154,
1120 SEO Bldg, Box 4348, Chicago, IL 60680  (312) 996-5771


Re: RISKS DIGEST 11.13

Peter da Silva <peter@taronga.hackercorp.com>
Wed, 20 Feb 1991 13:34:26 GMT
Poster #1:
> [As an aside, I have a difficult time understanding why a person who does not
> pay for software maintenance expects to have bugs fixed. If you choose to
> not pay for the service, then don't expect the vendor to fix it for free.]

Why not? The bug is the vendor's responsibility. Imagine Ford selling cars
with doors that unlocked if you hit them in the right place. Do you think
they would get away with only providing fixes to people with maintainance
agreements?

Poster #2:
> Because these systems are so 'cheap' (you can build a nice workstation for
> significantly less than $1000),

Could you explain how? My own system was more than that and I bought it
from a friend for a phenomenal price. The operating system alone is a
significant part of the cost.

I've seen 386SX platforms for $875. Add $375 for the cheapest runtime
only 2-user UNIX I've seen, and you're already at $1250. Add $300 for
the RAM and as much or more for the bigger hard drive, and you're in the
area of $2000. Now, $2000 I might be able to believe.


Re: serious bug in SVR3.2 gives root access easily (RISKS-11.10)

<henry@zoo.toronto.edu>
Wed, 20 Feb 91 13:12:44 EST
As I understand it, the people in question are not demanding free software
maintenance.  They are demanding that when they pay good money for software,
they get software that meets specifications at least to the extent of being
free of catastrophic flaws, and that if this cannot be assured at time of
purchase, some minimal effort is made to assure it when flaws are found.

                                         Henry Spencer at U of Toronto Zoology


Re: serious bug in SVR3.2 gives root access easily (RISKS-11.10)

Flint Pellett <flint@gistdev.gist.com>
20 Feb 91 16:31:20 GMT
I would say that there is no reason software should be any different than
anything else you buy.  If you buy a new appliance, you get a guarantee for 90
days against defects in materials and workmanship, and when you buy software
you ought to get a similar guarantee.  A bug like this is a defect in the
workmanship, and if you are still covered by the guarantee, you ought to get it
fixed free.  But if you are past the 90 days (or however long) and you didn't
buy a maintenance contract, then you are (and ought to be) just as much out of
luck as if you didn't buy a maintenance contract on your fridge.  Software
buyers are likely to start paying attention to how long the guarantee is for,
and not buy from companies with really short guarantee periods.

However, there is another category of safety related bugs which have always
been in a seperate class: If the gas tank in your Pinto tends to explode, or
the car tends to shift itself into gear by itself, Ford does a recall (often
because the Govt. ordered it to) and fixes it at their expense.  The laws about
automobile recalls are a lot stricter than the ones on Software Developers, but
if software companies can't clean up their act themselves, then what we are
going to end up with is a big Federal bureaucracy monitoring stuff like this
just like the one that monitors the automakers.

This particular bug is pretty clearly a "safety related" bug, in that failure
to fix it could result in substantial losses by customers.  I believe the
software vendors also have a responsibility to let their customers know about
it too, and need to mail something out to all their users they know about that
either gives them the fix or tells them how to get it, just like the recall
notices that you get if something is wrong with your car that could affect your
safety.  If too many vendors decide that it will cost too much now to act
responsibly, then they're going to end up having to deal with a bunch of
federal regulators who will make them act responsibly, and it will end up
costing them (and us software buyers) a lot more.

Flint Pellett, Global Information Systems Technology, Inc.,
1800 Woodfield Drive, Savoy, IL 61874 (217) 352-1165 uunet!gistdev!flint

Please report problems with the web pages to the maintainer

Top