The Risks Digest

The RISKS Digest

Forum on Risks to the Public in Computers and Related Systems

ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator

Volume 5 Issue 2

Friday, 12 June 1987

Contents

o Three gremlins on the loose: nukes, sharks, enlightened rockets
Dave Platt
o Yet another air-traffic-controller foul-up
Roy Smith
o National Crime Information Center access
PGN
o Yes, Virginia, There Are Software Problems
Nick Condyles
o Heisenbugs; Also, Risks of Supercomputers
Eugene Miya
o Info on RISKS (comp.risks)

Three gremlins on the loose: nukes, sharks, enlightened rockets

Dave Platt <dplatt@teknowledge-vaxc.ARPA>
Thu, 11 Jun 87 08:08:13 pdt
  [Whereas the following three incidents are not directly computer related,
  they are clearly RISKS related, providing further examples of the unexpected
  technical failures.  On examining my list of collected cases, I am astounded
  to discover how many of the cases were due to causes that (a) had never
  occurred before and (b) no one had even anticipated might occur.  PGN]

This morning's San Jose Mercury News contains three interesting
articles, almost one-after-another.  Their headlines, and a summary of
their contents:

* "Official says many U.S. atom bombs are faulty"

   More than one of every three nuclear weapons in the U.S. arsenal
doesn't work properly, [Sylvester Foley, asst. sec. of energy for
defense programs] said yesterday [during a breakfast meeting with
reporters].

   "Hypothetically, it could be catastrophic if you ever wanted to use
it and you pushed the button and nothing happened", said Foley.

                  { no comment... dcp }

 * "Jaws V: Sharks make light lunch of trans-Atlantic fiber-optic cables"

   Sharks seem to have taken a fancy for the new 1"-thick underwater
cables being strung across the Atlantic;  they've bitten through the
cable at least four times (cost-to-repair $250,000 per bite).  Similar
bitings have been reported in the Pacific.  Older-style copper-wire
cables apparently weren't attractive to sharks, and weren't bitten.

   [The NYTimes article suggests that the sharks seem to be attracted to the
   low electrical currents in the copper wires providing power for the
   fiberoptic repeaters.  The new cables are still shielded, but much thinner
   than previously.  PGN]

* "Lightning hits NASA launch pad inadvertently setting off 3 rockets"

   A lightning strike on a pad at NASA's launch facility at Wallops
Island, VA. ignited three small rockets and sent two of them hurtling
along their planned trajectories;  they were more than two miles out
before the NASA officials could prepare to track them.  The third
rocket wasn't in firing position, and splashed down in the ocean 300
feet from the pad.  It appears that a lightning bolt may have induced
a current pulse in the firing leads, triggering the igniters.

   One of the rockets was scheduled to have been launched shortly
thereafter to study night-time thunderstorms and their effect on the
atmosphere.

    [The weather-satellite ground station was also affected, despite
    surge suppressors and grounding wires that were supposed to protect
    it.  Weather data from GOES-West could not be received for 9 hours,
    and GOES-East was blacked out for 5.5 hours.  NYTimes, same day.  PGN]


Yet another air-traffic-controller foul-up

Roy Smith <cmcl2!phri!roy@seismo.CSS.GOV>
9 Jun 87 20:24:16 GMT
    From the NY Times, June 9, 1987, page A22:  "Coding Error at O'Hare
Brings Two Jets Much Too Close Together".  Comments/elipsis in [brackets]
are my summations or deletions from the original.

    "Two American Airlines jets were guided past each other at a
  fraction of the minimum legal distance of separation between them because
  of an error by an air traffic assistant, the National Transportation Safety
  Board said yesterday.  [This happened late last month.]

    "But Mr. Engen [chief of the F.A.A.] said he did not believe that
  the incident justified any permanent reduction of, or special restraints
  on, Chicago area traffic.  He said the F.A.A. had determined that existing
  procedures, "when properly applied, do maintain a safe level of operation
  in the Chicago area."  [...]

    "According to officials of the safety board and the F.A.A., here is
  what happened:

    "An air traffic assistant in the tower at Chicago's O'Hare
  Internation Airport assigned the wrong code to the crew of a United
  Airlines plane that was about to take off.  The code was the one for an
  American Airlines plane that took off three minutes later.

    "Result: Confusion

    "The result was that the full-fledged controller whose job it was
  to radio subsequent air traffic instructions to the two planes saw the
  United plane on his radar scope with the American plane's identification,
  altitude and other data.  The computer feeding this data to the radar scope
  rejected duplicate data for the American plane, rightly concluding that two
  planes could not have the same code.  The American plan showed up only as a
  bright radar blip, without any identification alongside.

    "Now, the controller wanted the United plane, with the American
  plane's data next to it, to climb to a new altitude.  When he radioed
  instructions to "American 637" to climb, the United crew ignored them, and
  the instruction were carried out by the real American 637.  [This caused
  American 637 to pass another American plane at 1/4 mile horizonal and 500
  feet vertical separation -- the rules call for 3 miles and 1000 feet.]

    "[The quick fix is to add more supervisory staff to the O'Hare
  tower.]


    The biggest question in my mind is why does the system just drop
one plane if it sees two with the same id?  Shouldn't bells ring and lights
flash when an obvious operator-error situation like this pops up?  Also,
why isn't the plane id part of the info sent by the on-board transponder?
Aren't these id's simply the flight numbers assigned by the airlines?  Why
should it require any manual intervention to assign an id to a blip?

Roy Smith, {allegra,cmcl2,philabs}!phri!roy
System Administrator, Public Health Research Institute
455 First Avenue, New York, NY 10016


National Crime Information Center access

Peter G. Neumann <Neumann@CSL.SRI.COM>
Fri 12 Jun 87 07:27:29-PDT
Also in the NYTimes on the same day as the bombs, the sharks, and the
enlightning rockets was a front-page item on a proposed expansion of the
NCIC database that would permit Federal, state, and local law-enforcement
agencies to exchange information on people who are suspected of a crime but
have not been charged.  A government advisory panel also recommended that
the NCIC should have access to other databases such as those of the
Securities and Exchange Commission, the Internal Revenue Service, the
Immigration and Naturalization Service, Social Security, and State
Department passport office.  Attempts to legislate controls were also noted
in the Times article, although concerns for protection of privacy seemed
not to be commensurate with inadequacies in the technology and practice of
protecting systems and networks from internal and external misuse.  PGN


Yes, Virginia, There Are Software Problems

Nick Condyles <condyles@talos.UUCP>
8 Jun 87 20:46:10 GMT
The following appeared in the Richmond Times-Dispatch June 8, 1987
Section B page 1.  The article pertains to the state of Virginia's
attempt to automate distribution of child support payments.


                 STATE PAYMENTS FOR SOFTWARE SYSTEM HALTED

    State officials have halted payments on a problem-plagued computer
software system used in distributing child support checks and say the
manufacturer will get no more money until the system works.

    The state secretary of human resources, Eva S. Teig, said Saturday that
the state has paid Unisys Inc. only $75,000 of the $395,000 total "because
we are not satisfied with the product.  Until we are satisfied, we will not
continue payments on the program."

    Ms. Teig's announcement came as she prepared to submit a confidential
report on the child-support automation problems to Gov. Gerald L. Baliles.
The report "is a look at what has happened and a recommendation on where I
think we should be going," she said.

    The software system has been blamed for massive snarls in the
distribution of child support checks.

    If state officials are unable to work out the problems, Ms. Teig said,
"we'll look at other alternatives.  We're committed to making it work."

    Del. Howard E. Copeland, D-Norfolk, last month asked Attorney General
Mary Sue Terry to investigate the state Department of Social Services'
purchase of the software and determine whether it has been acquired through
legal procedures.

    Bert Rohrer, a spokesman for the attorney general, refused to comment
on whether an inquiry is being conducted.

    The purchase of the software was part of the department's attempt to
take over collection of child support payments, a job that had been done by
the courts.  The centralization brought widespread complaints about lost
checks, delayed payments and improper seizure of tax refunds.

    Social Services Commissioner William L. Lukhard approved the purchase
of the software system from Unisys, formerly Sperry Corp., in May 1985.
Critics say the software never worked properly, causing further delays in
the already slow distribution of checks.

    A federal audit last month said operating the new computer system would
cost Virginia up to $7 million per year, more than three times the amount
originally anticipated.

    Representatives of Unisys and state computer experts are analyzing the
system to determine if it can be corrected and "some progress has been
made," Ms. Teig said.

    The state is prepared to take further action if the software cannot be
salvaged, Ms. Teig said.  She would not elaborate.

Several state legislators have suggested that Lukhard be replaced as social
services commissioner.

    "As I've said before, I think the best thing for the whole department
is someone who would come in with a new broom and sweep clean," said Del.
Frank D. Hargrove, R-Hanover.  "The buck stops at the boss's desk."

    Others defended the agency chief.  "I would be disappointed and upset
if he was driven out of office," said Sen. Joseph V. Gartlan Jr., D-Fairfax.
"I regard him as one of the best administrators in state government."


        Nick Condyles


Tue <Date: 09 Jun 87 13:58:02 PDT>
Tue, 9 Jun 87 17:42:20 PDT
Regarding your Contributions from the Editor in SEN on Heisenbugs
(RISKS and SEN v12, n 2, Apr. 1987):

I felt I should bring to your attention the major problems in the area
of concurrent or parallel computing.  I am concerned because too many people
are using the buzzword of "parallel computing" without fully understanding
the consequences (not just in the supercomputing disciplines).

Many people are making vast assumptions on the synchrony and determinism
of such systems and on the ability of new compilers to utilize vector
and parallel processing.  I wish to point this problem out because
as parallel architectures move into the work space, these problems will
increase and they are moving into areas such as SDI which have
serious consequences.

One sub-problem I have noticed in debugging parallel codes is that many of
the existing software engineering tools and methodologies are very
sequentially oriented.  I have noted sequential version numbering
of certain file systems (manufacturers will not be mentioned)
are great hinderances in working in highly parallel environments.

The problem of Heisenbugs is not entirely new in this realm, and
I would like to point out significant technical report released
some time ago on the problem:

%A Dale H. Grit
%A James R. McGraw
%T Programming Divide and Conquer on a Multiprocessor
%R UCRL-88710
%I Lawrence Livermore National Laboratory
%C Livermore, CA
%D May 1983
%X Test programming the Denelcor HEP using HEP asynchronous FORTRAN.
Has interesting insights to MIMD programming of the future.

This paper does have other antecedents (such as the Computing Surveys
article by Anita Jones on multiprocessor experiences [1980]), but the
above paper points out an attempt at debugging a program which completely
interfered with (improved) the functioning of a buggy program.  Languages
like Ada(tm) cannot be expected to completely solve these problems, nor
can their support environments, since these problems are still in the
realm of open research issues.

Heisenbugs won't be restricted to parallel programs.  Non-deterministic,
self-modifying artificial intelligence programs also have the potential
have showing these behaviors.

--eugene miya
  NASA Ames Research Center
  eugene@ames-aurora.ARPA
  "You trust the `reply' command with all those different mailers out there?"
  {hplabs,hao,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene

Additionally, I add the file I have on the risks of supercomputers:

Perspective:
Supercomputer is just a marketing term  -- Enrico Clementi (IBM)

Definition:
A supercomputer is typically defined as the fast computer at a given
point in time.
Another definition is any computer which turns a CPU-bound problem into
an I/O-bound problem. {Ken Batcher}

Risks of supercomputing

1) User bites off more computation than can chew: 27 hours weather for
24 hour forecast.  Supercomputers and parallel processors are O(n)
solutions for O(n^3) problems in some cases.  Danger of the space
filled by the computation (take the space or what ever).  It is also
known, now, that the NWS can be liable for weather forecasts.

2) Lack of software and the fact it is state of the art hardware
(tends to be not as well shaken out) not necessary fault-tolerant
(PEPE [a BMD computer] might have been an exception).  The typical
supercomputer application tends to be behind the software engineering
art.

3) A perceptual power trip problem: most powerful thing we have can
handle ANYTHING.  Stop thinking and do more computing.

4) Normal risks inherent in all computer analysis and simulation: bad data,
incorrect models, only faster, and in some cases harder to detect errors.

5) Many of the newest designs are using radically different architectures:
hypercubes, ensembles like the Connection machine, shared memory systems,
hardware semiphores like the HEP, data flow machines, ELI (or VLIW) machines,
vector processors and so forth.  Programming applications for these machines
has been described as programming in 1946.  A comment ascribed to Steve Squires
of DARPA attributed that perhaps only 20% of programmers out there could
make the transition to working on these new machines. It's not just loop
or recursion problems, consider staravation and deadlock in applications
programs.  People waiting for smart compilers better not hold their breath.
You should remain skeptical of any claim of sophistication (i.e., drop your
dusty deck in this machine).

A problem I have: tools designed for sequential programming environments
are difficult to use in parallel environments.  The VMS/RSX version numbering
system is a real pain because it has a serial numbering.  This is a bottleneck
for some of my programs.

Lastly:
n) The physical embodiment problem (Ken Thompson): harm by dropping out
of a plane.  (A mention of this problems with the Belle chess machine:
a specialied architecture, but not a super).

Please report problems with the web pages to the maintainer

Top