The Risks Digest

The RISKS Digest

Forum on Risks to the Public in Computers and Related Systems

ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator

Volume 10 Issue 33

Friday 7 September 1990

Contents

o Critical military computer systems
Clifford Johnson
o Complexity, reliability, and meaningless arguments
Nancy Leveson
o Re: "Wild Failure Modes" in Analog Systems
Jan Wolitzky
o Analog vs Digital Controls
Martin Ewing
o Chaos
Peter da Silva
o Re: Software bugs "stay fixed"?
Bruce Hamilton
K. M. Sandberg
Andrew Koenig
Michael Tanner
o Boot camping
Timothy VanFosson
o Info on RISKS (comp.risks)

Critical military computer systems

"Clifford Johnson" <A.CJJ@Forsythe.Stanford.EDU>
Fri, 7 Sep 90 10:19:42 PDT
The herein-debated list of critical computer applications, in which reliance on
computers is to be avoided includes, re defense, mere early warning systems.
Presumably, Space Command's rate of false alerts, and the Vincennes shootdown,
contribute to this opinion.  But there is an important nuance neglected in
challenging the warning systems -- early warning is clearly beneficial,
problems arise only when an immediate ("use-or-lose") decision to retaliate is
contingent upon it.  Thus, it is really the de facto computerization of
decision-to-shoot procedures that is at fault, not the neutral computerization
of warning information.

And so I would not avoid early warning systems, which can greatly assist taking
evasive or preparatory actions, but would squarely challenge the
computerization of command and control systems.  The leading example of such
damnably dangerous computerization is the under-development, half-billion
dollar Rapid Execution And Combat Targeting system, which will enable virtually
instantaneous launch of U.S. ICBM's within a couple of minutes, at all times.
This includes the introduction of PC's into launch silos, which will automate
launch code verification, and which will provide some sort of direct electronic
interface with the missiles.

Besides actualizing launch on warning and sudden first strike capabilities, the
implementation of REACT would seem to add to the risk of an accidental launch,
even without a flimsy attack warning.  (If launch codes are received at the
silos, standing orders require their immediate execution...)


Complexity, reliability, and meaningless arguments

Nancy Leveson <nancy@murphy.ICS.UCI.EDU>
Thu, 06 Sep 90 19:50:38 -0700
To save my having to mail this information individually to the many people
who have asked:

   The next meeting of SC 167 (the RTCA committee rewriting DO-178A) will be
   November 6-9 in Herndon VA (outside of D.C.).  You can get on the mailing
   list for notification of meetings by calling the RTCA (Radio Technical
   Commission for Avionics) at (202) 682-0266.

With regard to the complexity discussion, does the question of whether one
generic type of system is more complex or more reliable than another even
make any sense? The same function can be implemented in a simple or "complex"
way using any generic type of components.

Consider Rube Goldberg's design for a "simplified" pencil sharpener.  It starts
with a string attached to a kite flying outside a window.  When the window is
opened, the string lifts the door on a cage filled with moths allowing them to
escape and eat a red flannel shirt hanging above the cage.  As the weight of
the shirt decreases, a shoe (attached to the top of the shirt via a string
through a pulley) becomes heavier than the shirt and starts to move downward,
flipping a power switch on.  When the power goes on, an iron on top of some
pants on an ironing board burns a hole in the pants, creating smoke which
enters a hole in a tree trunk next to the ironing board, smoking out an opossum
which jumps into a basket from a higher hole in the tree, pulling a rope that
lifts a cage door allowing a woodpecker to chew the wood from the pencil
exposing the lead.  There is also an emergency knife which is always handy in
case the opossum or the woodpecker gets sick and can't work.

One could argue that Goldberg's simplified design has a larger number of
failure modes with a high probability of occurring and therefore will be less
reliable than more traditionally-designed pencil sharpeners.  However, his
design, although it may fail more often, has the backup knife which may result
in a higher probability of resulting in having a way to get your pencil
sharpened (even if a cat comes in through the open window and distracts the
opossum and the woodpecker) than a traditional pencil sharpener without the
knife.  So it is not only the number and probability of the failure modes that
counts, but also the ways you have provided for coping with component failure.

Consider also that a knife alone would be much more reliable than even a
regular pencil sharpener (especially one of the Ginzu knives that the TV
spokespeople tell me never get dull).  But it is definitely less safe in terms
of potential for drawing blood.  So if safety rather than reliability is your
higher priority goal ...

When comparing the reliability and safety of mechanical/analog systems and
digital systems, you need to consider:

  1)  Confidence and the ability to measure or assess reliability and
      safety in our systems may be more important than other factors.
      I would prefer to design critical systems with components having known
      failure modes and failure rates than those that MIGHT have lower failure
      rates, but also might have higher ones and I have NO way to determine
      this with high confidence.

  2)  Analog and mechanical designs are often reused and perfected over long
      periods of time.  Not only does this tend to eliminate design errors,
      but it allows for high confidence in the failure rates and the projected
      failure modes.  Do unexpected failure modes pop up occasionally that were
      not expected?  Sure, so what? -- the alternatives are worse.

  3)  Wearout failures are much easier to detect and protect yourself
      against (e.g., simple redundancy usually provides adequate protection)
      than design errors resulting in erroneous answers.

  4)  Tools and methods for building systems reliably and safely may be as
      important as other factors.  For example, system safety engineers have
      many time-honed procedures for assessing and enhancing safety in
      analog/mechanical systems but few of these have been extended to digital
      systems.  Same applies to mechanical engineers.  And they tend to be
      trained in using these procedures.

  5)  Because it is (seemingly) easy to provide a great deal of functionality
      with little increased cost or trouble, digital components tend to have
      greater functionality demanded of them (it is the usual argument for
      replacing mechanical/analog devices).  This increases the probability
      of design errors.

  6)  ... [lots of other complicating factors]


Re: "Wild Failure Modes" in Analog Systems (Hoover, RISKS-10.31)

<wolit@mhuxd.att.com>
Fri, 7 Sep 90 10:02 EDT
Might as well carry this nit-picking one level further.  As long as
your computer's transistors, capacitors, or whatever rely on electrons,
photons, or other quantum-mechanical wave/particles with discrete
states, you are justified in considering them to be digital.  But this
is all silly -- the implementation is irrelevant.  If you can treat
the computer as a black box that behaves digitally, why not label it
as such?

Jan Wolitzky, AT&T Bell Labs, Murray Hill, NJ; 201 582-2998 att!mhuxd!wolit
(Affiliation given for identification purposes only)


Analog vs Digital Controls

Martin Ewing <WING@Venus.YCC.Yale.Edu>
Thu, 6 Sep 90 22:27 EDT
Analog controls are not really the opposite of digital.  The main difference is
that digital logic often uses saturated transistors and obscure data coding as
a representation, or analog, of a physical parameter.  Digital systems do tend
to use an enormous number of transistors for even the simplest operations, but
they are integrated into a manageable number of chips.

Analog systems are plagued by poor gain calibrations, temperature drifts,
nonlinearities, and noise.  Nonlinearities can result in saturation and
"latch-up" behavior.  AC systems suffer from crosstalk, parasitic oscillations,
and lots of other ills.  A component failure can easily produce as drastic a
change in output as a digital failure might.

The "advantage" of analog systems is that they don't have software.  However,
they do have all the troubles listed above, which tend to limit functionality.
They also have circuit designers instead of programmers.

The safest control systems are passive ones, which use no analogs: reactors
that get less reactive at high temperatures and aircraft that fly themselves
with no control forces.

Martin Ewing, 203-432-4243, Ewing@Yale.Edu
Yale University Science & Engineering Computing Facility


Chaos

Peter da Silva <peter@ficc.ferranti.com>
Thu Sep 6 22:33:54 1990
> Thus the possible range of catastrophic effects are inherently greater in
> digital as opposed to analog systems.

Like the Tacoma Narrows bridge?

Peter da Silva.     +1 713 274 5180.     peter@ferranti.com


Re: Software bugs "stay fixed"? (RISKS-10.31, Parnas RISKS-10.32)

&ruce_Hamilton.OSBU_South@Xerox.com>
Thu, 6 Sep 1990 19:09:43 PDT
Re: "My perception is that they stay fixed, if they were actually fixed."

A nontrivial portion of the bugs we encounter in building and testing our large
systems are INTEGRATION (system-building) errors, where the wrong version of
some software was included.  Coding errors are only HALF the reason for
regression testing.

Bruce         BHamilton.osbuSouth@Xerox.COM        213/333-8075


Re: Software bugs "stay fixed"? (Parnas, RISKS-10.31)

K. M. Sandberg <sandberg@ipla01.hac.com>
7 Sep 90 11:37:16 GMT
One problem is that sometimes the source code is not managed properly and code
that has the bug is reintroduced when fixing another bug. Also it is possible
that the code was "shared" and used in other programs/subroutines or the logic
that caused the bug is still in the programmer's head.  Major updates to the
code could also lead to the reintroduction of the bug for several reasons
including some one removing the fix as it seems not to be needed (lack of
comments?)

In other words there are many things that could cause the bug to reappear when
it was really fixed. This is the real world where anything is possible
(Remember Murphy's Law).
                        Kemasa.


Re: Software bugs "stay fixed"? (RISKS-10.31)

<ark@research.att.com>
Fri, 7 Sep 90 09:29:12 EDT
I have had more experiences than I care to think about in which bugs have been
fixed, and fixed correctly, but then somehow the wrong version of the program
was sent to the user.

My `debugging rule number 0' is: before you go looking for a bug, make sure the
program you're looking at is the one you're running.  You'd be amazed how many
bugs have disappeared that way.
                                --Andrew Koenig


Re: Software bugs "stay fixed"? (Parnas, RISKS-10.32)

Michael tanner <mtanner@gmuvax2.gmu.edu>
Fri, 7 Sep 90 09:35:41 -0400
In practice the following occurs:

  1. Programmer A fixes a bug.  Some time later programmer B is given the same
     software to fix a different bug, or otherwise make changes.  He sees some
     extraneous code he doesn't understand, doesn't see how it could work, or
     whatever and in an attempt to clean up the program, deletes or changes
     it.  This turns out to be programmer A's bug fix, and the old bug is
     reintroduced.

  Or,

  2. Large systems get re-built occasionally, and sometimes with old versions
     of some routines, thus introducing old "fixed" bugs.

Users are accustomed to seeing old bugs resurface, and programmers often find
the above scenarios to be the cause.  Maybe good software practice would
prevent it, but it does happen.
                                                  -- mike

Michael C. Tanner, Dept. of Computer Science, George Mason University


Boot camping (Ultrix, Wortman, RISKS-10.32)

Timothy VanFosson <timv@cadfx.ccad.uiowa.edu>
Fri, 7 Sep 90 14:49:11 GMT
I too had a similar problem because of *my* fit of tidiness.  Although my
machine (a VS3100) would boot, certain login ids would would be required to go
through the login process (Xprompter) two or three times before they would
actually work.  I know this is true because my id was one of them.  I guess an
added risk to the situation is that you may go crazy trying to remember your
last three months' worth of passwords before you figure out that it is an OS
problem :-).

Timothy VanFosson, Senior Systems Analyst, University of Iowa
CAD-Research, 228 ERF, Iowa City, Iowa 52242     Phone : (319) 335-5728

Please report problems with the web pages to the maintainer

Top