The Risks Digest

The RISKS Digest

Forum on Risks to the Public in Computers and Related Systems

ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator

Volume 12 Issue 16

Monday 26 August 1991

Contents

o Pacific Bell "Message Center" failure in San Francisco Area
David Schachter
o More Risks of Computer Billing -- $22,000 water bill
PGN
o Risk Perception
Rodney Hoffman
o More on Houston Chronicle spacemail item
Joe Abernathy
o Internal computer fraud at Pinkerton
Rodney Hoffman
o P&G phone record search
Mark Seecof
o RISKS on trusting organizations like CERT
Jyrki Kuoppala
o TCAS sees ghosts
IEEE Spectrum article via Jim Horning
o More on the Lauda Air crash
Brian Acohido via Nancy Leveson
o Info on RISKS (comp.risks)

Pacific Bell "Message Center" failure in San Francisco Area, Aug. 1991

<llustig!david@decwrl.dec.com>
Sun Aug 25 23:25:21 1991
[David's comments follow my synopsising of a San Fran Chron article.  PGN]

  Pacific Bell's Message Center answering service broke down for 21 hours
  around the San Francisco Bay Area, affecting thousands of customers.  Two
  hardware cards converting voice to digital failed at the same time, just
  before noon on Thursday.  This was the longest outage since service began
  last November.  (There had been a four-hour outage last December.)  No
  messages could be recorded, and no recorded messages could be retrieved.
  However, no recorded messages were lost.  Previous problems had been in
  software, attributed to the "newness of the system".  Some grumbling was
  quoted about how "They're finding all these bugs at the expense of the
  customers.  [Source: San Francisco Chronicle, Saturday, August 24, 1991, page
  A10, headline "Pac Bell Message Center Breaks Down; Electronic answering
  service out of whack for 21 hours", By Dan Levy, Chronicle Staff Writer]

My comments:

1. Pacific Bell has been touting its residential voice mail as a more reliable
replacement for the answering machine.  They stopped the promotion for a time
after word got out that their system was losing about ten percent (!) of all
messages.

2. Pacific Bell's current promotion points out that answering machines are an
old technology, but voicemail is new.  Apparently, the company expects us to
believe that new == more reliable.

3. There are times when centralizing a function makes it more reliable.  This
doesn't appear to be one.  When the voicemail system went down, customers could
not even rush to a store to buy their own answering machine as a workaround, it
would appear.  And what voicemail customer would know about the failure?
Unlike an answering machine, which has a light to blink rapidly when the
machine detects a fault, residential voicemail does nothing, and since the
service is pitched as being "more reliable," why would you suspect it?

David Schachter       uucp:  ...!{decwrl,mips,sgi}!llustig!david


More Risks of Computer Billing -- $22,000 water bill

"Peter G. Neumann" <neumann@csl.sri.com>
Thu, 22 AUG 91 09:18:40 PDT
In Austin TX, Malcolm Graham received a water bill for $22,000, for using
almost 10 million gallons of water in one month.  The meter reading for the
month was slightly LESS than that for the previous month, which the computer
interpreted as wrap-around.  (A new meter had been installed between readings,
and not set properly.)  A manual review of unusually large bills failed to spot
that one.  A utility company spokesman said ``We have about 275,000 accounts
each month.  We just missed this one. If we only miss one a month, that's a
pretty good percentage.''  <Source: CITY SOAKS A CITIZEN FOR $22,000 BILL,
Article by Scott W. Wright [who'S got Right on it!], 1991 Cox News Service, 22
August 1991>


Risk Perception

Rodney Hoffman <Hoffman.El_Segundo@Xerox.com>
Mon, 26 Aug 1991 08:22:31 PDT
In a lengthy `Los Angeles Times' article focusing on AIDS infection from doctor
to patient (JUDGING THE RISKS OF INFECTION, 26 August 1991, page A1), writer
Janny Scott begins by highlighting recent research findings about risk
perception:

  * Unusual and unknown risks are more terrifying than familiar ones,
    even though everyday risks claim more lives.

  * Risks undertaken voluntarily seem more tolerable and controllable
    than lesser risks imposed from outside.

  * Many people have difficulty understanding probability.

  * Familiar accidents may go barely noticed, while unfamiliar ones
    may provoke panic, particularly if they seem to set a precedent.

  * Experts and lay people value risks differently: experts count
    lives lost while the general public focuses on many other factors,
    including fairness and controllability.

  * Once people make a decision about the size of a risk, their minds
    are difficult to change.

  * "One thing people care a lot about is dread." (Peter Sandman, Rutgers)


More on Houston Chronicle spacemail item (Abernathy, RISKS-12.15)

Joe Abernathy <chron!magic322!edtjda@uunet.UU.NET>
Thu, 22 Aug 91 13:56:22 CDT
I've interviewed the NASA experiment manager since then, and she described
NASA's statements as overwrought. The shuttle was not mail bombed; applelink
was, and this event was misportrayed to Josh Quittner. The flight director was
upset because they didn't want anyone to know they were using applelink; but
the atlantis account on applelink was created explicitly to facilitate the
interest expressed by the network community. I suspect that the confusion stems
from the confluence of divergent interests at work.

            [... which can also result in multimile piecemeal spacemail.  PGN]


Internal computer fraud at Pinkerton

Rodney Hoffman <Hoffman.El_Segundo@Xerox.com>
Fri, 23 Aug 1991 08:36:00 PDT
A short item from the August 22 `Los Angeles Times':

      PINKERTON WORKER PLEADS GUILTY TO COMPUTER FRAUD

Pinkerton Security & Investigation Services, the 141-year-old detective agency
whose slogan is "the eye that never sleeps," was caught napping by an employee
who embezzled more than $1 million from the firm.

Marita Juse, 48, of Burbank[, California], pleaded guilty to computer fraud
this week in U.S. District Court in Los Angeles.  Between January, 1988, and
December, 1990, Juse wired $1.1 million of Pinkerton funds to her own account
and accounts of two fictitious companies.  Pinkerton discovered the theft
through a routine audit conducted after Juse left the firm Jan. 7, said Sally
Phillips, assistant general counsel for Pinkerton.

Juse also pleaded guilty to unrelated 1986 charges of conspiracy, theft of
government property and false claims in connection with a scheme to submit
false tax returns claiming refunds.  She faces a maximum sentence of 30 years
and millions of dollars in fines.


P&G phone record search (RISKS-12.14)

Mark Seecof <marks@capnet.latimes.com>
Thu, 22 Aug 91 13:49:50 -0700
To add injury to insult, the 2/3 of a million Ohio telephone subscribers who
had their records searched by P&G (or the prosecutors P&G suborned) will have
to PAY for the computer time and other costs of the search in their "regulated"
bills.  I think that some of the subscribers ought to petition the Ohio PUC to
disallow the charges, on the theory that the telephone company failed to carry
out its duty to attempt to minimize regulated costs when it (the telco) did not
try to have the subpoena quashed.


RISKS on trusting organizations like CERT

Jyrki Kuoppala <jkp@cs.hut.fi>
Thu, 22 Aug 1991 20:28:24 +0300
The subject of this story is the unresponsiveness of CERT and vendors to
security holes and the risk that this creates when someone thinks that the
holes will get fixed once they are reported to CERT.

The topic of how Unix system vendors, Unix system user and administrators, and
organizations like CERT should react on security holes which are found on
widespread Unix systems always seems to cause some controversy and lots of
discussion.

If you publish a security vulnerability widely, people will complain that you
are giving information on how to break into systems to possible `crackers'.  If
you don't publish it, it perhaps will never get fixed.

I'd like to report one story concerning one particular security vulnerability
which allows any ordinary user to gain unauthorized superuser privileges -
perhaps it can be useful to people studying the problems of what to do in case
of a surfacing security hole and how to do it.  This article probably has only
a small fraction of the facts that have happened concerning this and related
vulnerabilities.  But it probably isn't very different from many stories of
similar security holes.

This article doesn't contain technical information - I'll post the details in a
subsequent article in alt.security, comp.sys.sun & alt.sys.sun.

    May 1989

I send a bug report to Sun about SunOS vulnerability concerning the SunRPC
service rpc.rwalld, the world-writability of /etc/utmp on SunOS and the fact
that tftpd is enabled and able to read and write the root filesystem on SunOS.
This bug report concerns SunOS 4.0.1 and previous versions.

The hole allows anyone to get in from the Internet as the superuser in a few
seconds on an off-the-box Sun.  As one suggested fix I recommend
write-protecting /etc/utmp.  I don't notify CERT - I think at the time I'm not
aware of CERT.  The hole is fixed in a subsequent OS release - I'm not sure,
but I think a separate fix is also published later.

    June 1989

I tell about the hole on the Sun-Spots mailing list (gatewayed as the Usenet
newsgroup comp.sys.sun) with some details blanked out and give suggested fixes.

A fix for the hole is published by Sun - I don't have records on when this
happened.

    September 1989

In a security-related bug report reporting also a few other holes, I send the
following to Sun and CERT (the Computer Emergency Response Team, an
organization established by the Defense Advanced Research Projects Agency,
DARPA, to address computer security concerns of research users of the
InterNet):

>5. /etc/utmp is world-writable.  This was one of the original causes
>ogf the rwall / wall / tftp hole, and probably takes part in other not
>yet surfaced security holes.
>
>FIX : chmod og-w /etc/utmp

    October 1989

I send a somewhat `details-blanked' version of the above-mentioned bug report
to the Sun-Spots combined mailing list and newsgroup, including the note about
utmp.

    May 1990

A security hole with the program `comsat' which is used to report the arrival
of new mail to users (enabled by the `biff' program) is discovered.  The
vulnerability gives unauthorized users root access.  The hole is reported to
Sun through JPL's Sun software representative.  It is also reported to the
Internet Computer Emergency Response Team (CERT) and the DDN Security
Coordination Center (SCC).

CERT & Sun publish no notice about the hole, no fix is published.  In the NASA
internal notice the suggested fix is to just disable comsat.

    March 1991

I independently find the hole with `comsat' and report it to Sun and Cert.
They don't say it's been reported before, and seem to be somewhat unresponsive
about it.  At the same time, I publish a rough outline of the hole on the net,
and I am told about the previous bug reports.  Meanwhile, Sun talks something
about a non-disclosure agreement that I should sign so I could get information
on a product which will fix the hole.

No notice to the net is made by Sun or CERT.  No fix is made available.

    April 1991

As nothing seems to happen, I get a bit frustrated and send more mail to Sun &
Cert:

>If you can't come up with at least some kind of a solution to the
>problem, perhaps someone on the Usenet can.  I'll post the detailed
>bug report & perhaps some additional suggestions of fixes to the
>Usenet newsgroup alt.security a month from now if a decent fix isn't
>available then.

There's some answer by email from CERT, some talk about what to do.  No answer
from Sun.  No notice to the net is made.  No fix is made available.

    August 1991

Nothing has still happened - no notice about the vulnerability has
been announced on the net.  Someone takes up /etc/hosts.equiv
containing '+' on comp.unix.admin.  I remember the promise I made and
write this article.

    Conclusions

>From CERT's press release 12/13/88, the paragraph quoted verbatim:

>It will also serve as a focal point for the research community for
>identification and repair of security vulnerabilities, informal
>assessment of existing systems in the research community, improvement
>to emergency response capability, and user security awareness.  An
>important element of this function is the development of a network of
>key points of contact, including technical experts, site managers,
>government action officers, industry contacts, executive-level decision
>makers and investigative agencies, where appropriate.

In the light of this story (and some other experience about CERT) I don't think
CERT is doing a good job on `identification and repair of security
vulnerabilities'.  It is a good thing to have a central point to contact when
trouble arises or when you have a security hole to report, and apparently CERT
is doing a good job in acting as this central point, and distributing bug
reports to the vendors.

But I think that it is not enough.  We need something more to fix the holes -
as with this bug, it seems that when the vendor does nothing to fix things,
CERT also sits idle, promptly forwards the bug report to /dev/null and does
nothing.

    Solutions?

I suggest we make it a policy that anyone who sends a security hole report to
CERT and/or a vendor will send it to the Usenet some time (perhaps six months?
a year?) after the ack from CERT or the vendor.

Any more suggestions to solve the problem ?


TCAS sees ghosts

Jim Horning <horning@Pa.dec.com>
Thu, 22 Aug 91 19:04:59 PDT
IEEE SPECTRUM, August 1991, page 58, Section "Faults & failures":

                               TCAS sees ghosts

A system that warns pilots of impending midair collisions is finally, after 30
years in development, being installed in the U.S. airline fleet.  The system,
called TCAS for traffic alert and collision avoidance system, sends a stream of
interrogation signals to the same equipment aboard nearby aircraft and from
their responses determines the planes' altitude, distance, and approach rate.

Plans call for all 4000 large aircraft in the United States to carry US$150,000
TCASs by the end of 1993.  But the phase-in suffered a short-lived --and
embarrassing--setback on May 2, when the Federal Aviation Administration (FAA)
ordered a shutdown of 200 of the 700 units that had been installed.  The 200
systems were seeing phantom aircraft and instructing pilots to evade planes
that simply were not there.

The cause was quickly identified as a software glitch.  More precisely, it
was a software gap--five lines of code missing from the faulty units.

Not subject to the problem were TCASs manufactured by the Bendix/King Division
of Allied Signal Inc., Baltimore, Md., and Honeywell Inc., Phoenix, Ariz.
These were allowed to continue in service.

However, TCASs made by Collins Defense Communications Division of Rockwell
International Corp., Dallas, were recalled so that the software could be
fixed.  The fix was simple: the units were reloaded with the correct program.

The problem arose in the course of testing, because Collins engineers had
temporarily disabled the program's range correlation function--a few brief
lines that compare a transponder's current response with previous ones and
discard any intended for other aircraft.  Without this filter, the system
can misinterpret a response as coming from a fast-approaching airplane.

After testing the systems, Collins shipped them to airline customers without
re-enabling the range correlation.  For the most part, the systems worked as
intended.  But in high-traffic areas where many airplanes are interrogating
each other--around Chicago, Dallas, and Los Angeles, particularly--ghosts
appeared frequently.  Pilots were misled, and air traffic controllers were
distracted from their routine tasks by the need to handle nonexistent
situations.

"A pilot would see the ghost image shoot across the screen because the on-board
system was accepting all the replies as other TCAS airplanes in the vicinity
interrogated the same TCAS transponder," Thomas Williamson, TCAS program
manager with the FAA in Washington, D.C., told IEEE SPECTRUM.

TCAS II, the system currently being installed, tells pilots to climb, dive,
or maintain the same altitude to avoid a collision.  It also displays nearby
planes on a small screen.  The system was first demonstrated in the early
1970s, but making it work reliably was difficult because of interference
by overlapping signals from multiple aircraft in crowded area.  The
interference was eliminated by using directional antennas and variable-
strength interrogation signals and developing range-correlation software
to eliminate multiple responses.

In the range correlation scheme, the system notes the distance at which it
first receives a response from another aircraft--say 10 miles.  At the next
interrogation, the distance may be 9.5 miles.  The system would then expect
the next response to be at approximately 9 miles, and would set a range gate
so that it could look for a signal at that distance and calculate the closure
rate.  Without this correlation, the system becomes confused.

The FAA emphasized that the software fault did not pose a hazard.  TCAS is
a backup system; primary responsibility for avoiding midair collisions still
remains with the ground-based air traffic control systems.  Moreover, the
FAA pointed out that TCAS has proved its worth in more than 1 million hours
of operation.

"Had the problem involved TCAS software on a generic basis, then we would
really be concerned," Williamson said, "But it was a breakdown in the quality
control procedures of a specific manufacturer."

For its part, Collins has promised customers that it will correct all 200
systems within 90 days after discovery of the problem.  "We'll be fully
operational across the board well within that time frame," said Charles
Wahag, Collins' manager of TCAS products.

Wahag defends Collins' quality control procedures, which were approved by
a team of FAA software experts.  "We had a simple human error where an
engineer misclassified the changes in the software," he told SPECTRUM.
"It didn't show up in our testing because one of the essential elements was
absent: you have to have many, many TCAS-equipped airplanes in the sky,"
as in the high-traffic-density areas where the ghost problem appeared.

To prevent similar omissions, Collins now requires that a committee of
software engineers review changes before a program is released.  "More than
one pair of eyes must review these things and make a decision," Wahag said.

COORDINATOR: George F. Watson
CONSULTANT: Robert Thomas, Rome Laboratory


More on the Lauda Air crash

<leveson@cs.washington.edu>
Sat, 24 Aug 91 10:06:45 -0700
   [The following item was abridged by Nancy Leveson, and further by me.
   Also, today's paper indicates the FAA has backed off on some of its
   restrictions.  PGN]

>From the Seattle Times, Friday August 23, 1991 (excerpts)

        Flawed part in 767 may be flying on other jets
          by Brian Acohido, Times Aerospace Reporter

   More than 1,400 Boeing 747, 757, and 737 jetliners may be flying with the
same type of flawed thrust-reverser system as the ill-fated Lauda Air 767 that
crashed in Thailand last spring.  A thrust reverser inexplicably deployed on
that May 26 flight, possibly flipping the plane into an uncontrollable crash
dive.  All 223 passengers and crew members were killed.
   Officials at Boeing and the Federal Aviation Administration say only that
the matter is `under review' and that they are conferring about possible safety
implications for Boeing models other than 767s.  The use of thrust reversers on
late-model 767s was banned last week by the FAA.  Also last week, Boeing
alerted airlines worldwide that it may, at some point, recommend that the
reversers of these other models be inspected.
   Industry sources say it appears a dangerously flawed safety device that is
an integral part of the reversers in question may be the same one that is in
widespread use on other Boeing models as well.  The device is called an
electronically actuated auto-restow mechanism.  The flaw was discovered last
week, and was considered potentially hazardous enough to prompt the FAA to
order reversers deactivated on 168 late-model 767s.  The ban is in effect until
Boeing redesigns the device.  [... lots of stuff deleted about the use of it on
other planes, etc.] [...]
   `In my estimation, the suggestion is very, very strong that there is the
distinct possibility there could be further danger with these other aircraft,'
said aviation safety analyst Hal Sproggis, a retired 747 pilot.  [... more
stuff deleted about arguments between the NTSB and the FAA about what should be
done.]   [...]
   On Boeing jets, reversers work like this: A door on the engine cowling
slides open, simultaneously extending panels called `blocker doors,' which
deflect thrust up and out through the cowling opening.  In flight, the cowling
door is designed to remain closed, with the blocker doors retracted, stowed,
and locked.  Depending on the engine type, the reverser system is powered
either pneumatically using pressurized air, or, like the Lauda jet,
hydraulically using pressurized oil.
   The flawed auto-restow device is designed to detect the system becoming
unlocked in flight and to move quickly to restow and relock the system before
any significant control problem can occur.  According to industry sources, the
NTSB, and the FAA, here's how the complex device works:
   An electronic sensor monitors the cowling and alerts a computer if the
cowling door moves slightly in flight.  The computer then automatically opens
an `isolation valve' which permits pressurized oil or air to flow into the
reverser system.  This actuates a very crucial, and -- as was revealed last
week by the FAA -- dangerously flawed part called a `directional control valve'
or DCV.  The DCV directs the pressurized oil or air to retract the blocker
doors and shut the cowling door.  The DCV can sit in only two positions: extend
or retract.  In flight, it is supposed to always remain in the retract
position, ready to do its part in auto restow.
   In older Boeing aircraft, a mechanical part physically prevented the
directional control valve from moving off the retract position as long as the
plane was airborne.  But in newer Boeing jets, the auto-restow mechanism is
controlled and kept in the retract position by electronic means.  `The reason
they go for these electronic reversers is strictly economic,' safety expert
Sproggis said.  `It saves weight, and, in commercial aviation, weight is money.'
   When Boeing certified its electronically controlled reverser system, the
company assured the FAA that it was fail-safe.  As a result, the FAA never
required the company to calculate or test what might happen should a reverser
deploy in flight at a high altitude and high speed, as happened on the Lauda
flight.
   After the Lauda crash, Boeing tested the system anew.  An engineer wondered
what would happen if a simple O-ring seal on the DCV deteriorated, with small
bits getting into the hydraulic lines.  A test was run.  The result: the DCV
clogged in such a way that when the auto restow was activated, the DCV moved
off the retract to the extend position.  Thus, the computer thought it was
instructing the DCV to restow when, in fact, it was deploying the reverser.
   `I think they (Boeing officials) expected bits of the O-ring to run right
through the system and were shocked when they saw the reverser deploy,' said a
source close to the Lauda investigation.
   After learning of the results of the O-ring test, the FAA, which to that
point had rejected repeated exhortations from NTSB Chairman James Kolstad to
ban reverser use on 767s, did just that.
   Another revelation likely was a factor in the decision to ban reversers on
767s, sources said.
   After the Lauda crash, the FAA ordered reversers inspected on 55 767s
powered by Pratt & Whitney PW4000 engines -- the same airframe/engine
combination as the Lauda plane.  (Later, Boeing revealed that a total of
168 767s actually use the same electronically controlled reverser system.)
   As 767 inspection reports came in, a disturbing pattern of chafed wires
and out-of-adjustment auto-restow sensors emerged.  In fact, nine out of every
10 planes checked had sensors out of adjustment, the FAA reported.
   Moreover, a Seattle Times review of five years of `service-difficulty
reports,' or SDRs, filed by U.S. airlines with the FAA shows a similar pattern
of reverser troubles for 747s, 737s, and 757s.
   Airlines are required to file SDRs with the FAA showing how various problems
are dealt with.  Problems with reversers on Boeing planes are cited on 118
reports from Jan. 1, 1985 through June 25, 1991, including 44 reports on 737
reversers, 25 on 747s, four on 757s, and three on 767s.
   SDRs have been widely criticized for being something less than comprehensive
because of the wide leeway airlines are granted in deciding what to report.
Even so, the reports ranged from cockpit warning lights flickering inexplicably
and sensors repeatedly turning up out of adjustment, to numerous instances of
stuck or leaking reverser parts.  One case involved a 747 aborting a flight
after a reverser deployed and broke up with a loud bang.  The plane landed
safely.
   A pattern of out-of-adjustment sensors suggests that maintenance
instructions provided by Boeing to the airlines are not clear or perhaps that
the part is badly designed and susceptible to readily moving out of adjustment,
said industry sources.  More significantly, it suggests that the auto-stow
system may be activating unnecessary -- or more slowly than its supposed to --
due to a sensor that's out of adjustment, sources said.  [... more discussion
deleted about the risk on other Boeing planes]
   During the ill-fated Lauda flight, pilot Thomas Welsh, formerly of Seattle,
discussed with this Austrian co-pilot, Josef Thurner, the flickering of a
cockpit warning signal indicating a possible problem with one of the reversers.
Everything was being handled routinely until a second warning signal indicated
the left reverser had somehow deployed.  Two seconds later, a loud snap is
heard on the cockpit recorder, followed by swearing and the sound of warning
tones.  Thirty-nine seconds after the snap, the tape ends with the sound of a
bang.  The left engine was recovered from the wreckage with the reverser
deployed, evidence that the DCV was improperly positioned, perhaps because it
was contaminated, sources say.
   Sources said the valve could have become contaminated by something other
than a bad O-ring and that investigators also are exploring the possibility
that a stray electrical current, vibration or some other phenomenon moved
the DCV to the deploy position.  A key piece of evidence that could provide
the answer -- the left DCV -- was missing from the wreckage.

  This incident brings up some important issues:
    -- The role of the computer in this particular accident
    -- The role and procedures of the FAA in regulating aircraft
    -- The trend to removing mechanical safety interlocks in order to save
       weight and the way that such cost/benefit decisions are being made.

  Note that there will be a session at SIGSOFT '91 (Software in Critical
  Systems) in December on government standards and regulation and that Mike
  Dewalt of the FAA (his title is "National Resource Specialist -- Software")
  will be discussing certification and standards for commercial avionics
  software.
                                                                          Nancy
  Prof. Nancy G. Leveson, University of California
    (on sabbatical at Univ. of Washington)

Please report problems with the web pages to the maintainer

Top