The Risks Digest

The RISKS Digest

Forum on Risks to the Public in Computers and Related Systems

ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator

Volume 23 Issue 24

Wednesday 3 March 2004

Contents

Risks of Leap Years and Dumb Digital Watches, quadrennial posting
Mark Brader
GAO's latest evaluation of DOD software development practice
James Paul
Trouble with Mars rover Spirit
Erling Kristiansen
RFID tags in new US notes explode when you try to microwave them
Michael Borek
State looks at false bills from AT&T
Peter Howe via Monty Solomon
California e-voting: did programmers even try it?
Joel Garry
Anti-Spam Solutions and Security, Neal Krawetz
Monty Solomon
Legal Mercedes driver jailed for 18 months
Stefan Lesser
Re: Stolen heart monitor
Dave Brunberg
Re: Buffer overflows and VMS
Stanley F. Quayle
Re: Buffer overflows and Burroughs/Unisys
Bill Hopkins
Re: A320 Incident
Peter B. Ladkin
Info on RISKS (comp.risks)

Risks of Leap Years and Dumb Digital Watches, quadrennial posting

<msb@vex.net (Mark Brader)>
Tue, 2 Mar 2004 19:33:50 -0500 (EST)

All right now, how many people reading this...

-> saw a previous version of this message in RISKS-6.34, 13.21, 17.81,
   and/or 20.83;

-> have watches that need to be set back a day because, unlike the
   smarter kind of digital watch, they went directly from February 28
   to March 1;

-> and *hadn't realized it yet*?

  [Please answer to yourself, but not to RISKS.  We don't really need a
  hand count.  PGN]


GAO's latest evaluation of DOD software development practice

<"Paul, James" <James.Paul@mail.house.gov>>
Tue, 2 Mar 2004 14:17:46 -0500

The General Accounting Office (GAO) today released the following report:

  Defense Acquisitions: Stronger Management Practices Are Needed to Improve
  DOD's Software-Intensive Weapon Acquisitions.  GAO-04-393, 1 Mar 2004
  http://www.gao.gov/cgi-bin/getrpt?GAO-04-393

Quoting from Highlights of GAO-04-393:

"Software developers and acquirers at firms that GAO visited use three
fundamental management strategies to ensure the delivery of high-quality
products on time and within budget: working in an evolutionary environment,
following disciplined development processes, and collecting and analyzing
meaningful metrics to measure progress.  When these strategies are used
together, leading firms are better equipped to improve their software
development processes on a continuous basis.  An evolutionary approach sets
up a more manageable environment - one in which expectations are realistic
and developers are permitted to make incremental improvements.  The customer
benefits because the initial product is available sooner and at a lower,
more predictable cost.  This avoids the pressure to incorporate all the
desired capabilities into a single product right away.  Within an
evolutionary environment, there are four phases that are common to software
development: setting requirements, establishing a stable design, writing
code, and testing.  At the end of each of these phases, developers must
demonstrate that they have acquired the right knowledge before proceeding to
the next development phase.  To provide evidence that the right knowledge
was captured, leading developers emphasize the use of meaningful metrics,
which helps developers, managers, and acquirers to measure progress.  These
metrics focus on cost, schedule, the size of a project, performance
requirements, testing, defects, and quality.  "In a review of five DOD
programs, GAO found that outcomes were mixed for software-intensive
acquisitions.  The F/A-18 C/D, a fighter and attack aircraft, and the
Tactical Tomahawk missile had fewer additional cost and schedule delays.
For these programs, developers used an evolutionary approach, disciplined
processes, and meaningful metrics.  In contrast, the following programs,
which did not follow these management strategies, experienced schedule
delays and cost growth: F/A-22, an air dominance aircraft; Space-Based
Infrared System, a missile-detection satellite system; and Comanche, a
multimission helicopter...."

Why GAO Did This Study

	"The Department of Defense (DOD) has been relying increasingly on
computer software to introduce or enhance performance capabilities of major
weapon systems.  To ensure successful outcomes, software acquisition
requires disciplined processes and
practices.  Without such discipline, weapon programs encounter difficulty in
meeting cost and schedule targets.  For example, in fiscal year 2003, DOD
might have spent as much as $8 billion to rework software because of
quality-related issues.
	"GAO was asked to identify the practices used by leading companies
to acquire software and to analyze the causes of poor outcomes of selected
DOD programs.  GAO also was asked to evaluate DOD's efforts to develop
programs for improving software acquisition processes and to assess how
those efforts compare with leading companies' practices."

What GAO Recommends

"GAO recommends that the Secretary of Defense direct the military services
and agencies to adopt specific controls to improve software acquisition
outcomes.  These practices should be incorporated into DOD policy, software
process improvement plans, and development contracts.  DOD concurred with
two revised recommendations and partially concurred with two others."


Trouble with Mars rover Spirit

<Erling.Kristiansen@esa.int>
Tue, 2 Mar 2004 13:55:38 +0100

A good explanation of what happened to Spirit is at
  http://www.eetimes.com/sys/news/OEG20040220S0046

In brief:
- A software upload took place in order to correct some problem
- A utility to delete files belonging to the old software was uploaded, but
the upload failed
- This failure was apparently forgotten or ignored, resulting in less file
space being available for experiment data than anticipated
- When the file system overflowed, a reboot occurred. This, apparently, was
the intended behaviour
- The reboot could not complete due to insufficient available file space
- An infinite loop of reboots was entered

RISKS, as I perceive them:
- Relying on mission planners, working on assumed rather than actual file
space, to not overflow file system
- A file system that does not fail gracefully when overflowed
- A boot sequence that requires resources that may become unavailable


RFID tags in new US notes explode when you try to microwave them

<mikkeles@netscape.net (Michael Borek)>
Tue, 02 Mar 2004 09:54:25 -0500

"Prison Planet" (http://www.prisonplanet.com/022904rfidtagsexplode.html) is
carrying a story of an experiment on microwaving the new US $20.00 bills
which contain RFIDs.  (The site includes pictures of the results.)

Apparently, the new bills were setting off every RFID monitor they passed
through!  Wrapping the bills in aluminium foil stopped this behaviour, but
they decided to try microwaving the bills.  This led to the RFIDs exploding
and burning the face of Andrew Jackson's picture.

This could become quite common a problem as word passes that microwaving can
destroy RFIDs without the proviso that this may (will) damage the goods.

(A discussion is also being carried on Slashdot:
http://slashdot.org/article.pl?sid=04/03/02/0535225&mode=nested)


State looks at false bills from AT&T

<Monty Solomon <monty@roscom.com>>
Tue, 2 Mar 2004 12:24:52 -0500

Company denies telemarket scheme, Peter J. Howe, *The Boston Globe* 2 Mar 2004

Massachusetts utility regulators said yesterday that they are investigating
a pattern of AT&T Corp. allegedly sending bogus bills to people who are not
customers of the company, then trying to sell them AT&T phone service when
they call to complain.

After similar concerns emerged in upstate New York last week, the
Massachusetts Department of Telecommunications and Energy said yesterday it
has received more than 30 complaints since January from Bay State residents
who said they got bills from AT&T although they have never had AT&T service
or canceled it months or years earlier.  ...

http://www.boston.com/business/globe/articles/2004/03/02/state_looks_at_false_bills_from_att/


California e-voting: did programmers even try it?

<"Joel Garry" <JoelG@AnabolicInc.com>>
Tue, 2 Mar 2004 10:36:32 -0800

I think I electronically voted at a polling place in the California Primary
today.  California has an odd primary rule where, if you register
non-partisan, you are allowed to put bad data into the party of your choice,
if they have chosen to let non-partisans do so.  So I decided on a
particular party, inserted the card, chose "large print," and was presented
with page 1 of 8, a blank screen.  It seems the program dynamically assigns
the various items to the pages, which evaluates to 4 pages with normal type,
and 7 for large type.  So for large type it doubles the number of regular
type pages and presents the blank page first.  Poll workers did not seem to
understand this, and the help line for them to call was continually busy.

  [This RISK intentionally left blank.  JG]


Anti-Spam Solutions and Security, Neal Krawetz

<Monty Solomon <monty@roscom.com>>
Tue, 2 Mar 2004 00:47:23 -0500

Dr. Neal Krawetz, Anti-Spam Solutions and Security
SecurityFocus, last updated 26 Feb 2004,
http://www.securityfocus.com/infocus/1763

In a recent survey, 93% of respondents reported dissatisfaction with the
large volume of unsolicited e-mail (spam) they receive.  The problem has
grown to the point where nearly 50% of the world's e-mail is spam, yet only
a few hundred groups are responsible.  Many anti-spam solutions have been
proposed and a few have been implemented. Unfortunately, these solutions do
not prevent spam as much as they interfere with every-day e-mail
communications.

The problems posed by spam have grown from simple annoyances to significant
security issues. The deluge of spam costs up to an estimated $20 billion
each year in lost productivity -- according to the same document, spam
within a company can cost between $600 and $1,000 per year for every user.


Legal Mercedes driver jailed for 18 months

<Stefan Lesser <stefan.lesser@burdadigital.de>>
Tue, 2 Mar 2004 12:40:57 +0100 (W. Europe Standard Time)

Re: Solving e-mail problems economically (RISKS-23.22)

> The [...] virus writers are no more responsible for the amounts of junk
> [...] than I am responsible for the damage caused by an automobile whose
> driver does not observe my bicycle until the last second and manoeuvres
> suddenly.

In recent German jurisdiction, exactly that happened.

The driver of a Mercedes Coupe was jailed for 18 months after the death of a
woman and her 2-year-old daughter. Apparently the mother, who was driving on
the leftmost lane of a 3-lane Autobahn, was frightened by the fast approach
of the Coupe and swerved right across all three lanes into a tree. The cars
didn't even touch each other.  Although at the time of the accident the
Mercedes was going at nearly 155mph, there was nothing illegal about that as
there are no "global" speed limits on the German Autobahnen.

Apart from the somewhat irritating court decision, which is open for
retrial, there now seems to be the actual RISK of being prosecuted for the
mis-reactions of others...  [Also, death threats against the judge!  PGN]

English URL: http://www.timesonline.co.uk/article/0,,588-1006376,00.html
German  URL: http://www.stuttgarter-nachrichten.de/stn/page/detail.php/600908

Stefan Lesser, Support Center Muenchen, Burda Digital Systems GmbH,
Am Kestendamm 2, 77652 Offenburg  http://www.burdadigital.de  +49/89/9250-3433

  [Interesting case.  Check out the details.  PGN]


Re: Stolen heart monitor

<"Dave Brunberg" <DBrunber@FBLEOPOLD.com>>
Tue, 2 Mar 2004 08:56:50 -0500

>>The idea of an implantable medical device apparently requiring physical
>>reconfiguration (at least) to talk to an external monitor implies a level of
>>trust in the reliability of the external device which is seriously scary.
>>The RISKS hardly need pointing out here...

I think the RISKS of allowing unauthenticated remote reprogrammability of an
implanted medical device may be just as scary.  One way of reducing that
RISK may be to have some sort of an "emergency broadcast" safe mode in which
a new external monitor could identify itself to the implant and authorize
through a highly secure key which would require knowledge of a passphrase to
transmit.  Of course, you'd really have to remember to change the default
password....

David W. Brunberg, Engineering Supervisor, The F.B. Leopold Company, Inc.


Re: Buffer overflows and VMS (Levine, RISKS-23.22)

<"Stanley F. Quayle" <stan@stanq.com>>
Tue, 02 Mar 2004 09:55:28 -0500

> IIRC the late VAX/VMS systems also had built in buffer overflow prevention
> features, probably a lesson learned from Multics.

In fact, all systems running OpenVMS (formerly known as VAX/VMS) do.  Memory
protection was built into the design of both the VAX and Alpha processors.
The Itanium version of OpenVMS uses the processor's built-in in memory
protection, as well.

Pages can be set to be most any combination of read, write, or execute, with
different protections depending on operating mode: user, supervisor,
executive, or kernel.

C programmers moving to OpenVMS quickly discover what the ACCVIO error
message means.

Stanley F. Quayle, P.E. N8SQ  +1 614-868-1363  Fax: +1 614 868-1671
8572 North Spring Ct. NW, Pickerington, OH  43147


Re: Buffer overflows and Burroughs/Unisys (Gobeski, RISKS-23.22)

<"Bill Hopkins" <whopkins@wmi.com>>
Tue, 2 Mar 2004 12:02:47 -0500

Keith Gobeski's note on the Burroughs stack architecture's improvements over
many of its successors (compliment stolen from Wirth, IIRC) reminds me of
the discussions in the late '70s of whether and how it could support the
hot, new C language.  Eventually it could, mostly, but it was ugly.

The key to a language-oriented architecture is preserving the language's
abstractions (in this case the data array abstraction) in the run-time
environment.  The Algol abstraction is clean: an array and an index combine
properly if the index is within the declared range; otherwise it is illegal.
An array row descriptor identified the memory segment and the limit, so an
out-of-range access caused an exception.  There was no other way for a
program to get at the memory, so it was inherently safe.

Since C's memory abstraction is basically the hardware address, and
addresses can be manipulated arbitrarily, any reasonably complete C
implementation had to abandon tagged segments as array rows, instead putting
all arrays in a single segment, keeping an allocation map in a
locally-managed stack, etc. in order to generate an index for a memory
access.  All the (parallel, fast) hardware assist for arrays was lost,
replaced by (sequential, slower) software Some of the pathological C
constructions were still impossible, of course; you couldn't force it to
execute data.  As Unix (itself a reaction to Multics OS complexity) became
trendy, who knew this was a Good Thing?

The rise of the processor chip (and byte-oriented memory, and college
courses in C) put an end to the experiments outside Burroughs/Unisys, as
chip areas forced simple architectures (some actually *called* RISCs)
without support for elegant and useful abstractions.  Heck, software can
always make up for hardware deficiencies, right?

(A 1982 ACM Computer Architecture News paper, "The Architecture of the
Burroughs B5000 --20 Years Later and Still Ahead of the Times?" by Alastair
J. W. Mayer, is still relevant another 20 years later.  It's on his web site
at http://www.ajwm.net/amayer/papers/B5000.html)


Re: A320 Incident (Ladkin, RISKS-21.48, June 2001)

<"Peter B. Ladkin" <ladkin@rvs.uni-bielefeld.de>>
Tue, 02 Mar 2004 11:41:42 +0100

In RISKS-21.48, 21 June 2001, I reported on an incident to a Lufthansa
A320. The A320 is a "fly-by-wire" aircraft, in that primary control is
effected through computers and electrics rather than mechanical means.

The captain's (CAP) sidestick controller was miswired during maintenance so
that a "bank right" command initiated a "bank left" control signal and vice
versa. This was discovered on take-off, when the captain corrected a left
wing dip due to turbulent air flow with right sidestick movement ("bank
right") and the aircraft's left wing dipped further, reportedly coming
within two meters of touching the ground. The copilot took priority control
(a feature of the electronic control architecture) and recovered the
aircraft. The crew flew up a few thousand feet altitude, familiarised
themselves with the problem as best they could, and returned to land the
aircraft.

It turned out that two wires connecting the CAP's sidestick to one Elevator
and Aileron Computer (ELAC), of which there are two, had been
reverse-connected during maintenance, and the fault had been discovered
neither by post-maintenance check, nor by post-maintenance cross-check, nor
by the flight crew's pre-take-off control system check.

I had suggested in my Risks 21.48 note that I was puzzled by the partial
reports of the incident then available. The final report was published as
report 5X004-0/01 in April 2003 by the German Federal Bureau of Aircraft
Accidents Investigation (german acronym BFU) and is available in English at
http://www.bfu-web.de/berichte/01_5x004efr.pdf Thanks to John Sampson for
bringing it to my attention.

Salient facts are as follows. During previous flights, one of the two ELACs
failed. Maintenance found a defect in the X-TALK-BUS between ELAC Nos 1 and
2, found to be "caused by a bent connection pin (Pin 6K) in the plug segment
AE of the socket for the ELAC no. 1." The attempt to replace just the pin
failed and it was decided to replace the plug segment AE. There was no
suitable spare plug AE for this series of airplane in stock, and the AE
segments they had were not compatible with the remaining installed segments,
so it was decided to replace all four plug segments AA, AB, AD, AE with a
compatible set. This meant that "in a confined space approx. 420 assigned
connector pins had to be reconnected."

The method chosen was "ONE TO ONE", whereby "the wires were disconnected one
after the other from the old plug and immediately connected to the new one."
The mechanics used the wrong wiring diagram.

How could this be? Well, an airplane and its equipment are identified by
serial number (SN). The manufacturer knows what equipment was installed at
build. Subsequently, the manufacturer issues Service Bulletins (SB) for
modifications to installed equipment, and these SBs have different grades of
urgency. Some are only "recommended", for example. So the installed
equipment is identified by SN, and further by the log of which SBs have been
accomplished. The applicable wiring diagram on p2 of the Airplane Wiring
Manual (AWM) contained a designation that said it was applicable to
airplanes with an "effectivity range" of 013-018 and those with effectivity
001-012 on which SB 27-1030 had been accomplished. P4 of the AWM was
applicable to those airplanes with effectivity range 001 to 034 on which
SB27-1030 and SB27-1084 had been accomplished. SB27-1084 had been
accomplished, but not SB27-1030, and the aircraft had effectivity 017.
Hence p2 was applicable, but the mechanics thought p4 was applicable as
SB27-1084 had been accomplished.

Each numbered wire consists of a twisted red-blue pair. In segment AE, the
"Monitor Channel" is connected by pair 0603. The Control Channel is
connected by pair 0597. P2 specifies that these wires must be
cross-connected (blue to red, red to blue) between the sidestick and ELAC
plug. P4 specifies that these wires must be connected straight through (red
to red, blue to blue).

Furthermore, in the Aircraft Wiring List (AWL), the twisted pairs are always
assigned in the order red, then blue, in the alphanumeric sequence of plug
segment coordinates, except for these two wires. Wire 0603 is assigned blue
then red to the pins 3C/3D, and wire 0597 blue then red to 15J/15K. Why? The
manufacturer wanted to effect a uniform wiring for all its FBW airplanes,
and from a certain type series on, the A320 wiring was planned to be
identical to that of the A330 and A340. An interchange of colors was
accepted for a certain transition period, and this aircraft belonged to the
transition series.

The BFU report points out that, had only the AE segment been exchanged, only
the Monitor Channel would have been falsely connected, and with high
probability an error message would have appeared on the cockpit aircraft
monitoring display (ECAM). It doesn't say at which point this message would
have appeared - at check, at cross-check (both performed only with the right
sidestick), or at pre-take-off check (about which I speculate that maybe
only the right side stick operation checked again - see last paragraph).

The process of reconnecting 400-odd wires was a "major action on the control
system", and the manufacturer Airbus requires in AMM 20-52-10 that a
continuity check be performed on each individual wire, followed by an
operational or functional test of the related function. This action was
orally cancelled by maintenance supervisors upon inquiry by the mechanics,
the reason being that the functional test to be performed after maintenance
would reveal wiring errors. Well, it didn't.

It was also required to perform a functional check and a control system
check independently of each other. Well, they were performed simultaneously,
and the check person "had not been informed sufficiently about the previous
work flow", in particular that the reconnected wires had not been measured
as required.

Further, the control system test and functional test were performed only
from the right sidestick, not from both, and a visual comparison check of
the control surfaces was not performed. The report points out that the
manufacturer's instruction in AMM-27-93-00-710-050 is ambiguous. It talks
about how to perform the test with "the side stick". There are two. The
mechanics told the investigators that it did not matter which sidestick was
used to perform the tests, since "as both ELACs were connected to each
other[,] possible faults of the one or the other ELAC would surely be
indicated. This statement indicates lacking system knowledge of the
mechanics."

The cross-check staff member also used the same system documents to conduct
his cross-check that were used by the staff member who conducted the first
check. Regulations require a second set of documents to be used, to assure
independence, which was thereby lost.

The BFU points out that reconnecting all 400* wires of the ELAC plug "was
connected to a high risk of errors." It also says that "a complicated and
complex documentation system which tus is difficult to handle increases the
risk of mistakes. The 173 procedural instructions valid for the area
concerned contain many cross references making handling considerably more
difficult. It was very time-consuming to find out which procedural
instructions were relevant to the tasks to be performed."

The BFU also points out that quality assurance and monitoring, including
checks of the maintenance organisation by the LBA (the german regulator)
were inadequate.

After starting the engines, the AFTER START CHECKLIST for flight crew
apparently only contained the instruction that the lateral flight controls
were to be checked for full deflection, but not for the correct direction of
deflection.

The report illustrates the "causal chain" through the "Swiss Cheese Model"
of James Reason. The "holes" that "line up" and allow the accident to happen
are:
1. "Quality assurance: insufficient support of the work flow,
   misinterpretation of regulations";
2. "Documentation: complex, difficult-to-handle working documents;
   accomplished works was [sic] misdocumented";
3. "Mistakes: inverted connection of 2 pairs of wires on ELAC plugs";
4. "Test procedures: use of incorrect documentation wrongly accomplished
   tests; severity of action was not kept in mind";
5. "Flight Operation: Checklist are [sic] insufficient; aileron deflection
   were [sic] not checked for correct deflection" leading to
   "Occurrence: "Serious Incident" Aircraft reacts inverted to the input
   of the left sidestick at the time of the take off".

These factors correspond roughly to the statement of causes and contributing
factors.

In my RISKS-21.48 note, I recounted my puzzlement engendered by the partial
accounts then available, on the basis
(a) hat the plugs were standardised, and that
(b) a mistaken wire-up on ELAC 1 would have caused command signals
    in the reverse sense to those detected by ELAC 2 and the three Spoiler
    Elevator Computers (SEC), and I felt this should have been detected by
    the aircraft monitoring systems.

Concerning (a), the report makes clear that the plug wiring was by no means
standardised. The airplane belonged to a "transitional series" in which two
wiring pairs were to be cross-connected, and the mechanics thought they
should be connected straight-through, thanks to confusion over the
appropriate wiring diagram.

Concerning (b), the control signal discrepancy - ELAC 1 sensing a "bank
left" command and and ELAC 2 and the three SECs sensing "bank right" - was
not detected by the aircraft monitoring systems and displayed on the ECAM
during test because the left sidestick was not tested. However, had CAP
checked sidestick deflection during pre-take-off check, this discrepancy
would surely have been triggered. Had only the first officer checked the
lateral controls, the discrepancy would not have shown. The report says that
"according to statements of the crew, this check was accomplished pursuant
to the valid procedures." I wonder whether the "valid procedures" require
both pilots to perform pre-take-off flight control checks?

So the report leaves me still puzzled. If the CAP's sidestick had been moved
in the direction of lateral control at any time before takeoff, then the two
ELACs would have received contradictory sensor information, and ELAC No. 1
would have received sensor information contradicting that received by the
three SECs. I also suppose that both pilots should to perform pre-take-off
control checks, since sidestick operation is independent. So we are either
to suppose that a standard comparison across multiple channels is not
performed by the control system architecture, or else that CAP did not
perform a control check before departure and therefore either the
pre-take-off checklist procedures omit an important requirement not noted by
the BFU, or that the crew lied about performing the check according to
procedures. It would have been more satisfactory had the report sorted these
possibilities out.

Peter B. Ladkin, University of Bielefeld, http://www.rvs.uni-bielefeld.de

Please report problems with the web pages to the maintainer

Top