All right now, how many people reading this... -> saw a previous version of this message in RISKS-6.34, 13.21, 17.81, and/or 20.83; -> have watches that need to be set back a day because, unlike the smarter kind of digital watch, they went directly from February 28 to March 1; -> and *hadn't realized it yet*? [Please answer to yourself, but not to RISKS. We don't really need a hand count. PGN]
The General Accounting Office (GAO) today released the following report: Defense Acquisitions: Stronger Management Practices Are Needed to Improve DOD's Software-Intensive Weapon Acquisitions. GAO-04-393, 1 Mar 2004 http://www.gao.gov/cgi-bin/getrpt?GAO-04-393 Quoting from Highlights of GAO-04-393: "Software developers and acquirers at firms that GAO visited use three fundamental management strategies to ensure the delivery of high-quality products on time and within budget: working in an evolutionary environment, following disciplined development processes, and collecting and analyzing meaningful metrics to measure progress. When these strategies are used together, leading firms are better equipped to improve their software development processes on a continuous basis. An evolutionary approach sets up a more manageable environment - one in which expectations are realistic and developers are permitted to make incremental improvements. The customer benefits because the initial product is available sooner and at a lower, more predictable cost. This avoids the pressure to incorporate all the desired capabilities into a single product right away. Within an evolutionary environment, there are four phases that are common to software development: setting requirements, establishing a stable design, writing code, and testing. At the end of each of these phases, developers must demonstrate that they have acquired the right knowledge before proceeding to the next development phase. To provide evidence that the right knowledge was captured, leading developers emphasize the use of meaningful metrics, which helps developers, managers, and acquirers to measure progress. These metrics focus on cost, schedule, the size of a project, performance requirements, testing, defects, and quality. "In a review of five DOD programs, GAO found that outcomes were mixed for software-intensive acquisitions. The F/A-18 C/D, a fighter and attack aircraft, and the Tactical Tomahawk missile had fewer additional cost and schedule delays. For these programs, developers used an evolutionary approach, disciplined processes, and meaningful metrics. In contrast, the following programs, which did not follow these management strategies, experienced schedule delays and cost growth: F/A-22, an air dominance aircraft; Space-Based Infrared System, a missile-detection satellite system; and Comanche, a multimission helicopter...." Why GAO Did This Study "The Department of Defense (DOD) has been relying increasingly on computer software to introduce or enhance performance capabilities of major weapon systems. To ensure successful outcomes, software acquisition requires disciplined processes and practices. Without such discipline, weapon programs encounter difficulty in meeting cost and schedule targets. For example, in fiscal year 2003, DOD might have spent as much as $8 billion to rework software because of quality-related issues. "GAO was asked to identify the practices used by leading companies to acquire software and to analyze the causes of poor outcomes of selected DOD programs. GAO also was asked to evaluate DOD's efforts to develop programs for improving software acquisition processes and to assess how those efforts compare with leading companies' practices." What GAO Recommends "GAO recommends that the Secretary of Defense direct the military services and agencies to adopt specific controls to improve software acquisition outcomes. These practices should be incorporated into DOD policy, software process improvement plans, and development contracts. DOD concurred with two revised recommendations and partially concurred with two others."
A good explanation of what happened to Spirit is at http://www.eetimes.com/sys/news/OEG20040220S0046 In brief: - A software upload took place in order to correct some problem - A utility to delete files belonging to the old software was uploaded, but the upload failed - This failure was apparently forgotten or ignored, resulting in less file space being available for experiment data than anticipated - When the file system overflowed, a reboot occurred. This, apparently, was the intended behaviour - The reboot could not complete due to insufficient available file space - An infinite loop of reboots was entered RISKS, as I perceive them: - Relying on mission planners, working on assumed rather than actual file space, to not overflow file system - A file system that does not fail gracefully when overflowed - A boot sequence that requires resources that may become unavailable
"Prison Planet" (http://www.prisonplanet.com/022904rfidtagsexplode.html) is carrying a story of an experiment on microwaving the new US $20.00 bills which contain RFIDs. (The site includes pictures of the results.) Apparently, the new bills were setting off every RFID monitor they passed through! Wrapping the bills in aluminium foil stopped this behaviour, but they decided to try microwaving the bills. This led to the RFIDs exploding and burning the face of Andrew Jackson's picture. This could become quite common a problem as word passes that microwaving can destroy RFIDs without the proviso that this may (will) damage the goods. (A discussion is also being carried on Slashdot: http://slashdot.org/article.pl?sid=04/03/02/0535225&mode=nested)
Company denies telemarket scheme, Peter J. Howe, *The Boston Globe* 2 Mar 2004 Massachusetts utility regulators said yesterday that they are investigating a pattern of AT&T Corp. allegedly sending bogus bills to people who are not customers of the company, then trying to sell them AT&T phone service when they call to complain. After similar concerns emerged in upstate New York last week, the Massachusetts Department of Telecommunications and Energy said yesterday it has received more than 30 complaints since January from Bay State residents who said they got bills from AT&T although they have never had AT&T service or canceled it months or years earlier. ... http://www.boston.com/business/globe/articles/2004/03/02/state_looks_at_false_bills_from_att/
I think I electronically voted at a polling place in the California Primary today. California has an odd primary rule where, if you register non-partisan, you are allowed to put bad data into the party of your choice, if they have chosen to let non-partisans do so. So I decided on a particular party, inserted the card, chose "large print," and was presented with page 1 of 8, a blank screen. It seems the program dynamically assigns the various items to the pages, which evaluates to 4 pages with normal type, and 7 for large type. So for large type it doubles the number of regular type pages and presents the blank page first. Poll workers did not seem to understand this, and the help line for them to call was continually busy. [This RISK intentionally left blank. JG]
Dr. Neal Krawetz, Anti-Spam Solutions and Security SecurityFocus, last updated 26 Feb 2004, http://www.securityfocus.com/infocus/1763 In a recent survey, 93% of respondents reported dissatisfaction with the large volume of unsolicited e-mail (spam) they receive. The problem has grown to the point where nearly 50% of the world's e-mail is spam, yet only a few hundred groups are responsible. Many anti-spam solutions have been proposed and a few have been implemented. Unfortunately, these solutions do not prevent spam as much as they interfere with every-day e-mail communications. The problems posed by spam have grown from simple annoyances to significant security issues. The deluge of spam costs up to an estimated $20 billion each year in lost productivity — according to the same document, spam within a company can cost between $600 and $1,000 per year for every user.
Re: Solving e-mail problems economically (RISKS-23.22) > The [...] virus writers are no more responsible for the amounts of junk > [...] than I am responsible for the damage caused by an automobile whose > driver does not observe my bicycle until the last second and manoeuvres > suddenly. In recent German jurisdiction, exactly that happened. The driver of a Mercedes Coupe was jailed for 18 months after the death of a woman and her 2-year-old daughter. Apparently the mother, who was driving on the leftmost lane of a 3-lane Autobahn, was frightened by the fast approach of the Coupe and swerved right across all three lanes into a tree. The cars didn't even touch each other. Although at the time of the accident the Mercedes was going at nearly 155mph, there was nothing illegal about that as there are no "global" speed limits on the German Autobahnen. Apart from the somewhat irritating court decision, which is open for retrial, there now seems to be the actual RISK of being prosecuted for the mis-reactions of others... [Also, death threats against the judge! PGN] English URL: http://www.timesonline.co.uk/article/0,,588-1006376,00.html German URL: http://www.stuttgarter-nachrichten.de/stn/page/detail.php/600908 Stefan Lesser, Support Center Muenchen, Burda Digital Systems GmbH, Am Kestendamm 2, 77652 Offenburg http://www.burdadigital.de +49/89/9250-3433 [Interesting case. Check out the details. PGN]
>>The idea of an implantable medical device apparently requiring physical >>reconfiguration (at least) to talk to an external monitor implies a level of >>trust in the reliability of the external device which is seriously scary. >>The RISKS hardly need pointing out here... I think the RISKS of allowing unauthenticated remote reprogrammability of an implanted medical device may be just as scary. One way of reducing that RISK may be to have some sort of an "emergency broadcast" safe mode in which a new external monitor could identify itself to the implant and authorize through a highly secure key which would require knowledge of a passphrase to transmit. Of course, you'd really have to remember to change the default password.... David W. Brunberg, Engineering Supervisor, The F.B. Leopold Company, Inc.
> IIRC the late VAX/VMS systems also had built in buffer overflow prevention > features, probably a lesson learned from Multics. In fact, all systems running OpenVMS (formerly known as VAX/VMS) do. Memory protection was built into the design of both the VAX and Alpha processors. The Itanium version of OpenVMS uses the processor's built-in in memory protection, as well. Pages can be set to be most any combination of read, write, or execute, with different protections depending on operating mode: user, supervisor, executive, or kernel. C programmers moving to OpenVMS quickly discover what the ACCVIO error message means. Stanley F. Quayle, P.E. N8SQ +1 614-868-1363 Fax: +1 614 868-1671 8572 North Spring Ct. NW, Pickerington, OH 43147
Keith Gobeski's note on the Burroughs stack architecture's improvements over many of its successors (compliment stolen from Wirth, IIRC) reminds me of the discussions in the late '70s of whether and how it could support the hot, new C language. Eventually it could, mostly, but it was ugly. The key to a language-oriented architecture is preserving the language's abstractions (in this case the data array abstraction) in the run-time environment. The Algol abstraction is clean: an array and an index combine properly if the index is within the declared range; otherwise it is illegal. An array row descriptor identified the memory segment and the limit, so an out-of-range access caused an exception. There was no other way for a program to get at the memory, so it was inherently safe. Since C's memory abstraction is basically the hardware address, and addresses can be manipulated arbitrarily, any reasonably complete C implementation had to abandon tagged segments as array rows, instead putting all arrays in a single segment, keeping an allocation map in a locally-managed stack, etc. in order to generate an index for a memory access. All the (parallel, fast) hardware assist for arrays was lost, replaced by (sequential, slower) software Some of the pathological C constructions were still impossible, of course; you couldn't force it to execute data. As Unix (itself a reaction to Multics OS complexity) became trendy, who knew this was a Good Thing? The rise of the processor chip (and byte-oriented memory, and college courses in C) put an end to the experiments outside Burroughs/Unisys, as chip areas forced simple architectures (some actually *called* RISCs) without support for elegant and useful abstractions. Heck, software can always make up for hardware deficiencies, right? (A 1982 ACM Computer Architecture News paper, "The Architecture of the Burroughs B5000 --20 Years Later and Still Ahead of the Times?" by Alastair J. W. Mayer, is still relevant another 20 years later. It's on his web site at http://www.ajwm.net/amayer/papers/B5000.html)
In RISKS-21.48, 21 June 2001, I reported on an incident to a Lufthansa A320. The A320 is a "fly-by-wire" aircraft, in that primary control is effected through computers and electrics rather than mechanical means. The captain's (CAP) sidestick controller was miswired during maintenance so that a "bank right" command initiated a "bank left" control signal and vice versa. This was discovered on take-off, when the captain corrected a left wing dip due to turbulent air flow with right sidestick movement ("bank right") and the aircraft's left wing dipped further, reportedly coming within two meters of touching the ground. The copilot took priority control (a feature of the electronic control architecture) and recovered the aircraft. The crew flew up a few thousand feet altitude, familiarised themselves with the problem as best they could, and returned to land the aircraft. It turned out that two wires connecting the CAP's sidestick to one Elevator and Aileron Computer (ELAC), of which there are two, had been reverse-connected during maintenance, and the fault had been discovered neither by post-maintenance check, nor by post-maintenance cross-check, nor by the flight crew's pre-take-off control system check. I had suggested in my Risks 21.48 note that I was puzzled by the partial reports of the incident then available. The final report was published as report 5X004-0/01 in April 2003 by the German Federal Bureau of Aircraft Accidents Investigation (german acronym BFU) and is available in English at http://www.bfu-web.de/berichte/01_5x004efr.pdf Thanks to John Sampson for bringing it to my attention. Salient facts are as follows. During previous flights, one of the two ELACs failed. Maintenance found a defect in the X-TALK-BUS between ELAC Nos 1 and 2, found to be "caused by a bent connection pin (Pin 6K) in the plug segment AE of the socket for the ELAC no. 1." The attempt to replace just the pin failed and it was decided to replace the plug segment AE. There was no suitable spare plug AE for this series of airplane in stock, and the AE segments they had were not compatible with the remaining installed segments, so it was decided to replace all four plug segments AA, AB, AD, AE with a compatible set. This meant that "in a confined space approx. 420 assigned connector pins had to be reconnected." The method chosen was "ONE TO ONE", whereby "the wires were disconnected one after the other from the old plug and immediately connected to the new one." The mechanics used the wrong wiring diagram. How could this be? Well, an airplane and its equipment are identified by serial number (SN). The manufacturer knows what equipment was installed at build. Subsequently, the manufacturer issues Service Bulletins (SB) for modifications to installed equipment, and these SBs have different grades of urgency. Some are only "recommended", for example. So the installed equipment is identified by SN, and further by the log of which SBs have been accomplished. The applicable wiring diagram on p2 of the Airplane Wiring Manual (AWM) contained a designation that said it was applicable to airplanes with an "effectivity range" of 013-018 and those with effectivity 001-012 on which SB 27-1030 had been accomplished. P4 of the AWM was applicable to those airplanes with effectivity range 001 to 034 on which SB27-1030 and SB27-1084 had been accomplished. SB27-1084 had been accomplished, but not SB27-1030, and the aircraft had effectivity 017. Hence p2 was applicable, but the mechanics thought p4 was applicable as SB27-1084 had been accomplished. Each numbered wire consists of a twisted red-blue pair. In segment AE, the "Monitor Channel" is connected by pair 0603. The Control Channel is connected by pair 0597. P2 specifies that these wires must be cross-connected (blue to red, red to blue) between the sidestick and ELAC plug. P4 specifies that these wires must be connected straight through (red to red, blue to blue). Furthermore, in the Aircraft Wiring List (AWL), the twisted pairs are always assigned in the order red, then blue, in the alphanumeric sequence of plug segment coordinates, except for these two wires. Wire 0603 is assigned blue then red to the pins 3C/3D, and wire 0597 blue then red to 15J/15K. Why? The manufacturer wanted to effect a uniform wiring for all its FBW airplanes, and from a certain type series on, the A320 wiring was planned to be identical to that of the A330 and A340. An interchange of colors was accepted for a certain transition period, and this aircraft belonged to the transition series. The BFU report points out that, had only the AE segment been exchanged, only the Monitor Channel would have been falsely connected, and with high probability an error message would have appeared on the cockpit aircraft monitoring display (ECAM). It doesn't say at which point this message would have appeared - at check, at cross-check (both performed only with the right sidestick), or at pre-take-off check (about which I speculate that maybe only the right side stick operation checked again - see last paragraph). The process of reconnecting 400-odd wires was a "major action on the control system", and the manufacturer Airbus requires in AMM 20-52-10 that a continuity check be performed on each individual wire, followed by an operational or functional test of the related function. This action was orally cancelled by maintenance supervisors upon inquiry by the mechanics, the reason being that the functional test to be performed after maintenance would reveal wiring errors. Well, it didn't. It was also required to perform a functional check and a control system check independently of each other. Well, they were performed simultaneously, and the check person "had not been informed sufficiently about the previous work flow", in particular that the reconnected wires had not been measured as required. Further, the control system test and functional test were performed only from the right sidestick, not from both, and a visual comparison check of the control surfaces was not performed. The report points out that the manufacturer's instruction in AMM-27-93-00-710-050 is ambiguous. It talks about how to perform the test with "the side stick". There are two. The mechanics told the investigators that it did not matter which sidestick was used to perform the tests, since "as both ELACs were connected to each other[,] possible faults of the one or the other ELAC would surely be indicated. This statement indicates lacking system knowledge of the mechanics." The cross-check staff member also used the same system documents to conduct his cross-check that were used by the staff member who conducted the first check. Regulations require a second set of documents to be used, to assure independence, which was thereby lost. The BFU points out that reconnecting all 400* wires of the ELAC plug "was connected to a high risk of errors." It also says that "a complicated and complex documentation system which tus is difficult to handle increases the risk of mistakes. The 173 procedural instructions valid for the area concerned contain many cross references making handling considerably more difficult. It was very time-consuming to find out which procedural instructions were relevant to the tasks to be performed." The BFU also points out that quality assurance and monitoring, including checks of the maintenance organisation by the LBA (the german regulator) were inadequate. After starting the engines, the AFTER START CHECKLIST for flight crew apparently only contained the instruction that the lateral flight controls were to be checked for full deflection, but not for the correct direction of deflection. The report illustrates the "causal chain" through the "Swiss Cheese Model" of James Reason. The "holes" that "line up" and allow the accident to happen are: 1. "Quality assurance: insufficient support of the work flow, misinterpretation of regulations"; 2. "Documentation: complex, difficult-to-handle working documents; accomplished works was [sic] misdocumented"; 3. "Mistakes: inverted connection of 2 pairs of wires on ELAC plugs"; 4. "Test procedures: use of incorrect documentation wrongly accomplished tests; severity of action was not kept in mind"; 5. "Flight Operation: Checklist are [sic] insufficient; aileron deflection were [sic] not checked for correct deflection" leading to "Occurrence: "Serious Incident" Aircraft reacts inverted to the input of the left sidestick at the time of the take off". These factors correspond roughly to the statement of causes and contributing factors. In my RISKS-21.48 note, I recounted my puzzlement engendered by the partial accounts then available, on the basis (a) hat the plugs were standardised, and that (b) a mistaken wire-up on ELAC 1 would have caused command signals in the reverse sense to those detected by ELAC 2 and the three Spoiler Elevator Computers (SEC), and I felt this should have been detected by the aircraft monitoring systems. Concerning (a), the report makes clear that the plug wiring was by no means standardised. The airplane belonged to a "transitional series" in which two wiring pairs were to be cross-connected, and the mechanics thought they should be connected straight-through, thanks to confusion over the appropriate wiring diagram. Concerning (b), the control signal discrepancy - ELAC 1 sensing a "bank left" command and and ELAC 2 and the three SECs sensing "bank right" - was not detected by the aircraft monitoring systems and displayed on the ECAM during test because the left sidestick was not tested. However, had CAP checked sidestick deflection during pre-take-off check, this discrepancy would surely have been triggered. Had only the first officer checked the lateral controls, the discrepancy would not have shown. The report says that "according to statements of the crew, this check was accomplished pursuant to the valid procedures." I wonder whether the "valid procedures" require both pilots to perform pre-take-off flight control checks? So the report leaves me still puzzled. If the CAP's sidestick had been moved in the direction of lateral control at any time before takeoff, then the two ELACs would have received contradictory sensor information, and ELAC No. 1 would have received sensor information contradicting that received by the three SECs. I also suppose that both pilots should to perform pre-take-off control checks, since sidestick operation is independent. So we are either to suppose that a standard comparison across multiple channels is not performed by the control system architecture, or else that CAP did not perform a control check before departure and therefore either the pre-take-off checklist procedures omit an important requirement not noted by the BFU, or that the crew lied about performing the check according to procedures. It would have been more satisfactory had the report sorted these possibilities out. Peter B. Ladkin, University of Bielefeld, http://www.rvs.uni-bielefeld.de
Please report problems with the web pages to the maintainer