Opinions vs. Facts in RISKS Reports (re Aviation Accidents) ----------------------------------------------------------- Everyone is entitled to opinions and to facts. Keeping the two distinguishly separated is the basis of good reporting — including the reports/contributions to RISKS. The RISKS readers are best served by being able to tell one from the other, and to tell what is based on opinions/rumors and what on facts. Two examples follow. In RISKS-3.27 Stephen Little reported about "one major accident in which the pilot followed the drill for a specific failure, as practiced on the simulator, only to crash because a critical common-mode feature of the system was neither understood, or incorporated in the simulation." Since this is a very important evidence of "major accident" (with possible/probable loss of hundreds of lives) I tried to follow up on it and offered to pursue this report. The best way to verify such a report is by a reference to the official NTSB (National Transportation Safety Board) accident investigation report. Therefore, I have volunteered to pursue this reference myself if anyone could give me details like the date (approximately), place (country, for example), or the make and type of the aircraft. My plea for this information appeared in RISKS-3.34, on 8/9/1986. In response, one RISKS reader provided me with a pointer to what he vaguely remembered to be such a case. After pursuing the original report we both found that the pilot (Capt. John Perkins, of United Airlines) claimed that [computer based] simulator training helped him and his crew to survive a windshear encounter (not the kind of story the RISKS community finds to be of interest). (The long discussion about the F-16 does not relate to this topic since it was concentrated on what the simulator software should do and what the aircraft software should do, rather than on the fidelity of the simulator and on its training value). If the original report about that computer-induced major accident is based on facts — let's find them, we tried but did not succeed. If it is based or rumors — let's say so explicitly. A more recent RISKS (3.72) has another report, this time by a pilot, Peter Ladkin, who also provides the place and the make and type of the aircraft (just as I asked for). His report says: " An example of a deliberate override that led to disaster: An Eastern Airlines 727 crashed in Pennsylvania with considerable loss of life, when the pilots were completing an approach in instrument conditions (ground fog), 1000 feet lower than they should have been at that stage. They overrode the altitude alert system when it gave warning. " I found it very interesting. The mention of the aircraft type and the location are helpful hints for pursuing such accidents. However, I failed to locate any information about that "Eastern Airlines 727 [which] crashed in Pennsylvania". I (and Eastern Airlines, too) know of only two losses of Eastern Airlines 727's — neither in Pennsylvania. One in JFK to (windshear) and one in La Paz, Bolivia (flying into a mountain, in IFR conditions). However, I know of the 9/11/1974 Eastern Airline crash of a DC-9 in Charlotte, North Carolina — which, I guess, is what Peter Ladkin's report is about. This guess may be wrong. I APOLOGIZE TO PETER LADKIN IF I DID NOT GUESS THE RIGHT ACCIDENT. According to the NTSB accident report (NTSB-AAR-75-9) about the DC-9 in Charlotte: "The probable cause of the accident was the flightcrew's lack of altitude awareness at critical points during the approach due to poor cockpit discipline in that the crew did not follow predescribed procedure." [They were too low, and too fast.] The report also mentions that "The flightcrew was engaged in conversations not pertinent to the operation of the aircraft. These conversations covered a number of subjects, from politics to used cars, and both crew members expressed strong views and mild aggravation concerning the subjects discussed. The Safety Board believes that these conversations were distractive and reflected a casual mood and a lax cockpit atmosphere, which continued throughout the reminder of the approach and which contributed to the accident." What also contributed to the accident is that "the captain did not make the required callout at the FAF [Final Approach Fix], which should have included the altitude (above field elevation)". They also did not make other mandatory callouts. Other possible contributing factors was a confusion between QNE and QFE altitudes (the former is above sea level, and the latter above the field elevation). [This may be the 1,000' confusion mentioned in Peter Ladkin's report.] "The terrain warning alert sounded at 1,000 feet above the ground but was not heeded by the flightcrew" (which is typical to many airline pilots who regard this signal more of nuisance than a warning). Question: What did Ladkin mean by "An example of a deliberate override that led to disaster: ..... They overrode the altitude alert system when it gave warning" ? According to the NTSB they just did not pay attention to it. According to the Ladkin report they DELIBERATELY OVERRODE it, which implies explicit taking some positive action to override it. It is hard to substantiate this suggestion. Not paying attention is not a "deliberate override" as promised in the first line of the Ladkin report, just as flying under VFR conditions into the ground is not "a deliberate override of the visual cues" — it is a poor practice. (The only thing DELIBERATE in that cockpit was the discussion of used cars!) Does this example contribute to the RISKS discussion about "deliberate override"? In summary: Starting from wrong "facts" based on third hand vague recollections is not always the best way to develop theories. Again, the RISKS readers are best served by more accurate reporting. They deserve it. Danny Cohen.
In "New Scientist", 18-Sep-86, C.A.R. Hoare discusses mathematical techniques for improving the reliability of programs, especially life-critical ones. The following somewhat arbitrary excerpts (quoted without permission) include some interesting ideas: But computers are beginning to play an increasing role in "life-critical applications", situations where the correction of errors on discovery is not an acceptable option - for example, in control of industrial processes, nuclear reactors, weapons systems, oil rigs, aero engines and railway signalling. The engineers in charge of such projects are naturally worried about the correctness of the programs performing these tasks, and they have suggested several expedients for tackling the problem. Let me give some examples of four proposed methods. The first method is the simplest. I illustrate it with a story. When Brunel's ship the SS Great Britain was launched into the River Thames, it made such a splash that several spectators on the opposite bank were drowned. Nowadays, engineers reduce the force of entry into the water by rope tethers which are designed to break at carefully calculated intervals. When the first computer came into operation in the Mathematish Centrum in Amsterdam, one of the first tasks was to calculate the appropriate intervals and breaking strains of these tethers. In order to ensure the correctness of the program which did the calculations, the programmers were invited to watch the launching from the first row of the ceremonial viewing stand set up on the opposite bank. They accepted and they survived. ... [1.5 pages omitted] I therefore suggest that we should explore an additional method, which promises to increase the reliability of programs. The same method has assisted the reliability of designs in other branches of engineering, namely the use of mathematics to calculate the parameters and check, the soundness of a design before passing it for construction and installation. Alan Turing first made this suggestion some 40 years ago; it was put into practice, on occasion, by the other great pioneer of computing, John von Neumann. Shigeru Igarashi and Bob Floyd revived the idea some 20 years ago, providing the groundwork for a wide and deep research movement aimed at developing the relevant mathematical techniques. Wirth, Dijkstra, Jones, Gries and many others, (including me) have made significant contributions. Yet, as far as I know, no one has ever checked a single safety-critical program using the available mathematical methods. What is more, I have met several programmers and managers at various levels of a safety-critical project who have never even heard of the possibility that you can establish the total correctness of computer programs by the normal mathematical techniques of modelling, calculation and proof. Such total ignorance would seem willful, and perhaps it is. People working on safety-critical projects carry a heavy responsibility. If they ever get to hear of a method which might lead to an improvement in reliability, they are obliged to investigate it in depth. This would give them no time to complete their current projects on schedule and within budget. I think that this is the reason why no industry and no profession has ever voluntarily and spontaneously developed or adopted an effective and relevant code of safe practice. Even voluntary codes are established only in the face of some kind of external pressure or threat, arising from public disquiet, fostered by journals and newspapers and taken up by politicians. A mathematical proof is, technically, a completely reliable method of ensuring the correctness of programs, but this method could never be effective in practice unless it is accompanied by the appropriate attitudes and managerial techniques. These techniques are in fact based on the same ideas that have been used effectively in the past. It is not practical or desirable to punish errors in programming by instant death. Nevertheless, programmers must stop regarding error as an inevitable feature of their daily lives. Like surgeons or airline pilots, they must feel a personal commitment to adopt techniques that eliminate error and to feel the appropriate shame and resolution to improve when they fail. In a safety-critical project, every failure should be investigated by an impartial enquiry, with powers to name the programmer responsible, and forbid that person any further employment on safety-critical work. In cases of proven negligence, criminal sanctions should not be ruled out. In other engineering disciplines, these measures have led to marked improvement in personal and professional responsibility, and in public safety. There is not reason why programmers should be granted further immunity... ... [1 page, to end of article, omitted]
CP-6 has a further problem when first loaded that was encountered recently at Wilfrid Laurier University. A check is made to ensure that front end processors (FEP's) are up and running, but not that they contain the correct software... the consequence in W.L.U's case was that after loading version C01 for testing and then rebooting C00 software they left C01 software in the FEP's. Unfortunately, this resulted (for whatever reason) in disk record writes being interpreted as disk record deletes. The problem became apparent when using the editor which performs direct disk updates... but its severity was not at first appreciated... the system was brought down very rapidly when it was.... Ian Davis.
The recent notice about title-indexing (article titles must include all important article keywords in their first five words) struck a real chord in me. My current job is maintaining and updating Dartmouth College's automated card catalog. We have a database of over 800,000 records, all completely free-text searchable (EVERY WORD in every record is indexed). We are beginning to suffer storage limitations, and are exploring our options. However, if we tried to suggest anything so restrictive as "five keywords per title", we'd have a revolution on our hands. The instance cited seems to me to be a clear example of shaping the task to suit the tools at hand. Somebody out there ought to be ashamed of him/herself. At the very least, the notice explaining why articles' titles must be rewritten should have been 1. Extremely apologetic and 2. Should have given a time by which this temporary limitation would no longer apply. As it stands, the system sounds as if it is going to be less useful than some of the available conventional journal indexes — what incentive does this give for using it? Tsk, tsk.
> From: firstname.lastname@example.org > Small, straightforward problems with very little complexity in the > logic (e.g., just a series of mathematical equations) may not say much > about the reliability of large, complex systems. And there, of course, lies the heart of the structured programming movement. You improve reliability by reducing the complexity of program logic. You turn a large, complex system into a small, straightforward system by building it in layers, each of which makes use of primitives defined in the layer below. The reason it may not be as effective as many have hoped is that even simple, straightforward programs often turn out to have bugs... scott preece, gould/csd - urbana, uucp: ihnp4!uiucdcs!ccvaxa!preece
> From: hplabs!sdcrdcf!darrelj@ucbvax.Berkeley.EDU (Darrel VanBuer) > The thing is software DOES wear out in the sense that it loses its > ability to function because the world continues to change around it... ---------- That's like saying "People do live forever in the sense that some of their atoms linger." The sense you depend on is not in the words you use. "Becoming obsolete" is NOT the same thing as "wearing out." The word "wear" is in there for a reason. Software does not suffer wear (though storage media do). The only exception I can think of would be demonstration packages that self-destruct after a set number of uses. Words are important; if you smear their meaning, you lose the ability to say exactly what you mean. This is a risk the computing profession has contributed to disproportionately. scott preece
The recent discussions on manual overrides for airplane landing gear and car brakes have all been ignoring a fundamental issue: To compute the expected cost/risk of having/not having an automated system, you need more than just a few gedanken experiments; you need some estimates of the probabilities of various situations, and, in each of those situations, the expected costs of using or not using the automatic systems. Here's a simple, well-known example: Some people claim they don't wear seat belts because, in an accident, they might be trapped in a burning car, or one sinking into a lake. Is this a valid objection? Certainly; it COULD happen. But the reality is that such accidents are extremely rare, while accidents in which seat belts contribute positively are quite common. So, on balance, the best you can do is wear seat belts. Of course, if you are in some very spe- cial situation - doing a stunt that involves driving a car slowly across a narrow, swaying bridge over a lake, for example - the general statistics fail and you might properly come to a different conclusion. In the United States, how many people regularly drive on gravel roads? Per- haps for those relatively few who do, an override for the automatic brake system, or even a car WITHOUT such a system might make sense. Perhaps the costs for all those people who almost never drive on gravel roads can be shown to be trivial. There certainly ARE costs; every additional part adds cost, weight, something that can break; plus, there's another decision the driver might not want to be burdened with. And there are "external" costs: An uncontrolled, skidding car could easily injure someone besides the driver who chose to override the ABS. Accidents in general are fairly low-probability events. As such, they have to be reasoned about carefully - our intuitions on such events are usually based on too little data to be worth much. Also, since we have little direct expe- rience, we are more likely to let emotional factors color our thinking. The thought of being trapped in a burning or sinking car is very disturbing to most people, so they weight such accidents much more heavily than their actual probability of occurrence merits. It's also worth remembering another interesting statistic (I wish I knew a reference): When asked, something like 80% of American male drivers assert that their driving abilities are "above average". Given such a population of users, there are risks in providing overrides of safety systems. — Jerry
> ..... Yet, perhaps such vehicles should have a switch to disable > anti-lock and allow conventional braking. Imaging trying to stop quickly > with anti-lock brakes on a gravel road... But the whole point of anti-lock brakes is to avoid skidding when traction is lost. If the vehicle skids, it'll hit the cow. Overrides, as has been said before, allow incompetent operators to substitute their opinions for facts. Brint
Chuck Fry's argument for override provisions in automated controls on cars makes a lot of sense. Frankly, though, I'd rather see as few new automatic controls as we can manage with. I live in the Buffalo area--heavy industry with cobwebs on it--and people here are driving cars that ought to have been junked last year. Airplanes get first-class maintenance, or at least second-class. With cars it's different; when something breaks, many people just can't afford to have it fixed. The simpler a car's design, the longer a poor man can keep it running safely. Maybe I'm being cynical, but I believe that so simple an improvement as putting brake lights on rear windshields will prevent far more accidents than any amount of intermediary computerization. [Since deregulation, you might be surprised that the airlines like everyone else believe in cutting expenses to the bone. Maintenance may or may not be what it was. I have seen several reports that it is not, although it is certainly nowhere near so bad as with autos. PGN]
Please report problems with the web pages to the maintainer