From SF Chronicle wire services, 28 Aug 1986: White Sands Missile Range NM A rocket carrying a scientific payload for NASA was destroyed 50 seconds after launch because its guidance system failed... The loss of the $1.5 million rocket was caused by a mistake in the installation of a ... resistor of the wrong size in the guidance system. "It was an honest error", said Warren Gurkin... "This rocket has been a good rocket, and we continue to have a lot of faith in it." Saturday's flight was the 27th since the first Aries was launched in 1973, and it was the third failure.
Computer-Aided Engineering, Penton Publishing, Cleveland OH, April 1986 page 4: "Impressive computer analysis, however, may tempt some engineers into developing designs that barely exceed maximum expected operational loads. In these cases there is no room for error, no allowance for slight miscalculations, no tolerance for inaccuracy. In engineering parlance, the design is "close to the line". The reasoning, of course, is that relatively small safety factors are justified because computer analysis is so accurate. The major flaw in this logic, however, lies in the fact that the initial mathematical model set up by the designer may itself contain gross inaccuracies... These errors are carried through the entire analysis by thecomputer, of course, which uses the model as the sole basis for its calculations... And wrong answers are easily obsuured by flashy color graphics and high-speed interactive displays. In most cases, the engineer must be extreamly familar with the design and the programs used in its development to spot errors in results." -John K. Krouse editor Annette C. Bauman, DDN-PMO Test & Evaluation Branch, DDN Network software Test Director
>....Their dispensing machines cannot be cheated in this way, because they have >a steel door in front of the machine which does not open until you insert a >valid plastic card. People who swindle ATM's don't have cash cards????? ATM swindle's don't seem to have caught on in the UK too much yet (at least not that I've heard), but the new "vandal proof" phone boxes which have special money compartments seem to be rather more vulnerable. I have heard reports of people touring regions of the UK on a regular basis emptying these phones. Another interesting scam at the moment (which I presume has swept the US long ago....) and which is not illegal is that of beating quiz machines. Teams of 3 "experts" (sport, TV/film and general knowledge usually) tour pubs and play the video quiz machines. These have money prizes and they simply strip them of everything in them by answering all the questions. Most landlords are now removing these games as they are losing money.......
John Ellenby (of Grid systems) told me that they installed just such a thing into an operating system they were building and used it to distinguish between the various operators who used the console. The operators never could work out how the system "knew" who they were. (I may say that I am not totally convinced however - particularly in a non-keyboard oriented society such as the UK where very few people can actually type properly.)
Nancy Leveson's comment (on PGN's comment on human error in RISKS-3.43) makes some very good points. We do need to discuss the terms we use to describe the various ways systems fail if only because system safety and especially software safety are fairly young fields. And it seems natural for practitioners of a science, young or not, to disagree on what they are talking about. (Recall the discussion a few years ago in SEN on what the term "software engineering" meant and whether what software engineers did was really engineering.) But what scientists say in these discussions about science may not be science, at least in the sense of experimental science — it's more like philosophy, especially when the talk is about "causes". Aristotle, for one, talked a lot about causes and categories. When we are urged to constrain our use of "cause" ("Trying to simplify and ascribe accidents to one cause will ALWAYS be misleading. Worse, it leads us to think that by eliminating one cause, we have then done everything necessary to eliminate accidents (e.g. train the operators better, replace the operators by computer, etc.)"), we are being given a prescription, something value-laden. (I don't mean to imply that science is or should be value-free.) The implication in the prescription seems to be that we (those interested in software and system safety) should avoid using "cause" in a certain way otherwise we are in danger of seducing ourselves as well as everybody else not specifically so interested (the public) into a dangerous (unsafe) way of thinking. But a way of supplementing the philosophical or prescriptive bent to our discussion about the fundamental words is to look at how other disciplines use the same words. For example structural engineers seem to be doing a lot of thinking about what we would call safety. They even say "Human error is the major cause of structural failures." (Nowak and Carr, "Classification of Human Errors," in Structural Safety Studies, American Society of Civil Engineers, 1985.) It may be that our discussions about the basic words we use can be helped by consulting similar areas in more traditional types of engineering. There is another prescriptive aspect to the subject of constraining our discourse as raised by Nancy, namely not admitting into that discourse statements from certain sources. ("Also, the nature of the mass media, such as newspapers, is to simplify. This is one of the dangers of just quoting newspaper articles about computer-related incidents, When one reads accident investigation reports by government agencies, the picture is always more complicated.") Our thinking about this prescription may also benefit from looking at other engineering disciplines to see how they investigate and report on failures and what criteria and categories (the jargon word is "methodology") they use, implicitly or explicitly, in assigning causes to failure. "Over-simplified" might be the best adjective to describe some of the contributions to RISKS from newspapers-- one doesn't know whether to believe them or not. A problem may arise when writers on safety start to quote SEN and the safety material collected there, most of which is previewed here on RISKS, as authoritative sources on computer and other types of failures. The question is whether SEN's credibility is being lessened or the newspaper's enhanced by the one being the source for the other. Compare some of the newspaper stories reproduced on this list with the lucidity and thoroughness of Garman's report on the "The 'Bug' Heard 'Round the World," (SEN, Oct. 1981). That seems a model for a software engineering analysis and report of a failure. We might compare it to other thorough engineering analyses of failures, say the various commissions' reports on Three Mile Island or the NBS (no chauvinism intended) report on the skywalk collapse at the Hyatt Regency in Kansas City. (The report of the Soviet government on Chernobyl will perhaps bear reading, too.) If we evolve some kind of standard for analyzing and reporting system failure, we'll be able to categorize the trustworthiness of newspaper and, for that matter, any other failure reports so that their appearance on RISKS will not necessarily count as an endorsement, either in our own minds or in that of the public. Ken Dymond, NBS
Nancy Leveson's... (ad infinitum?) [but not quite yet ad nauseum!] From Alan Wexelblat's comment on my comment on ... (RISKS-3.44): >... she denies that there are "human errors" but believes that >there are "management errors." It seems that the latter is simply >a subset of the former (at least until we get computer managers). With some risk of belaboring a somewhat insignificant point, after reading [Alan's message], it is clear to me that I did not make myself very clear. So let me try again to make a more coherent statement. I did not mean to deny that there are human errors, in fact, the problem is that all "errors" are human errors. I divide the world of things that can go wrong into human errors and random hardware failures (or "acts of God" in the words of the insurance companies). My real quibble is with the term "computer errors". Since I do not believe that computers can perform acts of volition (they tend to slavishly and often frustratingly follow directions to my frequent chagrin), erroneous actions on the part of computers must either stem from errors made by programmers and/or software engineers (who, for the most part, are humans despite rumors to the contrary) or from underlying hardware failures or a combination of both. I suppose we could also include operator errors such as "pushing the wrong button" or "following the wrong procedure" as either part of "computer errors" or as a separate category. The point is that the term "computer error" includes everything (or nothing depending on how you want to argue) and the term "human error" includes most everything and overlaps with most of the computer errors. And the term "computer error" is also misleading since to me (and apparently to others since they tend to talk about human errors vs. computer errors and to imply that we will get rid of human errors by replacing humans with computers) it seems to imply some sort of volition on the part of the computer as if it were acting on its own, without any human influence, to do these terrible things. That is why I do not find the terms particularly useful in terms of diagnosing the cause of accidents or devising preventative measures. I was just trying to suggest a breakdown of these terms into more useful subcategories, not to deny that there are "human errors" (in fact, just the opposite). And in fact, to be useful, we probably need to further understand and subdivide my four or five categories which included design flaws, random hardware failures, operational errors, and management errors (along with the possibility of including production or manufacturing errors for hardware components). Note that three out of the first four of these are definitely human errors and manufacturing errors could be either human-caused (most likely) or random. Actually, I thought the part of my original comment that followed the quibbling about terms was much more interesting... Nancy Leveson ICS Dept. University of California, Irvine
Please report problems with the web pages to the maintainer