New York Times Reports Shuttle Software Patch Excerpted from the New York Times 5-5-89 edition by Henry E. Hardy [...] In checking the Magellan's control systems two weeks ago, engineers detected and corrected one potentially catastrophic problem. A design flaw was found in the software for the spacecraft's computer. If the craft were to lose its proper orientation to the Sun and the Earth, the flaw could have prevented the spacecraft from regaining its bearings. The result could have been the loss of the spacecraft, as it failed to get enough solar energy to run its electronics and thus could no longer keep its antenna pointed at to Earth. Project officials said engineers devised a "patch," a substitute set of instructions, to override the design flaw. John H. Gerpheide, the Magellan project manager at the Jet Propulsion Laboratory in Pasadena, Calif., said: We're convinced that we've got a good fix on the problem. The fix has been tested and thoroughly reviewed. We don't have any concern at all." Errors in computer instructions were said to have been the cause of the failure of Phobos 1, an unmanned Soviet spacecraft, as it was headed to Mars last September. The spacecraft tumbled out of control. And the craft, unable to keep its solar panels pointed to the Sun, ran out of electricity. The companion craft, Phobos 2, made it to an orbit of Mars and then failed as it was maneuvering to drop scientific instruments on the tiny Martian moon Phobos. Soviet scientists who were here to view the launching of the Atlantis said the cause of the Phobos 2 loss was still unclear.
Here is a NASA press release on a self-diagnostic *IN FLIGHT* maintenance system. I am sure this would be useful in tracking down problems but I can only imagine the problems that could arise from trusting such a system. "But the computer said it was fixed" -David Robinson NASA FLIES FIRST AIRCRAFT SELF-DIAGNOSTICS SYSTEM ( RELEASE: 89-69 ) In the first flight in a joint NASA/USAF program that promises self-repairing flight controls and lower maintenance costs in future aircraft, computers aboard the NASA Ames-Dryden F-15 Flight Research Aircraft were able to correctly identify and isolate in flight a simulated failure in the flight control system. Flight control system failures can and do occur during flight. When this happens, costly ground maintenance diagostic tests are conducted to try to identify the failure so that appropriate corrective actions may be taken. In many cases, the failure cannot be identified during ground tests because the actual flight conditions are not duplicated. With the new expert system technology, failures can be identified and isolated before landing and be fixed immediately. The first simulated failure was an angle-of-attack sensor. The maintenance diagnostic system correctly identified the failure and isolated the problem. Future tests will incorporate other failures. "This is a real breakthrough in flight control system maintenance diagnosis for future aircraft," says F-15 Flight Research Aircraft Project Manager Dr. James Stewart. "Newer digitally-controlled aircraft are more complex. However, digital controls allow this type of computer programming which will reduce the maintenance cost of future digitally-controlled aircraft." The maintenance diagnostic system is the first technology to be tested in the Self-Repairing Flight Control Program. The other technologies, scheduled to begin flight tests this fall, include failure detection, identification and reconfiguration of the flight control system. An example of the need for reconfiguration is when a tail surface fails in flight. The flight control system will be reconfigured (repair itself) so that other surfaces take over the function of the failed tail surface. Also, a pilot alert system will tell the pilot what the problem is and what the new configuration and flight envelope are after the system has self repaired. This program is being conducted by NASA's Ames-Dryden Flight Research Facility, Edwards, Calif., and is sponsored by the Air Force Wright Research and Development Center, Wright-Patterson Air Force Base, Ohio. The prime contractor, McDonnell Aircraft Company, St. Louis, Mo., with the General Electric Aircraft Control System Division, Binghampton, N.Y., designed and developed the maintenance diagnostic system for use in the NASA program.
I found the fuss about Northrop's statement that the first B-2 will be a production aircraft to be unjustified. The mere fact that they say it is not a prototype does not mean that it is not a prototype, My dictionary defines "prototype" as a first or early example of something. Nobody is claiming that the first will be the second or a late example of the B-2. Whatever they call it, the risks will be the same. Some organisations and spokesmen will use any excuse to stir up a controversy. David L. Parnas
Software engineers please do not be misled. It is not the generation nor even the design that requires a prototype. The prototype is part of the design. The design is not complete without it. All assertions to the contrary are fallacious and dangerous. Engineering of any kind is risky enough without this kind of foolishness. William Hugh Murray, Fellow, Information System Security, Ernst & Whinney
There is a subtlety here that people not intimate with military aviation may not appreciate. This is not as big an innovation as Northrop is claiming. It has been common for quite some time to build even the first one of a new aircraft in "production" tooling. Although it does save a lot of time, it also contributes greatly to realistic testing. Things like production processes affect the result; it simply is not possible, in practice, for a hand-built prototype to accurately represent production hardware. Proper testing requires hardware built with production tooling. This practice started with the USAF's "Cook-Craigie plan" techniques in the 1950s. (Many of the critics quoted in the Philadelphia Inquirer article are making fools of themselves because they don't understand this.) Doing this also helps a lot when one wants to get production moving rapidly after testing; part of Cook-Craigie is a scheme whereby early stages of production ramp up fully while later stages concentrate on getting the first few aircraft out the door, the hope being that any modifications that are needed will not affect the early stages badly. Inevitably, this sort of thing involves risk that production tooling will need to be torn up and revised because testing finds problems, and that half-built aircraft may need expensive revisions or even scrapping. Efforts are made to get the thing right the first time, and to get good test results as quickly as possible. Sometimes it works well; sometimes not. As in other such production innovations, after early successes there was a tendency for later projects to get the outline right — first aircraft built in production tooling, first stages of production rolling early — while forgetting important unorthodox details like the emphasis on intensive early testing. The result is failures, which tend to be blamed on bad luck or the inherent difficulty of the problem rather than on bad management. (Another production innovation which suffered the same fate was concurrency: designing all the pieces of the hardware simultaneously, relying on good interface documentation to make sure they all work together. When it works, it gets hardware out the door much sooner than step-by-step methods. It worked well for the early ICBM programs because (a) they consistently funded multiple parallel development efforts for anything deemed risky, and (b) they didn't choose between them until hardware was available to be tested. Many later programs adopted concurrent development without these important (and expensive) details, the result being a lot of failures.) The ultimate end product of remembering the successes but forgetting the details is what's going on at Northrop: the conviction that it's possible to get everything right the first time, so no modifications will be needed and full-scale production can start immediately. That *is* folly, but not because there aren't any prototypes. Henry Spencer at U of Toronto Zoology
Rich Neitzel's anecdotes do not justify his conclusion. In every case that he mentioned there were standards that were clear enough and substantive enough that, in his opinion, they were not met by the products in question. We have no such standards in software. He points out that we will never eliminate fraud and incompetence. The fact that there will always be people who cheat does not mean that we should not have standards. Au contraire! Dave Parnas
"A bad standard is better than nothing. It gives you something to violate." [A quote from the Hammer Forum, 1986] "Standards are like motherhood: They should not come too soon, and there should be an identifiable father." [CAPT John Nichols, USN, 1968] Gordon Bell said it best in his seminal essay, "Standards Can Help Us," IEEE COMPUTER, June 1984, pp. 71-78. Serious readers should review that article. The practical application of the tongue-in-cheek advice from Hammer '86, and from Nichols and Bell, is that at the least, standards, even flawed ones, give us the basis for further discussion, based on something more than just personal taste. I remember discussing computer system reliability with representatives of UNIVAC, early in this decade. These capable folks had arguably one of the best systems of that era - the UNIVAC 1100 series. Its ancestors had succeeded despite a history of frequent crashes, which were then common in the industry. Newer systems, with designs unhampered by demands for "backwards compatability" with older models, and free to use newer technology, were beginning to demonstrate an order of magnitude more reliable performance; e.g., one crash a quarter instead of one per week. (Actually, the old systems were worse than that, and the new ones better.) I lost that argument, because the others were convinced that "bigger means less reliable." An analogy about jumbo jets vs. smaller plans did not help. It was only when the competition [viz., IBM] introduced the 308x series, with its multi-megabyte diagnostic code, and MTBF in months vice hours, that the argument was over. I'm not suggesting a "standard for reliability." I am saying that the effort to make standards can help us appreciate the diversity of our needs, and the adoption of standards can raise our level of expectations.
Generally rather similar to the risks involved in using other "standard" tools like compilers, assemblers, text editors, front-panel switches :-), operating systems, etc.: there is always a chance that the tool has not been fully tested and will do the wrong thing silently, or that it will not catch user errors that it is supposed to catch. >Is it reasonable to set some criteria ... Lengthy use tells you something about the average density of bugs in the code, but won't necessarily tell you about the one bug that's in precisely the wrong place. Thorough validation suites are better, although rarer. Better yet are validation suites for the *application*, ones which do their best to stress its components. (Note, this is not the same as "black box" validation suites written with no knowledge of said components.) The fact is, even well-proven tools can have obscure bugs lurking in them. Case in point: the C compiler in V7 Unix, an unambitious compiler written by a very good programmer and exhaustively shaken down by widespread use, had a bug in its 32-bit-divide routine that was not found until people — specifically, some of my users — stumbled over it. The code made some assumptions about the hardware that were true, at least most of the time, of older pdp11 processors but not of the new 11/44 we had. The most interesting part was that my fix for the problem appears to have also cured some much rarer misbehavior found even on older processors. The values returned by that routine may have been wrong, occasionally, all along. One simply cannot afford to place implicit trust in *any* of the tools used to build a sensitive application. As with "end to end" arguments in networking, to be sure that the final product is right, one must test it directly and not rely on trusted tools. Henry Spencer at U of Toronto Zoology
About a month ago, I moved to my office, in my new job. A sun 3/140 workstation was on my desk. Being more of a thoeretical person, I barely touched it. The first day at work hadn't finished and I had a strong discomfort in my left eye. When I went home, I had to take my contact lenses out immediately. Next day at work, the discomfort became even stronger. I have been wearing contact lenses for seven years now but had never felt something like that before. This continued during the whole week. On Saturday, I was surprised to discover that the discomfort had almost gone. On Sunday, I was feeling perfect. This prompted me to think that it had something to do with my work. The arrangement on my desk is as follows (view from the top) : |----------------||----------------------------------| | ---------- || | | : : || | | : SUN : || My desk | | : : || | | ---------- || o o | | |keyboard| || | | | |----------------||-----------|----------|-----------| \===<>===/ <-- Me ! ---------- As you can see, (you should be able to :-)) my head is in the path of the stream of air coming from the ventilation holes (:) of the sun. This air, not only has a large enough speed (if you put your hand in front of my face, you can sense the draft) but also is dryer, hence has lower relative humidity, than the rest of the air in the room. This means that it dries my contact lenses a lot faster than usual. I think this was the cause of the discomfort. In order to try my hypothesis, on Monday morning, I blocked the ventilation holes, using a triangular calendar that was on my desk, as follows (front view, as if you are sitting in front of the computer) : |------------| | | | Screen | | | |------------| -------------- ____ <--- One sheet of the calendar |----------------| ^ : SUN CPU : / \ <-- Calendar |----------------| / \ As you can see, the draft is now redirected sideways, away from me, and I don't have any problems anymore. The Sun should be O.K. since the one inch minimum ventilation clearance, required by the manufacturers, is satisfied. Of course, you could argue that if I was hacking away on my sun, the draft would not fall on me, which makes this a RISK of NOT using a computer :-). Periklis Andreas Tsahageas European Computer-Industry Research Centre Arabellastrasse 17, D-8000 Muenchen 81, West Germany +49 (89) 92 69 91 09 Europe: email@example.com ...!unido!ecrcvax!periklis
Not only is it possible to pop off the grey cover and use the diagnostic modulars that are commonly found on the sides of houses (or the nearest telephone pole), but it is also possible to access nearly every line in an area of a city fairly easily. Simply find one of the grey boxes usually at the base of a phone line tpole where the cable switches from above ground to below ground. These junction boxes will usually have x0,000 thousand twisted pair lines connected together and hanging over a metal bar. Simply pick one, and patch in w/a pair of aligator clips. This can also be done inside of the black covers hanging on telephone poles. If you really want to create some havoc, cut a couple of twisted pairs in really hard to reach places... On a related note, a story: Our neighbor's hired some people to come in and clean their house. It was done in a rather odd fashion; you call a friend of the cleaners who then told the person that someone called for them. The cleaner would then return the call-- but would never have a number that they could be reached at. No big deal, they were using a pay phone.... wrong. The person had a phone w/ aligator clip instead of RJ-11 male connnector in a bucket. Acting like they were pruning the bushes, they would tap a neighbor's line and call whenever they needed.... It really does happen!
Whenever my local telephone company (Illinois Bell) installs new service or alters existing service they move the telephone network interface outside. They do this to simplify their access for testing. When I had a second telephone line installed at my home, the installer was about to replace the existing network interface in the basement with a gray box outside. It took me some time to convince the installer to put the network interface inside. Mike Akre, AT&T, Lisle, IL
I have just discovered that our local Electricity Supply Company is using PC's and now even X-windowed VMS & Unix systems to bring circuits up and down: an iconic display allows a mouse-click and a keyboard confirm to activate a circuit breaker, through comms links to the grid. Apparently NO physical token exchange is used between linespeople and controllers: A verbal confirmation, coupled to somebody watching the breaker come in or out is all that is used. Perhaps I'm being too paranoid, but If I were a linesman I'd want to see the key for that segment in my hands before I climbed the tower, much as permanent way crews do (used to?) for track repairs, or train drivers for bi-directional single-line working. Should automated systems maintain physical key/token exchanges from the past? is there an electronic "equivalent" that could be used instead? On the plus side, They're using Radar scans of lightning strikes and the PC network to offer some predictive services: they try and direct line-crews to be on the alert *before* a storm reaches their section. On the whole I think the use of computers, especially bitmapped displays is beneficial in this area: they can condense a lot of information into one screen, in a simple and intuitive form. Of course, providing some active control has an "inverted" effect: simple mistakes can propagate out into catastrophies. George Michaelson, Prentice Computer Centre, Queensland University, St Lucia, QLD 4067
Last week, I went to our local bank to do a money transfer to an american bank. Though the clerk had a terminal on her desk, she fetched a typewriter and had to type everything on a paper form. This surprised me a bit as the bank says they are very up to date with technology, but that's something else. I gave her a piece of paper with all the addresses on the one side (recipient and bank), and on the other side, there was the bank routing number and the account number. I thought she had done everything all right, took my copy of the form, paid the thing and went home. When I came home, the bank had already closed as it was 18.25. I took the form out and discovered that she had not filled in the bank routing number. This made me curious and I took out my little paper and checked the form... BINGO! I found out that she had written the address of the recipient correctly, but had written the zip code of the bank as the account number and no bank routing number, except: Household Bank, Columbus, Ohio. Next morning I ran to the bank and told them about this. They immediately called their headquaters as all those forms are sent of after the bank closes. Luckily, they could take the form out of the stack and everything was all right. The clerk said :"Oh, there was something on the back side, too? I only read the front side.". She didn't even notice! But imagine if the computer had done this immediately... either the computer would have rejected the input or I could have paid the whole thing twice.. Konrad Neuwirth, Fernkorngasse 44/2/4, A-1100 Wien, Austria
Please report problems with the web pages to the maintainer