Around 6PM on Tuesday, May 20th the Galaxy IV satellite's "onboard control system and a backup switch failed." The satellite reportedly provided pager service to 80%+ of US customers, and also carried NPR, several television networks, and Reuters news feeds. A spokesman for PanAmSat (which owns the satellite) has said contact does exist with the satellite but has not said how soon service can be restored- apparently one option is to use a backup satellite already in orbit, but it could take several days to reposition it. This is a single-point-of-failure case with a twist. Although Galaxy IV reportedly had a backup system it failed, which might make one wonder how thoroughly backup systems are tested. If 80%+ of pagers in the US have been affected, this is quite an egregious case of SPOF. On the other hand, from the point of view of PanAmSat, adding points of failure for services in space is not the same as doing so for services on earth. As for pager companies, given that CBS quickly switched to Galaxy 7, perhaps their backup plans were not as robust as they could have been. http://www.cnn.com/TECH/space/9805/20/satellite.outage/ Roy Rodenstein, email@example.com Future Computing Lab, GVU Center, Georgia Tech [Also noted by many others. This case certainly brings up a lot of issues discussed here previously. The event occurred just as my plane was leaving Dulles after my Senate testimony on infrastructural risks and vulnerabilities <http://www.csl.sri.com/neumann/senate98.html>! PGN]
It will take some time to gauge the complete impact of the loss of a communications satellite Galaxy IV. Some technical details of the narrow locus of failure are available at PanAmSat (http://www.panamsat.com). The event will certainly be expensive if control of the satellite cannot be reestablished. But more difficult to calculate are the larger costs of the disruption of service. Particularly troubling is the loss of paging services used for hospitals including (according to news reports) Johns Hopkins and others. Radio news reports suggested that hospitals and other services were retreating to simple telephone communications. It is unlikely that such a retreat would be feasible. The widespread use of paging systems has produced a highly distributed system that is nearly entirely dependent on these methods of establishing communications. Work patterns themselves are now distributed in hospitals in ways that make operation without paging systems effectively impossible. Some hospitals in which I work no longer have any overhead voice paging capability at all and the huge volumes of paging for virtually every possible service and activity make even those archaic remnants of a previous time woefully inadequate. The outage points out the subtle dependencies that flow from the applications of larger scale, highly integrated technologies, especially the boundaries of failure. The benefits of these modern systems are largely in efficiencies of work that they permit. When they fail, the impact can extend in space and time far beyond the obvious, first order 'edges' of the system. Hence, the term coined above: the "narrow locus of failure". This is the technical system and its well defined and readily apparent functions. In this incident, the NLF is the satellite and the pagers and the gasoline pumps and the television nets, and so forth. But the larger locus of failure here would include all the subsequent and serial effects. These are especially difficult to describe completely because of they are so distributed and diverse. Predecessor systems were not robust, of course. Hard wired telephones failed and the local doctor-in-the-hospital [not for nothing were they called `residents'!] could be overwhelmed by the demands of a single ward or floor. What was distinct about these earlier systems was the narrowly restricted boundaries of their failure. The failure of a local telephone system or a local resident was limited in scope. Not so with the new systems. That this sort of failure occurs shoudl give us real pause as we consider the various proposals to use advanced technology for medical applications. There are a host of proposals to use advanced information technology for electronic medical records, for so called 'telemedicine', for remote robotic surgery, for drug dispensing and monitoring, for home infusion, etc. etc. These applications are always touted by proselytizers as advances in quality or capability to be employed for improved performance. But the history of technological applications shows that these advances are exploited mainly to achieve greater efficiencies in production, and that the gains are quickly eaten up by this use. That is to say, the advances are exploited not to achieve a more robust system but a more efficient one -- and one with the potential for large scale, catastrophic modes of failure. The example of Galaxy IV is a demonstration of the potential of such highly technical systems. Richard I. Cook, MD, Cognitive Technologies Lab., Dept of Anesthesia and Critical Care, University of Chicago, Chicago, IL 60637 1+773-702-5306
The U.S. Navy, facing pressure from Congress to cut spending, is maintaining its cutting edge by replacing expensive custom-built systems with off-the-shelf products. "If we insisted on military specs, we'd be a generation behind, and they'd cost twice as much," says the intelligence officer on the USS Coronado. The new strategy, called IT-21 or Information Technology for the 21st Century, is the brainchild of the Pacific Fleet commander-in-chief Adm. Archie Clemins. "If you use proprietary systems, you can never stay current with technology," says Clemins. Another advantage is a shortened learning curve: "Everybody knows how to use the technology so training costs are way down." In addition, using off-the-shelf systems makes it a lot easier to coordinate joint operations with U.S. allies. "Proprietary computers were too expensive for our coalition partners." The only downside is that the Navy may be losing some of its computer brain power to the private sector: "Our people are very valuable in the commercial world," says a spokesman. (*St. Petersburg Times*, 18 May 1998; Edupage, 19 May 1998) I am not quite sure what the phrase in the Educom headline "off the shelf PC", but I certainly wish that the Navy is not trusting weapon control or cruise control to Windows 95. Come to think of it at least the quote from the Naval personnel doesn't include "PC". But then, "Everybody knows how to use the technology..." suggests a well-known brand of software products. I shudder to think that Win95 is used to control real-time embedded systems and such... Chiaki Ishikawa firstname.lastname@example.org.NoSpam Personal Media Corp., Shinagawa, Tokyo, Japan 142
Well, Bob Frankston [19.73] is risking his life again, but not by flying.... Let's start out calmly. He's deluding himself if he thinks he should feel safer on a plane with a handheld GPS rather than a sextant; or, one takes it, a working compass. Four crucial points are: 1. A `magnetic direction indicator' is one of three required air data instruments on board every airplane, per FAR 91.205(a). The two others are altimeter and airspeed indicator. This requirement has been valid, and almost unquestioned, for a half century. These are three highly-reliable instruments with few failure modes, most (but not all) of which are visible or testable. 2. A compass has very few failure modes compared with that of a GPS, and multiple compasses have only one non-visible common failure mode, namely when flying in a heavy magnetic field or near either of the two Poles. GPS has failure modes that are not easily inspectable; e.g. 80-mile position shifts, and any signal disturbances, signals themselves having multiple failure modes. See recent Risks, and the comment by O'Connell, below. 3. A GPS and a compass have different functions. It's hard to keep the wings level on an aircraft with total power failure using a GPS (but it can be learned, according to Jim Wolper); you can easily with a compass (if it turns, your wings aren't level). Keeping wings level is essential for flying an airplane under instruments. Similarly, you fly a heading with a GPS by extrapolating; it's harder to tell from the GPS how well and accurately you are flying the heading, but this is essential information; radar vectors are also based on headings; ATC does not do it by GPS fixes. A GPS tells you where you are; a compass does not. 4. The DC-10 has two compasses, not three as Frankston suggests. If one is not functioning, then the aircraft is left with only one. That is, no backup; remember, this is one of the three essential air data instruments. Now upping the shrill level: concerning point 2 I'm flabbergasted that Frankston apparently thinks a handheld GPS is somehow equivalent in safety to a working compass or a sextant. You can't jam a sextant or a compass because there are no signals to jam, and you can tell by inspection if your sextant is working. Celestial navigation works (at night!) where other things don't (as noted by Dave Alexander). Points 1-4 may be well illustrated by example. Suppose the DC-10 suffered a total electrical failure (say, like the Martinair B767 a couple of years ago -- see my `incidents' compendium). Suppose also that the East Coast suffers a power failure sufficient to take out the radar (check out the 200+ hours outages suffered by En-Route ATCCs in the US from Sept94-Sept95, the NTSB report concerning which appears in my compendium). Suppose also that GPS signals were reported to be jammed (see Risks-19.71 and earlier. Afficionados will notice that they don't have to *be* jammed, they just have to be *reported* to be unreliable). Could Frankston now explain how the DC-10 is supposed to keep flying in a straight line, let alone to a particular identified area of the country, without a working compass (or sextant :-)? If his conversation with the crew didn't enlighten him, he might be advised to pick another airline. There's probably an equal chance they were pulling his leg. I would probably have wanted to pick another airline anyway - I'll admit to a superstition about any DC-10 with a `cargo door problem' (this particular story is a classic for engineering ethics courses, and there's a source book on it). Another superstition is flying in V-tail Bonanzas in light-to-moderate turbulence at over 125kts. I won't admit to any others. A British pilot told me that most of his colleagues assume that IRS or radio fixes will be available; therefore it is not standard company policy to check the standby compass; and that the only couple of pilots he knew who always did check were ex-727. As Charlie Brown would say: Good Grief! Peter Ladkin email@example.com http://www.rvs.uni-bielefeld.de
PGN graciously (or desperately) allowed me to compile the comments to Bob Frankston's article, which include Frankston's reply and (editor's perquisites) a pontifical postscript from me. Alexander McClellan <firstname.lastname@example.org> pointed to an article on Transport Canada's WWW site: (Aviation Safety Letters, Issue 3/97, "But I Could Hit a Hill..." ) http://www.tc.gc.ca/aviation/syssafe/asl-397/english/butico~1.htm which says: "First of all, GPS is not infallible. As we've said many times in the past, GPS satellites can transmit faulty signals and, unless you have an installation certified for instrument flight rules (IFR) flight, you won't be warned. Faulty satellites have caused 80-mi. position errors in the past. Even if you have an IFR box, there will be times when there just won't be enough satellites to navigate. What if this happens at a critical point in your flight when the visibility is too poor to map-read?" Transport Canada was talking about hand-held GPS boxes. Henry Spencer <email@example.com> commented on the reliability issue for commercial avionics: [Spencer] Consumer GPS receivers are not necessarily designed to work at airliner speeds. While there is quite extensive informal use of hand-held GPS receivers as auxiliary navigation aids, especially in smaller aircraft which lack heavy-duty navigation systems of their own, the FAA is quite rightly reluctant to officially sanction this practice without some assurance that the receivers are actually designed to work reliably in that environment. Having a supposedly-reliable navigation aid that is lying to you is much worse than having to get by without it. John T Faulks <firstname.lastname@example.org>, who works in Lockheed Martin FADEC Product Support, elaborated some of the questions an avionics professional would ask about the requirements behind Frankston's proposal: [Faulks] How should you add a GPS to the cockpit - do you want it to talk with the other navigation systems, do they have available input channels (ARINC or whatever)? Or do you want to sit it in the pilots lap? Should they turn them off during taxi and takeoff - if there are concerns about the passengers using electronics in the main cabin, the risk in the cockpit is much much greater. This concern was echoed by David Alexander <email@example.com>, an ex-military pilot working now on Systems Integration, who commented on resiliency and integration problems as well as a (literal) war story about sextant use: [begin Alexander] I for one am glad that the regulations are that tight. As someone who builds resilient systems (and I also design and build High Availability systems) [I am] aware of the need to retest the entire configuration if you make a single change. [..] The fact that the system is complex in itself adds a new dimension of risk [..] and you cannot be certain that there has been no unintentional impact on the other systems (you mean you believe the spec was right ?!!!) without testing. [..] don't forget that aircraft systems are normally specially designed to reduce and remove spurious EMP that could affect other systems and rely on external antennae. What with the potential RFI and shielding I'm not sure I'd have confidence in a handheld GPS in a cockpit of a big 'multi'. [Speaking of sextants,] During the (1982) Falklands War [one of my former military instructors] navigated a Vulcan bomber (early 1960's design) of the RAF [on the longest bombing mission in history] from Ascension Island to the Falklands Islands, successfully bombed the runway there, and got back to Ascension using mainly a sextant. The Vulcan only had a LORAN C navigation system that was not much use in the South Atlantic. [end Alexander] Another tale (pilots love tales!) was contributed by George Bleyle <firstname.lastname@example.org>, a United Airlines A320 captain and check pilot, to confirm the advantages of a sextant over even a compass: [Bleyle] True story.......30+ years ago, the brother of a Navy navigator friend (also a Navy nav) was assigned to VX-6, the Navy C-130 [otherwise known as the Hercules turboprop transport aircraft PBL] squadron that provided logistics support out of Christchurch, NZ, for the annual wintering-over expeditions to the Antarctic. On one trip, after departing the ice shelf in a near white-out for a return trip to Christchurch, and climbing out to VMC [Visual Meteorological Conditions, that is, you can see where you're going PBL] on top of an apparently endless cloud deck, the aircraft suffered a TOTAL and complete electrical failure. No AC no DC (after batteries depleted), no comm, no nav, no nothing; VMC on top with all directions NORTH. Well, using just his periscopic sextant, HO-214 (the Air Almanac), and a chart, he was able to continuously shoot the sun to get a True Bearing to the sun, work backwards, and compute headings to fly to Christchurch... The aircraft arrived successfully, and he was awarded a Navy Commendation, etc. All done with a sextant and chart. Nice work. [Spelling of Christchurch fixed in archive copy. PGN] Ryan O'Connell <email@example.com> addressed reliability and, indirectly, safety pithily: [begin O'Connell] [..] The compass is the most basic form of instrument on an aircraft - if you don't have a reliable one, you're stuffed. (Flying a standard instrument arrival (STAR) without one is difficult at best) [and, substantiating the point about GPS unreliability:] A quick look through a database of Notices to Airmen (NOTAMs) reveals: KZDC (Washington) A0123/97: GLOBAL POSITION SYSTEM PSEUDO RANDOM NOISE 1 U/S KZHU (Houston) A0405/98: GLOBAL POSITIONING SYSTEM UNRELIABLE WI 300 NMR TCS KZLA (Los Angeles) A0325/98: GLOBAL POSITIONING SYSTEM SIGNAL UNREL WI 257 NMR NID LSAS (Switzerland) A0236/97: GPS SIGNAL UNREL FOR NAV WI SWITZERLAND S OF 4605N 'S OF MONTE CENERI' Vincent Dovydaitis <firstname.lastname@example.org> noted the general covariance of reliability with complexity, as well as the importance of determining the boundaries of the engineering task, and the unsolved, often sparsely addressed, cognitive issues that arise with technology change: [begin Dovydaitis] 1. New technology is necessarily better than old technology -- a GPS (a highly complex instrument involving reception of satellite signals, calculation (triangulation) and data processing) is not more reliable than a compass (a magnetized needle floating in a fluid). 2. [..] A reliable GPS installation involves questions of structural integrity (drilling holes in the skin of the plane to mount the antenna), electrical integrity (connecting it to the plane's power supply), and electromagnetic integrity (interference from or to other devices), assuming you start with reliable GPS hardware and software. 3. System reliability -- the total navigation/communications system of the plane must have a probability of failure of less than 10^-9. Not easy to achieve, and requires elimination of single points of failure, etc. Which is why more than one compass is required in a commercial plane, and why complex technology (like GPSs) must be backed up by simpler technology (compasses). 4. The dominant display effect -- if a GPS driving a moving map and a compass disagree, which is the pilot looking at? Which does he believe? [end Dovydaitis] Jeremy Leader <email@example.com> considered safety, taking to task Frankston's comparison with WWW development: [Leader] Frankston ends by citing the Web as a "good example" of a resilient distributed system [..] I agree that the web is, over all, a robust distributed system. But to imply that aircraft should be developed by "hacking" rather than "design" suggests that the penalty for error in an aircraft is similar to the penalty for a web server going down! [..] the design techniques used should be chosen in part based on the cost of being wrong. Hacking is great if you don't mind occasional crashes. Rob Borsari <firstname.lastname@example.org> played the role of the offended pilot, giving double meaning to the phrase `enough is enough': [begin Borsari] [..] The technology of flight navigation is very reliable and accurate with out using any "modern electronics" at all. It is quite possible and safe to navigate a plane using a magnetic compass and a watch. In fact all of the computer systems on the plane rely on these two instruments for their baseline data. [...] "Strange systemic interactions [Frankston quote]" in a flight navigation system means that you are given a position with an unknown variance. This can be fatal. It is far better to know that you don't know where you are then to incorrectly assume that you are given a precise position when it is incorrect. And here is the insulting part: Lousy navigation. I flew a 1946 Cessna for many years using only a mag compass a watch and the most basic of radio navigation systems VORs. [...] any trained pilot can use those tools to locate his position and allow accurate navigation. Here is a list of electronic navigation technologies in order of age: Radio Range ADF ILS / GS VOR (Tacan for the military but that is a different story) DME Omega LORAN GPS The only one not currently in use is the Radio Range. Any one of them provides a cross check to the pilot's basic navigation. Without any of them an aircraft can still be navigated safely. In bad weather you need at least a VOR for navigation and ILS for precision navigation. [end Borsari] Many of these points were also addressed by Kyle Schmidt <Kyle.Schmidt@Toronto.Messier-Dowty.gmsmail.com> and David Eklund <DavidEklund@compuserve.com>. Bob Frankston <Bob_Frankston@frankston.com> replied: [begin Frankston] [...] avionics is just an example [...] to emphasize the flaws in the philosophy of "whole system testing". It just doesn't scale in a world where there are no systems that work in isolation. The Internet is a great counter-example. It shifts the responsibility for reliability (including defining what is meant by reliability) to the end points and doesn't do more than try to get packets through reasonably often and reasonably quickly. See http://www.reed.com/Papers/EndtoEnd.html [..] If we were better able to evolve systems and to add functionality without decreasing reliability we would be to take advantage of the rapid evolution in capability, performance, cost and reliability, we've seen in computer systems. Reliability?? Yep. I know that computers crash (or otherwise get befuddled) but I'm glad to have the choice of having a very powerful system even if I do have to press reset once in a while. It is wonderful that the Web allows bad HTML and stale URLs. I realize that these issues are as much about marketplace philosophy as about technology. But a point of Risks is that technology doesn't exist in isolation. While it is useful to be aware of risks, the issue is more one of tradeoffs. Which risks does one choose and when does the design paradigm have to change? [end Frankston] Now, one can observe that some discussants, including Frankston, focus on reliability but barely mention safety. For those unsure of the difference, I quote from Nancy Leveson's text Safeware: [Leveson, p163] *Reliability* is the characteristic of an item expressed by the probability that it will perform its required function in the specified manner over a given time period and under specified or assumed conditions. [The second definition on p172 is similar. PBL] [Leveson, p175] An *accident* is an undesired and unplanned (but not necessarily unexpected) event that results in (at least) a specified level of loss. [Leveson, p181] *Safety* is freedom from accidents or losses. What's the difference? A car whose brakes always work but which often won't start is safe but not reliable. A car whose brakes occasionally don't work but which starts every time is probably more reliable but certainly less safe. Suppose, as many of us and also the FAA believe, safety is a major issue in aviation data systems. Now, it's easy to see how braking systems involve safety whereas cabin reading lights may not, partly one may suspect because they play different roles. But since it has been argued that a compass and a GPS can be used to play similar roles, and these roles are informational rather than directly physically-causal, is there an example of an important safety consideration that would not also be a reliability issue? Indeed yes - the failure modes of the compass are more visible and testable; the failure modes of the GPS can be hidden. Knowing when you're broken is a safety issue. Suppose Frankston were to be right in his more general point that uncoordinated human market activity can ultimately ensure the emergent property of reliability - his argument that the WWW is an example is certainly worth debating. Those who think that such activity might also ensure safety may wish to ponder whether seated passengers on commercial jets tend to keep their safety belts fastened in cruise. The last word on safety goes to Peter Mellor: [Mellor] I do think it would be a good idea to point out to the average frequent flier that seeing the pilot getting on board with a toy GPS should inspire as much confidence as seeing her get on board with a guide-dog. edited by Peter Ladkin
Bob Frankston, in RISKS 19.73, writes about avionics, and raises issues about complex systems. I have changed the order of some of the quotes for clarity. I have omitted his arguments in favor of redundancy. > But the new problem is (was) a bad compass. The third compass on the plane > had to be replaced due to FAA rules [...] what is of concern is that they > couldn't just go out to the store, buy a GPS, and place it in the cockpit. > As a passenger, when I bring my GPS and PC, I've got technology far far > ahead to the technology on the plane. As Bob notes elsewhere, a transport aircraft is a complex system, but he seems to ignore the role of the crew in its operation. Here's an analogy: since I am a pilot, I probably know more about navigation than the average train crew, but my bringing a GPS and PC onto a train will not improve its safety or reliability unless I know something about trains, switches, signals, and the like. > The reason that the systems can't be upgraded is that the whole plane > would have to be recertified as a new aircraft [...] There is something > very wrong here. The engineering practices that are supposed to assure our > safety seem to work to assure our lack of safety. I would argue that GPS is no safer than other navigation systems; its advantage is its efficiency. It is certainly neither failure proof nor fail-safe. Nathaniel Bowditch (1773 - 1838), compiler of the first compendium on navigation, wrote "A prudent navigator will never limit himself to a single method, particularly one requiring ... a device that is subject to mechanical damage or loss." The most robust navigational device is the compass. Speaking as a pilot, if the chips (no pun intended) were down and I could only have one navigational device I would choose the compass over the GPS every time. > I presume, though, that the mechanical systems try to be > independent-enough to reduce the propagation of failures. Actually, American 191, the DC-10 lost at O'Hare in May 25, 1979, uncovered many interactions in the mechanical systems. [Corrected in ARCHIVE copy. PGN] > But, if we think about the simple example of just placing a GPS in the > cockpit and allowing the airplanes computer to use the data we have a very > different model. I would argue that this is not a simple example. A GPS would need to be thoroughly tested for system interactions, wiring would be needed, software would be needed to interface with the existing navigational systems, company training manuals would need to be rewritten (I work part-time for a charter operator, and I am currently rewriting the training manual for one of the aircraft types we operate; it is a rather large task), the Minimum Equipment List (which must be consulted when there is inoperative equipment; it's why the original flight couldn't depart without the spare compass) would need to be revised, and maintenance and inspection procedures would need to be revised. I would be uncomfortable about omitting any one of these steps. > Yes, there can be strange systemic interactions. But, instead, we have a > situation that assures lousy navigation rather than permitting > improvements when available. The situation does not assure lousy navigation. It assures good navigation and avoids strange interactions. > Understanding how to build such resilient distributed systems is still in > the challenge category. Exactly. Jim Wolper ATP/PhD/CFI, Associate Professor of Mathematics, Idaho State University; Pilot/Instructor, Avcenter, Inc.
Please report problems with the web pages to the maintainer