Brian Randell forwarded me the Jon Jacky's report from COMPASS'88, and asked to comment it. These comments can be summarized by: FRIGHTENED! I am frightened to see attendants to a congress devoted to safety issues debating on topics (the 10**-9 figure and the like) when ignoring, or wanting to ignore, their real origin and their meaning: no significant contribution to the mortality rate in industrialized countries (see the statistics from OMS, the world wealth organization). I am frightened to see that they are confusing evaluation of a product and evaluation of a process: as a direct consequence of the former, such reliability goals cannot, by essence, be statistically assessed for a given product. I am frightened to see specialists of safety who do not seem aware of one of the major criticisms of the Inquiry Commission after the Challenger accident, i.e. that NASA had neglected quantitative evaluations. Jean-Claude Laprie, LAAS-CNRS, Toulouse, France.
I shall let Bev have the last word in this argument, especially since I cannot figure out what we are arguing about. As far as I can tell, we are in violent agreement. But he does issue a challenge to which I feel I should respond, i.e., >Maybe Nancy could tell us how she would reduce catastrophic failure rates to >acceptable levels and demonstrate the achievement of such levels? I do not believe that we should be talking about catastrophic failures at all. Catastrophic failures or accidents usually involve aspects of the environment that are not under the control of the designer of the system. Computers do not usually have catastrophic failures (unless one considers the problems of electrical shock or fire, which are usually minimal). I prefer the word used by Safety Engineers, i.e., hazards (states of the system being designed that could lead to accidents or catastrophic failures given certain environmental conditions). As a component of a larger, potentially hazardous system, computer software can certainly contribute to the system hazards and thus has some hazardous (but not necessarily catastrophic) failure modes. Considering only catastrophic failures limits the problem too much. So how can software-related hazards be eliminated or reduced? The practice of System Safety Engineering provides some direction. Details will be in my book (coming out next year from Addison-Wesley) and my survey article also describes this (in less detail) but briefly: The first step is to identify the hazards of the system being designed. System safety engineers call this a Preliminary Hazard Analysis. This part of the process often involves knowledge of previous systems and accidents and requires more creativity for one-of-a-kind or first-time systems. However, even these can be analyzed using information about general types of hazards such as electrical shock, chemical effects, or radioactivity exposure. Analysis techniques, e.g., Failure Modes and Effects Analysis (although this is usually used more often for reliability analysis than for safety analysis) and Fault Tree Analysis, are used to determine plausible events that could create hazards. Design procedures are then used to try to eliminate or minimize these hazards by either preventing the precipitating events from occurring (i.e., designing them out), minimizing the probability of their occurrence, or minimizing their chance of leading to the hazard. Bev should note, here, that probability is involved and thus measurement or estimation. But the key is that measurement alone leads us to a fatalistic yes/no choice whereas measurement combined with design allows us some further options in attempting to find designs that have acceptable risk. My books and reference materials are currently somewhere over the state of Kansas, so system safety engineers may need to correct some of the following. But the general goal of system safety design is to eliminate single events that can lead to hazards and to minimize the probability of multiple events or sequences of events leading to a hazard. Thus the desire of the engineer with whom I spoke to get a number for the event of a particular type of software behavior. Since I did not know of any way that he could obtain this number with the amount of certainly he and I felt were necessary when a nuclear plant meltdown was involved, I thought it best for him to use 1.0 for this probability and design around this event (i.e., design the system so that the event would not cause a hazard). Engineers have various ways of eliminating hazardous events (what I referred to sloppily in my previous message as "catastrophic failure modes) from systems or minimizing their probability of occurrence. For example, the hazard of electrical shock can be eliminated by designing a purely mechanical system. This is a rather extreme solution, however, and often unacceptable. Another less extreme solution is to use interlocks to prevent certain events or to ensure proper sequencing of events. For example, a door over a high voltage area is often used to protect against accidents. The door can be designed so that when it is open (and the high voltage equipment exposed), the circuit is broken. Note that the use of interlocks are often simply a matter of designing the system so that multiple independent failures are required for hazards to arise: The interlocks themselves may fail. Assuming that common point failure modes (which engineers have techniques for identifying) are eliminated or minimized, then risk will be reduced. Probabilistic Risk Assessment (PRA) can be used to determine if the risk is acceptable, but even for non-computerized systems PRA is controversial and criticized as inexact. The most accepted use of PRA is for comparison of alternative designs. PRA has been primarily used in the assessment of the safety of nuclear power plants, and there is at least one interesting critique of this use that has been published by the Union of Concerned Scientists. Several accidents I have heard about have occurred when computer subsystems replaced electro/mechanical subsystems without replacing the interlocks that maintained adequate levels of risk. The same mechanical interlocks or back-up systems can be included to protect against computer errors (probably the safest since we can assess the risk involved fairly accurately using historical information) and/or various kinds of interlocks can be included in the software. In this case, software reliability assessment is involved since the reliability of the software interlocks will be crucial. Since these software interlocks are often quite simple and limited in required functionality, I believe (and I realize that others might not agree with me), high reliability MAY be achievable by using sophisticated software engineering techniques including, perhaps, formal methods. I might be willing to accept subjective assessments of risk here if the software is simple enough whereas Bev might not. If subjective types of assessment (including the use of formal methods) is not acceptable, then we may be forced to rely on mechanical back-ups, probably in addition to the software interlocks (again, the goal is to require as many independent failures as possible for a hazard to occur). [Note that some hazards are unavoidable, and the design goal then becomes minimizing the amount of time in the hazardous state.] Probably neither Bev nor I would agree to the use of another computer as the backup, even if the code is written by a separate group of people because of the problem of eliminating common failure modes, i.e., non-independent failures (e.g., the same software requirements specification). The point is that this does not mean that the reliability of the entire software system (which may be quite large and complex) needs to be ultra-high, only the smaller safety-critical parts. If the software is designed, as is common with software engineers who do not know enough about safety, so that it is ALL potentially safety-critical, then the problem becomes quite overwhelming, and Bev and I (and others) find ourselves with planes or other devices that we do not feel safe using. Hopefully, I have not made too many mistakes in this brief summary of System Safety Engineering. If I have, I am sure that the other Risks readers will correct me :-). Nancy
The telephone feature of delivering the calling number to the terminating line is part of a group of features called 'CLASS', although there are other ways it could be done in certain special cases. There are a number of Bellcore publications that describe it in some detail. Among these are TR-TSY-000031 on the basic feature, (TA) 000030 on the signalling between office and customer terminal, 000391 on the feature to block delivery of the calling number, 000218 on selective call reject, and (TA) 000220, also related to selective call reject. TAs are an early version of TRs. If you don't find one in a reference,look for the other. There are several other TRs that relate to these features, but this list should sate most of us. Calling number delivery, selective call reject, and calling number delivery blocking are all involved with the 'Signalling System 7' which is just beginning to be deployed amongst local exchanges, although some of the long distance carriers are much farther along. Among other advantages, SS7 enables the transfer of much more information between network nodes than was previously generally available. This should allow the introduction of many new network services in the near future. On the other hand, CLASS and calling number delivery in particular will not likely become common until large areas are cut over to SS7, since otherwise they would not work much of the time. (Only within the local switching office, or among those that had already implemented SS7) It looks to me like a subscriber to calling number delivery gets telemetry intended to allow display of the number calling concurrently with ringing. I suppose proper customer premise equipment could pick this off and feed it into a computer or use it to determine what to do with the call, eg. route to an answering machine only if not long distance. If the number isn't available, as would be the case if the originating and terminating offices were not linked by SS7, the telemetry sends ten 0s. If the number is available but the originator is blocking delivery, it sends ten 1s. Calling number delivery blocking is itself a CLASS feature that can be set on by a service order or, depending upon the tariffed offering, turned on or off on a per call basis. How it is offered, if at all, is up to the local telco and PUC. The TR makes it look to me like it is not available to party line subscribers. I think there is a technical reason for this. Selective call reject allows the subscriber to set up a list of up to N directory numbers (N might be on the order of 6 to 24) that would be sent to 'treatment' instead of ringing the subscriber's phone. A caller using blocking could be put on this list after one call by using a control that says, in effect, add the last caller to my list, but that number could not be read from the list by the subscriber. It doesn't look to me like the blocking code itself can be put on the list; maybe somebody else knows a way or has tried it. Call reject can be turned on or off also, and can be maintained from either a DTMF or dial phone. There might be something here for everybody. If I can block delivery of my number and Mr. Townson can send me to treatment we would be almost as well off as with Internet addressing from Bitnet to Portal. The foregoing opinions and interpretations are mine, not my employer's. My interpretations of the referenced documents are based on a cursory reading. They probably contain some errors. John McHarry McHarry%BNR.CA.Bitnet@wiscvm.wisc.edu
A co-worker of mine called the Police last year to report a burglar alarm in his neighborhood which was going off. (He lives in Baltimore County, Maryland.) The dispatcher received the phone number, his name and an address automatically. The 911 dispatcher read back the address that was displayed. It was where they had lived two(2!) years previously. When they moved, they kept the old phone number and gave the phone company his the address. Unfortunately, the change of address was not passed on to 911. Although it would be nice to have 911 come if you were in trouble and and could only lift the phone, I would like them to arrive at the Current address. (I know the people who live at my old address do not know my current address, although I assume they have a current phone phone book. Since I am listed, They could direct the police to my home.) Quite a waste of time, esp. in an emergency.
The Alameda County phone book has a privacy notice right below the 911 number which warns callers about ANI and advises them to use the regular 7-digit number if they don't want their number displayed on the dispatcher's console. -Al Stangenberger
Here's another scam for ANI. Set up a free phone service: time and weather, point spread predictions, sports score line, Dow Jones business news brief. It's just a taped message someone can call into. Now set up a PC to capture the ANI information on people who call. Take the diskette of phone numbers to a service that offers CNA (customer name and address) and presto! You have yet another profiled mailing list ready to be sold to hungry marketers of sports equipment, business journals, etc. Where'd they get MY name? you ask. You'll never know. ANI is going to be big business. Just north of Atlanta is one of the new AT&T regional billing centers. Their goal is to fully integrate ANI with their customer inquiry department. So when you call 1-800 whatever, the AT&T rep will answer "Good morning Mr. Jones, how's the weather in Macon? I'll bet you're calling about that collect call to Bogota." They'll have your name, address, and billing info on the screen in front of them as they answer your call. Hmmm... try forwarding your calls to AT&T. What will happen? Brent Laminack (gatech!itm!brent)
Please report problems with the web pages to the maintainer