(Quoting the Canadian federal Transportation Safety Board <http://www.bst-tsb.gc.ca/eng/reports/rail/1997/Index.html>): On 3 Sep 1997, at approximately 0150 mountain daylight time, VIA Rail Canada Inc. Train No. 2, travelling eastward at 67 mph, derailed at Mile 7.5 of the Canadian National Wainwright Subdivision, near Biggar, Saskatchewan. Thirteen of nineteen cars and the two locomotives derailed. Seventy-nine of the 198 passengers and crew on board were injured, 1 fatally and 13 seriously. Approximately 600 feet of main track was destroyed. The Board determined that the derailment immediately followed the fracture of the lead axle on the trailing locomotive. The axle fractured as a result of an overheated traction motor suspension bearing that failed due to a lack of lubrication. An on-board hot-bearing monitoring system detected the overheated bearing 29 hours before the derailment and sounded an alarm. Various operating and maintenance employees attempted to diagnose the warning, but inadequate knowledge and training, coupled with miscommunication, led to the erroneous conclusion that the failure was in the warning system, and the crew disconnected it. (A full report is also available at the URL above.) Observation: While unquestioning faith in the reliability of computers can sometimes prove fatal, the reverse is also true. The "computer error" has become a truism, humans are often more likely to believe in the integrity of mechanical systems than computer systems. As a result, they may ignore or even defeat computer-generated warnings of mechanical failure — with consequences like those in the above report. Bruce Martin
DARPA leads fight against domain-name hackersEdupage Editors <email@example.com> Thu, 27 Aug 1998 13:57:08 -0400The Defense Advanced Research Projects Agency (DARPA) has awarded a $1.4 million contract to Network Associates to develop a cryptographic authentication system for the Internet's domain-address system. The new system will enable the Net's routing points to verify the origin of any given Web page, preventing hackers from corrupting Web page caches or rerouting domain traffic altogether. It will not, however, prevent hackers from breaking into individual Web servers and changing pages. "That's not part of this particular approach," says the director of Network Associates' TIS Labs. The company is working with the Internet Software Consortium, which will distribute the security system to Unix vendors when it becomes commercially available. Beta versions are expected to be ready in about six months, with a final product on the market in about 18 months. (*TechWeb*A, 26 Aug 1998; Edupage, 27 August 1998) [As noted in RISKS many times before, pervasive authentication would be an enormous necessary step toward meaningful security. PGN]
World Wide War on child pornographersEdupage Editors <firstname.lastname@example.org> Thu, 3 Sep 1998 14:00:17 -0400Law enforcement agents in 14 countries raided about 200 suspected members of a worldwide Internet child pornography [group]; in the United States, the U.S. Customs Service seized computers from 32 suspects in 22 states belonging to an organization called the Wonderland Club. Customs officials say the Wonderland Club required all of its members to possess thousands of sexually explicit images of children, some as young as 18 months, and some showing club members sexually molesting their own relatives. Attorney James X. Dempsey of the Center for Democracy and Technology, a civil liberties organization, says: "This shows that law enforcement already has plenty of power. The Internet only facilitates crime the way the automobile facilitates crime. Like any tool, it has pluses and minuses." (*The Washington Post*, 3 Sep 1998; Edupage, 3 September 1998)
Sing a Song of SoftwareEdupage Editors <email@example.com> Tue, 1 Sep 1998 12:11:31 -0400To help combat software piracy in China, where (according to the Business Software Alliance) as much as 96% of all business software is pirated, Microsoft has joined a Hong Kong-based recording company to co-produce a pop music CD called "Lai" ("Come Along") that urges listeners to preserve the purity of cyberspace by using only legal software. (*USA Today*, 31 Aug 1998; Edupage, 1 September 1998) [The article cites BSA's observation that only Vietnam has a worse piracy rate: 98%. PGN] [Is Come-Along a Hum-A-Long Cass-CD? PGN]
Software not capable of ruining companyStefan Leue <firstname.lastname@example.org> Thu, 27 Aug 1998 14:48:55 -0400 (EDT)The German weekly "Der Spiegel" reports today in "Spiegel Online" (www.spiegel.de) that SAP America Inc. has been sued by the bankruptcy trustees overseeing liquidation of drug distributor FoxMeyer Corp. for $500 million for "gross negligence". Allegedly, SAP had promised FoxMeyer that SAP R/3 would be suitable to handle FoxMeyer's order processing needs which the software didn't live up to. The trustees consider the unsuitability of the software a contributing factor to FoxMeyer's bankruptcy. Yahoo!Finance reports similar facts (http://biz.yahoo.com/finance/980826/foxmeyer_d_1.html). Spiegel Online quotes Oliver Finger, an analyst with German DG-Bank, as saying that he does not see negative consequences for SAP, and that the suit had no basis. Finger adds that he does not believe that "software is suitable to drive a company into ruin". Any counterexamples? Stefan Leue
Consultant is sued for expected Y2K computer malfunctionKeith Rhodes <email@example.com> Mon, 31 Aug 1998 10:40:37 -0500[NOTE: And they're off! Now it's consultants for the plaintiff in the lead; now it's the consultants for the defendant.] The clothing retailer J. Baker Inc. is demanding reimbursement from Andersen Consulting for the cost of a computer system installed in 1991, anticipating its Y2K malfunction. Andersen asked the court to rule that it met all of its contractual obligations. This is considered a landmark case, especially if it is determined that Baker never specified Y2K compliance when it specified the system requirements. [Source: Melody Petersen, Consultant Is Sued for Expected Computer Malfunction in 2000, *The New York Times*, 31 Aug 1998, PGN Abstracting]
MS databases lose data; MS loses source code to DOSBear Giles <firstname.lastname@example.org> Thu, 27 Aug 1998 14:08:15 -0600 (MDT)It's bad enough that Microsoft databases lose data, but now Microsoft claims, in court, that it has lost the crucial source code necessary to prove Caldera's allegation that Microsoft did in fact, as implied by an internal 30 September 1991 that which Microsoft does not dispute, actively sabotage Windows 3.1 if it is launched from any competitive product to MS-DOS. Caldera is involved as the current legal owner of DR DOS, an increasingly popular alternative to MS-DOS which was knocked out of the market after the introduction of Windows 3.1 due to the flakiness of the DR DOS/Windows 3.1 combination. (Not to imply that MS DOS/Windows 3.1 was particularly stable.) Since it lost the source code, Microsoft appears to be claiming that there's no contempt of court in failure to provide the documentation (since it no longer exists) and the judge should dismiss the case as without merit. No word on whether Microsoft's next defense will be that it stored the source code for Windows 3.1 in an Access database. As an historical footnote, it's my understanding that the smoking gun memo was discovered in the 1995 DoJ investigation of Microsoft's business practices. That raises some obvious questions about what the current round will uncover. References: Wall Street Journal (27 Aug 1998?) http://www.news.com/News/Item/0,4,25763,00.html?st.ne.4.head http://www.zdnet.co.uk/news/1998/34/ns-5364.html http://www.caldera.com Bear Giles <email@example.com>
Near-loss of SOHO spacecraft attributed to operational errorsCraig DeForest <firstname.lastname@example.org> Thu, 3 Sep 1998 22:25:15 GMTRISKS readers will remember that on 24 Jun 1998, the international billion-dollar SOHO satellite lost contact with Earth and began spinning in an uncontrolled fashion. [RISKS-19.87,90] An Investigative Board was established by ESA and NASA to determine the cause of the disruption. That board has now released its final report, which is available on the web at http://umbra.nascom.nasa.gov/soho/SOHO_final_report.html for perusal by interested parties. It is a very interesting case study of a failure in complex systems management. The proximal cause of the loss was a mis-identification of a faulty gyroscope: two redundant gyroscopes, one of which had been spun down(!), gave conflicting signals about the spacecraft roll rate, and the ops team switched off the functioning gyro. The spun-down gyro became SOHO's only information about roll attitude, causing SOHO to spin itself up on the roll axis until the pre-programmed pitch and yaw control laws became unstable. This was the last in a series of glitches in the operational timeline on the 24th of June; the full story is available at the above web site. There were many other factors leading to the loss. The report reads like a roll call of well-known RISKy behaviors, including a staffing level too low for periods of intensive operations; lack of fully trained personnel due to staffing turnover; an overly ambitious operational schedule; individual procedure changes made without adequate systems level review; lack of validation and testing of the planned sequence of operations; failure to carefully consider discrepancies in available data; and emphasis on science return at the expense of spacecraft safety. The board "strongly recommends that [ESA and NASA] proceed ... with a comprehensive review of SOHO operations ... prior to the resumption of SOHO normal operations". Contact with SOHO has since been re-established, and — following thawing of the frozen hydrazine rocket fuel on board — full attitude control is expected within a couple of weeks, allowing recommissioning and testing of the spacecraft and instruments.
Can your laptop blow you out of the sky?"Peter G. Neumann" <email@example.com> Wed, 26 Aug 98 11:15:03 PDTIn March 1998, the Portable Rechargeable Battery Association wrote to the FAA, claiming that INCETE power ports in use in at least 1700 aircraft can result in exploding batteries. Power supply manufacturers claim there are no recorded cases of such explosions. FAA is sponsoring a conference in San Diego this week in an effort to pursue this matter further. [Source: An article by Mark Eddo, ZDTV, 15 Aug 1998: http://www.zdnet.com/zdnn/stories/zdnn_smgraph_display/0,3441,2131636,00.html]
Re: FLIR for CadillacsKirk or Diane Kerekes <firstname.lastname@example.org> Wed, 26 Aug 1998 09:57:14 -0500The California codes are on-line. A search of them via FindLaw reveals no hits for "night vision" or "night scope". A further search of all state government sites via FindLaw found no hits indicating that any state had banned night vision equipment, except for the occasional hunting restriction. Kirk Kerekes, Red Gate Ranch email@example.com
Another risk of e-mail<Bob_Frankston@frankston.com> Wed, 26 Aug 1998 11:42 -0400[This is the response I received from Bob after I queried him regarding a bounce on RISKS-19.93. PGN] While debugging my own software, which undigestifies each issue of the Risks Digest as it arrives in my local mailbox, I seemed to have generated nondelivery messages to the submitters. This is the kind of mistake that makes me sympathetic to the problems of others in building large systems. These things happen — the issue is more of learning the lessons each time and how one recovers than trying to prevent all possible accidents. The other observation is that the mistakes one makes on a large corporate system and on one's own system are not all that dissimilar. Computers provide an individual with the capability making simple mistakes with large impact.
Re: USS Yorktown: The risk of assumption is the assumption of riskAlun Jones <firstname.lastname@example.org> Thu, 27 Aug 1998 11:02:53 -0500As a software developer, I'm constantly bemused by users observing behaviour in my programs, and thereby assuming that they are fully aware of the way in which the program was written, and what are "easy" changes to request. However, it isn't merely users that assume they are fully aware of a problem's causes on very small amounts of information - several recent posters to RISKS seem to have been guilty of exactly that same assumption that since "problem A is caused effect Z for me," all cases of effect Z are caused by problem A. I'm going to pick on the Yorktown discussion simply because it's the easiest for people to find here - it's very recent, and no-one seems to have posted with any more technical information than was presented in the Government Communications Newsletter <URL:http://www.gcn.com/gcn/1998/July13/cov2.htm> There are two references here to "divide by zero" - the first comes from the Giffin memo, which (from the portions quoted) appears to be an attempt by a non-technical person to describe the relatively technical cause of failure. The second is from DiGiorgio, who is noted as contradicting other parts of the Navy's description of the incident(s). Neither description, however, mentions whether the particular crash involved was an application fault or an OS fault - the term "blue screen", in particular, is notable in its absence. The only mention of the word "crash", indeed, is in the Giffin memo, in a section that other RISKS readers have already pointed out as being so technically inaccurate as to be totally unreliable. And yet, this one news report is thrown out as 'proof' that NT has blue screen problems on a large scale - I'm not suggesting that it doesn't, merely that this report does not have remotely enough accurate information to draw a valid conclusion as to whether the problem was the operating system, or the (apparently distributed) application. If the latter is the case, then we can bay for Microsoft's blood until the cows come home, force the Navy to change their default "Off the shelf" operating system, and still find ourselves with an immobilised Yorktown once the culprit application is ported to the otherwise more stable operating system. The final acts of assumption come from: Phil Edwards, RISKS 19.91: "My reading of the story is that NT Server blue-screened for no apparent reason" (from no relevant data) Martin Ward, RISKS 19.91: "a decent operating system should not be capable of being crashed by an application program" (who said this was the cause?) Mike Williams, RISKS 19.92: "I would guess that the application was originally developed and tested on Intel machines ... but ... was ported ... to an Alpha machine with no fp masking by default. Cue crash on first fp divide by zero." (apparently missing the line in the original report that stated the system was running on "dual 200-MHz Pentium Pros from Intergraph Corp") These are by no means the most technically inaccurate RISKS articles I have seen, but they do point out how ready we are to assume, from very little data, that we thoroughly understand the cause of a problem. Guesses are all fine and dandy, as they occasionally lead to solutions faster than more rigorous analysis - but unless they are backed up with some supporting data, they are nothing more than guesses. Alun Jones, Texas Imperial Software, 1602 Harvest Moon Place Cedar Park TX 78613 Phone +1 (512) 378 3246 In related news, let's hope that PGN's assurance that "R2K" issues are of no concern is more than just a guess. :-) [Be careful how you misquote Dave Kristol's question and PGN's response! Although I certainly manage to find risks in almost everything, there seems to be nothing in the RISKS preparation and delivery process that cares about dates, and hence nothing that will prevent RISKS from transcending Y2K EDITORIALLY. Nothing in the foregoing to the contrary notwithstanding, I certainly expect that there will be some problems. Although a RISKS issue is a RISKS issue is a RISKS issue, e-mail systems may prevent me from receiving all your horror stories of Y2K failures, and problems with editors, file systems, operating systems, mailers, networking protocols, and everything else that is out of my control may make it impossible for you to RECEIVE the first few issues of the year 2000 (assuming that we are still going strong). But I hope that I will be busily cranking out Y2K reports from all quarters — assuming that you all are able to send them to me! PGN]
Re: USS YorktownJON STRAYER <JSTRAYER@ssg.ci.in.ameritech.com> Thu, 27 Aug 1998 10:46:22 -0500Last year, the Navy selected Microsoft Corp.'s Windows NT 4.0 for an automation program intended to reduce the need for sailors. Before the incident, the Navy called the Yorktown experiment a success that eliminated the need for 44 enlisted sailors and four officers and saved as much as $2.7 million a year. After the incident, Navy officials blame the problem on human error and the database system, not Windows NT, and say no computer system is failure-free. (Navy officials also said future Smart Ships will have backup computers for use during a failure.) [PGN-ed abstractions from the *Wall Street Journal* (via NewsEdge Corp) "Was it Windows or human error?"] Points that scare me: 1. Blaming the crash on human error (as if that will ever go away). 2. Having a single point of failure that can leave the ship dead in the water for hours. That would be fatal in combat. 3. Having the damage control functions depend on something as fragile as Windows NT. As late as 1985 the Navy was using sound powered phones for interior communications during combat because they were very robust and easy to repair. I doubt you can say the same about this new system. 4. "The system responded as it should." Can Capt. Hamilton really be that ignorant? The system was supposed to crash? There are only three reasonable numbers in software engineering: 0 - You can't have it 1 - You can have one Infinity - You can have as many as your system can handle
Re: USS Yorktown (Ward, RISKS-19.91)"Mark Hull-Richter" <email@example.com> Tue, 18 Aug 1998 12:35:13 -0700> "The whole point is that a decent operating system should not be capable of > being crashed by an application program." - Martin Ward I couldn't agree more. We run our primary application here on an HP3000 with MPE/iX version 5.5. For about four months straight this year we were plagued by one to four system crashes a month, something we had never seen before, or at least not in the 2.5 years I have been here. When HP's analysts finally traced the problem down, it seemed that one of our supplemental applications was an executable binary file that had a lock word on it. (A lock word is an older method of providing a modest means of security against unauthorized access to the file.) It seems that the program encountered an error situation (we think it was an i/o time-out caused by a premature logoff), trapped to the OS, which proceeded to attempt to handle the error. In the course of so doing, it went to provide a stack trace to the user's screen, but, since the executable had a lock word, the OS tried to query the user for the lock word to get permission to dump the stack trace. The user was not there, causing another error, only this time in the OS code. Parts of the stack were overwritten and the system crashed. Without trying to start a "my OS is better then yours" war, one has to wonder - what constitutes a "decent" operating system and how does one know one has such? Mark A. Hull-Richter, CDB Infotek, 6 Hutton Centre Drive, Santa Ana, CA 92707 Manager, Middle-Tier Software (714)708-2000x143 firstname.lastname@example.org
Re: USS Yorktown (Spencer, RISKS-19.92)William Todd <email@example.com> Tue, 18 Aug 1998 17:52:38 -0400<> People need to be trained in the use of those backup systems [...] >As has been noted in connection with airliners, there is a difficult problem >of keeping the operators skilled in manual control when they seldom exercise >it in normal operation. It might be better to make partially-manual control >the norm, and reserve full automation as the emergency backup. The submarines I was sailing on in the mid-70s had a feature known as automatic depth control. It worked quite well, keeping the ship on ordered depth without any work on the part of the planes men (the two sailors who control movement of the sub's planes and thus its depth). Outside of occasionally making sure it was still working we never used it. It takes skill to control the boat's depth and the skill must be constantly exercised. We did use it on occasion when we had, for one reason or another, to be at periscope depth where sea surface has a great effect on depth keeping. On rare occasion the sea state was high enough (> sea state 7-8) so that the planes men could not adequately control depth. We then would use the depth control system. In three years those conditions occurred no more than two or three times.
Re: USS Yorktown (Williams, RISKS-19.92)"Phil Edwards" <firstname.lastname@example.org> Wed, 19 Aug 1998 13:24:56 +0100> I would guess that the application was originally developed and tested > on Intel machines with the default fp exception masking, [...] ported > to an Alpha machine with no fp masking by default. [...] Some evidence in support of this possibility can be found in the US Navy's current IT Standards Guidance document, linked at http://www.doncio.navy.mil/links/IPTs/Information_Technology_Standards_Guidance/ . (The Atlantic and Pacific fleets standardised on 'commercial off-the-shelf' solutions and Windows NT in 1997, as part of the 'IT-21' initiative whose results included the Yorktown crash; the ITSG, adopted in June, in effect makes this a formal service-wide policy). Section 126.96.36.199 of the ITSG - Computer Resources/Computing Hardware/Component Technologies/CPU - lists processors under two headings: "Clients & Servers" and "Servers & Special Purpose". The Pentium II is the CPU of choice for "Clients & Servers", although "Pentium substitutes like AMD K6 and Cyrix M2 are viable and may be considered". Under the "Servers & Special Purpose" heading we read: "The MIPS RISC processor is from the Silicon Graphics, MIPS Group. Alpha is produced by DEC. PA-RISC is from Hewlett-Packard. Ultra Sparc is a Sun Microsystems processor. The PowerPC is from IBM/Motorola. These systems are suitable as servers or special purpose workstations where PCs are not able to perform the required function." This implies to me that the Yorktown may well have been running an Alpha box -- for performance or for stability (!) - which choked on an app developed for Intel. The ITSG's Computing Hardware section is an odd document in many ways. MIPS, Alpha, PA-RISC, Ultra Sparc and PowerPC are all given availability dates from the present day to 2003; Intel's Merced processor is not given a date but classified as 'emerging'. Several lines are devoted to the Pentium Pro's MMX support ("It is recommended that any new Pentium processor support MMX technology"). As these examples may suggest, the ITSG's position on computing hardware in general is remarkably PC-centric. The following is from 7.1.3 (Computer Resources/Overview/General Philosophy): "The general philosophy for implementing computing resources in the DON [Department of the Navy] is the concept of homogeneous clients and heterogeneous servers. Homogeneous clients facilitate providing a consistent interface between the user and the system and serve to make system support and maintenance less complex. It is also beneficial if servers are homogeneous as well. However, servers should be implemented in such a way that they perform their function transparent to the user. Restricting the introduction of new server technology could choke innovation and prevent users from taking advantage of advances in computing such as massively parallel processors. "Workgroup servers that support general command needs should be homogeneous using the same technology as the client. In today's environment, the de facto standard client-server computing technology is Microsoft's Windows NT. Current DON guidance is to develop all new applications such that the client can operate these applications at full capability using a Personal Computer (PC) running Windows NT." [endquote] The logic is unclear, to put it kindly: the benefits of standardisation are established, but the only argument for standardising *on Windows NT* is that it's the "de facto standard". The "concept" of "heterogeneous servers" doesn't make much of an impact, either. All in all, one gets the impression of a "general philosophy" which has been rather hastily rewritten - a Risky undertaking. Phil Edwards email@example.com, @ntexplorer.com Editor, NEWS/400.uk and Windows NTexplorer
Please report problems with the web pages to the maintainerTop