[Wed 5 Nov 86 15:38] In the San Diego Union was an article from the AP newswire. A tape recording of President Reagan urging voters to go out and vote Republican went haywire and continuously called phone lines at a hospital in Texas. Over a six hour period several of the hospital phone lines received a phone call every three minutes.
Today (November 6th) is the first day that there has NOT been an item in my paper about some computer failure or other problem resulting from the Big Bang! Accordingly, it seems like a good time to take stock, and report what has been going on. But first I would like to deal with a comment Jerry Saltzer made about my original posting. I quoted a newspaper article which referred to the TOPIC terminal network used by the Stock Exchange as being > . . . six years old and considered fairly antiquated by today's standards. and Jerry Saltzer replied > I wonder who it is that considers that system as antiquated? Another > perspective says that a complex system that has been running for six > years is just beginning to be seasoned enough that its users can have > some confidence in it... Well, it was Sir Nicholas Goodison, the chairman of the Stock Exchange who said that TOPIC was antiquated rather than a computer scientist, although perhaps he was influenced in this view by his technical staff. He was also quoted as "having breathed a sigh of relief" when he heard that the problems were only with TOPIC and not the brand new and expensive (18 million pounds?) SEAQ system. To its credit, as far as I know, SEAQ has not failed yet, although it has been taken out of service on several occasions when TOPIC has broken in the interests of fairness - some people can access SEAQ directly and this would give them an unfair advantage. Anyway, TOPIC probably was very stable ("tried and trusted" was another phrase in the article I quoted) until the Stock Exchange started tinkering with it just before the Big Bang. Indeed, according to an article in Computing (Oct 30th), the Stock Exchange "opened an electronic gateway" allowing access to detailed SEAQ price information by an additional 7,500 screens at the last minute, effectively quadrupling the load. The rest is history. As far as the technology being antiquated goes, I believe that TOPIC provides a video feed (Teletext) whereas SEAQ provides a digital feed, and perhaps it is significant that it was the TOPIC/SEAQ link that failed. Apparently, video is much less convenient for wiring up a dealing room so that you can switch information between desks flexibly. So perhaps, in that limited sense TOPIC is indeed antiquated, but the real problem was caused by the tinkerers as Jerry said. However, I think that to some extent, the issue here is akin to the recent discussions about whether software rots. What changes are the assumptions a system makes about its environment, and the Big Bang certainly produced a radically new environment. Anyway, back to what's been happening since last Monday (Big Bang day). TOPIC went down again on Tuesday at lunchtime, but since then has been reasonably well behaved thanks to various emergency measures designed to minimise the load. In particular, there are restrictions on the time of day that you can enter new pricing information, and the page refresh rate has been decreased. The Stock Exchange anticipated a 50% increase in demand, but the load actually doubled. The Sunday Times quoted the figure of 2.2 million page requests/day (as opposed to 500,000 on NASDAQ, a comparable system on Wall Street). Two new computers have been ordered to add to the eight which already support the network, and should increase the capacity by 50%. On Monday, a malfunction replaced the British Aerospace share prices with those for Bass (a brewery). But perhaps the most serious problem is the backlog of unmatched trade reports which will have to be sorted out before accounts can be settled. At the weekend, after one weeks trading, there were 55,000 such unmatched records, and even worse, despite working at it all weekend, only 2,000 were resolved. By Tuesday, there were at least another 4,000 bringing the total to 59,000 and 15 security firms are reported to be having difficulties with the new settlement system. It is difficult to put these figures into perspective without knowing the total number of trades in a week. 55,000 seems pretty big to me, and is apparently five times the average, but then 11,000 also seems pretty big! A semi-informed guess would be that 55,000 represents about 30% of the weeks trading. The main reason for the backlog is a power failure at a computer bureau last week, but human error caused by lack of familiarity with the new systems, and "insufficient decimal precision" have also been blamed. So with nothing in the paper today, everything appears calm, but as the Independent put it yesterday, "behind the scenes, officials are faced with nightmarish problems". The next big test of the system will be in December when trading starts in 6 billion pounds worth of British Gas shares, the biggest share issue ever, aimed at getting as many share holders as possible, (7 million people have expressed an interest!). I think the dealers might just be going back to the deserted trading floor of the Stock Exchange... [Sources: Computing, Sunday Times, Independent] Robert Stroud, Computing Laboratory, University of Newcastle upon Tyne. UUCP ...!ukc!cheviot!robert
This is my favourite Big Bang story and comes from the not entirely serious Backbytes column of Computing (Oct 30th), reproduced without permission. Robert Stroud, Computing Laboratory, University of Newcastle upon Tyne. UUCP ...!ukc!cheviot!robert "Dog days for dire Stratus" (c) Computing As the blue touch paper for the Big Bang was finally lit this week, one company that must have allowed itself a sigh of relief is fault-tolerant computer manufacturer Stratus. The trouble is that, while stockbroker companies are usually delighted with their Stratus machines, they [the companies] have an unfortunate habit of demonstrating the non-stop capabilities to clients by wrenching out a circuit board while the computer is in operation. Over recent months this habit has caused havoc at the UK customer assistance centre of Stratus in downtown Hounslow, Middlesex. All Stratus computers sold in the UK are linked to the centre by autodial modem. In the case of any part apparently 'failing', red lights flash in the centre and the requisite replacement is hastily dispatched, complete with service engineer. With the boom in fault-tolerant sales as financial institutions geared up for Big Bang, the 'cry wolf' situation began to get out of hand. Desperate engineers have now solved the problem by placing a timing delay in the alarm system to allow sticky fingered stockbrokers time to put the board back. With computer-based dealing starting for real this week and keeping everyone in the financial institutions well occupied, Backbytes is sure that the problem will disappear anyway.
My father once told me about a semi-truck that was being used to test an experimental microprocessor-controlled engine. Apparently the micro would crash (the computer, not the truck) whenever the truck was driven near the local airport. It was finally determined that the cause was EMI from a radar transmitter at the airport. Fortunately, when the micro crashed the engine simply died, although one can easily imagine worse consequences. I'm told that they now test their experimental systems by simply driving them past the Voice of America transmitter near Cincinnati. If the system can operate under the conditions there, then they believe that it should operate almost anywhere! /Don [A new definition of "exhaustive testing"? PGN]
The required redundancy/diversity can be and is achieved for software and for hardware, e.g.: In nuclear reactor systems the redundant data processing systems -- old fashioned hardwired systems as well as computerised systems -- are in redundant, strictly separated rooms, sometimes even different parts of the building. The same applies for the cabling, which is routed different ways ASAP from the instrumentation points. (This is at least true for current reactors in Germany.) If redundant software is developed using design diversity or n-version-programming properly, in connection with a certain amount of robustness and checking involved, not all versions will always suffer the same way from some strange events. The more you know about these events, the more you can do about it and make your system more fault-tolerant. Udo Voges, Kernforschungszentrum Karlsruhe, firstname.lastname@example.org
Please report problems with the web pages to the maintainer