"A new computer system used to process benefits payments has been scrapped at a cost to the taxpayer of (UK) 141M pounds, the BBC has learned. The IT project, key to streamlining payments by the UK Department for Work and Pensions (DWP), was quietly axed at an internal meeting last month. ... It is the latest in a long series of computer problems for the government." [Source: BBC News, 5 Sept 2006] http://news.bbc.co.uk/1/hi/uk_politics/5315280.stm [Phillip Hammond, the Conservatives' shadow work and pensions secretary, is quoted: "It is pretty disgraceful that after two and half years of spending public money on this project, the government has walked away from it. We never hear of somebody actually losing their job because they have failed to implement a project they were responsible for." PGN-ed]
The taxi route for commercial jets at Blue Grass Airport was altered a week before Comair Flight 5191 took the wrong runway and crashed, killing all but one of the 50 people aboard. Both the old and new taxiways to reach the main commercial runway cross over the shorter general aviation runway, where the commuter jet tried to take off on 27 Aug 2006. [Source: Crash Probe Focuses on Use of Shorter Runway, Richard Fausset and Alan C. Miller, *Los Angeles Times*, 28 Aug 2006; PGN-ed; more details in subsequent reports] http://www.latimes.com/news/nationworld/nation/la-082806plane,0,242799.story?coll=la-home-headlines
Diego Latella, The Case of Patriots in the Gulf War. [in italian] MAGAZINE: SAPERE - Ed. Dedalo srl - www.edizionidedalo.it Directors: C. Bernardini and F. Lenci This paper addresses the controversy on the performance of the Patriot system ATBM during the 1991 Gulf War. The controversy has been initiated by the seminal work of Prof. T.A. Postol and his colleagues at MIT, where several aspects of Patriot performance have been analysed and declarations of Army officials as well as the press have been questioned. The paper is aimed at the general (although motivated) public more than specialists. It starts by giving a brief introduction to the technical features of the system and its development history. The dramatic scarcity of data concerning the events in the Gulf War is then addressed, and reasons for understanding it are discussed. The most significant failures of the system during the Gulf War are presented, in contrast to the initially overly positive assessments of the success of the system during the war. The discussion is broadened including issues from the debate involving Postol's group, GAO, the Army, Raytheon researchers, and the Panel On Public Affairs of the American Physical Society (which, incidentally, judged very positively the work of the MIT group). Some personal closing remarks are presented on the use of computers in war, having to regret that not so much has changed, since SDI, on the expectations many people, including researchers, still put on computers, despite the lessons we should have learned on their practical as well as conceptual limitations. The paper includes a rich bibliography with more than sixty references. Dott. Diego Latella, Ist. di Scienza e Tecnologie dell'Informazione A. Faedo I56124, Pisa, ITALY +39 0503152982 http://www.isti.cnr.it/People/D.Latella [The translation into English is Diego's, although it has been PGN-ed. Even with my limited ability to read Italian, the original article appears to be very well researched. PGN]
The following is an extract from an article based on the series "Trust me I'm an economist", BBC2. (Second episode 7pm on 25th August 2006.) The author and presenter is Tim Harford, a *Financial Times* columnist and author of "The Undercover Economist". Supermarkets package their cheapest products to look more like famine relief than something you'd want to pay for. It's not because they can't afford sexy packaging even for their cheapest foods - it's because they want to persuade richer customers to buy something more expensive instead. Economists call this "product sabotage" and it can reach extreme levels. In the hi-tech world it is common to produce a high-specification product, sold at a premium price, and then sell the same product more cheaply with some of the functions disabled. Intel did this with its 486 computer chip in the early 1990s, and IBM did it with a printer: the economy version for home users was simply the top-of-the-range model with a chip in it to slow it down. These tactics might seem sneaky or unethical, and they certainly don't go down well with customers. Yet frustrating as it is, product sabotage is often the cheapest way to produce two different versions of a product. For the hi-tech industry the alternative is to design the whole product twice. And two different versions are what you need if you want to reach price-sensitive customers. The full article is on: http://newsvote.bbc.co.uk/mpapps/pagetools/print/news.bbc.co.uk/2/hi/business/5274352.stm During the 1970s when working for ICL, I was told by customer support engineers that the 'conversion' of a 1902 mini-mainframe to the faster 1902A model was to snip one connecting wire on the back-plane. Plus ca change ... Peter Mellor; Mobile: 07914 045072; e-mail: MellorPeter@aol.com
Any fool could tell you that [mixing] Wikis and policy making could only result in this kind of mess. Currently I am considering the possibility bulk e-mailing UK parliamentarians (or UK PMO / Royal Palace) for an undisclosed client. In my case the matter concerns an international broadcasting linkage between Canada and the UK. So few political wikis exist to get your MP or MLA's attention — that bulk e-mail and bulk faxing has become the only solid and workable alternative. Max Power, CEO, Power Broadcasting, http://HireMe.geek.nz/ British MP falls foul of wiki-d pranksters A British Government Minister may have thought he was keeping up with modern trends when he put a draft policy on the Internet on Friday, but he was soon left red-faced when hundreds of pranksters defaced it. Weblogging, techno-savvy Environment Secretary David Miliband, tipped as a bright young spark in Prime Minister Tony Blair's administration, had put a draft "environment contract" on his department's website, setting out social responsibilities for people, government and businesses. But embarrassed administrators were forced to haul it down after more than 170 cyber-jokers trashed the document by adding in bizarre paragraphs for fun. The page used "wiki" editing techniques, which allow readers to alter the content. A heading of "Who are the parties to the environmental contract?" became, "Where is the party for the environmental contract? Can I come? Will there be cake? Hooray!" Another asked: "What would an environmental contract for energy look like? Will it look like my face? My beautiful face?" The tricky question of "what tools can be used to deliver the environmental contract?" received the answer: "Spade, Organic Yoghurt Stirrer, Old washing up liquid bottle, Sticky Back Plastic." Meanwhile, a list of tools that "create the right incentive frameworks" was doctored to include "Big stick" and "Owl magnet". Some of the Internet pranksters put the boot into the Government when monkeying around with the text. Under a list of things citizens should do, one wag added: "Pay a higher proportion of their income to the government, and see little tangible improvement in their standard of living". One passage said everyone had the capacity to tackle environmental problems, but that people were too often dissuaded by "doubts about whether our actions will make any difference". One joker swiftly tagged on: "Besides which we just can't help but meddle, interfere, impose our views on others, and generally use taxpayers' resources in ways that are wasteful except in our own self-aggrandisement". Word about the document spread like wildfire across several Internet weblogs. Administrators were forced into action and left a message of their own: "Please note - the Wiki has been 'locked' for the time being to prevent editing. "Thanks to everyone for their interest so far - do visit again and continue the discussion. In the meantime, you're welcome to read the comments and materials submitted." A spokeswoman for the red-faced Department for Environment Food and Rural Affairs said the page was an experiment. "It's unfortunate that these things do happen. We are currently looking at security on the site," she said.
[This has not been widely reported in the mainstream press, which, given the gravity of what happened shocks me rather more than what happened - dww] On August 17, 2006, there was a class two incident that occurred at the Swedish atomic reactor Forsmark. A short circuit in the electricity network caused a problem inside the reactor and it needed to be shut down immediately, using emergency backup electricity. However, in two of the four generators, which run on AC, the AC/DC converters died. They disconnected, leaving the reactor in a state where the operators did not know what the current state of the system was for approx. 20 minutes. A meltdown could have occurred, such as we had in Tschernobyl. In Sweden, the government immediately shut down all reactors that were built similarly until the problem could be cleared up. In Germany, people were insisting that BrunsbŁttel was built similarly, but the company operating the reactor (in both cases, Forsmark and BrunsbŁttel: Vattenfall) insisted it was not the same. When it was discovered that BrunsbŁttel was indeed the same, the German environmental minister, Sigmar Gabriel, threatened to shut it down right away. But he has been pacified and the reactor is still running. This seems to be a very similar problem to the LA situation - the emergency systems had not been tested with the grid electricity going off. Additionally, it appears that BrunsbŁttel has had three incidents in 2002 pertaining to the emergency electricity system. According to the taz from August 31, 2006, there exists a list of 260 security problems with BrunsbŁttel which the ministry in Kiel is keeping under wraps. The ministry says that a list does exist, but refuses to publish it at the same time it is telling Vattenfall that it is not communicating its problems properly. The risks involved here are very seldom but very lethal - a core meltdown is no joke. An extremely technical report can be found here: http://www.neimagazine.com/story.asp?sectionCode=132&storyCode=2038313 Prof. Dr. Debora Weber-Wulff, FHTW Berlin, FB 4, Treskowallee 8, 10313 Berlin GERMANY +49-30-5019-2320 http://www.f4.fhtw-berlin.de/people/weberwu/
This happened many years ago now, but might be of interest, as an example of an omission of looking at the whole system. We had a computer center with a large battery backup. The 3-phase AC power was converted to DC, which kept the batteries fully loaded. The power for the servers was converted from DC to 3-phase AC. One day the latter converter broke, so we only got 2 phases of power. (There was no interruption in the mains.) We then, to our horror, discovered that there were no switches installed to bypass the faulty converter, and we thus had to close down all servers. The obvious remedy was to install those switches. Nobody had thought of the possibility that the converters might fail. Not surprisingly, as these converters are used in million of telephone exchanges all over the world, and had an impressing MTBF. Kurt Fredriksson
A data centre in London, UK suffered a total power outage on Sunday, 23 Jul 2006 when the incoming supply and the in-house stand-by generation failed. Other events contributed to a loss of service lasting 11 hours and four minutes. The events ran as follows: At 10:56 a public supply 132kV to 11kV transformer failed with consequential failure in the 40kV to 132kV transformer feeding the area of London that feeds the data centre. Incoming power failed on all six cables. Generators 1 and 3 started OK, but generator 2 did not — owing to insufficient air pressure to engage the starter motors. A stand-by diesel-powered air-compressor was tried, but could not maintain pressure in the air-starter piping. Generator 1 began to overheat and shut down and generator 3 shut down shortly afterwards due to a high load condition. While the generators were running, the supply was repeatedly in and out of tolerance. When out of tolerance, the UPS batteries provided the primary power source and this drained the power from the batteries even when the generators were running. As a result and just before generators 1 and 3 failed, UPS3 shutdown safely and went into bypass mode. No fuses were blown. When the generated power were finally lost, UPS3 reverted to a non-bypass configuration. The remaining UPSs then went into normal battery back-up and carried the load for 22 minutes at which time the UPS modules began to discharge their batteries. UPS1 then shut down on a "battery low" alarm for one module. When the second module reached "battery low", there was no power available on the module and it shut down without going into bypass. This cascaded until the last module was taking the entire load and, as this discharged, the voltage and frequency dropped resulting in the inverter thyristors failing to switch off. This caused both inverter and battery fuses to blow as well as one of the inverter thyristors themselves. UPS 2 exhibited a similar failure. All UPSs were then off and auto bypass was not enabled. When the mains were restored, raw mains was not provided directly to the PDUs. As a consequence, the input breaker shunt-tripped. This could not be reset until the UPS power has been restored at the PDU level and then each input breaker had to be manually re-set. While this was going on, Building Management System (BMS) connectivity was lost and the fire alarm went off. The Fire Brigade arrived and evacuated the building. An hour later, the Fire Brigade had confirmed the safety of the building, the fire alarm system had been taken off-line and staff had re-entered the building Commercial power was restored at 13:42. All DC and HVAC systems normalised without intervention, but the UPS systems had to be restored manually. About one hour after this, UPSs 1 and 3 were brought back online. A further hour later UPS 2 was brought back online. Half an hour later, data centre personnel began to restore various UPS PDUs to operation. Some 90 minutes after this, UPS power was briefly restored to the BMS and the Integrated Management and Monitoring System (IMMS) began to report sporadic alarms. By 20:00 all UPSs were back online. Two hours later, service had been restored, albeit that some customers were suffering odd failures the following day - possibly owing to the loss of power to equipment that had been previously running continuously for thousands of hours. Some years before, an international bank in London suffered a power outage while having its UPS serviced. They had two levels of battery UPS with stand-by generation through diesel-powered alternators. Level One UPS was scheduled for maintenance. It was taken down and (Sod's Law at work!) within minutes the incoming supply failed (as a result of "JCB-fade"). "Never mind!" the engineers cried, "There is always the Level Two UPS". Sadly, this did not take up the load, owing to the previously undetected failure of an "AND" circuit that noted the absence of incoming mains power and a low voltage condition on UPS1. "Never mind!" the engineers cried again. "The stand-by generators will start and carry the load." They didn't! The reason? The stand-by generators required a signal from UPS1 ... which was down for maintenance. Attempts to manually start the SBGs failed, firstly because the batteries were flat (!) and secondly (after replacing the batteries) because the manual-start process still required a signal from UPS1 to release an interlock. Some five hours of blackout was experienced before the incoming supply was restored. Michael "Streaky" Bacon
[...] > Redundancy isn't hard. Engineering is hard. But not impossible. Stephen Fairfax was able to think of a wide range of possibly untested failure modes for this system while dashing off an email. If suitably paid, I've no doubt he (and many others) could think of a comprehensive, even exhaustive, list and design and implement a suitable test programme. But who would be willing to pay? The results of losing a data centre rarely compare to the results of losing a Boeing or a skyscraper or a power plant. Of course occasionally they do. I wonder how well tested *those* data centres are? Merlyn Kline
On Sat, 12 Aug 2006, Lauren Weinstein wrote: > I found it rather amusing, in a "sad commentary" sort of way. I disagree with the "sad commentary" part of this, to explain. A UPS is for risk managment not risk elimination you decide two numbers (a) How long do you expect to need internal power. (b) How long does it take to do a controlled shutdown. Add them together, add some more for growth and luck. The numbers you come up with then reduce the risk to an "acceptable" level. But there's a twist to the first number. For example in a hospital computer system you only need batteries for long enough to get the diesel generator started. You only need enough diesel to keep you going until you are sure you can get a refill. Exactly the same reasoning has happened with that telecom node. (possibly after the fact of course) Robert de Bath <robert$ @ debath.co.uk> <http://www.debath.co.uk/>
Here's a link to an interesting analysis that's rather more extensive than others I've seen, perhaps in part because it relies on subscription-only sources to Japanese news sites: http://japaninc.typepad.com/terries_take/2006/08/index.html#top The particularly good bit: The technical causes of the batteries overheating were well explained in a recent Nikkei interview of a professor at Kyoto University, who is an expert on battery technology. He points out that there are two possible, complementary reasons for the Dell notebook fires — one of which offsets some of the blame from Sony. Firstly, there was the well publicized manufacturing failure, consisting of metal particles that were introduced into the battery electrolyte and which can eventually lead to internal short circuits and thus overheating. Sony takes full responsibility for this. The second reason, however, is probably not so well known, but allows Sony to share the blame with with Dell and Apple. Apparently some PC designs by both companies push the Lithium-ion battery technology past its safe point by virtue of the fast recharging cycle the makers have implemented. According to the professor, when Lithium cells are exposed to rapid charging, they can form metal fragments through chemical reaction between the electrodes and a high concentration of Lithium atoms. Once formed, these conductive metallic fragments can penetrate the plastic separator between the positive and negative electrodes, causing major short-circuits and thus catastrophic over-heating. This failure in circuit design is probably why Sony investors are betting that the company won't have to cover the entire cost of the recall. Curt Sampson <firstname.lastname@example.org> +81 90 7737 2974
*Computerworld* rediscovers that it is well known in business, that the official data may be well secured in data bases, with sensitive data going to trusted employees who reorganize the data in spread sheets BI tools that are not as well secured against data breaches. http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9002950&source=rss_topic17 Story found thanks to http://socrates.berkeley.edu:7077/it-security/
Finding and fixing this particular defect actually comes at almost zero cost. Here's what happens: 1. Professional tester designs high-fidelity test, including a step that involves powering down of the fuel pump. 2. During a test case review with the system engineers, someone says, "Hey, we can't do that, if the diesel engine is run dry, it'll break." 3. Professional tester says, "Yes, okay, but let's just suppose we *did* run the test. Guess what I learned while designing the test? The fuel pump is connected to utility power with no fail-over to generator power. So, when utility power fails, the pump stops, which means the diesel soon is running without fuel, which means that not only does the diesel engine become damaged, but we don't get our backup power." 4. System engineers say, "Ohhhhh." System engineers leave test case review, go off and solve problem. 5. Professional tester, without running a single test, saves the organization thousands, potentially millions of dollars. I have seen scenarios like this happen dozens of times in my career being a professional tester and managing professional testers. Amateur testers--i.e., people who do not make a study and profession of the field of testing--will usually miss situations like this. Now, I will grant you that there are plenty of instances where a truly high-fidelity test *is* judged by management to be cost-prohibitive. For example, some people do not performance test in completely accurate test environments, which casts a lot of doubt on their performance test results. (By the way, please note that we are talking about *software* testing here, so this is a case where the combinatorial explosion is not actually what gets you into trouble; in fact, the combinatorial explosion is not that difficult to deal with.) The explanation is usually, "It'll cost too much to replicate the production environment." Of course, that was exactly the reason why NASA didn't test the effect of foam strikes on shuttle wings, which were going on for years before the US lost one very expensive shuttle and seven very expensive--indeed, to their families, priceless--astronauts. I bet that foam strike test that was proposed to be run at Southwest Texas Research Institute--and cancelled due to cost considerations--looks like a bargain to those same NASA managers now.... Rex Black Consulting Services, Inc. 31520 Beck Road, Bulverde, TX 78163 USA CTO, Pure Testing, Pvt Ltd +1 (830) 438-4830 www.rexblackconsulting.com
Brave New Ballot: The Battle to Safeguard Democracy in the Age of Electronic Voting Aviel D. Rubin Morgan Road Books (Doubleday/Random House) $24.95 ISBN 0-7679-2210-7 Avi Rubin performs a true patriotic duty with this book. He shows that without voter-verified records, votes can be lost, election outcomes can come into doubt, and public cynicism in the political process surely grows. *Brave New Ballot* is an interesting story of a talented computer scientist who found himself in an adventure because of his dogged effort to make America's voting technology consistent with her democracy. U.S. Representative Rush Holt (D-NJ) This book is a very readable introduction to the ongoing challenges. Although it is written as a rather personal narrative in a largely nontechnical manner, it captures many of the important issues underlying the need for and general lack of integrity of the election process.
Please report problems with the web pages to the maintainer