[From Johnmac's blog: <http://johnmacrants.blogspot.com>] T-Mobile's Sidekick Smart Phone Service, powered by Microsoft's Danger Data Service has been out of commission for over a week and now the users are warned that their data, stored on Danger's Servers, may have been lost and that the data that remains on their Sidekick devices is at jeopardy, putting customers contact and calendar information at risk to disappear. Some johnmac comments: 1. There was never a problem like this prior to the Microsoft acquisition of Danger. 2. There has been little media coverage of this problem although I suspect that multi-thousands of users are affected. 3. It would seem that, given all of its technical expertise, Microsoft could come up with some way to replicate the original Danger SideKick to Danger backup. Failing that, it should be able to provide a USB backup to Outlook. 4. Perhaps Google can jump in with a Sidekick to G-Mail, G-Calendar, etc. If so, game over and a lot of Androids get sold.* - - - - - - - - - - - - The latest missive: Sidekick customers, during this service disruption, please DO NOT remove your battery, reset your Sidekick, or allow it to lose power. Updated: 10/10/2009 12:35 PM PDT T-MOBILE AND MICROSOFT/DANGER STATUS UPDATE ON SIDEKICK DATA DISRUPTION Dear valued T-Mobile Sidekick customers: T-Mobile and the Sidekick data services provider, Danger, a subsidiary of Microsoft, are reaching out to express our apologies regarding the recent Sidekick data service disruption. We appreciate your patience as Microsoft/Danger continues to work on maintaining platform stability, and restoring all services for our Sidekick customers. Regrettably, based on Microsoft/Danger's latest recovery assessment of their systems, we must now inform you that personal information stored on your device - such as contacts, calendar entries, to-do lists or photos - that is no longer on your Sidekick almost certainly has been lost as a result of a server failure at Microsoft/Danger. That said, our teams continue to work around-the-clock in hopes of discovering some way to recover this information. However, the likelihood of a successful outcome is extremely low. As such, we wanted to share this news with you and offer some tips and suggestions to help you rebuild your personal content. You can find these tips in our Sidekick Contacts FAQ. We encourage you to visit the Forums on a regular basis to access the latest updates as well as FAQs regarding this service disruption. In addition, we plan to communicate with you on Monday (Oct. 12) the status of the remaining issues caused by the service disruption, including the data recovery efforts and the Download Catalog restoration which we are continuing to resolve. We also will communicate any additional tips or suggestions that may help in restoring your content. We recognize the magnitude of this inconvenience. Our primary efforts have been focused on restoring our customers' personal content. We also are considering additional measures for those of you who have lost your content to help reinforce how valuable you are as a T-Mobile customer. We continue to advise customers to NOT reset their device by removing the battery or letting their battery drain completely, as any personal content that currently resides on your device will be lost. Once again, T-Mobile and Microsoft/Danger regret any and all inconvenience this matter has caused. Service Disruption FAQs| Disruption Credit FAQs| Disruption Discussion Password/Sign-in Text Message FAQs | Password/Sign-in Discussion [One of my closest associates reacted to this, and said, "Who would want to use a system called `Danger'?" PGN] email@example.com firstname.lastname@example.org email@example.com johnmac@panix. com, firstname.lastname@example.org email@example.com firstname.lastname@example.org email@example.com [...] [Johnmac's message also included an incisive item by Robert X. Cringeley, Microsoft screwup puts T-Mobile users in Danger. PGN] http://www.infoworld.com/d/adventures-in-it/microsoft-screwup-puts-t-mobile-users-in-danger-482?source=3DIFWNLE_nlt_blogs_2009-10-12
Daniel Eran Dilger, Microsoft's Danger SideKick data loss casts dark on cloud computing, 11 Oct 2009 http://www.roughlydrafted.com/2009/10/11/microsofts-danger-sidekick-data-loss-casts-dark-on-cloud-computing/ Microsoft has demonstrated that the dark side of cloud computing has no silver linings. After a major server outage occurred on its watch last weekend, users dependent on the company have just been informed that their personal data and photos "has almost certainly been lost." Microsoft's Danger SideKick data loss casts dark on cloud computing While occasional service outages have hit nearly everyone in the business, knocking Google's Gmail offline for hours, plunging RIM's BlackBerrys into the dark, or leaving Apple's MobileMe web apps unreachable to waves of users, Microsoft's high profile outage has impacted users in the worst possible way: the company has unrecoverable lost nearly all of its users' data, and now has no alternative backup plan for recovering any of it a week later. The outage and data loss affects all SideKick customers of the Danger group Microsoft purchased in early 2008. Danger maintained a significant online services business for T-Mobile's SideKick users. All of T-Mobile's SideKick phone users rely on Danger's online service to supply applications such as contacts, calendars, IM and SMS, media player, and other features of the device, and to store the data associated with those applications. When Microsoft's Danger servers began to fall offline last Friday October 2, users across the country couldn't even use the services; even after functionality was beginning to be brought back on Tuesday October 6, users still didn't have their data back. This Saturday, after a week of efforts to solve the crisis, T-Mobile finally announced to its SideKick subscribers: "Regrettably, based on Microsoft/Danger's latest recovery assessment of their systems, we must now inform you that personal information stored on your device - such as contacts, calendar entries, to-do lists or photos - that is no longer on your Sidekick almost certainly has been lost as a result of a server failure at Microsoft/Danger." A new report from Engadget says that T-Mobile has suspended sales of its SideKick models and is warning: "Sidekick customers, during this service disruption, please DO NOT remove your battery, reset your Sidekick, or allow it to lose power." ... [Also noted by Ben Moore. PGN]
Daniel Eran Dilger, Microsoft's Sidekick/Pink problems blamed on dogfooding and sabotage, 12 Oct 2009 Additional insiders have stepped forward to shed more light into Microsoft's troubled acquisition of Danger, its beleaguered Pink Project, and what has become one of the most high profile Information Technology disasters in recent memory. The sources point to longstanding management issues, a culture of "dogfooding," and evidence that could suggest the issue was a deliberate act of sabotage. AppleInsider previously broke the story that Microsoft's Roz Ho launched an exploratory group to determine how the company could best reach the consumer smartphone market, identified Danger as a viable acquisition target, and then made a series of catastrophic mistakes that resulted in both the scuttling of any chance that Pink prototypes would ever appear, as well as allowing Danger's existing datacenter to fail spectacularly, resulting in lost data across the board for T-Mobile's Sidekick users. ... http://www.roughlydrafted.com/2009/10/12/microsofts-sidekickpink-problems-blamed-on-dogfooding-and-sabotage/
T-Mobile's "Sidekick" mobile service uses a backend system provided by Microsoft, and seemingly aptly named "Danger." [Will Robinson was not mentioned, but...] Danger has lost ALL the customers' stored data. The only copy remaining is that remaining on the mobile device itself. "our teams continue to work around-the-clock in hopes of discovering some way to recover this information. However, the likelihood of a successful outcome is extremely low." RISKS: Backups are good, working backups *far* better. If you run a cloud-based service, you can ruin *many* more people's days than anyone with a mere departmental failed server ever can.
The *LA Times* reports that patients at Cedars-Sinai Medical Center were hit with excess radiation from CT brain scans. The FDA has issued an alert "Over an 18-month period, 206 patients at a particular facility received radiation doses that were approximately eight times the expected level. Instead of receiving the expected dose of 0.5 Gy (maximum) to the head, these patients received 3-4 Gy." <http://www.fda.gov/medicaldevices/safety/alertsandnotices/ucm185898.htm> A) Old Risks come back Yet Again; note this went on for 18 months. B) I assume the employees around such emitting devices still wear film badges or other dosimeters; maybe patients should do so as well.... [Also noted by Brian Harvey and Lauren Weinstein. PGN] http://www.washingtonpost.com/wp-dyn/content/article/2009/10/10/AR2009101000813.html?hpid=sec-health
From the better-late-than-never department: http://rondam.blogspot.com/2009/09/time-machine-time-bomb.html Summary: plugging in a new ESATA drive can cause you to silently lose ALL your Time Machine backups.
-- and what that means for the way we communicate [Source: Jessica E. Vascellaro, *Wall Street Journal*, 12 Oct 2009; PGN-ed] E-mail has had a good run as king of communications. But its reign is over. In its place, a new generation of services is starting to take hold-services like Twitter and Facebook and countless others vying for a piece of the new world. And just as e-mail did more than a decade ago, this shift promises to profoundly rewrite the way we communicate-in ways we can only begin to imagine. ... http://online.wsj.com/article/SB10001424052970203803904574431151489408372.html
The risks of assumptions: In RISKS-25.80, Steve Bellovin muses entertainingly on the chaotic state of the flight status systems he encountered, and the foibles he ascribes to poor systems design. However, the piece is based on the assumption that the design purpose of the flight information system is to provide passengers with accurate real time data on the status of the flight. If this is the purpose then his thoughts are valid. If on the other hand the purpose of the system is to give to the passenger information about the flight that the airline wishes the passenger to know -- as part of a strategy to manage passenger expectations - then the system may well be doing what its designers created it to do. For example, displaying times to the minute leaves the passenger with an impression of precision - a perception that an airline might want to create in the mind of the passenger. However changing that "precise" timing in real time as the accurate flight status changed in the database would spoil the impression. I can also visualise situations where the airline would not want the "real" status of the flight displayed in real time on the annunciations at the airport. Disney manages queues brilliantly to minimise the negative psychological effects of the wait. They know when to tell you its a long wait, and when to "fold the queue" out of visual sight to minimise its apparent length. Are the airline flight systems intended to do something similar? What would they display if something went badly wrong? Was it really poor systems design; was the airline using it badly; or did they have a purpose that was not apparent or espoused? I suspect Steve is right — the end to end system was broken. But it did get me to thinking about what the airline might have specified as the design parameters for the system.
I have noticed similar problems with flight status. For the most part, I do not believe that this is a computer problem. I believe that the airlines have cut their personnel so far as to barely have enough people to do what is needed if everything goes right. When anything goes wrong, there are not enough people to do what needs to be done. At some point, someone has to manually enter that a plane has left the gate, or arrived at at a gate. This seems to be a low priority (and it probably should be) compared to getting people on and off the plane, for example. Of course, there various databases should be connect to the web sites and in airport displays, but I think that is a more minor problem in this case. Of course, I have no actual knowledge of how any of this works.
It is immensely informative (as well as hugely entertaining) to see two paragons of design excellence such as Donald Norman and PGN arguing issues of simplicity versus complexity. I could only wish Henry Petroski and Edward Tufte would chime in, too... I'm reminded of the deliciously dynamic conversation of architectural design between Richard Meier's Getty Center and the context provided by its central garden independently designed by Robert Irwin. Perhaps all design arises from the synthesis of contending views. It seems to this acolyte that a view helpful to understanding the context of complex systems and their need for simple user interfaces is to realize that the user is not external to the system. A car is not acting as a car until its driver takes the controls. The UI is only one of many interfaces, each critically important and benefiting from principles of design elegance. One certainly agrees that quotes mean more in context. Frost's "good fences makes good neighbors" was what the neighbor said - the poet said "something there is that doesn't love a wall". On the other hand, Einstein's original quote was "...the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience." Einstein was describing physics not engineering. In engineering corners must often be cut. In physics they may never be. There are more risks in remaining silent than in hazarding the occasional ironic remark. Rob Seaman, National Optical Astronomy Observatory firstname.lastname@example.org
Complex talk about Complex Machinery I'm sorry to see the bearer of our golden standard on interfaces, Don Norman, plunging headlong into such shallow waters [RISKS-25.80] . At issue is John Maeda's quote [RISKS-25.79] in a Lexus ad in The New Yorker: "Digital technology will enable the creation of ultra-complex machines, processes, and imagery. But that amazing technology will be framed in an elegant and simple form that makes it user-friendly. The more complex the machinery, the simpler the interface will be." [italics mine]. This last sentence, without more context, explanation, or scope of applicability, is worse than a simple conundrum; it is a disservice to public understanding of the perils of complexity that the RISKS forum, as I've known it, serves to explore. (I admit that there are special circumstances, such as in the design of aids for the visually, mentally or physical handicapped, wherein a more severe handicap might seem to require more complicated machinery and a simpler interface. Even then, with more complexity, more things can go wrong: we should have indicators to know that something has gone amiss with a breathing tube or robotic arm, and the means to do something about it - automatically, or course, which could save the day, or the person, until these indicators or controls or automated actuators themselves malfunction, etc.) To print that quote was surely bad judgment on the part of Maeda, or Lexus, or *The New Yorker*, or some combination of them, I don't know which. It may have been wrong of me to call it exactly as I saw it, an unintended parody, suggesting that complexity of machinery and the complexity of its interface are inversely related. I had assumed that RISKS readers would see somewhat as I did. And, as for PGN's puns, look folks, lets fix what we can fix.
[Quite a few readers noted that the A380 item was three years old, and slipped by Wendell and PGN. Apologies. The A390 has of course been flying quite noticeably for quite a while now. PGN] My blushes, as Holmes said to Dr Watson. It's odd that I didn't check the date, for I often complain that a Web page hides the answer to When? At least the moral of the story holds good — which is to say bad. Ad astra per aspera. [Mea culpa from PGN for not noticing the old URL and checking it. I have spent the last three weeks filling up at least six dumpsters for recycling and 30 large boxes for saving almost 60 years of accumulated paper, so that my office could be emptied enough for an earthquake retrofit. I've been massively preoccupied, and am now a preoccupant of temporary (and also nearly empty) office for the next three weeks or so. I'll have to rely on our aggressive backup system in case my desktop does not survive the construction. I'm finally forced to live in a paperless world for a while. PGN]
We have been through all this before, in thread "Failure Taxonomy (Discussion of Terms)" in January 1997. [Indeed. And yet the ensuing discussion was also repeated, as seen in a few selected responses that follow — included in case we have some new readers such as the T-mobile/Danger/MS folks, the CAT scan folks, or others, with apologies to old readers (who can skip the next three messages). PGN] All *design* faults show no physical degradation. That doesn't make the fault merely a matter of opinion. If any artifact is stated to carry out some function, and it doesn't, then it is flawed - independently of whether the failure was caused by erroneous software or a broken wire. The distinction that Paul Robinson seeks to make is false. A bridge may be rusty, or a wire corroded, yet still be fit for purpose. The evidence of the fault is the consequent failure to perform as required, not the corrosion. So long as the requirements are clear, any deviation from the requirements is an objective fault that does not depend on anyone's opinion. If the requirements are not clear, then whether the observed behaviour is faulty or not *is* a matter of opinion, whether the system is mechanical or software-controlled. Let's keep post-modernism out of engineering.
I suspect that we may be mixing two types of failure. When we design software or bridges, we write specifications. If we specify a width of 'n' metres for a bridge, but a survey reveals a different width, then the bridge does not meet its specification — i.e. it is faulty. Similarly, when designing software, we may (and should) specify precise behaviour. If the software fails to meet that specification, then it is faulty. We apply code inspections and various types of testing to determine how well the specification is met. Since perfection is not possible, there will be gaps in our specification. However such a "specification bug" is a different thing from a failure to meet the specification. Quite different from the above is "decay". A bridge may rust, components may bend or break. Metal fatigue and a multitude of other factors may reduce the usefulness of a product that was originally of acceptable quality. In general, software does not experience this . If your software works correctly, it will continue to work correctly. However its usefulness may decline over time due to external factors. The computer on which it runs may become obsolete. Peripherals may malfunction. The problem, which the software solves, may change . In the first type of failure, software and other engineering endeavours share a good deal of similarity. In the second, they seem to share less.  Sometimes data or configuration files become progressively more corrupt, giving software the appearance of decay. This is sometimes known as "bit rot".  A bridge has an analogue to the "problem change" situation. e.g., a bridge with a particular capacity may become less useful as changed traffic patterns create a need for higher capacity.
I'm not sure I follow this argument. If the point here is that the concept of "defect" in software becomes meaningless when we narrow our field of vision sufficiently, then that's true; it's meaningless to say that a solitary 0 or 1 is "defective". But that's just as true for the non-computing examples given here; t's meaningless to say that a single proton/electron/neutron is "rusty". "Error" in software is a matter of human opinion. But then, so is the general consensus that brakes should be able to stop a car and bridges shouldn't fall down.
Disproof by counterexample: binary diff against a program that works correctly will quantitatively show the bits that are different. If I don't know anything about corrosion or bridges, your claim that those brown spots on the cables are bad is going to be your opinion and nothing more — to me. Or, looking at it from the other side, I write code for scientists. Often enough I don't have enough domain knowledge to tell if the numbers my code produces are correct or not. I have to ask for someone else's opinion on that and trust them if they say my code needs replacing.
Please report problems with the web pages to the maintainer