The Risks Digest

The RISKS Digest

Forum on Risks to the Public in Computers and Related Systems

ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator

Volume 25 Issue 81

Monday 12 October 2009

Contents

Microsoft's Danger Data Service disrupts users
John F. McMullen
Microsoft's Danger SideKick and cloud computing
Daniel Eran Dilger via Monty Solomon
Microsoft's Sidekick due to dogfooding/sabotage
Daniel Eran Dilger via Monty Solomon
Cloud Danger, literally... M$ loses T-mobile data
David Lesher
Excess CAT scan radiation -- the return of Therac 25?
David Lesher
A Time Machine time bomb
Ron Garret
Why E-mail No Longer Rules
Jessica E. Vascellaro via Monty Solomon
Re: Airline status display follies
Peter R Cook
Arthur Flatau
Re: The risks of being cute
Rob Seaman
Ken Knowlton
Re: The computers did it -- differently
Wendell Cochran
Re: Software never fails, people decide that it does
Martyn Thomas
Michael Smith
Geoffrey Brent
Dimitri Maziuk
Info on RISKS (comp.risks)

Microsoft's Danger Data Service disrupts users

"John F. McMullen" <johnmac13@gmail.com>
Mon, 12 Oct 2009 18:54:31 -0400

[From Johnmac's blog:  <http://johnmacrants.blogspot.com>]

T-Mobile's Sidekick Smart Phone Service, powered by Microsoft's Danger
Data Service has been out of commission for over a week and now the users
are warned that their data, stored on Danger's Servers, may have been lost
and that the data that remains on their Sidekick devices is at jeopardy,
putting customers contact and calendar information at risk to disappear.

Some johnmac comments:
1. There was never a problem like this prior to the Microsoft acquisition
   of Danger.
2. There has been little media coverage of this problem although I suspect
   that multi-thousands of users are affected.
3. It would seem that, given all of its technical expertise, Microsoft could
   come up with some way to replicate the original Danger SideKick to Danger
   backup. Failing that, it should be able to provide a USB backup to
   Outlook.
4. Perhaps Google can jump in with a Sidekick to G-Mail, G-Calendar, etc. If
   so, game over and a lot of Androids get sold.*

 - - - - - - - - - - - -

The latest missive:

Sidekick customers, during this service disruption, please DO NOT remove
your battery, reset your Sidekick, or allow it to lose power.
Updated: 10/10/2009 12:35 PM PDT

T-MOBILE AND MICROSOFT/DANGER STATUS UPDATE ON SIDEKICK DATA DISRUPTION

Dear valued T-Mobile Sidekick customers:

T-Mobile and the Sidekick data services provider, Danger, a subsidiary of
Microsoft, are reaching out to express our apologies regarding the recent
Sidekick data service disruption.  We appreciate your patience as
Microsoft/Danger continues to work on maintaining platform stability, and
restoring all services for our Sidekick customers.

Regrettably, based on Microsoft/Danger's latest recovery assessment of their
systems, we must now inform you that personal information stored on your
device - such as contacts, calendar entries, to-do lists or photos - that is
no longer on your Sidekick almost certainly has been lost as a result of a
server failure at Microsoft/Danger. That said, our teams continue to work
around-the-clock in hopes of discovering some way to recover this
information. However, the likelihood of a successful outcome is extremely
low. As such, we wanted to share this news with you and offer some tips and
suggestions to help you rebuild your personal content. You can find these
tips in our Sidekick Contacts FAQ. We encourage you to visit the Forums on a
regular basis to access the latest updates as well as FAQs regarding this
service disruption.

In addition, we plan to communicate with you on Monday (Oct. 12) the status
of the remaining issues caused by the service disruption, including the data
recovery efforts and the Download Catalog restoration which we are
continuing to resolve. We also will communicate any additional tips or
suggestions that may help in restoring your content.

We recognize the magnitude of this inconvenience. Our primary efforts have
been focused on restoring our customers' personal content. We also are
considering additional measures for those of you who have lost your content
to help reinforce how valuable you are as a T-Mobile customer.  We continue
to advise customers to NOT reset their device by removing the battery or
letting their battery drain completely, as any personal content that
currently resides on your device will be lost.  Once again, T-Mobile and
Microsoft/Danger regret any and all inconvenience this matter has caused.

Service Disruption FAQs| Disruption Credit FAQs| Disruption Discussion
Password/Sign-in Text Message FAQs | Password/Sign-in Discussion

  [One of my closest associates reacted to this, and said,
  "Who would want to use a system called `Danger'?"  PGN]

johnmac@acm.org johnmac13@gmail.com johnmac@sdf.lonestar.org
johnmac@panix. com,  johnmac@echonyc.com johnmac13@mac.com
 jmcmullen@monroecollege.edu johnmac@alumni.iona.edu [...]

  [Johnmac's message also included an incisive item by Robert X. Cringeley,
  Microsoft screwup puts T-Mobile users in Danger.  PGN]
http://www.infoworld.com/d/adventures-in-it/microsoft-screwup-puts-t-mobile-users-in-danger-482?source=3DIFWNLE_nlt_blogs_2009-10-12


Microsoft's Danger SideKick and cloud computing (Daniel Eran Dilger)

Monty Solomon <monty@roscom.com>
Mon, 12 Oct 2009 19:22:50 -0400

Daniel Eran Dilger, Microsoft's Danger SideKick data loss casts dark on
cloud computing, 11 Oct 2009
http://www.roughlydrafted.com/2009/10/11/microsofts-danger-sidekick-data-loss-casts-dark-on-cloud-computing/

Microsoft has demonstrated that the dark side of cloud computing has no
silver linings. After a major server outage occurred on its watch last
weekend, users dependent on the company have just been informed that their
personal data and photos "has almost certainly been lost."

Microsoft's Danger SideKick data loss casts dark on cloud computing

While occasional service outages have hit nearly everyone in the business,
knocking Google's Gmail offline for hours, plunging RIM's BlackBerrys into
the dark, or leaving Apple's MobileMe web apps unreachable to waves of
users, Microsoft's high profile outage has impacted users in the worst
possible way: the company has unrecoverable lost nearly all of its users'
data, and now has no alternative backup plan for recovering any of it a week
later.

The outage and data loss affects all SideKick customers of the Danger group
Microsoft purchased in early 2008. Danger maintained a significant online
services business for T-Mobile's SideKick users.  All of T-Mobile's SideKick
phone users rely on Danger's online service to supply applications such as
contacts, calendars, IM and SMS, media player, and other features of the
device, and to store the data associated with those applications.

When Microsoft's Danger servers began to fall offline last Friday October 2,
users across the country couldn't even use the services; even after
functionality was beginning to be brought back on Tuesday October 6, users
still didn't have their data back. This Saturday, after a week of efforts to
solve the crisis, T-Mobile finally announced to its SideKick subscribers:

"Regrettably, based on Microsoft/Danger's latest recovery assessment of
their systems, we must now inform you that personal information stored on
your device - such as contacts, calendar entries, to-do lists or photos -
that is no longer on your Sidekick almost certainly has been lost as a
result of a server failure at Microsoft/Danger."

A new report from Engadget says that T-Mobile has suspended sales of its
SideKick models and is warning: "Sidekick customers, during this service
disruption, please DO NOT remove your battery, reset your Sidekick, or allow
it to lose power." ...

  [Also noted by Ben Moore.  PGN]


Microsoft's Sidekick due to dogfooding/sabotage (Daniel Eran Dilger)

Monty Solomon <monty@roscom.com>
Mon, 12 Oct 2009 23:01:13 -0400

Daniel Eran Dilger, Microsoft's Sidekick/Pink problems blamed on dogfooding
and sabotage, 12 Oct 2009

Additional insiders have stepped forward to shed more light into Microsoft's
troubled acquisition of Danger, its beleaguered Pink Project, and what has
become one of the most high profile Information Technology disasters in
recent memory.

The sources point to longstanding management issues, a culture of
"dogfooding," and evidence that could suggest the issue was a deliberate act
of sabotage.

AppleInsider previously broke the story that Microsoft's Roz Ho launched an
exploratory group to determine how the company could best reach the consumer
smartphone market, identified Danger as a viable acquisition target, and
then made a series of catastrophic mistakes that resulted in both the
scuttling of any chance that Pink prototypes would ever appear, as well as
allowing Danger's existing datacenter to fail spectacularly, resulting in
lost data across the board for T-Mobile's Sidekick users. ...

http://www.roughlydrafted.com/2009/10/12/microsofts-sidekickpink-problems-blamed-on-dogfooding-and-sabotage/


Cloud Danger, literally... M$ loses T-mobile data

"David Lesher" <wb8foz@panix.com>
Sun, 11 Oct 2009 15:43:17 -0400 (EDT)

T-Mobile's "Sidekick" mobile service uses a backend system provided by
Microsoft, and seemingly aptly named "Danger." [Will Robinson was not
mentioned, but...]

Danger has lost ALL the customers' stored data. The only copy remaining
is that remaining on the mobile device itself.

"our teams continue to work around-the-clock in hopes of discovering some
way to recover this information. However, the likelihood of a successful
outcome is extremely low."

RISKS:

Backups are good, working backups *far* better.

If you run a cloud-based service, you can ruin *many* more people's days
than anyone with a mere departmental failed server ever can.


Excess CAT scan radiation -- the return of Therac 25?

David Lesher <wb8foz@panix.com>
Sun, 11 Oct 2009 14:32:34 -0400

The *LA Times* reports that patients at Cedars-Sinai Medical Center were hit
with excess radiation from CT brain scans.

The FDA has issued an alert "Over an 18-month period, 206 patients at a
particular facility received radiation doses that were approximately eight
times the expected level.  Instead of receiving the expected dose of 0.5 Gy
(maximum) to the head, these patients received 3-4 Gy."
<http://www.fda.gov/medicaldevices/safety/alertsandnotices/ucm185898.htm>

A) Old Risks come back Yet Again; note this went on for 18 months.

B) I assume the employees around such emitting devices still wear film
badges or other dosimeters; maybe patients should do so as well....

  [Also noted by Brian Harvey and Lauren Weinstein.  PGN]
http://www.washingtonpost.com/wp-dyn/content/article/2009/10/10/AR2009101000813.html?hpid=sec-health


A Time Machine time bomb

Ron Garret <ron@flownet.com>
Sat, 10 Oct 2009 03:37:27 -0700

 From the better-late-than-never department:

http://rondam.blogspot.com/2009/09/time-machine-time-bomb.html

Summary: plugging in a new ESATA drive can cause you to silently lose ALL
your Time Machine backups.


Why E-mail No Longer Rules (Jessica E. Vascellaro)

Monty Solomon <monty@roscom.com>
Mon, 12 Oct 2009 11:09:54 -0400

-- and what that means for the way we communicate

[Source: Jessica E. Vascellaro, *Wall Street Journal*, 12 Oct 2009; PGN-ed]

E-mail has had a good run as king of communications. But its reign is over.
In its place, a new generation of services is starting to take hold-services
like Twitter and Facebook and countless others vying for a piece of the new
world. And just as e-mail did more than a decade ago, this shift promises to
profoundly rewrite the way we communicate-in ways we can only begin to
imagine. ...
http://online.wsj.com/article/SB10001424052970203803904574431151489408372.html


Re: Airline status display follies (Bellovin, RISKS-25.80)

Peter R Cook <PCook@wisty.plus.com>
Sat, 10 Oct 2009 10:29:18 +0100

The risks of assumptions:

In RISKS-25.80, Steve Bellovin muses entertainingly on the chaotic state of
the flight status systems he encountered, and the foibles he ascribes to
poor systems design.

However, the piece is based on the assumption that the design purpose of the
flight information system is to provide passengers with accurate real time
data on the status of the flight. If this is the purpose then his thoughts
are valid.

If on the other hand the purpose of the system is to give to the passenger
information about the flight that the airline wishes the passenger to know
-- as part of a strategy to manage passenger expectations - then the system
may well be doing what its designers created it to do.

For example, displaying times to the minute leaves the passenger with an
impression of precision - a perception that an airline might want to create
in the mind of the passenger. However changing that "precise" timing in real
time as the accurate flight status changed in the database would spoil the
impression.

I can also visualise situations where the airline would not want the "real"
status of the flight displayed in real time on the annunciations at the
airport. Disney manages queues brilliantly to minimise the negative
psychological effects of the wait. They know when to tell you its a long
wait, and when to "fold the queue" out of visual sight to minimise its
apparent length. Are the airline flight systems intended to do something
similar?  What would they display if something went badly wrong?

Was it really poor systems design; was the airline using it badly; or did
they have a purpose that was not apparent or espoused? I suspect Steve is
right -- the end to end system was broken. But it did get me to thinking
about what the airline might have specified as the design parameters for the
system.


Re: Airline status display follies (Bellovin, RISKS-25.80)

Arthur Flatau <flataua@acm.org>
Mon, 12 Oct 2009 11:32:36 -0500

I have noticed similar problems with flight status.  For the most part, I do
not believe that this is a computer problem.  I believe that the airlines
have cut their personnel so far as to barely have enough people to do what
is needed if everything goes right.  When anything goes wrong, there are not
enough people to do what needs to be done.  At some point, someone has to
manually enter that a plane has left the gate, or arrived at at a gate.
This seems to be a low priority (and it probably should be) compared to
getting people on and off the plane, for example.

Of course, there various databases should be connect to the web sites and in
airport displays, but I think that is a more minor problem in this case.

Of course, I have no actual knowledge of how any of this works.


Re: The risks of being cute (RISKS-25.79,80)

Rob Seaman <seaman@noao.edu>
Sun, 11 Oct 2009 11:16:56 -0700

It is immensely informative (as well as hugely entertaining) to see two
paragons of design excellence such as Donald Norman and PGN arguing issues
of simplicity versus complexity.  I could only wish Henry Petroski and
Edward Tufte would chime in, too...

I'm reminded of the deliciously dynamic conversation of architectural design
between Richard Meier's Getty Center and the context provided by its central
garden independently designed by Robert Irwin.  Perhaps all design arises
from the synthesis of contending views.

It seems to this acolyte that a view helpful to understanding the context of
complex systems and their need for simple user interfaces is to realize that
the user is not external to the system.  A car is not acting as a car until
its driver takes the controls.  The UI is only one of many interfaces, each
critically important and benefiting from principles of design elegance.

One certainly agrees that quotes mean more in context.  Frost's "good fences
makes good neighbors" was what the neighbor said - the poet said "something
there is that doesn't love a wall".

On the other hand, Einstein's original quote was "...the supreme goal of all
theory is to make the irreducible basic elements as simple and as few as
possible without having to surrender the adequate representation of a single
datum of experience."  Einstein was describing physics not engineering.  In
engineering corners must often be cut.  In physics they may never be.

There are more risks in remaining silent than in hazarding the occasional
ironic remark.

Rob Seaman, National Optical Astronomy Observatory  seaman@hanksville.org


Re: The risks of being cute (RISKS-25.79,80)

<KCKnowlton@aol.com>
Sun, 11 Oct 2009 20:35:47 EDT

Complex talk about Complex Machinery

I'm sorry to see the bearer of our golden standard on interfaces, Don
Norman, plunging headlong into such shallow waters [RISKS-25.80] . At issue
is John Maeda's quote [RISKS-25.79] in a Lexus ad in The New Yorker:

  "Digital technology will enable the creation of ultra-complex machines,
  processes, and imagery. But that amazing technology will be framed in an
  elegant and simple form that makes it user-friendly. The more complex the
  machinery, the simpler the interface will be."  [italics mine].

This last sentence, without more context, explanation, or scope of
applicability, is worse than a simple conundrum; it is a disservice to
public understanding of the perils of complexity that the RISKS forum, as
I've known it, serves to explore.

(I admit that there are special circumstances, such as in the design of aids
for the visually, mentally or physical handicapped, wherein a more severe
handicap might seem to require more complicated machinery and a simpler
interface. Even then, with more complexity, more things can go wrong: we
should have indicators to know that something has gone amiss with a
breathing tube or robotic arm, and the means to do something about it -
automatically, or course, which could save the day, or the person, until
these indicators or controls or automated actuators themselves malfunction,
etc.)

To print that quote was surely bad judgment on the part of Maeda, or Lexus,
or *The New Yorker*, or some combination of them, I don't know which. It may
have been wrong of me to call it exactly as I saw it, an unintended parody,
suggesting that complexity of machinery and the complexity of its interface
are inversely related. I had assumed that RISKS readers would see somewhat
as I did.

And, as for PGN's puns, look folks, lets fix what we can fix.


Re: The computers did it -- differently (RISKS-25.80)

Wendell Cochran <atrypa@eskimo.com>
Sat, 10 Oct 2009 09:07:19 -0700

  [Quite a few readers noted that the A380 item was three years old, and
  slipped by Wendell and PGN.  Apologies.  The A390 has of course been
  flying quite noticeably for quite a while now.  PGN]

My blushes, as Holmes said to Dr Watson.

It's odd that I didn't check the date, for I often complain that a Web page
hides the answer to When?

At least the moral of the story holds good -- which is to say bad.

Ad astra per aspera.

  [Mea culpa from PGN for not noticing the old URL and checking it.  I have
  spent the last three weeks filling up at least six dumpsters for recycling
  and 30 large boxes for saving almost 60 years of accumulated paper, so
  that my office could be emptied enough for an earthquake retrofit.  I've
  been massively preoccupied, and am now a preoccupant of temporary (and
  also nearly empty) office for the next three weeks or so.  I'll have
  to rely on our aggressive backup system in case my desktop does not
  survive the construction.  I'm finally forced to live in a paperless
  world for a while.  PGN]


Re: Software never fails, people decide that it does (RISKS-25.80)

Martyn Thomas <martyn@thomas-associates.co.uk>
Sat, 10 Oct 2009 11:28:37 +0100

We have been through all this before, in thread "Failure Taxonomy
(Discussion of Terms)" in January 1997.  [Indeed.  And yet the ensuing
discussion was also repeated, as seen in a few selected responses that
follow -- included in case we have some new readers such as the
T-mobile/Danger/MS folks, the CAT scan folks, or others, with apologies
to old readers (who can skip the next three messages).  PGN]

All *design* faults show no physical degradation. That doesn't make the
fault merely a matter of opinion. If any artifact is stated to carry out
some function, and it doesn't, then it is flawed - independently of whether
the failure was caused by erroneous software or a broken wire.

The distinction that Paul Robinson seeks to make is false. A bridge may be
rusty, or a wire corroded, yet still be fit for purpose. The evidence of the
fault is the consequent failure to perform as required, not the corrosion.

So long as the requirements are clear, any deviation from the requirements
is an objective fault that does not depend on anyone's opinion. If the
requirements are not clear, then whether the observed behaviour is faulty or
not *is* a matter of opinion, whether the system is mechanical or
software-controlled.

Let's keep post-modernism out of engineering.


Re: Software never fails, people decide that it does (RISKS-25.80)

Michael Smith <emmenjay@zip.com.au>
Tue, 13 Oct 2009 01:56:12 +1100

I suspect that we may be mixing two types of failure.

When we design software or bridges, we write specifications.  If we specify
a width of 'n' metres for a bridge, but a survey reveals a different width,
then the bridge does not meet its specification -- i.e. it is faulty.

Similarly, when designing software, we may (and should) specify precise
behaviour.  If the software fails to meet that specification, then it is
faulty.  We apply code inspections and various types of testing to determine
how well the specification is met.  Since perfection is not possible, there
will be gaps in our specification.  However such a "specification bug" is a
different thing from a failure to meet the specification.

Quite different from the above is "decay".  A bridge may rust, components
may bend or break.  Metal fatigue and a multitude of other factors may
reduce the usefulness of a product that was originally of acceptable
quality.

In general, software does not experience this [1].  If your software works
correctly, it will continue to work correctly.  However its usefulness may
decline over time due to external factors.  The computer on which it runs
may become obsolete.  Peripherals may malfunction.  The problem, which the
software solves, may change [2].

In the first type of failure, software and other engineering endeavours
share a good deal of similarity.  In the second, they seem to share less.

[1] Sometimes data or configuration files become progressively more corrupt,
giving software the appearance of decay.  This is sometimes known as "bit
rot".

[2] A bridge has an analogue to the "problem change" situation.  e.g., a
bridge with a particular capacity may become less useful as changed traffic
patterns create a need for higher capacity.


Re: Software never fails, people decide that it does (RISKS-25.80)

Geoffrey Brent <gpbrent@optusnet.com.au>
Sat, 10 Oct 2009 21:35:08 +1100

I'm not sure I follow this argument. If the point here is that the concept
of "defect" in software becomes meaningless when we narrow our field of
vision sufficiently, then that's true; it's meaningless to say that a
solitary 0 or 1 is "defective". But that's just as true for the
non-computing examples given here; t's meaningless to say that a single
proton/electron/neutron is "rusty".

"Error" in software is a matter of human opinion. But then, so is the
general consensus that brakes should be able to stop a car and bridges
shouldn't fall down.


Re: Software never fails, people decide that it does (RISKS-25.80)

Dimitri Maziuk <dmaziuk@bmrb.wisc.edu>
Sat, 10 Oct 2009 12:51:31 -0500

Disproof by counterexample: binary diff against a program that works
correctly will quantitatively show the bits that are different.

If I don't know anything about corrosion or bridges, your claim that those
brown spots on the cables are bad is going to be your opinion and nothing
more -- to me.

Or, looking at it from the other side, I write code for scientists.  Often
enough I don't have enough domain knowledge to tell if the numbers my code
produces are correct or not. I have to ask for someone else's opinion on
that and trust them if they say my code needs replacing.

Please report problems with the web pages to the maintainer

Top