It turns out that they developed a program called Chaos Monkey. They have also announced that they are going to be releasing it as open source. This is great news if your company is plannig on deploying applications to the cloud. This type of planning is generally what has plagued the companies that have had issues when Amazon Web Services has had issues in the past.
While surfing around the other day looking for stories about the Amazon Outage I came across this story about how SmugMug managed to come through it basicly unaffected. The article is written by Don MacAskill acording to his LinkedIn Profile he is Co-Founder, CEO, & Chief Geek at SmugMug. The article goes through a lot of inforation and I warn you now that it is a long article. I suggest reading it so that you can start understanding the limitations in the Cloud and what can be done to avoid it. (Tip: Read the links to the Forum Posts. A lot of the comments are pretty bad.)
When you get done with this one check out this post from Joe Stump,a former Digg employee and currently at SimpleGeo. He talks about how people should have planned for this type of event. It's a great read with awesome points.(Also much shorter ;) )