Wed, 18 Mar 2009

What I Learned From the Windows Azure Malfunction

I like to think that there’s a lesson to be learned in every experience, especially when things are going wrong.  As many of you know, Windows Azure malfunctioned last Friday, March 13th.  I just finished posting a summary of what happened and what we’re doing about it to the Windows Azure blog.  Go there if you want to understand what happened.

What I wanted to do here was talk about what I personally learned from the experience:

  1. Overcommunication is key when things go wrong.
  2. That being said, all communication is an opportunity to miscommunicate.
  3. Friday the 13th truly is unlucky.

As the guy who was responsible for communicating during the malfunction and afterwards, I had a chance to make mistakes on both the serious points above.  In communicating via the forum and on Twitter with our customers, I didn’t communicate often enough.  Even when there wasn’t much to say, it would probably have been good to make more frequent updates just letting everyone know we were continuing to investigate and address the problem.  I’m going to work with the rest of the Windows Azure team to make sure I have enough information in the future to make frequent and accurate updates next time there’s an issue (large or small).

More communication would have been better, but even in the communication I made, I managed to miscommunicate at least once.  A lot of the press about the malfunction has been picking up on the fact that at the end of the forum thread, I said we’ll be doing a post-mortem but I probably won’t do any follow-up communication until after the MIX conference.  I wanted to avoid overpromising when I didn’t know how much work we had to do to fully understand the issue, but my comment left the impression that we weren’t taking the issue seriously and weren’t taking communication seriously.

To clear up any miscommunication, I want to let everyone know that even though Windows Azure is in a Community Technology Preview (CTP), we react to malfunctions in the system with all appropriate seriousness and care.  If you have concerns about anything, feel free to contact me via Twitter or email any time, or leave a comment here.

With that, I’m off to prepare for my MIX session.  If you’re in town for the conference, stop by my session and say hi!