Fri, 02 Jul 2010

Deleting Windows Azure Queue Messages: Handling Exceptions

This week’s episode of Cloud Cover (scheduled to go live tomorrow morning) is all about Windows Azure queues. It’s a bit of a long episode, but there’s a lot of interesting technical content in there. During the show, a detail came up as Ryan and I were discussing queues and concurrency. At the time, I wasn’t sure exactly what guidance to give, so I committed to following up before the show went live.

To understand the situation, remember that Windows Azure employs reliable queueing, meaning that it guarantees no message is lost without being handled by a consumer. That means that consuming queue messages is a two step process. First, the consumer dequeues the message, specifying a visibility timeout. At this point, the message is invisible and can’t be retrieved by other consumers. When the consumer is finished with the message, it deletes it. If, however, the consumer is unable to finish processing the message, the visibility timeout will expire, and the message will reappear on the queue. This is what guarantees the message will eventually be handled.

This leads us to the situation Ryan and I were discussing on the show. The scenario is as follows:

  1. Instance #1 dequeues a message and starts working on it.
  2. The message’s visibility timeout expires, making it visible again on the queue.
  3. Instance #2 dequeues the message and starts working on it.
  4. Instance #1 finishes working on the message and tries to delete it.

At that last step, an error will be returned from the queue service. This is because the first instance no longer “owns” the message; it’s already been delivered to another instance. (The underlying mechanism is a pop receipt which is invalidated by the second dequeueing of the message.)

The question I couldn’t answer on the fly during the show was how to accurately detect this error and handle it in code. After some discussion with the storage team, this is the .NET code I’m recommending people use to identify this error:


try
{
    q.DeleteMessage(msg);
}
catch (StorageClientException ex)
{
    if (ex.ExtendedErrorInformation.ErrorCode == "MessageNotFound")
    {
        // pop receipt must be invalid
        // ignore or log (so we can tune the visibility timeout)
    }
    else
    {
        // not the error we were expecting
        throw;
    }
}

It would be nice if the storage client library included a constant for “MessageNotFound,” as it does for a number of other common error codes, but we can be confident that’s the right string by consulting the documentation on Queue Service Error Codes.

Note that I’m not just checking for the HTTP 404 status code, because that could mean some other things (like an incorrect queue name). Looking for the “MessageNotFound” error code is more specific and thus better to use.

Now go watch the latest Cloud Cover episode!