Wed, 12 Nov 2008

Using Numbers as Keys in Windows Azure

Why my blog will break in the year 9683

Every entity in Windows Azure tables has two special properties (columns) called the partition key and the row key.

Together, the partition key and the row key form a unique identifier for the entity.  They also define the order in which entities are returned in queries.  For example, in my blog engine, I’m using DateTime.MaxValue.Ticks - DateTime.UtcNow.Ticks to ensure that my blog posts are returned in reverse chronological order.

One thing that’s critical to note is that both keys are strings.  That means that the sort order is lexicographical.  To get the right sort behavior for numbers, it’s then necessary to make those numbers fixed-length by zero-padding them.

I got this only half right when coding my blog.  I used string.Format(“{0:d10}”, DateTime.MaxValue.Ticks - DateTime.UtcNow.Ticks).  The {0:d10} says to zero-pad the integer so that it’s at least 10 digits long.  Unfortunately, I have no idea where I got the number 10 from.  I must have miscalculated or misremembered how many digits are required to store ticks.  Let’s fire up IronPython and do the calculation:

>>> DateTime.MaxValue.Ticks
3155378975999999999L
>>> # Anyone remember how to find the log-base-10 of a number?  Here's one way...
>>> DateTime.MaxValue.Ticks.ToString().Length
19
>>> # Aha!  I need 19!  I'm using 10, so when the length gets down to 18, I'll sort wrong.  When will that happen?
>>> doomsday = DateTime(DateTime.MaxValue.Ticks - 99999999999999999)
>>> doomsday.ToString()
'2/10/9683 6:13:20 AM'

I think it would be irresponsible to let things start failing in 9683, so to my future fans, let it be known that I’ve fixed this issue in my blog.  For the blog entries themselves, no action was necessary.  Recent timestamps all come to 19 digits in the DateTime.MaxValue.Ticks – DateTime.UtcNow.Ticks formula, so I only had to ensure that the code correctly zero-pads to 19 digits now.  Existing posts don’t need to change.

Actually, it will break in 3169!

For comments, I use forward chronological order (just DateTime.UtcNow.Ticks), and current timestamps are down in the 18-digit numbers.  The first 19-digit number comes at DateTime(999999999999999999), which comes in November of 3169!  That’s a lot sooner than the entries will break, and it actually requires fixing up data to correct it.

What I needed to do was read each comment from my blog, update the row key to be 19 digits, and then save the changes.  Because the row key is part of the identity of an entity, this needed to be a delete/re-add instead.  Here’s the code I ended up with:

            var svc = new BlogDataServiceContext();
            foreach (var comment in svc.BlogCommentTable)
            {
                svc.DeleteObject(comment);
                comment.RowKey = Int64.Parse(comment.RowKey).ToString("d19");
                svc.AddObject("BlogCommentTable", comment);
            }
            svc.SaveChanges();

Of course, I also had to update the blog code itself to use 19-digit numbers in the future.

Fairly straightforward, but there’s a catch.  If you imagine (as I like to when nobody’s around) that my blog is incredibly popular, and people are visiting all the time and leaving comments… then I’m going to have a problem.  If I update the blog code first, then new comments will have the extra zero at the front, while the old comments are still getting updated.  If I instead update the existing comments first, then they’ll have the extra zero at the front, but new comments that come in before I’ve rolled out the blog code change won’t.  Either way, my blog will be briefly wrong.

How to do the change properly?

There are two ways to handle this.  Let’s call them “the right way” and “the way I did it.”  The way I did it is just update the blog code and run my local code to update the old comments at the same time.  Swapping in the new blog code once it’s deployed to the cloud takes only a few seconds, as does running the update code, so comments can only sort wrong if someone submits something during those few seconds (and if they refresh the browser a few seconds later, it will fix itself).

The right way to handle this is to do a staged update.  First update the code so it stores new values the correct way (19 digits) but can handle both kinds of data (perhaps by sorting all the comments explicitly after retrieval based on the numeric value).  Then run the script to update all the old comments.  Then roll out a second code update that goes back to using the sort order that comes back on retrieval.

Aside about the importance of testing

I actually broke everything for a bit in the midst of this update… I guess it’s important to set up test data and try operations there first. :-)  I had a typo in my code.  I used “BlogEntryTable” when I meant “BlogCommentTable,” so I effectively moved all the comments into the entries table, which caused even loading the front page of my blog to raise an exception.  I had to quickly write a bit of new code to move them back, but my blog was effectively down for a few minutes while I figured it all out (and a worse typo could have resulted in data loss).

Has anyone yet written a utility to copy all table data from one account to another?  This could be used to create test data to try something out on or to make a backup snapshot.