Wed, 05 Nov 2008

Windows Azure Blog Source Code from PDC

You can now download the source code as it stood at the end of my PDC talk.

There’s a README.txt in the source zip that explains what you need to do to get things running. Basically you need to provide StorageClient.dll (it’s not included, but you can build it from the SDK sample), and you need to manually run devtablegen to set up the local storage.

What does it do?

To catch everyone up who didn’t watch my talk, this demo is a Windows Azure application that has basic create/view functionality for blog posts (stored in table storage) and images (stored in blob storage). In addition, when you create a post that references an external image via an <img/> tag, a worker role fetches the remote image, copies into blob storage, and fixes up the references in the blog post to point to the copy. The idea is to avoid having blog posts with broken image tags when someone decides to remove the original (remote) image.

An SDK bug

Why do you have to run devtablegen manually? It’s actually a bug on our part. I reference the same data models from both the web role and the worker role. If you choose the “Create Test Storage Tables” option in Visual Studio, it will end up passing the DLL twice into devtablegen, which then dies with an ugly error. We’ll fix the bug, and in the mean time the workaround is to run devtablegen from the commandline.

My own bugs

Image replacement didn’t work when I tried it live at PDC, and unfortunately I deleted the data before getting back to this to investigate, so I’m not 100% sure what went wrong. One issue I’ve identified is that I don’t handle chunked transfer encoding. To fetch the image and put it in blob storage, I’m using the following code:

    var response = WebRequest.Create(url).GetResponse();
    var props = new BlobProperties(Guid.NewGuid().ToString()) { ContentType = response.ContentType };
    container.CreateBlob(props, new BlobContents(Utilities.CopyStream(response.GetResponseStream(),
        (int)response.ContentLength)), false);

This code, it turns out, is a bit naïve. I need a smarter CopyStream utility that can handle not knowing the content length up front.

A second bug is that I’m not escaping a string I’m using in a regular expression. I use this code to replace existing <img/> tags with new ones referencing the copied image:

            return Regex.Replace(html, string.Format("(<img[^>]*src\\s*=\\s*[\"']){0}", origUrl),
                string.Format("$1{0}", newUrl), RegexOptions.IgnoreCase);

Notice that I’m not escaping the string, so special characters (like question marks, which appear in a lot of URLs) will end up doing some strange things. Fortunately, there’s a Regex.Escape() method which should fix this problem quickly and easily. (But I left the broken version in here, since I wanted to show the code exactly as it was in my talk.)

It’s quite likely there are other bugs… drop me a note if you find something else.

Wed, 05 Nov 2008

Windows Azure Blog Source Code from PDC

What does it do?

An SDK bug

My own bugs

Those links again