Thu, 23 Sep 2010

Web Page Image Capture in Windows Azure

46aecb90-73e5-44ae-ab47-6e50ff13f6d6[1]In this week’s upcoming episode of Cloud Cover, Ryan and I will show http://webcapture.cloudapp.net, a little app that captures images of web pages, like the capture of http://silverlight.net you see on the right.

When I’ve seen people on the forum or in email asking about how to do this, they’re usually running into trouble using IECapt or the .NET WebBrowser object. This is most likely due to the way applications are run in Windows Azure (inside a job object, where a number of UI-related things don’t work). I’ve found that CutyCapt works great, so that’s what I used.

Using Local Storage

The application uses CutyCapt, a Qt- and WebKit-based web page rendering tool. Because that tool writes its output to a file, I’m using local storage on the VM to save the image and then copying the image to its permanent home in blob storage.

This is the meat of the backend processing:


var proc = new Process()
{
    StartInfo = new ProcessStartInfo(Environment.GetEnvironmentVariable("RoleRoot")
                    + @"\approot\CutyCapt.exe",
            string.Format(@"--url=""{0}"" --out=""{1}""",
                url,
                outputPath))
        {
            UseShellExecute = false
        }
};
proc.Start();
proc.WaitForExit();
if (File.Exists(outputPath))
{
    var blob = container.GetBlobReference(guid);
    blob.Properties.ContentType = "image/png";
    blob.UploadFile(outputPath);
    File.Delete(outputPath);
}

Combining Roles

Typically, this sort of architecture (a web UI which creates work that can be done asynchronously) is accomplished in Windows Azure with two roles. A web role will present the web UI and enqueue work on a queue, and a worker role will pick up messages from that queue and do the work.

For this application, I decided to combine those two things into a single role. It’s a web role, and the UI part of it looks like anything else (ASP.NET MVC for the UI, and a queue to track the work). The interesting part is in WebRole.cs. Most people don’t realize that the entire role instance lifecycle is available in web roles just as it is in worker roles. Even though the template you use in Visual Studio doesn’t do it, you can simply override Run() as you do in a worker role and put all your work there. The code that I pasted above is in the Run() method in WebRole.cs.

If I later want to separate the front-end from the back-end, I can just copy the code from Run() into a new worker role.

Get the Code

You can download the full source code here: http://cdn.blog.smarx.com/files/WebCapture_source.zip, but note that it’s missing CutyCapt.exe. You can download the most recent version of CutyCapt.exe here: http://cutycapt.sourceforge.net. Just drop it in the root of the web role, and everything should build and run properly.

Watch Cloud Cover!

Be sure to watch the Cloud Cover episode about http://webcapture.cloudapp.net, as well as all of our other fantastic episodes.

If you have ideas about other things you’d like to see covered on the show, be sure to ping us on Twitter (@cloudcovershow) to let us know.