Wed, 26 Jan 2011

Windows Azure Startup Tasks: Tips, Tricks, and Gotchas

In my last post, I gave a brief introduction to Windows Azure startup tasks and how to build one. The reason I’ve been thinking lately about startup tasks is that I’ve been writing them and helping other people write them. In fact, this blog is currently running on Ruby in Windows Azure, and this is only possible because of an elevated startup task that installs Application Request Routing (to use as a reverse proxy) and Ruby.

Through the experience of building some non-trivial startup tasks, I’ve learned a number of tips, tricks, and gotchas that I’d like to share, in the hopes that they can save you time. Here they are, in no particular order.

Batch files and Visual Studio

I mentioned this in my last post, but it appears that text files created in Visual Studio include a byte order mark that makes it impossible to run them. To work around this, I create my batch files in notepad first. (Subsequently editing in Visual Studio seems to be fine.) This isn’t really specific to startup tasks, but it’s a likely place to run into this gotcha.

[UPDATE 1/26/2011] If you do File/Save As…, you can click the little down arrow and choose a different encoding. “Unicode (UTF8 without signature) codepage 65001” works fine for saving batch files. Thanks, Travis Pettijohn, for emailing me this tip!

Debug locally with “start /w cmd”

This is a tip Ryan Dunn shared with me, and I’ve found it helpful. If you put start /w cmd in a batch file, it will pop up a new command window and wait for it to exit. This is useful when you’re testing your startup tasks locally, as it gives you a way to try out commands in exactly the same context as the startup task itself. It’s a bit like setting a breakpoint and using the “immediate” panel in Visual Studio.

Remember to take this out before you deploy to the cloud, as it will cause the startup task to never complete.

Make it a background task so you can use Remote Desktop

“Simple” startup tasks (the default task type) are generally what you want, because they run to completion before any of your other code runs. However, they also run to completion before Remote Desktop is set up (also via a startup task). That means that if your startup task never finishes, you don’t have a chance to use RDP to connect and debug.

A tip that will save you lots of debugging frustration is to set your startup type to “background” (just during development/debugging), which means RDP will still get configured even if your startup task fails to complete.

Log to a file

Sometimes (particularly for timing issues), it’s hard to reproduce an error in a startup task. You’ll be much happier if you log everything to a local file, by doing something like this in your startup task batch file:

command1 >> log.txt 2>> err.txt
command2 >> log.txt 2>> err.txt
...

Then you can RDP into the role later and see what happened. (Bonus points if you configure Windows Azure Diagnostics to copy these log files off to blob storage!)

Executing PowerShell scripts

To execute an unsigned PowerShell script (the sort you’re likely to include as a startup task), you need to configure PowerShell first to allow this. In PowerShell 2.0, you can simply launch PowerShell from a batch file with powershell -ExecutionPolicy Unrestricted ./myscript.ps1. This will work fine in Windows Azure if you’re running with osFamily=”2”, which gives you an operating system image based on Windows Server 2008 R2. If you’re using osFamily=”1”, though, you’ll have PowerShell 1.0, which doesn’t include this handy commandline argument.

For PowerShell 1.0, the following one-liner should tell PowerShell to allow any scripts, so run this in your batch file before calling PowerShell:

reg add HKLM\Software\Microsoft\PowerShell\1\ShellIds\Microsoft.PowerShell /v ExecutionPolicy /d Unrestricted /f

(I haven’t actually tested that code yet… but I found it on the internet, so it must be right.)

Using the Service Runtime from PowerShell

In Windows Azure, you’ll find a PowerShell snap-in that lets you interact with the service runtime APIs. There’s a gotcha with using it, though, which is that the snap-in is installed asynchronously, so it’s possible for your startup task to run before the snap-in is available. To work around that, I suggest the following code (from one of my startup scripts), which simply loops until the snap-in is available:

Add-PSSnapin Microsoft.WindowsAzure.ServiceRuntime
while (!$?) {
    sleep 5
    Add-PSSnapin Microsoft.WindowsAzure.ServiceRuntime
}

(That while loop condition amuses me greatly. It means “Did the previous command fail?”)

Using WebPICmdline to run 32-bit installers

[UPDATE 1/25/2011 11:47pm] Interesting timing! WebPI Command Line just shipped. If you scroll to the bottom of the announcement, you’ll see instructions for running it in Windows Azure. There’s an AnyCPU version of the binary that you should use. However, it doesn’t address the problem I’m describing here.

Using WebPICmdline ~~(in CTP now, but about to ship)~~ is a nice way to install things (particularly PHP), but running in the context of an elevated Windows Azure startup task, it has a surprising gotcha that’s difficult to debug. Elevated startup tasks run as NT AUTHORITY\SYSTEM, which is a special user in a number of ways. One way it’s special is that its user profile is under the system32 directory. This is special, because on 64-bit machines (like all VMs in Windows Azure), 64-bit processes see the system32 directory, but 32-bit processes see the SysWOW64 directory instead.

When the Web Platform Installer (via WebPICmdline or otherwise) downloads something to execute locally, it stores it in the current user’s “Local AppData,” which is typically under the user profile. This will end up under system32 in our elevated startup task, because WebPICmdline is a 64-bit process. The gotcha comes when WebPI executes that code. If that code is a self-extracting executable 32-bit process, it will try to read itself (to unzip) and search in the SysWOW64 directory instead. This doesn’t work, and will typically pop up a dialog box (that you can’t see) and hang until you dismiss it (which you can’t do).

This is a weird combination of things, but the point is that many applications or components that you install via WebPICmdline will run into this issue. The good news is that there’s a couple simple fixes. One fix is to use a normal user (instead of system) to run WebPICmdline. David Aiken has a blog post called “Running Azure startup tasks as a real user,” which describes exactly how to do this.

I have a different solution, which I find simpler if all you need to do is work around this issue. I simply change the location of “Local AppData.” Here’s how to do that. (I inserted line breaks for readability. This should be exactly four lines long.)

md "%~dp0appdata"
reg add "hku\.default\software\microsoft\windows\currentversion\explorer\user shell folders"
    /v "Local AppData" /t REG_EXPAND_SZ /d "%~dp0appdata" /f
"%~dp0webpicmdline" /AcceptEula /Products:PHP53 >>log.txt 2>>err.txt
reg add "hku\.default\software\microsoft\windows\currentversion\explorer\user shell folders"
    /v "Local AppData" /t REG_EXPAND_SZ /d %%USERPROFILE%%\AppData\Local /f

[UPDATE 1/31/2011] Prepended the %~dp0 to the webpicmdline call. Sometimes I just put a “cd %~dp0” at the top of the script to avoid gotchas around locations like this.

Use PsExec to try out the SYSTEM user

PsExec is a handy tool from Sysinternals for starting processes in a variety of ways. (And I’m not just saying that because Mark Russinovich now works down the hall from me.) One thing you can do with PsExec is launch a process as the system user. Here’s how to launch a command prompt in an interactive session as system:

psexec –s -i cmd

You can only do this if you’re an administrator (which you are when you use remote desktop to connect to a VM in Windows Azure), so be sure to elevate first if you’re trying this on your local computer.