blog.smarx.com

Steve Marx's blog about cloud development

Windows Azure Worker Role to Deal with Spam

As promised, today I added a worker role to asynchronously process comments and attempt to detect spam, and I invite you to test it out!  See the bottom of this post for details.

Design

Here’s a flow diagram I drew on my whiteboard:

image

The steps are:

  1. A comment comes in via my blog.
  2. The comment gets stored in a Windows Azure table.
  3. A reference to the comment gets stored in a Windows Azure queue.
  4. (Some time later) a worker role picks up the queue item and retrieves the comment from table storage.
  5. The worker talks to TypePad AntiSpam and asks whether the comment is spam or not.
  6. The worker updates the comment table to reflect the result of the spam test.

Note that after step (3), the synchronous portion is done, so the website remains responsive.  (No need to wait for the spam check, which I consider potentially slow, despite it being quite speedy in practice.)  The IsSpam property defaults to false, so the comment shows up right away, providing immediate feedback that comment submission succeeded.

The big advantage to this architecture is the loose-coupling.  Because the spam check is asynchronous, the blog itself can continue to function without it.  That means that if TypePad AntiSpam has downtime (or my worker role has a bug in it), normal use of my blog won’t be disrupted.  It also means that if I later plug in a more sophisticated (and slower) analysis, I don’t have to worry about my comment form responding slowly or my front-end getting bogged down.

I can also scale the roles differently.  I’m using two instances of the web role right now, but there’s no need for more than one worker role, since the incoming rate on comments is less than 50 comments an hour.

Implementation

In my last post, I described the changes I made to my data model and the blog code.  One additional change is a one-liner to enqueue work when a comment has been stored:

    QueueStorage.Create(StorageAccountInfo.GetDefaultQueueStorageAccountFromConfiguration())
        .GetQueue("commentqueue")
        .PutMessage(new Message(string.Format("{0}/{1}", comment.PartitionKey, comment.RowKey)));

The worker role code is quite simple.  To talk to TypePad AntiSpam, I used the Akismet .Net 2.0 API project on Codeplex, with a minor change to point to TypePad AntiSpam instead.  This is nearly all of the code from the worker (omitting the function which converts my comment object to an AkismetComment):

public override void Start()
{
    var q = QueueStorage.Create(StorageAccountInfo.GetDefaultQueueStorageAccountFromConfiguration())
        .GetQueue("commentqueue");
    var akismet = new Akismet("<KEY DELETED>", "http://blog.smarx.com", "blog.smarx/2");
    if (!akismet.VerifyKey())
    {
        throw new ArgumentException("Invalid key.");
    }

    var svc = new BlogDataServiceContext();

    while (true)
    {
        var msg = q.GetMessage();
        if (msg != null)
        {
            var split = msg.ContentAsString().Split('/');
            var partitionkey = split[0];
            var rowkey = split[1];

            var comment = (from c in svc.BlogCommentTable
                           where c.PartitionKey == partitionkey
                            && c.RowKey == rowkey select c).FirstOrDefault();
            if (comment != null)
            {
                var akismetComment = GetAkismetComment(comment);
                if (akismetComment == null)
                {
                    // comment is for a non-existent blog entry
                    comment.IsSpam = true;
                    svc.UpdateObject(comment);
                }
                else if (akismet.CommentCheck(akismetComment))
                {
                    comment.IsSpam = true;
                    svc.UpdateObject(comment);
                }
                svc.SaveChanges();
            }
            q.DeleteMessage(msg);
        }
        else
        {
            Thread.Sleep(1000);
        }
    }
}

Try it!

You can test this out yourself.  Post a comment with the author “viagra-test-123” and watch it disappear within a couple seconds.  (This string is hard-coded in Akismet and TypePad AntiSpam to be a spam indicator.)


Comments

2008-11-26 07:06 GMT
Hi Steve,

A suggestion and a question:
1. Let people enter their email, so you can show their Gravatar.
2. How many instances of the worker role are created by the platform? One? Can it be controlled?

Thanks
Arik
2008-11-26 12:18 GMT
Arik, good idea to enable Gravatars... I'll consider adding that.

I'm only using one instance for my worker role now, but yes, that's configurable in your .cscfg file and can be changed at runtime via the web portal.
2008-11-26 14:07 GMT
Steve,

Great diagram. It seems that people that play with Azure have love for paint diagrams :)

For more Azure paint based diagrams, please check out my blog :)
2008-11-26 19:19 GMT
Chris, I actually drew this on the whiteboard in my office and took a picture. :-)

(Obviously I then imported into Gimp, cropped, and made the background actually look white.)

I'm very proud of my artistic skills. And now I have some inspiration from your blog too! :-)
2008-11-26 20:53 GMT
Excellent, nice technique with the background, must try that myself at somepoint.

Oh yeah, I really like your spam methodology. Much better to clean up spam afterwards and have genuine comments appear straightaway, rather than the other way around.
2009-01-22 22:16 GMT
Hi Steve,

Thanks for sharing that idea ! Interesting use for a worker in the Cloud !

Btw, I'm not skilled enough in Gimp to do what you did with you whiteboard picture. That's why I use scanr.com :-)
2009-08-14 15:40 GMT
Humm... interesting,

I'm only using one instance for my worker role now, but yes, that's configurable in your .cscfg file and can be changed at runtime via the web portal.

Thanks for writing about it
2009-11-08 11:06 GMT
Thanks for sharing that idea
2009-11-23 08:44 GMT
hzp735 http://j8Jw83mNs0doPpsqvjrcns5.info
2009-11-23 08:44 GMT
hzp735 http://j8Jw83mNs0doPpsqvjrcns5.info
2009-11-23 08:45 GMT
hzp735 http://j8Jw83mNs0doPpsqvjrcns5.info
2009-12-01 17:24 GMT
testando.. FUCK!
2009-12-15 05:13 GMT
Thank you for the sensible critique. Me & my neighbour were preparing to do some research about that. We got a good book on that matter from our local library and most books where not as influensive as your information. I am very glad to see such information which I was searching for a long time.This made very glad Smile
2009-12-25 13:54 GMT
Hi! mjBBmh
2010-01-03 18:48 GMT
Excellent, nice technique with the background, must try that myself at somepoint.

Oh yeah, I really like your spam methodology. Much better to clean up spam afterwards and have genuine comments appear straightaway, rather than the other way around.
2010-01-06 16:43 GMT
I'm only using one instance for my worker role now, but yes, that's configurable in your .cscfg file and can be changed at runtime via the web portal.
2010-01-07 08:53 GMT
Thank you for the sensible critique. Me & my neighbour were preparing to do some research about that. We got a good book on that matter from our local library and most books where not as influensive as your information. I am very glad to see such information which I was searching for a long time.
2010-01-08 11:08 GMT
Thanks for sharing that idea ! Interesting use for a worker in the Cloud !
2010-01-13 13:54 GMT
Nothing makes me prouder of the athlete than one who comes to work every day, focused, ... To help the athlete develop the habits of looking up and reacting to where they are ... the most information to the coach on the level of preparation of the athlete.
2010-01-17 02:54 GMT
viagra-test-123
2010-01-17 05:47 GMT
viagra-test-123
2010-01-17 16:37 GMT
I'm only using one instance for my worker role now, but yes, that's configurable in your .cscfg file and can be changed at runtime via the web portal.
2010-01-21 07:47 GMT
This is really interesting. I had to research first what Windows Azure was before I got any of the terms you mentioned. It's good that the chat is open to non-Microsoft employees - how do we know when the next one will be?
2010-01-21 07:47 GMT
Last week we did an interesting experiment called the Windows Azure Lounge. Overall, I think the chat was really interesting, and I believe people found it valuable.
2010-01-21 07:47 GMT
This is really interesting. I had to research first what Windows Azure was before I got any of the terms you mentioned. It's good that the chat is open to non-Microsoft employees - how do we know when the next one will be?
2010-01-21 07:47 GMT
This is really interesting. I had to research first what Windows Azure was before I got any of the terms you mentioned. It's good that the chat is open to non-Microsoft employees - how do we know when the next one will be?
2010-01-21 07:47 GMT
Remember the Windows Azure Lounge? I’ll be hanging out there (maybe with a few coworkers) tomorrow, September 4th, from 12:00 PM to 3:00 PM PDT. Feel free to stop by and chat about whatever’s on your mind. Unlike the first chat, this one will be a little less structured. Just show up and hang out, any time noon to three.
2010-01-23 14:37 GMT
Thank you for the sensible critique. Me & my neighbour were preparing to do some research about that. We got a good book on that matter from our local library and most books where not as influensive as your information.
2010-01-24 05:49 GMT
Thanks for your hard work! I look forward to playing with the rewritten code.
viagra-test-123
2010-01-24 15:41 GMT
buy viagra online @ http://www.someviagra.com
2010-01-27 18:56 GMT
Arik, good idea to enable Gravatars... I'll consider adding that.
2010-01-28 21:50 GMT
LZPWGVDQ
2010-01-29 04:23 GMT
I'm only using one instance for my worker role now, but yes, that's configurable in your .cscfg file and can be changed at runtime via the web portal.
viagra-test-123
2010-02-09 05:27 GMT
s

Add a comment

Your name:
Your URL
(will be linked from your name):
Your comment: