Anybody else getting 'alphabet soup' comment spam?

Hosiah · 01-22-2007, 07:18 PM

This is by far the most puzzling spam case I've ever seen. It's a comment spammer who hits an average of two times per day and leaves a comment on my CAPTCHA-protected blog. Here is a typical example:

Code:

NAME:    bibqslpm
URL:     http://ajmxwkmw.com
MESSAGE:  mssmlgmz http://kvkmgblm.com auuhhidm nyziwkeg  {URL=http://yfajajbf.com}wcrynwpa{/URL}  <a href="http://iodvknjr.com">cocdtpwi</a> 
IP:      219.14.96.8
DATE:    1/22/07
TIME:    16:24:08

The curly brackets are really square brackets, but I change them so the forum here doesn't parse them.

Don't bother checking anything there, because it's all 100% bogus every time. Random IPs from all over the world, URLs that don't even exist, random times and 'alphabet soup' in all fields. No sense, no logic, different letters every time. Sometimes without the BB code URL.

I've tried Googling for 'random letter spam' and other catchy-sounding phrases, but no real luck. I've only found one other site discussing this, and they're baffled as well!

Not to mention the CAPTCHA system. It's been 100% effective so far in stopping all bots.

So, anybody else suggest where to look to find info on this? I'm sure a sysadmin or two out there must know what this is. I have a couple of ideas how to stop it (PHP checking the validity of the URL, screening the BB code systax, etc.), but nothing that couldn't be gotten around. Right now, I'd be happy to merely know the purpose of it.

bigrigdriver · 01-23-2007, 12:03 AM

Yes. I get several per day. The best I've been able to come up with is that there is frequently something in common, a string of two or three letters, that I can focus on.

So, in my /etc/mail/spamassassin/local.cf, I blacklist based on the common part of the URL. Something like:

blacklist_from *@*yq*

flags all those with yq in common. There are a few others with common strings in them. Spamassassin has thus far been very good at flagging them, and postfix puts then in the spam folder.

Now if I could just figure out how to make procmail/postfix just delete the damn things and leave the once-in-a-while kind of spam to be sorted out.

Hosiah · 01-27-2007, 06:14 AM

My progress so far:

This appears to be a different program from the one bigrigdriver reported, as only once in several examples did the 'yq' combination pop up.

I have, however, found other patterns. For one, all of the random strings are 8 letters long! I've saved a log file of each incident, and when I strip out HTML tags and the "http://" and ".com" parts, I end up with neat columns of 8-letter strings.

Furthermore, I wrote a letter-frequency script. I might as well post:

Code:

#!/bin/bash

for LETTER in a b c d e f g h i j k l m n o p q r s t u v w x y z;
do
  LETTER_COUNT=$(cat $1 | tr [A-Z] [a-z] | grep -o $LETTER | wc -l)
  echo $LETTER_COUNT ":" $LETTER >> temp
done

cat temp | sort -gr > frequency_count
rm temp

exit 0

and running it on the file gives me a startlingly different pattern from normal English usage. The 10 most frequent letters by the spambot are bwagpmjizl, as opposed to the standard English distribution of etaoinshrd.

So this gives me three ways to kick it out (and I can hard-code them into my PHP comment script, so they get rejected on the posting attempt): (1) Check the URL and make sure it exists, (2) check for all fields to show the 8-letter word pattern, (3) check for letter-frequency. On the letter-frequency part, I hope to create some kind of "fuzzy-match" method, because I don't want to block legitimate users with bad spelling (or 'AOLbonics')... but having the letter 'z' in the top ten letters is of course a dead giveaway!

I hope not to just solve this problem, but come up with a general method for kicking out random-letter spam of all kinds, since I bet that these will become more frequent as a means to undermine Bayesian filtering.