Help answer threads with 0 replies.
Go Back > Forums > Linux Forums > Linux - Security
User Name
Linux - Security This forum is for all security related questions.
Questions, tips, system compromises, firewalls, etc. are all included here.


  Search this Thread
Old 01-22-2007, 07:18 PM   #1
Registered: Sep 2004
Location: Des Moines, Iowa
Distribution: Slackware, Mandriva, Debian derivatives, +BSD/ Solaris/ Minix/ plan9/ GNU/HURD...
Posts: 185

Rep: Reputation: 31
Anybody else getting 'alphabet soup' comment spam?

This is by far the most puzzling spam case I've ever seen. It's a comment spammer who hits an average of two times per day and leaves a comment on my CAPTCHA-protected blog. Here is a typical example:

NAME:    bibqslpm
MESSAGE:  mssmlgmz auuhhidm nyziwkeg  {URL=}wcrynwpa{/URL}  <a href="">cocdtpwi</a> 
DATE:    1/22/07
TIME:    16:24:08
The curly brackets are really square brackets, but I change them so the forum here doesn't parse them.

Don't bother checking anything there, because it's all 100% bogus every time. Random IPs from all over the world, URLs that don't even exist, random times and 'alphabet soup' in all fields. No sense, no logic, different letters every time. Sometimes without the BB code URL.

I've tried Googling for 'random letter spam' and other catchy-sounding phrases, but no real luck. I've only found one other site discussing this, and they're baffled as well!

Not to mention the CAPTCHA system. It's been 100% effective so far in stopping all bots.

So, anybody else suggest where to look to find info on this? I'm sure a sysadmin or two out there must know what this is. I have a couple of ideas how to stop it (PHP checking the validity of the URL, screening the BB code systax, etc.), but nothing that couldn't be gotten around. Right now, I'd be happy to merely know the purpose of it.
Old 01-23-2007, 12:03 AM   #2
LQ Addict
Registered: Jul 2002
Location: East Centra Illinois, USA
Distribution: Debian stable
Posts: 5,908

Rep: Reputation: 356Reputation: 356Reputation: 356Reputation: 356
Yes. I get several per day. The best I've been able to come up with is that there is frequently something in common, a string of two or three letters, that I can focus on.

So, in my /etc/mail/spamassassin/, I blacklist based on the common part of the URL. Something like:

blacklist_from *@*yq*

flags all those with yq in common. There are a few others with common strings in them. Spamassassin has thus far been very good at flagging them, and postfix puts then in the spam folder.

Now if I could just figure out how to make procmail/postfix just delete the damn things and leave the once-in-a-while kind of spam to be sorted out.
Old 01-27-2007, 06:14 AM   #3
Registered: Sep 2004
Location: Des Moines, Iowa
Distribution: Slackware, Mandriva, Debian derivatives, +BSD/ Solaris/ Minix/ plan9/ GNU/HURD...
Posts: 185

Original Poster
Rep: Reputation: 31
My progress so far:

This appears to be a different program from the one bigrigdriver reported, as only once in several examples did the 'yq' combination pop up.

I have, however, found other patterns. For one, all of the random strings are 8 letters long! I've saved a log file of each incident, and when I strip out HTML tags and the "http://" and ".com" parts, I end up with neat columns of 8-letter strings.

Furthermore, I wrote a letter-frequency script. I might as well post:


for LETTER in a b c d e f g h i j k l m n o p q r s t u v w x y z;
  LETTER_COUNT=$(cat $1 | tr [A-Z] [a-z] | grep -o $LETTER | wc -l)
  echo $LETTER_COUNT ":" $LETTER >> temp

cat temp | sort -gr > frequency_count
rm temp

exit 0
and running it on the file gives me a startlingly different pattern from normal English usage. The 10 most frequent letters by the spambot are bwagpmjizl, as opposed to the standard English distribution of etaoinshrd.

So this gives me three ways to kick it out (and I can hard-code them into my PHP comment script, so they get rejected on the posting attempt): (1) Check the URL and make sure it exists, (2) check for all fields to show the 8-letter word pattern, (3) check for letter-frequency. On the letter-frequency part, I hope to create some kind of "fuzzy-match" method, because I don't want to block legitimate users with bad spelling (or 'AOLbonics')... but having the letter 'z' in the top ten letters is of course a dead giveaway!

I hope not to just solve this problem, but come up with a general method for kicking out random-letter spam of all kinds, since I bet that these will become more frequent as a means to undermine Bayesian filtering.


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
spam filter that puts spam into spam folder? paul_mat Linux - Software 3 03-31-2009 04:18 AM
gcompris rocks! missing assetml-voices-alphabet-en soylentgreen Linux - Software 0 09-18-2006 10:37 AM
Fonts: phonetic alphabet letters problem tramni1980 Slackware 1 08-19-2006 05:33 AM
Why do Web addresses and files always use latin alphabet? General General 1 07-04-2006 11:25 AM
LXer: The Story of Stone Soup LXer Syndicated Linux News 0 01-14-2006 07:46 AM > Forums > Linux Forums > Linux - Security

All times are GMT -5. The time now is 05:23 PM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration