Setup of FuzzyOcr plugin for spamassassin

Every day spammers invent new technics to bypass spam filters. Whereas modern spam filters cope good with different text mail using regex rules and bayesian classifiers, they're useless when spammers send messages with attached image and random, non-spam text in the message body. But solution to this problem is already available! It is FuzzyOcr plugin for spamassassin.

This plugin checks for specific keywords in image/gif, image/jpeg or image/png attachments, using gocr (an optical character recognition program). This plugin can be used to detect spam that puts all the real spam content in an attached image. The mail itself only random text and random html, without any URL's or identifiable information. It also do approximate matches on words, so errors in recognition or attempts to obfuscate the text inside the image will not cause the detection to fail. It can be easely extended, because all words reside in a simple plain text file.

Setup it on Ubuntu takes a few simple steps:
  1. apt-get install gocr netpbm imagemagick libstring-approx-perl
  2. mkdir /tmp/fuzzyocr
  3. cd /tmp/fuzzyocr
  4. apt-get source libungif-bin
  5. wget http://users.own-hero.net/~decoder/fuzzyocr/ {giftext-segfault.patch,fuzzyocr-latest.tar.gz}
  6. patch libungif4-4.1.4/util/giftext.c ./giftext-segfault.patch
  7. cd libungif4-4.1.4
  8. dpkg-buildpackage -rfakeroot -us -uc
  9. cd ..
  10. dpkg -i libungif4g_4.1.4-1_i386.deb libungif-bin_4.1.4-1_i386.deb
  11. tar xzf fuzzyocr-latest.tar.gz
  12. mkdir -p /usr/local/lib/site-perl
  13. cp FuzzyOcr-2.3b/FuzzyOcr.pm /usr/local/lib/site-perl
  14. cp FuzzyOcr-2.3b/FuzzyOcr.cf /etc/spamassassin
  15. cp FuzzyOcr-2.3b/FuzzyOcr.words.sample /etc/spamassassin/FuzzyOcr.words

Steps 4-10 required only because segfault was discovered in giftext utility from that package. Now all you have to do is to enable FuzzyOcr plugin in spamassassin and tweak your word list.

Edit /etc/spamassassin/FuzzyOcr.cf and:
  • Remove loadplugin FuzzyOcr FuzzyOcr.pm
  • Set focr_pre314 to 1.
  • Set focr_logfile to /var/log/FuzzyOcr.log

Add following line to /etc/spamassassin/v312.pre:

  • loadplugin FuzzyOcr /usr/local/lib/site_perl/FuzzyOcr.pm

To test FuzzyOcr plugin you can use image spam message samples in FuzzyOcr-2.3b/samples:

[denis@sun:test]$ spamassassin -t FuzzyOcr-2.3b/samples/animated-gif.eml
...
...
Content analysis details:   (24.4 points, 5.0 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
 0.8 EXTRA_MPART_TYPE       Header has extraneous Content-type:...type= entry
 0.7 DATE_IN_PAST_06_12     Date: is 6 to 12 hours before Received: date
 2.8 TVD_FW_GRAPHIC_ID1     BODY: TVD_FW_GRAPHIC_ID1
 0.0 HTML_MESSAGE           BODY: HTML included in message
  20 FUZZY_OCR              BODY: Mail contains an image with common spam text inside
                            Words found:
                            "alert" in 4 lines
                            "charts" in 1 lines
                            "symbol" in 1 lines
                            "alert" in 4 lines
                            "stock" in 2 lines
                            "company" in 3 lines
                            "trade" in 1 lines
                            "meridia" in 1 lines
                            "growth" in 1 lines
                            (18 word occurrences found)

7 responses to «Setup of FuzzyOcr plugin for spamassassin»

 Maciej Sołtysiak commented, on January 22, 2007 at 7:44 p.m.:

Cool, thanks to this I managed to fix my fuzzyocr setup :-)
Thanks. However, I am using the new branch. v2.3 is deprecated.

 mike commented, on February 22, 2007 at 3:31 p.m.:

Set focr_pre314 to 1 ONLY if you are running a version < 3.1.4.

(from FuzzyOCR manual)

mike

Post a comment