Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is anyone else peeved at the fact that SpamAssassin is still the de-facto standard for spam filtering?

Works pretty well for post-queue but as soon as you try to pre-queue filter anything you're in deep trouble.

Has anyone tried compiling SpamAssassin or writing a faster version of it in another language? It was a long time since I played with perllibs but I seem to remember being able to load perl code into c programs.



    >Has anyone tried compiling SpamAssassin or writing a faster version of it in another language?
How fast to you want it? I've been pumping roughly 20,000 emails per day through Spamassassin via maia[0], on a relatively moderate VPS. greylisting in front of it handles a huge portion of the load.

[0] https://github.com/technion/maia_mailguard

Edit: Been a while since I checked. Actually yesterday's number was 98,000.


It's been made clear to me now that I need to look closer at greylisting in postfix before mails are passed to the proxy_filter.

But if you're interested I'm talking around 62k mails a day. My experience with this amount has led me to use 64G RAM on each MX but that's only to handle a certain incident with very high load. Usually RAM usage is much lower than 64G and there's plenty of IO cache available.


Spamassassin is old and outdated. It's like Apache and rspamd is like nginx. Give it a shot.


I have used assp for a few years, it's tricky to setup but does a great job once you get there. http://sourceforge.net/projects/assp/


http://dspam.nuclearelephant.com/

It's a bit picky and horribly documented, but extremely fast and low on resources.


Except database size. Will grow to gigs and gigs and gigs and gigs. Dspam can be very difficult to manage with a 400gb token database when you have a large system. ( last version I used was 3.9 maybe they have improved this?)


I use SpamAssassin in a prequeue SMTP proxy and it works pretty well, but it's nearly the last check after many other lightweight tests.


Greylisting is the #1 standard for spam filtering.


Um ... maybe. I run a greylist daemon and I wrote about its effectiveness a few months ago (http://boston.conman.org/2015/04/12.1), which also goes into how effective SPF would be had I actually used it in accepting incoming email.

I also wrote about the effectiveness of the various blacklists out there as well (http://boston.conman.org/2015/05/11.1).


(The TL,DR of the following: SpamAssassin still works very well, and can be used on the incoming SMTP connection just fine ("pre-queue" as you prefer to call it).)

I used SpamAssassin ca. 2000-2008, then moved to Gmail, and recently went back to maintaining my non-@gmail.com addresses with my own mail setup, again with SA. In the last 30 days I got 43 spams delivered to my spam folders, 16 spams delivered to non-existing addresses at my domains (captured to prevent backscatter), and 101 spams rejected right away on the SMTP level due to high enough score, and IIRC zero delivered to my inbox (IIRC even across the last 2-3 months). I'm not sure why so few spams are even being sent to my handful of non-@gmail.com addresses, I've been using some of them on mailing lists for about 8 years now, too. (My single @gmail.com address gets about 40 spams in the Gmail spam folder per day.) I got 2 false "half-positives" in the last 30 days from the same company before training them (and 0 fp to the spam folder); I say half-positives sine I'm filtering mails with a low enough score to a "possible spam" folder, with the idea of reducing the amount of work for checking.

It took some effort to get everything working well (not the fault of SpamAssassin, except for pulling some hair about its relatively messy setup (quite complex and not very clean, so needs a calm mind when doing a non-standard setup)). I wrote some software[0] for this, though some people will shake their heads that I'm using djbdns and Qmail (my motivation for the latter is that I know its workings and that it's the original backend for qpsmtpd).

[0] https://github.com/pflanze/better-qmail-remote, https://github.com/pflanze/mailmover, (and https://github.com/pflanze/tinydns-scm for generating tinydns configurations programmatically using Scheme, including SPF records, once I get around cleaning up the code and pushing it here)

> as soon as you try to pre-queue filter anything you're in deep trouble.

I'm using qpsmtpd as the incoming SMTP server, which is also written in Perl, but it doesn't matter, as SpamAssassin offers a daemon approach (spamd) which even qpsmtpd uses, thus you just execute the small "spamc" program and pass the mail on stdin (or use a library that reimplements the protocol in the language of choice). Given this I see no reason to be in deep trouble, and rejecting spams during the SMTP stage works just fine for me.

PS. yes, you can also embed Perl in a C program, but why deal with the complexities of mixing languages in the same process when scanning a mail takes several seconds of real time and a sizable fraction of a second of CPU time, thus the IPC overhead is completely negligible.


Thanks for the extensive response but with around 62k mails a day, on each MX, I think pre-queue filtering requires too much RAM.

It's been made clear to me, thanks to this thread, that I first and foremost need to explore greylisting in postfix before mail is passed to the proxy_filter. Secondly that I should explore dspam and rspamd as faster alternatives to SA.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: