Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Am I the only one wondering, why wouldn't Craigslist use a bayesian filter? Plus posting delay so they can't retry quickly.


The problems I see with a Bayesian classifier in this application are:

1. Spammers get instant feedback on whether their post was flagged. When they send me email spam, for example, they have no idea how badly they're doing.

2. Spammers have access to a large ham database to draw from -- all the posts that were not filtered. With email, they never see any user's highly individualized ham database.

3. Bayesian classifiers require a significant amount of manual intervention, especially given 1 and 2. I'm happy to correct classification errors in my email when they happen at a rate of roughly 1/200,000, but if I had to stay on top of dozens of false negatives and false positives every day, I just wouldn't. Very quickly the database would become too noisy to be of much use.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: