Forums / Suggestions / Bayesian spam filter for comments, feedbacks form ...

Bayesian spam filter for comments, feedbacks form ...

Author Message

Quoc Huy Nguyen Dinh

Friday 16 April 2010 7:45:54 am

I'm working on an extension to auto filter comments, feedbacks form and other user postings based on a Bayesian spam algorithm.

The filter is getting more accurate as you teach it what is spam and what is ham. So it can be a bit random on first times.

Was wondering if the community finds this interesting. If so I would share it when ready.

Nicolas Pastorino

Monday 19 April 2010 12:07:20 am

Hi !

this is definitely interesting! Do you have technical details on this solution ?

Cheers !

--
Nicolas Pastorino
Director Community - eZ
Member of the Community Project Board

eZ Publish Community on twitter: http://twitter.com/ezcommunity

t : http://twitter.com/jeanvoye
G+ : http://plus.tl/jeanvoye

Quoc Huy Nguyen Dinh

Wednesday 21 April 2010 4:18:37 am

I won't be re-inventing the wheel here.

There are several PHP classes that do this. My plan is to use the following library for the extension:

http://www.phpclasses.org/package/4236-PHP-Detect-spam-in-text-using-Bayesian-techniques.html

It is using a DB table to store data from what the script is learning.

My extension would have:

  • a module that loads up all comments / feedbacks (should be customizable) and allow you to mark them as SPAM or HAM and send for learning.
  • a workflow event that would analyze each post of a comments / feedbacks (customizable) against the base.
  • a way to moderate messages marked as spam/ham
  • the DB table would be modified to allow a different base for each siteaccess.

Sebastiaan van der Vliet

Wednesday 21 April 2010 5:03:23 am

Why not use Akismet? http://code.google.com/p/ezakismet/ & http://akismet.com/

Certified eZ publish developer with over 9 years of eZ publish experience. Available for challenging eZ publish projects as a technical consultant, project manager, trouble shooter or strategic advisor.

Quoc Huy Nguyen Dinh

Thursday 22 April 2010 6:31:14 am

Excellent. That might save me hours of dev time :-D

Will test it, thanks for sharing