Talk:Bayesian message classification

From Citizendium
Jump to: navigation, search
This article is basically copied from an external source and has not been approved.
Main Article
Definition [?]
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
To learn how to update the categories for this article, see here. To update categories, edit the metadata template.
 Definition Please add a brief definition or description.
Checklist and Archives
 Workgroup category Computers [Editors asked to check categories]
 Talk Archive none  English language variant Not specified
To do.

  • A variant of English needs to be assigned.
Metadata here

It's more complicated than that.

This article gives the impression that Bayesian spam filtering is done in a particular way, i.e. by treating probabilities of each word independently. That is not the only possible way to do Bayesian spam filtering, and I don't think it's the way it's usually (or always) done. Another way is to look at probabilities of phrases. Yet another way is to look at probabilities of certain combinations of words (regardless of where in the article the word appears). For example, the word "interest" might not by itself increase the spam score (or not much), but if it appears in the same message as "mortgage" and "house" it might add significantly to the probability of the message being classified as being about mortgages, and then get a Bayesian spam score based on the user's previous reactions to other messages about mortgages. In other words, it can be done in two steps, using Bayes' theorem at each step. --Catherine Woodgold 21:22, 2 May 2007 (CDT)

Why is this article marked external?

If it's from another source, then what is it? Greg Woodhouse 12:10, 30 June 2007 (CDT)