Multinomial Naïve Bayes

2012-04-09T19:00:05+00:00

I am watching lectures on Natural Language Processing. In the week 3 lectures, professor talks about text classification. I found writing the formulas in words helps.

Given a training set, classify test set into a class

We need to calculate two probabilities Say, we have a training set of 5 documents , with 3 documents of class 'A', and 2 documents of class 'B'. 1) Probability of a class A, given a training set = number of documents classified as 'A'\total number of documents in training set 2) Probability of each word in the vocabulary

Vocabulary (V) - unique words in the training set
Assume there are 3 words in training set hello, world,goodbye
All documents for class 'A' are merged and same for class 'B'
Probability of a word 'hello' given class 'A' = number of times word 'hello' occurs in documents classified as 'A' + 1 \total words in documents classified as 'A' + V

Then, we tackle the test set. And, figure out which class is proportionally having maximum probability. How ? 1 ) Use the prior probability of a class, say 'A' 2) And take each word in the test set, use the corresponding probability of the word in that class('A') ^ frequency word in the test set 3) And, just multiply...