I am watching lectures on Natural Language Processing. In the week 3 lectures, professor talks about text classification. I found writing the formulas in words helps.
Given a training set, classify test set into a class
We need to calculate two probabilities Say, we have a training set of 5 documents , with 3 documents of class 'A', and 2 documents of class 'B'. 1) Probability of a class A, given a training set = number of documents classified as 'A'\total number of documents in training set 2) Probability of each word in the vocabulary
Then, we tackle the test set. And, figure out which class is proportionally having maximum probability. How ? 1 ) Use the prior probability of a class, say 'A' 2) And take each word in the test set, use the corresponding probability of the word in that class('A') ^ frequency word in the test set 3) And, just multiply...