What Makes Naive Bayes Classification So Naive?  How Does Naive Bayes Classifier Work
If you logically segment "Naive Bayes Classification" into words then you will end up with two terms that makes more sense independently:
 Naive
 Bayes Theorem
So, if we go through these two terms separately and then combine the concepts into one, we can understand Naive Bayes Classification so easily, isn't it?
"Naive"?
 Inexperienced
 Just like a naive (inexperienced) little child who makes some assumptions that are not completely true.
So, what is this assumption all about?
 "Independent assumption of predictor"
 i.e. presence of a particular feature in a class is unrelated to the presence of any other features.
How does Naive Bayes' Classifier differ from other classifiers?
 A big difference!
 Unlike many other classifiers which assume or find for some correlation among features, Naive Bayes classifier completely abandon the concept of correlation.
 Seems illogical, doesn't it?  The reason we call it "Naive"
We are in a probabilistic world!
Let's give this "independent assumption" a mathematical shape: If two events A and B are independent then P(A AND B) = P(A) x P(B)
e.g.
Assumption: If the guy name is Vito Corleone AND if he is from Sicily then he is The Godfather.
Independent assumption: P(The guy is The Godfather) = P(The guy name is Vito Corleone) x P(The guy is from Sicily)
Let's say, you see a guy and you are 80% sure that his name is Vito Corleone AND 90% sure that he is from Sicily. So, with our "independent assumption", now calculate what are the chances that he is The Godfather?
P(The guy name is Vito Corleone) = 0.8, P(The guy is from Sicily) = 0.9
or
P(The guy is The Godfather) = P(The guy name is Vito Corleone) x P(The guy is from Sicily)
P(The guy is The Godfather) = 0.8 x 0.9 = 0.72
There are 72% chances that he is The Godfather! Respect!
Bayes Theorem?
 Probability of an event given that another even has already been occurred.
 Example: Probability of occurrence of Event2 given that another event Event1 has already been occurred or in mathematical language find P(Event2  Event1).
Already Given Information
 P(Event1) = 30%, P(Event2) = 20%
 P(Event1  Event2) is probability of Event1, given that Event2 already occurred = 45%
so
P(Event2  Event1) = P(Event1  Event2) x P(Event2) / P(Event1)
P(Event2  Event1) = 0.45 x 0.2 / 0.3 = .30
Time to combine the concepts of Naive Independent assumption and Bayes Theorem into one
 Lets forecast rain for tomorrow
 Predictors / Independent variables: Humidity (H), Atmospheric Pressure (AP)
 Probability that rain will happen ( Rain = Yes ) if
 Humidity (H) = Yes
 Atmospheric Pressure (AP) = Low

According to Bayes' Theorem 
P(Rain = Yes  (H = High, AP = Low)) =
P((H = High, AP = Low)  Rain = Yes) x P (Rain = Yes) / P(H = High AND AP = Low)
According to independent assumption of predictors 
P(H = High AND AP = Low) can also be written as P (H = High ) x P(AP = Low)
and
P((H = High, AP = Low)  Rain = Yes) can also be written as P(H = High  Rain = Yes) x P(AP = Low  Rain = Yes)
After combination of both the concept 
P(Rain = Yes  H = High, AP = Low) =
P(H = High  Rain = Yes) x P(AP = Low  Rain = Yes) x P (Rain = Yes)) / P (H = High ) x P(AP = Low)
Similarly, we can calculate for different options e.g. P(Rain = No  H = High, AP = Low) or P(Rain = Yes  H = low, AP = Low) or P( Rain = Yes  H = low, AP = High) and so on.
 The option with the maximum probability would be a right classifier.
How does the Naive Bayes Classifier perform so well with a wrong assumption?
 The independent assumption of predictor limits it significantly, and as a result this makes it very less prone to get stuck in Local Minima.
 Since predictors are independent to each other, the interactions among them are not modeled, so relatively it needs less training data. As a result, it performs well even with small datasets as well as with missing data too.
 Because it employs a very simple hypothesis function, it exhibits a very high bias but relatively low variance, that prevents it from overfitting to its training data.
 It’s not sensitive to irrelevant features.
When to use Naive Bayes Classifier for classification?
 It performs magnificently in multiclass prediction.
 It also works so well in text classification.
 Spam Filtering
 Sentiment Analysis
 Recommendation System  Naive Bayes Classifier along with Collaborative Filtering make a great Recommendation System.
 You may prefer it when you require less model training time. It's fast. So fast.
 Unlike Neural Network and other classifiers, it is not black box. It is so easy to understand and interpret.
Since Naive Bayes Classification is based on probabilistic approach, the concept and its application is never gonna fade away.
Don't you agree with me? Do you? Much appreciated if let me know your thoughts in the comments below. Please do share the post with your friends as well!
 It’s not sensitive to irrelevant features.
When to use Naive Bayes Classifier for classification?
 It performs magnificently in multiclass prediction.
 It also works so well in text classification.
 Spam Filtering
 Sentiment Analysis
 Recommendation System  Naive Bayes Classifier along with Collaborative Filtering make a great Recommendation System.
 You may prefer it when you require less model training time. It's fast. So fast.
 Unlike Neural Network and other classifiers, it is not black box. It is so easy to understand and interpret.
Since Naive Bayes Classification is based on probabilistic approach, the concept and its application is never gonna fade away.
Don't you agree with me? Do you? Much appreciated if let me know your thoughts in the comments below. Please do share the post with your friends as well!
can we use naive bayes for regression?
ReplyDeleteIn regression models, I believe you can just use the bayesian evaluation as an addition feature.
Deleteshould i need to make any additional assumption of distribution such as normality?
Deleteexcellent explanation! for naive bayes, first time ever i see my logic working in a sequence way.
ReplyDeleteExplanation using Vito Corleone?
ReplyDeleteRespect!