-
Notifications
You must be signed in to change notification settings - Fork 0
Bayes rule
Bayes rule is one of the most important formullas.
Read
The formula in English:
Bayes Rule The probability of a certain class given a datapoint or document is equal to the probability of a class given the data point multiplied by the probability of the class devided by the probability of the datapoint/document.
In order to assign a class to a data point we consider 3 things:
- p(d|c) The probability that a data point is a certain class. Likelyhood.
- p(c) The probability that a certain class appears (regardless of the class). Prior.
- p(d) The probability that a certain data point appears (regardless of the datapoint).
- You can also express 2 as what we expect about the class. In some situations some things are expected to occure much more often. This is the prior. Your preconceptions.
- You can also express 1 as what the evidence says about how likely a class is based on a datapoint. This is the likelyhood. This encodes the evidence that an instance has been generated by a given class.
This serves as model of how to think. You have your bias where you except things. This is p(c) or the prior. You also have the evidance you have. If you are super convinced of something you might discard evidence but if you only look at the evidence (fequentist approuch) you discard the imporant factor of what you believe now.
So when they tried to prove that the sun is the middle of the solar system the belief that the earth was in the middle was so large that the evidance was almost not enough.
This is how likely it is that you have an example of class c given the instance you are seeing. This is what we are after. You get this when running the formula.
Naieve bayes than works by calculating like this:
$$C = \text{argmax}{c \in C} P(c) \prod{f \in F} p(f|c)$$
So here you take as your prediction the max posterior probability of the top part of the bayes calculation which is the product of the probability of the features given the class.
Because you multiply a lot of probabilites computers are bad at this and causes underfolowing
So instead we do we can move from probability space to log space and when we do that you have to move to sum instead of multiplication:
$$C = \text{argmax}{c \in C}~log(p(c)) \sum{f \in F} log(p(f|c))$$
You leave out the bottom devision because you assume indepence. This way the probabilty of a point appearing is always 1. X devided by 1 is X.
Although I have tried my best to make sure this summary is correct, I will take no responsibility for mistakes that might lead to you having a lower grade.
If you see anything that you think might be wrong then please create an issue on the Github repository or even better, create a pull request 😄
Do you appreciate my summaries, and you want to thank me then you can support me here:
Every model is wrong, but some models are usefull.