# How to implement Naive Bayes from scratch with Python

## Метаданные

- **Канал:** AssemblyAI
- **YouTube:** https://www.youtube.com/watch?v=TLInuAorxqE
- **Дата:** 17.09.2022
- **Длительность:** 14:37
- **Просмотры:** 44,967

## Описание

In the 6th lesson of the Machine Learning from Scratch course, we will learn how to implement the Naive Bayes algorithm.

You can find the code here: https://github.com/AssemblyAI-Examples/Machine-Learning-From-Scratch

Previous lesson: https://youtu.be/kFwe2ZZU7yw
Next lesson: https://youtu.be/Rjr62b_h7S4

Welcome to the Machine Learning from Scratch course by AssemblyAI.
Thanks to libraries like Scikit-learn we can use most ML algorithms with a couple of lines of code. But knowing how these algorithms work inside is very important. Implementing them hands-on is a great way to achieve this. 

And mostly, they are easier than you’d think to implement.

In this course, we will learn how to implement these 10 algorithms.
We will quickly go through how the algorithms work and then implement them in Python using the help of NumPy.

▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬

🖥️ Website: https://www.assemblyai.com/?utm_source=youtube&utm_medium=referral&utm_campaign=scratch06
🐦 Twitter: https://twitter.com/AssemblyAI
🦾 Discord: https://discord.gg/Cd8MyVJAXd
▶️  Subscribe: https://www.youtube.com/c/AssemblyAI?sub_confirmation=1
🔥 We're hiring! Check our open roles: https://www.assemblyai.com/careers

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

#MachineLearning #DeepLearning

## Содержание

### [0:00](https://www.youtube.com/watch?v=TLInuAorxqE) <Untitled Chapter 1>

welcome to another video of the machine learning from scratch course presented by assembly AI in this series we Implement popular machine learning algorithms using only buil in Python functions and numpy in this lesson we learn about naive Bay as always we start with a short Theory section and then we jump to the code so let's get started so naif Bay is a probabilistic classifier

### [0:20](https://www.youtube.com/watch?v=TLInuAorxqE&t=20s) Naive Bayes

based on applying B theorem with strong also called naive Independence assumptions between the featur so let's learn about the B theorem first

### [0:34](https://www.youtube.com/watch?v=TLInuAorxqE&t=34s) Bayes' Theorem

it says that the probability of an event a given another event B can be calculated as the probability of B given a times the probability of a divided by the probability of B so if we transfer this to our case with class labels and features then we can say that the probability of Y given X is the probability of x given y times the probability of y divided by the probability of X and in this case y are the class labels that we want to predict and X is the feature Vector so then we do the assumption that the features are mutually independent for example if we want to predict if someone takes the bus or walks and we have two features if it's raining or not and the distance to the destination so then we make the assumption that these two features are independent and in reality often this is not the case but this assumption still works really well for this classifier that's also why we say this is a naive assumption so if we make this assumption we can split this part here P of x given Y into the different components and say this is the product so P of X1 given y * P of X2 given Y and so on and these are all the single feature Vector components and these probabilities here are easier to set up so now we want to select the class label why so

### [2:04](https://www.youtube.com/watch?v=TLInuAorxqE&t=124s) Select class with highest posterior probability

we want to select the class with the highest posterior probability so this P of Y given X is also called the posterior and this is the formula we've just seen so now we want to select y as the arc Max of the posterior and then we can simplify this a little bit so we can first get rid of P of X because this depends not on Y at all so just throw this away and then we also apply a little trick so all these probabilities here are values between 0er and one and if we multiply this then the number can become very small and we can run into inaccuracies so for this we apply a little trick instead of the product we do a sum and then we apply the logarithm so if you apply the logarithm we can change the product with a sum and then this is the final formula to get Y and now we need to know how we can calculate P of Y and also this P of x given y so these are called the prior and class conditional so first P of Y is called the prior probability and we can simply calculate this as the frequency of each class so we count how often this class label occurs and then the P of x i given Y is known as the class conditional probability and for this we model this with a gion distribution so here this is the formula where we have the mean and the standard deviation or then squared this is the variance so here we can see a plot of different gion distributions for different means and standard deviations so yeah this is often a good choice to model probabilities and yeah this is all that we need to code this up so let's summarize the different steps in the training step we calculate the mean the variance and the prior so the frequency for each class with our training set and then in the prediction step we calculate the posterior for each class with the formula that we've just seen and here we also plug in the G and formula for these probabilities and then we simply choose the class with the highest posterior probability and that's it so let's jump to the code so first let's import numpy of course and then we want to create a class that we call naive B and here we don't need an init function because we don't have any parameters to configure this so instead we want a fit method which gets the training samples and the training labels and then we also want a predict method with self and the test samples so let's start with fit so here we want to get the number of some examples and the number of features first and we have this assumption that X and Y are already numpy and D arrays so this is already in the correct format and then we can extract this by saying x. shape with a small s x. shape then let's get the number of unique classes and we store this in self doore classes equals numpy unique with Y and then we get the number of different classes by saying the length of self doore classes and now the first thing we want to do is we want to calculate the mean the variance and the prior for each class so for this let's initialize these with zeros first so we say self doore or mean equals numpy do Zer and now as a shape we want to have n classes times n features and then as data type we can also say this is a numpy float 64 so this is the default but just to make this more clear that we work with floats here and then let's copy this one time so we also do the same for the variance so we say self. VAR equals this and then also for the prior so we say self. priors but here we only want n classes so for each class we want to have a prior and now we want to calculate these so we say four index and Class C in enumerate self doore classes then we only want to get the samples of this class so XC equals x where y equal equals c and then we want to calculate the mean variance and prior and assign this so we say self. uncore mean and then of the current index so for this class and then for all the columns so for all features we can say this is simply x c do mean along AIS equal Z and then we do the same for the um variance so we also say self. VAR and here we can apply x. VAR so these are built in numpy functions and then for the priors we say self doore prior and of this in index and this is X C do shape Z so this is the number of the um how many samples we have divided by the as float the number of total samples and now these are the priors so now this is all that we have to do in the fit method so let's go on with predict so here let's say a y prediction equals and then we use list comprehension and use a helper function underscore predict where we only put in one small feature component X for small X in large x and then we want to return this as numpy array so let's say numpy array of Y predict and then we create this underscore predict which gets self and only a small X so only one component and here we want to calculate the posteriors so let's initialize this with an empty list and then let's say um let's write a comment so we want to culate the posterior probability for each class so we say for index and C in EN numerator self doore classes so the same enumeration that we do here and then let's calculate this so we say first the prior and let's go back to the formula so here we have this so we have the logarithm of the prior plus the logs of all the class conditionals so for this we say numpy lock and then we can simply access this so we already calculated the priors and then of this current class index and then for the posterior we say this is um numpy sum over numpy loog and then here we want to apply the gion distribution so for this we create a helper function that we call Self doore PDF for probability density function and then this should get the index and the uh and x and then we have the posterior so now we want to add this to the prior so we say posterior equals posterior plus prior and then we append this to the list so we say posteriors do append the current posterior and then in the end we want to return the class with the highest posterior so here we say return and then self. uncore classes off and now here we want to say numi arcmax of the posteriors and this is all that we need for predict so now we only need the probability density function so let's say self Define underscore PDF this gets self then it gets the class index and x and here let's have a brief look at the formula again so this here is the formula with the means and the variances and let's split this into denominator and numerator so let's first get the mean so we say mean equals self doore mean of this class index and the same for the variance so we say VAR equals self doore VAR of this class index and then we say the numerator equals and now we apply this so we say numpy this is the exponential function of minus and then we say x minus the mean to the power of 2 divided by 2 * the variance so let's put this into parentheses as well and also this part so minus and then this and then for the nominator equals so this is num square root over and here we have 2 * numpy do pi times the variance so then we want to return the numerator divided by the denominator and this is all that we need so now we are done and now we can test this so I already prepared some code for testing and let's go over this very quickly you can also find the whole code on GitHub so we import data sets and train test split from SK learn then let's have a helper function to calculate the accuracy then we call data sets make classification and create a toy data set with 1,000 samples and 10 features and two classes then we split this into training and testing then we create our naive Bas classifier and call fit with the training samples and then we call predict with the test calculate the accuracy by comparing y test and the predictions and now let's run this and we see the accuracy is 96. 5% so it works pretty well so yeah this is all I hope you enjoyed this and then I hope to see you in the next lesson

---
*Источник: https://ekstraktznaniy.ru/video/12964*