# How to implement KNN from scratch with Python

## Метаданные

- **Канал:** AssemblyAI
- **YouTube:** https://www.youtube.com/watch?v=rTEtEy5o3X0
- **Дата:** 11.09.2022
- **Длительность:** 9:23
- **Просмотры:** 126,290

## Описание

In the first lesson of the Machine Learning from Scratch course, we will learn how to implement the K-Nearest Neighbours algorithm. Being one of the simpler ML algorithms, it is a great way to kick off our deep dive into ML algorithms.

You can find the code here: https://github.com/AssemblyAI-Examples/Machine-Learning-From-Scratch

Previous lesson: https://youtu.be/p1hGz0w_OCo
Next lesson: https://youtu.be/ltXSoduiVwY

Welcome to the Machine Learning from Scratch course by AssemblyAI.
Thanks to libraries like Scikit-learn we can use most ML algorithms with a couple of lines of code. But knowing how these algorithms work inside is very important. Implementing them hands-on is a great way to achieve this. 

And mostly, they are easier than you’d think to implement.

In this course, we will learn how to implement these 10 algorithms.
We will quickly go through how the algorithms work and then implement them in Python using the help of NumPy.

▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬

🖥️ Website: https://www.assemblyai.com/?utm_source=youtube&utm_medium=referral&utm_campaign=scratch01
🐦 Twitter: https://twitter.com/AssemblyAI
🦾 Discord: https://discord.gg/Cd8MyVJAXd
▶️  Subscribe: https://www.youtube.com/c/AssemblyAI?sub_confirmation=1
🔥 We're hiring! Check our open roles: https://www.assemblyai.com/careers

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

Are KNN and K-means the same thing?
No. KNN is a supervised learning algorithm whereas K-means is a clustering algorithm. 

#MachineLearning #DeepLearning

## Содержание

### [0:00](https://www.youtube.com/watch?v=rTEtEy5o3X0) Segment 1 (00:00 - 05:00)

the first algorithm we're going to look into is k n or k nearest neighbors how knm works it's basically given a data point you calculate this data point distance from all other data points in your data set and then you get the closest k points so this k is a hyper parameter that the user determines and in regression to get the results you get the average of the values of the k nearest neighbors or in classification you get the label of this data point using the majority vote of the k nearest neighbors so maybe let's see this on an example let's say the green values that we have here are one group and the red values that we see here are another group and then we have a new data point the yellow point what we do is we get the distance of the yellow point to all other data points in our data sets we get the k closest ones so let's say for this example is 3 and in this case it's a classification so we get the majority vote all of them are green that means this point also needs to be green so let's see how we can implement this algorithm in python all right let's start building the canon algorithm so i'm going to make it into a class actually and in the initialization function what i need to pass it is of course self and this is a k nearest neighbor's algorithm and the k is going to be determined when the model is created so that's why i'm also going to have to pass it a k value for now i can say the default value for k is 3 and then we create k and this class is going to have a fit function and a predict function in the fit function we don't really need to do much basically what we have to do is to keep the values for the x and y data sets and of course i also need to pass it here to the fit function and the predict function is where we're going to do all the calculations so calculating the distance between this data point and all the other data points and finding the closest ones and then getting the prediction for that to the predict function we're going to be passing the testing data set so the data points that you want the prediction for so what i'm going to do is actually to create a helper function another predict function that will get a single data point value and here what i'm going to do is to say the predictions will be self the helpful function for each of the examples in the data set that is being sent to us and then i can return these predictions and here in this helper function i'm going to calculate the distance of this little x so one single data point uh to all the points in our x train and then return the label based on the three nearest neighbors the main thing that i need to do here is to compute the distances and then i need to get the closest k and finally we need to determine the label with majority vote so the computer distance i'm going to be using euclidean distance so let's create a i don't know where this came from but let's create a euclidean distance global function that given to erase will give us a distance between them and numpy square root numpy sum of x one six two of course i also need to import numpy for this distance and then i can return the distance so here i'm going to calculate the distances and the distance is going to be between this x that is passed past this function and each value in x train but self extreme of course from here i'm going to use ark sort from numpy on top of the distances and after it's sorted i'm going to get the first k

### [5:00](https://www.youtube.com/watch?v=rTEtEy5o3X0&t=300s) Segment 2 (05:00 - 09:00)

of these distances of their indices at least what arcsort does is basically tells you where the original indices of from the previous array from the original array would be after they are sorted so then when you get the first k uh effectively you're getting the indices of the closest three neighbors for this data point that we're working with so that would give me the indices and then i will get their labels nearest labels and we can get that from y train for e in the closest indices to get the most common class label i'm going to use from the collections library a counter data structure oops it's just going to make it a bit easier for us i can get the k nearest labels and then i can ask for the most common one and basically all i need to do is to return this most common label so let's see if everything works as intended now i've already imported the iris data set from sklearn from scikit-learn and let's see what the data set looks like first all right so this is what the data set look looks like it looks like there are three separate clusters of labels and the next thing that i want to do is to create a classifier i'll close this with k n but of course i need to import canon here since i just created it from k n we import k n and what we need to pass it is the k value uh let's say okay let's say 5 for now then we call the fit function over the x strain and y train and then we need to do predictions uh why then we send it to x test and that would give me some predictions uh but let's see what these predictions look like first all right so this is one result this is one prediction that we get uh as you remember you might remember we are getting it from the counter the most common function and what it returns is a list of the counts of all instances and uh yeah so how many times it has occurred and what the name of this label is so instead of that of course we need to return only the name of the label and nothing else so that's why i'm going to have to select the first one and the first value inside this tuple also and that is going to give me the labels so let's run this again and see okay now it looks like it's giving me actual labels it's either 0 1 or 2. and now i also want to calculate this accuracy to see if it's working well or not and that is actually quite easy to do i'll just say accuracy count how many times predictions are the same as y test and divide this by number of data points in y test and then we can print this let's see 0. 96 so that's pretty good already for something that we implemented in like what 10 minutes or something like that so that's great that means our k n is working don't forget that you can get this code through our github repository the link is in the description and if you have any questions don't forget to leave a comment i hope you liked this video and i will see you in the next lesson

---
*Источник: https://ekstraktznaniy.ru/video/13009*