Euclidean Distance Manhattan Distance Minkowski Distance Chebyshev Distance Jaccard Distance & Cosine Similarity Solved Example by Vidya Mahesh Huddar
Following are the list of distance measures used in machine learning and data mining
Euclidean Distance
Manhattan Distance
Minkowski Distance
Chebyshev Distance
Cosine Similarity
Jaccard Distance
********************************
Follow Us on:
1. Blog / Website: https://www.vtupulse.com/
2. Download Final Year Project Source Code: https://vtupulse.com/download-final-year-projects/
3. Like Facebook Page: https://www.facebook.com/VTUPulse
4. Follow us on Instagram: https://www.instagram.com/vtupulse/
5. Like, Share, Subscribe, and Don't forget to press the bell ICON for regular updates
Оглавление (2 сегментов)
Segment 1 (00:00 - 05:00)
Welcome back. In this video, we are going to understand distance measures in a machine learning with a simple solved examples. Distance measures are used to find how similar or different data points are. They play very important role in clustering, classification, and recommendation systems. The smaller the distance, the more similar the data points. Followings are the list of distance measures used in machine learning and data mining. Those are Euclidean distance, Manhattan distance, Minkowski distance, Chebyshev distance, cosine similarity, and Jaccard distance. We will solve all these methods one by one. So, first we'll consider the Euclidean distance. Euclidean distance is nothing but a shortest straight line distance between the two data points. So, Euclidean distance is given by the following formula. If uh A of X1 Y1 and B of X2 Y2 are our two data points, then distance between A {comma} B is equal to square root of X2 minus X1 bracket square plus Y2 minus Y1 bracket square. To understand this one, we will take a simple example. Here, A of 2 {comma} 3 and B of 5 {comma} 7 two data points are there. So, the distance between A {comma} B, which is equal to square root of 5 minus 2 bracket square plus 7 minus 3 bracket square. So, 5 minus 2 is a 3 bracket square plus 7 minus 3 is a 4 bracket square. That is square root of 9 plus 16, which is equal to 5. This is the distance by using Euclidean distance method. So, next we'll consider Manhattan distance. Manhattan distance is a sum of the absolute difference between the coordinates of the points. So, we compute Manhattan distance using following formula. That is if A and B are the two data points, then a Manhattan distance which is equal to absolute difference between X2 and X1 plus absolute difference between Y2 and Y1. So, to understand this one, we'll take one example. Here A of (2,3) and B of (5,7) is there. So, distance between A, B is equal to absolute difference between 5 and 2 plus absolute difference between 7 and 3. So, 5 - 2 is equal to 3, 7 - 3 is equal to 4. So, 3 + 4 which is equal to 7. So, next we'll consider the Minkowski distance. Minkowski distance is a generalized distance measure that includes Euclidean and Manhattan distance as a special cases. The formula for Minkowski distance is given as if A of (X1, Y1) and B of (X2, Y2) are the two data points, then the Minkowski distance of A, B is equal to absolute difference between X2 X1 raised to P plus absolute difference between Y2 Y1 raised to P raised to 1 divided by P. To understand this one, we'll take one example. Here A of (2,3) B of (5,7) is given with a P is equal to 3. So, which is equal to absolute difference between 5 and 2 raised to 3 because here P is 3 plus absolute difference between 7 and 3 raised to 1 / 3. So, once you simplify this one, we will get a distance between A {comma} B by using Minkowski as 4. 497. So, next we will consider the Chebyshev distance. Chebyshev distance is a maximum absolute difference between the corresponding coordinates of two points. So, to calculate this one, we will use a following formula, that is maximum value between absolute difference between X2 X1 and absolute difference between Y2 Y1. So, here it is a example A of 2 {comma} 3 and B of 5 {comma} 7. So, distance between A {comma} B is equal to maximum value between 5 - 2 and 7 - 3. So, 5 - 2 is a 3, 7 - 3 is a 4, and in between 3 and 4, the maximum value is 4. So, the distance between A {comma} B by using Chebyshev distance is 4. Next, we will consider the cosine similarity. Cosine similarity is a measure that calculates the similarity
Segment 2 (05:00 - 08:00)
between two vectors based on the angle between them. This is a formula what we are using to calculate the cosine similarity. That is a cosine similarity of A {comma} B is equal to multiplication of A and B divided by multiplication of length of A and length of B. So, to understand, we'll take one simple example. Here, cosine similarity of A and B is equal to A into B, that is a 2 into 5 + 3 into 7 divided by the length of A. The length of A is nothing but the square root of sum of square of individual coordinates. So, that is square root of 2 square + 3 square multiplied with the square root of five square plus seven square. Once you simplify this one, we'll get the cosine similarity between A and B is equal to 0. 9997. Finally, we'll consider the Jaccard distance. A Jaccard distance measures the dissimilarity between two sets based on their intersection and union. So, it is calculated as one minus the Jaccard similarity. So, which is the ratio of the size of intersection to the size of the union of the sets. If A and B are two sets, then the Jaccard similarity between A and B, which is equal to cardinality of intersection of A and B divided by cardinality of union of A and B. So, here we have the A set and B set with the values 1 2 3 4 and a 3 4 5 6 respectively. So, first we need to find the Jaccard similarity. That is A intersection B divided by A union B. First, we'll calculate the A intersection B. A intersection B is nothing but the common points which are present in the two sets. In these two sets, common points are three and A union B is nothing but all the elements present in A and B. Those are 1 2 3 4 5 and 6. So, if you count here, the cardinality of A intersection B is nothing but a two. That is a length of A intersection B. Here, we have two points. So, this is two and the length of A union B is a six. So, 2 divided by 6 is equal to 0. 33. So, once you find the Jaccard similarity, next we need to find the Jaccard distance, which is nothing but one minus Jaccard similarity. So, which is equal to one minus 0. 333, which is equal to point 667. In this video we have understood uh different distance measures, solved examples step by step. I hope the concept is clear. If you like the video, please like, share, and subscribe. Press the bell icon for regular updates. Thank you for watching.