As previously discuss about the Pythagoras Theorem and Euclidean distance link to blog , these methods need more computation as it involve squaring and square rooting. This makes us think is their any other way to compute the distance. Here we are going to answer this question with some proof and examples.
Let’s say we have 2 data points B(x1, y1) and C(x2,y2) and we are finding distance between these two points using the Euclidean distance or Pythagoras Theorem.
But the questions here arise that why we are calculating the distance between these 2 points: the answer would be to see how similar these 2 points are. In that case can’t we measure the angle between them the angle COB. Smaller the angle closer the points are and thus that simillary the points are and vise-versa. But this analogy does not apply everywhere let’s see another example.
Here we can see that the angle COB is similar as before but the distance between these points are way more. Again a question arise here that is what makes this chnage: The answer would be the length from the origin. As long as the points have same lenght from origin the anology with the angle works fine.
Finding angle COB:
let’s look back to the figure for reference:
Let’s assume:
angle COD = θ2 and angle BOA = θ1
According to the figure,
angle COB = cos(θ2 — θ1) (we choose cos(θ) because angle COD and angle BOA is made using the base and hypotenuse and cos(θ) = base/hypotenuse)
and cos(θ2 — θ1) = cos(θ2).cos(θ1) + sin(θ2).sin(θ1) (because cos(A-B) = cos(A)cos(B) + sin(A)sin(B))
getting unknown values:
x2
cos(θ2) = ------------ (base/hypotenuse in Triangle COD)
√(x2² + y2²)
x1
cos(θ1) = ------------ (base/hypotenuse in Triangle BOA)
√(x1² + y1²) y2
sin(θ2) = ------------ (perpendicular/hypotenuse in Triangle COD)
√(x2² + y2²) y1
sin(θ1) = ------------ (perpendicular/hypotenuse in Triangle BOA)
√(x1² + y1²)
Putting values back:
x2 x1 y2 y1
angle COB = ----------- . ----------- + ------------ . -----------
√(x2² + y2²) √(x1² + y1²) √(x2² + y2²) √(x1² + y1²)
and we know the legth of both points from the origin are same thus the hypotenuse is equal.
thus:
x2.x1 y2.y1
angle COB = ----------- + ------------
√(x2² + y2²) √(x2² + y2²)
x2.x1 + y2.y1
angle COB = ---------------
√(x2² + y2²)
This is the formula for cosine similarity the lesser the angle more the points are similar as we know,
cos(0) = 1 and cos(90) = 0,
thus if two points have 0 deg angle that means those points are exact same or we can say identical
if the angle between them is 90 deg then we can say that points are not similar at all.
Why cosine similarity
We done all the calculations but the problem remains same. We caclulated the cosine similarity because the Euclidean distance or Pythagoras Theorem requires squaring and rooting but here we can see cosine similarity uses them as well, so why do all of these?
See the squaring and rooting in cosine similarity is done only for the length of the point from origin and if we take the points whose length form the origin is 1 then we don’t need to do squaring and rooting as the answer after both operations remains 1.
thus the formula for cosine similarity if the lenght form origin is 1 is:
angle COB = x2.x1 + y2.y1
this requires only multiplication and addition which requires very less computational power as compare to the Euclidean distance or Pythagoras Theorem. In this example we calculated cosine similarity for 2 dimentional plane but we can easily do this for higher dimensions. The formula for the 3D cosine similarity will be:
cosine similarity = x3.x2.x1 + y3.y2.y1
this is very simple as compare to Euclidean distance or Pythagoras Theorem in higher dimensions. Thus we use cosine similarity whenever we can.
If you still didn’t gone through the Euclidean distance or Pythagoras Theorem you can visit: link to blog
In future we will explore practical use of cosine similarity.