Accounting for word context

Now that we understand the concept of word2vec and the intuition beyond it, let’s move forward and generalize the same concept to a sequence of words. So based on our current knowledge, to get the words ready for further mathematical analysis and modeling, we map each word to a vector of d-dimensional, right?

https://cdn-images-1.medium.com/max/800/1*ethIZevzvxqyiKL9pKgECw.png

v2w Diagram

But… 😐

As you probably notice, this way of mapping each word to a single vector is pretty restrictive 😐! Ya ask why? Imagine you have a dictionary book 📖, then you head to look for the word “bat” meaning. How many different meanings does it have? Doesn’t it rely on the concept that the word is in?

https://cdn-images-1.medium.com/max/800/1*MMZHkl5fSjWMhIojXM0KYA.png

I’m pretty sure now that you understand why I said considering only a single vector per word is beyond restrictive because it does not take into account the surrounding words’ context! Therefore we need to build a framework by which, we can modify this mapping in a way that takes into account the meaning of surrounding words, right? 🌱 To do that, we need to first get familiar with the concept of Inner Product.

Remember I mentioned that within the concept of word2vec, we map each word to a vector of d-dimensional where each of those d-dimensional vectors is associated with a particular meaning/topic? Now keep reading👇

https://cdn-images-1.medium.com/max/800/1*ct2Fz3MQcdLrsIokGQdwGQ.png

Inner product of two words/codes

The purpose of the inner product is to quantify the relation/similarity between words! The way it does this is via taking the dot product of two vectors.

https://cdn-images-1.medium.com/max/800/1*6qM3cmt6hXMHGnyLg5bZOQ.png

Here we have two vectors C1 and C2, each one of them has d-components. So we gonna take the first d-component of C1 and the first d-component of C2 and then multiply them together for all the d-components! After that, we sum all together which is the concept of inner product! So inner product is ****gonna;

⁉️🧠 Why the dot product is gonna positive and large if the two words are similar?

Ya remember my notional example of Paris🗼? Now imagine we have two similar words; word1 and word2 which have 10-dimensional vectors associated with them (just for simplicity, in practice the dimension can get up to 256🤯*). When we say these words are similar then we know that their associated vectors gonna be similar, right? If yes, then it means that each of the components of the vector is similar(being either + or -)*! So in the below example, because these two-word vectors are similar, then as you see both the component 1st is positive!