When we learn about seq2seq neural networks, there is a term we should know called Teacher Forcing.
When we train a seq2seq model, the decoder generates one token at a time, building the output sequence step by step.
At each step, it needs a previo...
For comparing the hidden states between the encoder and decoder, we need a similarity score.
Two common approaches to calculate this are:
Cosine similarity
Dot product
Cosine Similarity
It performs a dot product on the vectors and then normali...