Cosine Similarity vs Dot Product in Attention Mechanisms

Cosine Similarity vs Dot Product in Attention Mechanisms

posted Originally published at dev.to 1 min read

For comparing the hidden states between the encoder and decoder, we need a similarity score.

Two common approaches to calculate this are:

  • Cosine similarity
  • Dot product

Cosine Similarity

It performs a dot product on the vectors and then normalizes the result.

Example

Encoder output:

[-0.76, 0.75]

Decoder output:

[0.91, 0.38]

Cosine similarity ≈ -0.39

  • Close to 1 → very similar → strong attention
  • Close to 0 → not related
  • Negative → opposite → low attention

This is useful when:

  • Values can vary a lot in size
  • You want a consistent scale (-1 to 1)

The problem is that it’s a bit expensive. It requires extra calculations (division, square roots), and in attention we don’t always need that.


Dot Product

Dot product is much simpler. It does the following:

  • Multiply corresponding values
  • Add them up

Example

(-0.76 × 0.91) + (0.75 × 0.38) = -0.41

Dot product is preferred in attention because:

  • It’s fast
  • It’s simple
  • It gives good relative scores

Even if the numbers are not normalized, the model can still figure out:

  • Which words are more important
  • Which words to ignore

Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.

Just run:

ipm install repo-name

… and you’re done!

Installerpedia Screenshot

Explore Installerpedia here

More Posts

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

Karol Modelskiverified - Mar 19

Defending Against AI Worms: Securing Multi-Agent Systems from Self-Replicating Prompts

alessandro_pignati - Apr 2

The "Privacy vs. Utility" trade-off in FinTech AI is a false dichotomy

Pocket Portfolioverified - Mar 30

Comparison: Universal Import vs. Plaid/Yodlee

Pocket Portfolioverified - Mar 12

Mitigating AI Hallucinations: A Deep Dive into Best-of-N and Consensus Mechanisms

alessandro_pignati - Apr 14
chevron_left

Related Jobs

View all jobs →

Commenters (This Week)

4 comments
3 comments
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!