Cosine Similarity vs Dot Product in Attention Mechanisms

Question

Cosine Similarity vs Dot Product in Attention Mechanisms

RijulTP posted Apr 25 Originally published at dev.to 1 min read

For comparing the hidden states between the encoder and decoder, we need a similarity score.

Two common approaches to calculate this are:

Cosine similarity
Dot product

Cosine Similarity

It performs a dot product on the vectors and then normalizes the result.

Example

Encoder output:

[-0.76, 0.75]

Decoder output:

[0.91, 0.38]

Cosine similarity ≈ -0.39

Close to 1 → very similar → strong attention
Close to 0 → not related
Negative → opposite → low attention

This is useful when:

Values can vary a lot in size
You want a consistent scale (-1 to 1)

The problem is that it’s a bit expensive. It requires extra calculations (division, square roots), and in attention we don’t always need that.

Dot Product

Dot product is much simpler. It does the following:

Multiply corresponding values
Add them up

Example

(-0.76 × 0.91) + (0.75 × 0.38) = -0.41

Dot product is preferred in attention because:

It’s fast
It’s simple
It gives good relative scores

Even if the numbers are not normalized, the model can still figure out:

Which words are more important
Which words to ignore

Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.

Just run:

ipm install repo-name

… and you’re done!

Explore Installerpedia here

chevron_left

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

	I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt Karol Modelski - Mar 19
	Defending Against AI Worms: Securing Multi-Agent Systems from Self-Replicating Prompts alessandro_pignati - Apr 2
	Your AI Doesn't Just Write Tests. It Runs Them Too. Kevin Martinez - May 12
	The "Privacy vs. Utility" trade-off in FinTech AI is a false dichotomy Pocket Portfolioverified - Mar 30
	Comparison: Universal Import vs. Plaid/Yodlee Pocket Portfolioverified - Mar 12

Cosine Similarity vs Dot Product in Attention Mechanisms

Cosine Similarity

Dot Product

0 Comments

Please log in to comment on this post.

More Posts

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

Defending Against AI Worms: Securing Multi-Agent Systems from Self-Replicating Prompts

Your AI Doesn't Just Write Tests. It Runs Them Too.

The "Privacy vs. Utility" trade-off in FinTech AI is a false dichotomy

Comparison: Universal Import vs. Plaid/Yodlee

More From RijulTP

Understanding Teacher Forcing in Seq2Seq Models

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,221 amazing developers

Don't have an account? Sign up

OR

Cosine Similarity vs Dot Product in Attention Mechanisms

Cosine Similarity

Dot Product

0 Comments

Please log in to comment on this post.

More Posts

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

Defending Against AI Worms: Securing Multi-Agent Systems from Self-Replicating Prompts

Your AI Doesn't Just Write Tests. It Runs Them Too.

The "Privacy vs. Utility" trade-off in FinTech AI is a false dichotomy

Comparison: Universal Import vs. Plaid/Yodlee

More From RijulTP

Understanding Teacher Forcing in Seq2Seq Models

Related Jobs

Commenters (This Week)