Attention Is All You Need - Part 5

1 4 27
calendar_todayschedule2 min read
— Originally published at dev.to

Hello, I'm Ganesh. I'm building git-lrc, an AI code reviewer that runs on every commit. It is free, unlimited, and source-available on GitHub. Star Us to help devs discover the project. Do give it a try and share your feedback for improving the product.

In previous article we discussed about step 2 of transformer model, i.e. position encoding.

In this article we will discuss step 3 of transformer model, i.e. Multi-Head Attention.

Why Traditional RNN model didn't work for long sentences?

Before 2017, we were using LSTM and RNN models for NLP tasks.

Basicaly as the input of words and processing and context was very less.

For Example let's assume there are 3 words model processes words 1 by 1.

So, first sentence it was taking about river bank.

The river bank.
The United Bank

Next it is about united bank which is has no related data but as we did embeding and positonal encodings we have very low probablity of understanding the context.

Here is the example of how it vector might look like.

How Single Attention Head works?

A single attention head works by determining how much focus a specific token (word) in a sequence should place on other tokens to better understand its own context.

Let's take example: "The cat sat on the mat."

For the token "sat", the attention head might learn to pay high attention to "cat" and "mat" because they are directly related to "sat".

Let's get understanding these in details in next article by actual implementing it.

Reference: https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

git-lrc

Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.
⭐ Star it on GitHub: https://github.com/HexmosTech/git-lrc

🔥 Join developers growing publicly
Share your knowledge, build in public, and grow your developer presence with a global community.

More Posts

Attention Is All You Need - Part 6

Ganesh Kumar - Jun 1

Attention Is All You Need - Part 4

Ganesh Kumar - Jun 3

Attention Is All You Need - Part 1

Ganesh Kumar - Apr 29

Attention Is All You Need - Part 2

Ganesh Kumar - Apr 29

MCP Is the USB-C of AI. So Why Are You Plugging Everything In?

Ken W. Algerverified - Jun 10
chevron_left
1.1k Points32 Badges
44Posts
5Comments
3Connections
I am tech enthusiast, IoT innovator, software developer.

Related Jobs

View all jobs →

Commenters (This Week)

4 comments
1 comment
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!