Attention Is All You Need - Part 1

1 4 27
calendar_todayschedule2 min read
— Originally published at dev.to

Hello, I'm Ganesh. I'm building git-lrc, an AI code reviewer that runs on every commit. It is free, unlimited, and source-available on Github. Star Us to help devs discover the project. Do give it a try and share your feedback for improving the product.

I will going to discuss about the paper "Attention Is All You Need" which introduced the Transformer architecture in 2017 and it has become one of the most important models in the field of NLP.

This paper was published in 2017 by Google researchers.

What is Background?

The Goal of Machine Learning is to learn mapping from input to output.

For example:
Predicting house price based on sqft was based on bedrooms, bathrooms, locality, etc

In email spam detection, the input is email text and the output is spam or not spam.

This were mapped through neural networks.

Neural networks is sequence of layers each transforming an input to output of previous layer.

But this had a major limitation that it was not able to capture the long range dependencies in the input.

What problem they solved?

The earlier models were based on Recurrent Neural Networks (RNNs) processed one token per time step.

Which mainly had two problems:

  1. It was not able to capture the long range dependencies in the input.
  2. It was not able to process the input in parallel. As it was only depended on sequential information.

How this was solved?

Transformer architecture was introduced to solve this problem.

It is based on attention mechanism.

Which allows the model to focus on the most relevant parts of the input sequence.

This is simple explaination for now I will conclude and wrap it up for this article.

Conclusion

In this article, we discussed the background of the Attention Is All You Need paper and the problem it solved.

In the next article, we will discuss the Transformer architecture in detail with an example.

Reference: https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

git-lrc

Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.
⭐ Star it on GitHub: https://github.com/HexmosTech/git-lrc

🔥 Join developers growing publicly
Share your knowledge, build in public, and grow your developer presence with a global community.

More Posts

The Sovereign Vault — A Comprehensive Guide to Protocol-Driven AI

Ken W. Algerverified - Jun 4

Attention Is All You Need - Part 4

Ganesh Kumar - Jun 3

Attention Is All You Need - Part 5

Ganesh Kumar - Jun 1

Attention Is All You Need - Part 2

Ganesh Kumar - Apr 29

MCP Is the USB-C of AI. So Why Are You Plugging Everything In?

Ken W. Algerverified - Jun 10
chevron_left
1.1k Points32 Badges
44Posts
5Comments
3Connections
I am tech enthusiast, IoT innovator, software developer.

Related Jobs

View all jobs →

Commenters (This Week)

1 comment
1 comment
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!