Can we make LLMs generate faster than their current speed?

Question

Can we make LLMs generate faster than their current speed?

Nikhilesh TayalLeader posted Jan 19 Originally published at www.linkedin.com 1 min read

The problem

Most AI models write text one word at a time.

That keeps answers good, but makes them slow.

Some models try to write everything at once to be faster.

But then:

they lose memory efficiency
the text feels less connected
compute cost goes up

So currently it’s fast vs good, not both.

The solution: ReFusion

ReFusion doesn’t work word by word.

It works in small chunks of text, called slots.

How it works:

First, the model plans which chunks can be written together
Then it writes those chunks in parallel

For example: Sentence to write:

“The cat sat on the mat.”

Step 1: Plan slots

The model decides:

Slot 1: “The cat”

Slot 2: “sat on”

Slot 3: “the mat”

Step 2: Write slots in parallel

All three are generated at the same time:

Slot 1 → “The cat”

Slot 2 → “sat on”

Slot 3 → “the mat”

Step 3: Combine

Final sentence:

“The cat sat on the mat.”

The result

Faster than the strong models we use today
Quality stays almost the same

Do you think decoding speed and not model size might be the next big frontier?

2 Comments

chevron_left

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

YasirAwan4831 · Answer 1 · 2026-01-20T12:31:56+0000

Interesting idea! It seems the focus now will be on making the model faster and smarter rather than making it bigger.

	Sovereign Intelligence: The Complete 25,000 Word Blueprint (Download) Pocket Portfolioverified - Apr 1
	Architecting a Local-First Hybrid RAG for Finance Pocket Portfolioverified - Feb 25
	AI Reliability Gap: Why Large Language Models are not for Safety-Critical Systems praneeth - Mar 31
	The Privacy Gap: Why sending financial ledgers to OpenAI is broken Pocket Portfolioverified - Feb 23
	I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt Karol Modelskiverified - Mar 19

Can we make LLMs generate faster than their current speed?

2 Comments

Please log in to add a comment.

Please log in to comment on this post.

More Posts

Sovereign Intelligence: The Complete 25,000 Word Blueprint (Download)

Architecting a Local-First Hybrid RAG for Finance

AI Reliability Gap: Why Large Language Models are not for Safety-Critical Systems

The Privacy Gap: Why sending financial ledgers to OpenAI is broken

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

More From Nikhilesh Tayal

Fast learning is overrated.

For internal AI implementation, do we need a Project Manager or a Product Manager?

Hiring Agentic AI Engineers? This one question you could ask to separate the real talent from others

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,250 amazing developers

Don't have an account? Sign up

OR

Can we make LLMs generate faster than their current speed?

2 Comments

Please log in to add a comment.

Please log in to comment on this post.

More Posts

More From Nikhilesh Tayal

Related Jobs

Commenters (This Week)