Multimodal AI has a hidden problem.

Leader 2 54 108
calendar_todayschedule1 min read

Multimodal AI has a hidden problem.

Images → one tokenizer

Videos → another

3D → completely different setup

And it gets worse:

  • Models that generate visuals don’t really understand them

  • Models that understand visuals can’t generate them well

So instead of one intelligent system,

we end up with a stack of disconnected capabilities.

Apple is trying to take a very different approach with new model - AToken

Instead of adding more pieces, it removes them.

  • One tokenizer

  • One encoder

  • Works across images, videos, and 3D

The core idea:

Treat all visual data in a unified format.

Images → (x, y)

Videos → (t, x, y)

3D → (x, y, z)

Everything becomes part of a single 4D token space.

So the same model can:

  • Understand

  • Generate

  • Reconstruct

Across all formats.

And the real unlock:

Data leverage.

We have massive image datasets.

But very limited video and 3D data.

With a shared model:

→ Learning transfers across modalities

→ Less data needed overall

→ Faster capability growth

This is exactly what happened with LLMs.

One tokenizer → text, code, conversations, everything.

Now we’re seeing the same shift in vision.

From:

“different models for different media”

To:

one model that understands the visual world.

6.7k Points164 Badges2 54 108
Indiaaimletc.com
65Posts
46Comments
9Followers
9Connections
Nikhilesh is an entrepreneur, teacher and tech nerd
He is an IIT Kharagpur alumnus. He is also a Google Developer Expert for AI and has 14000+ followers on LinkedIn.
Currently, he ... Show more
Build your own developer journey
Track progress. Share learning. Stay consistent.

1 Comment

1 vote
🔥 Join developers growing publicly
Share your knowledge, build in public, and grow your developer presence with a global community.

More Posts

TypeScript Complexity Has Finally Reached the Point of Total Absurdity

Karol Modelskiverified - Apr 23

The Sovereign Vault — A Comprehensive Guide to Protocol-Driven AI

Ken W. Algerverified - Jun 4

AI Agents Don't Have Identities. That's Everyone's Problem.

Tom Smithverified - Mar 13

Why Are There Only 13 DNS Root Servers For The Whole World? Is that a problem

richarddjarbeng - May 7

Your AI Agent Skills Have a Version Control Problem

snapsynapseverified - Apr 22
chevron_left

Related Jobs

View all jobs →

Commenters (This Week)

3 comments
1 comment
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!