# I Built a Wayland-Native AI Computer Use Server Because Nobody Else Did

Question

# I Built a Wayland-Native AI Computer Use Server Because Nobody Else Did

johnohhh1Leader posted Apr 10 7 min read

Ubuntu 26.04 LTS lands in about a week.

It is the first Ubuntu LTS expected to put a pure Wayland GNOME desktop front and center for a massive wave of users. That matters, because a lot of Linux AI tooling is about to run face-first into reality.

Most Linux computer-use tools still assume one of three things:

X11
a fake desktop running in Xvfb or VNC
raw evdev access with elevated privileges

That stack is already dusty. On a modern GNOME Wayland desktop, it is not the native path, and in a lot of cases it is not the right path at all.

So I built one that does it properly.

It is called portal-use.

The Problem: Most "Linux Computer Use" Is Still X11 Theater

When people say they have "Linux computer use" working today, what they usually mean is one of these:

import subprocess

subprocess.run(["xdotool", "click", str(x), str(y)])

That works on X11 because xdotool talks directly to the X server.

On a pure Wayland session, that assumption breaks. Wayland clients do not get to inject input into arbitrary applications just because they exist. That is part of the security model, not an accidental limitation.

So the usual workaround becomes:

Run Xvfb
Start a VNC server
Point the AI agent at the fake display
Pretend that is the user's desktop

It is not.

That is not the real desktop. It does not have the user's actual windows, actual session state, actual login context, actual filesystem workflow, or actual desktop environment. It is a simulated stage set.

The other common path is direct evdev injection:

needs root or custom udev rules
writes to /dev/input/event*
breaks when device ordering changes
bypasses the desktop stack entirely

That is not a native integration either. It is a kernel-level shortcut held together with trust and duct tape.

Wayland already has a proper stack for this.

The Right Stack

A real Wayland-native computer use implementation looks like this:

AI agent → MCP server → XDG Desktop Portal → PipeWire + libei/EIS → GNOME compositor

That is the correct architecture.

Here is the breakdown.

Screen capture

For screenshots and screen streaming, the compositor exposes frames through XDG Desktop Portal ScreenCast, backed by PipeWire.

The app requests access.
The user approves it.
The compositor exposes a PipeWire node.
Frames come through as a proper modern desktop stream.

Input injection

For keyboard and mouse control, the app uses XDG Desktop Portal RemoteDesktop.

Once approved, the portal hands back an EIS file descriptor.
That gets consumed by libei, which is the proper user-space mechanism for emulated input on Wayland compositors that support it.

The compositor remains in charge the entire time.

The user approves access
The session is mediated
Access can be revoked
No root is required
No /dev/input hacks are required
No X11 compatibility layer is required

That is how this should work.

What portal-use Does

portal-use is an MCP-compatible computer-use server for Wayland desktops that uses the real Linux desktop stack instead of pretending X11 still runs the world.

It uses:

XDG Desktop Portal for access brokering
PipeWire for screen capture
libei / EIS for input injection
Starlette + Uvicorn for the server
MCP StreamableHTTP for AI tool integration

The result is a computer-use server that can control a real GNOME Wayland desktop without X11, without Xvfb, and without root.

What Actually Took Time

The architecture looks straightforward on paper. The real work was in the implementation details nobody bothers to write down.

`ei_device_start_emulating()` Is Mandatory

This one wasted an absurd amount of time.

The libei docs mention ei_device_start_emulating(), but they do not make it clear enough that input events can be silently dropped if you send them before that call happens at the correct time.

No warning.
No exception.
No log message.
Your events just disappear.

The sequence matters:

EI_EVENT_SEAT_ADDED
EI_EVENT_DEVICE_ADDED
EI_EVENT_DEVICE_RESUMED

Only after EI_EVENT_DEVICE_RESUMED should you call:

ei_device_start_emulating(device, sequence);

Only then should you send motion, button, or keyboard events.

If you treat DEVICE_ADDED as "ready", your automation looks connected but does nothing.

`c_float` vs `c_double`

The libei pointer motion functions expect double.

If you pass the wrong ctypes type, your coordinates get mangled in a way that looks like bad math, bad scaling, or bad monitor transforms.

Wrong:

_libei.ei_device_pointer_motion_absolute(dev, x, y)

Right:

_libei.ei_device_pointer_motion_absolute(
    dev,
    ctypes.c_double(x),
    ctypes.c_double(y),
)

A tiny ABI mismatch can turn precise clicks into ghost clicks that land near the target but not on it. That kind of bug is extra annoying because it makes you distrust your entire coordinate pipeline.

Clicks Worked, but the Cursor Did Not Move

This one was sneaky.

At one point, clicks worked correctly. Menus opened. Hover states triggered. Navigation happened. The compositor was clearly receiving the events.

But the cursor itself did not visibly move.

On GNOME 50, those are not always the same thing.

There is a difference between:

where the compositor delivers pointer interaction
where the visible hardware cursor overlay appears on screen

The fix was to register two devices:

one absolute pointer device for exact event delivery
one relative pointer device for visual cursor motion

The move function sends motion to both.

The relative device updates the visible cursor.
The absolute device anchors exact click coordinates.

That split was the missing piece.

The Hot Corner Trap

After homing the cursor to (0, 0), the pointer ends up in the top-left corner.

On GNOME, that is the Activities hot corner.

So the very first slight motion can trigger the overview and derail the session before the agent has even started doing useful work.

The fix was simple: move to a safe location immediately after connecting.

safe_x = width * 0.05
safe_y = height * 0.5

Tiny detail. Huge quality-of-life improvement.

`persist_mode` Can Deadlock on GNOME

The ScreenCast portal supports persistence options so that users do not have to re-approve access constantly.

In practice, on GNOME 46 through 50, setting persist_mode in a combined RemoteDesktop + ScreenCast flow can cause SelectSources to hang forever without returning a response.

So the production workaround became:

try session persistence with a timeout
if it hangs, remove persist_mode
continue without it

That means access is approved once per login session rather than once forever. Not ideal, but completely livable.

The Production Shape

The best UX ended up being a user daemon.

The architecture now looks like this:

systemd user service
  → portal-use server
  → portal session established at login
  → consent granted once
  → persistent MCP endpoint on localhost

Then the AI client connects like this:

Claude Code / Claude Desktop
  → HTTP MCP connection
  → portal-use
  → real Wayland desktop

That solves the worst part of portal-based automation: repeated startup friction. Approve once at login, then the session stays alive.

What Works Right Now

Tested on Ubuntu 26.04 RC, GNOME 50, and NVIDIA RTX 5070.

Current working features:

full desktop screenshots via PipeWire
exact-coordinate left, right, and middle click
double-click
click-and-drag
scroll in all directions
full keyboard input, including modifiers and combos
computer_zoom for high-resolution crops of small UI regions
visible cursor movement on screen
works with Claude Code and Claude Desktop
daemon mode with consent once per login session

I have already used it to search for and launch apps, open Files from a dock context menu, and navigate Chrome, all on a real Wayland desktop with no X11 in the stack.

Why This Matters

Ubuntu 26.04 LTS will be the supported Linux desktop for years.

That means developers, enterprises, researchers, and power users are going to be running modern Wayland desktops as the default, not as an experiment. AI tools that still rely on X11 assumptions are going to age badly and fast.

On macOS, AI desktop control uses the native accessibility stack.
On Windows, it uses native automation APIs.
On Linux, too many projects still jump straight to Xvfb and VNC like the real desktop is unavailable.

But the real desktop is available.

GNOME already ships the pieces:

xdg-desktop-portal-gnome
PipeWire
libei
a consent-based compositor-controlled workflow

The stack exists.
The protocols exist.
The tooling just has not caught up.

A fake X display inside a container is not a real Linux desktop integration. It is a workaround that happens to run on Linux.

What I Want to Port Next

portal-use is part of a broader effort: porting AI desktop tooling to the native Wayland stack before Ubuntu 26.04 LTS makes the old assumptions impossible to ignore.

The target architecture looks like this:

Layer	Native Linux Stack
Screen capture	PipeWire via XDG Desktop Portal ScreenCast
Input injection	libei via XDG Desktop Portal RemoteDesktop
Window info	AT-SPI2 accessibility tree
OCR	Tesseract on zoomed crops
Integration	MCP

If you are building AI tooling for Linux desktops, stop building for the compatibility layer.

Build for Wayland.
Use the portal.
Use the compositor-mediated path.
Use the stack Linux desktops are actually shipping.

Get It

git clone https://github.com/johnohhh1/portal_use
cd portal_use
bash install.sh

Requires: Ubuntu 26.04+ or another GNOME Wayland compositor, plus Claude Code or Claude Desktop.

Issues, PRs, and testing notes are welcome, especially from people trying it on KDE Plasma Wayland.

Built on Ubuntu 26.04 RC + GNOME 50 + RTX 5070, because somebody had to.

2 Comments

chevron_left

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

Gargi Chakravarty · Answer 1 · 2026-04-11T02:25:12+0000

Respect for actually building this instead of waiting for someone else to . Did you run into compositor specific issues or is it mostly consistent across environments?

johnohhh1 · Answer 2 · 2026-04-11T13:45:02+0000

Thanks! I want to be upfront about what I've actually tested vs what I'm guessing at.
What I've tested: GNOME 50 on Ubuntu 26.04 RC. That's it. Only environment I've run this against seriously.
strong textGNOME-specific issues I hit:
Hardware cursor doesn't move with POINTER_ABSOLUTE alone — absolute motion events deliver correctly to apps but don't update the DRM hardware cursor overlay. You have to also register a relative (POINTER) device and send delta motion to get the cursor to visually move. No idea yet if this is GNOME-specific or a broader Wayland compositor thing.
persist_mode deadlock — setting persist_mode=2 in SelectSources on a combined RemoteDesktop + ScreenCast session causes the portal to just never send a Response signal. Hangs forever. Seen it on GNOME 46 through 50. KDE might not have this bug.
ei_device_start_emulating ordering — libei protocol thing, not compositor-specific. You have to call it on DEVICE_RESUMED, not DEVICE_ADDED. If you break the event loop too early you never call it and all your events get silently dropped. Burned me for a while.

Other compositors: KDE Plasma should mostly work since the portal interfaces are standardized — screen capture side should be fine (PipeWire is PipeWire) but input injection will probably need some testing around how KDE handles POINTER vs POINTER_ABSOLUTE. For wlroots (Sway, Hyprland) — EIS support is partial and varies by build. Wouldn't promise anything without someone actually testing it.
If you get it running on something other than GNOME 50 I'd genuinely love to know what breaks. Open an issue with journalctl output and compositor version.

	How I Built a React Portfolio in 7 Days That Landed ₹1.2L in Freelance Work Dharanidharan - Feb 9
	React Native Quote Audit - USA kajolshah - Mar 2
	Everyone says DeepSeek is cheaper, but I got tired of guessing the exact math. So I built a calculat abarth23 - Apr 27
	I Wrote a Script to Fix Audible's Unreadable PDF Filenames snapsynapseverified - Apr 20
	I built a CLI to test Tauri apps because nothing else worked Mathieu - Apr 21

# I Built a Wayland-Native AI Computer Use Server Because Nobody Else Did

Ubuntu 26.04 LTS lands in about a week.

The Problem: Most "Linux Computer Use" Is Still X11 Theater

The Right Stack

Screen capture

Input injection

What portal-use Does

What Actually Took Time

`ei_device_start_emulating()` Is Mandatory

`c_float` vs `c_double`

Clicks Worked, but the Cursor Did Not Move

The Hot Corner Trap

`persist_mode` Can Deadlock on GNOME

The Production Shape

What Works Right Now

Why This Matters

What I Want to Port Next

Get It

2 Comments

Please log in to add a comment.

Please log in to add a comment.

Please log in to comment on this post.

More Posts

How I Built a React Portfolio in 7 Days That Landed ₹1.2L in Freelance Work

React Native Quote Audit - USA

Everyone says DeepSeek is cheaper, but I got tired of guessing the exact math. So I built a calculat

I Wrote a Script to Fix Audible's Unreadable PDF Filenames

I built a CLI to test Tauri apps because nothing else worked

More From johnohhh1

I Ported ComfyUI Desktop to Ubuntu 26.04 1 a day till I run out!

Here is the Kimi Desktop on Ubuntu 26.04 noone asked for..... Because I am creating 1 desktop a day

I Ported the Ollama Desktop App to Linux Just in Time for Ubuntu 26.04 LTS

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,027 amazing developers

Don't have an account? Sign up

OR

# I Built a Wayland-Native AI Computer Use Server Because Nobody Else Did

Ubuntu 26.04 LTS lands in about a week.

The Problem: Most "Linux Computer Use" Is Still X11 Theater

The Right Stack

Screen capture

Input injection

Consent and control

What portal-use Does

What Actually Took Time

ei_device_start_emulating() Is Mandatory

c_float vs c_double

Clicks Worked, but the Cursor Did Not Move

The Hot Corner Trap

persist_mode Can Deadlock on GNOME

The Production Shape

What Works Right Now

Why This Matters

What I Want to Port Next

Get It

2 Comments

Please log in to add a comment.

Please log in to add a comment.

Please log in to comment on this post.

More Posts

More From johnohhh1

Related Jobs

Commenters (This Week)

`ei_device_start_emulating()` Is Mandatory

`c_float` vs `c_double`

`persist_mode` Can Deadlock on GNOME