Ubuntu 26.04 LTS lands in about a week.
It is the first Ubuntu LTS expected to put a pure Wayland GNOME desktop front and center for a massive wave of users. That matters, because a lot of Linux AI tooling is about to run face-first into reality.
Most Linux computer-use tools still assume one of three things:
- X11
- a fake desktop running in Xvfb or VNC
- raw
evdev access with elevated privileges
That stack is already dusty. On a modern GNOME Wayland desktop, it is not the native path, and in a lot of cases it is not the right path at all.
So I built one that does it properly.
It is called portal-use.
The Problem: Most "Linux Computer Use" Is Still X11 Theater
When people say they have "Linux computer use" working today, what they usually mean is one of these:
import subprocess
subprocess.run(["xdotool", "click", str(x), str(y)])
That works on X11 because xdotool talks directly to the X server.
On a pure Wayland session, that assumption breaks. Wayland clients do not get to inject input into arbitrary applications just because they exist. That is part of the security model, not an accidental limitation.
So the usual workaround becomes:
- Run Xvfb
- Start a VNC server
- Point the AI agent at the fake display
- Pretend that is the user's desktop
It is not.
That is not the real desktop. It does not have the user's actual windows, actual session state, actual login context, actual filesystem workflow, or actual desktop environment. It is a simulated stage set.
The other common path is direct evdev injection:
- needs root or custom udev rules
- writes to
/dev/input/event*
- breaks when device ordering changes
- bypasses the desktop stack entirely
That is not a native integration either. It is a kernel-level shortcut held together with trust and duct tape.
Wayland already has a proper stack for this.
The Right Stack
A real Wayland-native computer use implementation looks like this:
AI agent → MCP server → XDG Desktop Portal → PipeWire + libei/EIS → GNOME compositor
That is the correct architecture.
Here is the breakdown.
Screen capture
For screenshots and screen streaming, the compositor exposes frames through XDG Desktop Portal ScreenCast, backed by PipeWire.
The app requests access.
The user approves it.
The compositor exposes a PipeWire node.
Frames come through as a proper modern desktop stream.
For keyboard and mouse control, the app uses XDG Desktop Portal RemoteDesktop.
Once approved, the portal hands back an EIS file descriptor.
That gets consumed by libei, which is the proper user-space mechanism for emulated input on Wayland compositors that support it.
Consent and control
The compositor remains in charge the entire time.
- The user approves access
- The session is mediated
- Access can be revoked
- No root is required
- No
/dev/input hacks are required
- No X11 compatibility layer is required
That is how this should work.
What portal-use Does
portal-use is an MCP-compatible computer-use server for Wayland desktops that uses the real Linux desktop stack instead of pretending X11 still runs the world.
It uses:
- XDG Desktop Portal for access brokering
- PipeWire for screen capture
- libei / EIS for input injection
- Starlette + Uvicorn for the server
- MCP StreamableHTTP for AI tool integration
The result is a computer-use server that can control a real GNOME Wayland desktop without X11, without Xvfb, and without root.
What Actually Took Time
The architecture looks straightforward on paper. The real work was in the implementation details nobody bothers to write down.
ei_device_start_emulating() Is Mandatory
This one wasted an absurd amount of time.
The libei docs mention ei_device_start_emulating(), but they do not make it clear enough that input events can be silently dropped if you send them before that call happens at the correct time.
No warning.
No exception.
No log message.
Your events just disappear.
The sequence matters:
EI_EVENT_SEAT_ADDED
EI_EVENT_DEVICE_ADDED
EI_EVENT_DEVICE_RESUMED
Only after EI_EVENT_DEVICE_RESUMED should you call:
ei_device_start_emulating(device, sequence);
Only then should you send motion, button, or keyboard events.
If you treat DEVICE_ADDED as "ready", your automation looks connected but does nothing.
c_float vs c_double
The libei pointer motion functions expect double.
If you pass the wrong ctypes type, your coordinates get mangled in a way that looks like bad math, bad scaling, or bad monitor transforms.
Wrong:
_libei.ei_device_pointer_motion_absolute(dev, x, y)
Right:
_libei.ei_device_pointer_motion_absolute(
dev,
ctypes.c_double(x),
ctypes.c_double(y),
)
A tiny ABI mismatch can turn precise clicks into ghost clicks that land near the target but not on it. That kind of bug is extra annoying because it makes you distrust your entire coordinate pipeline.
Clicks Worked, but the Cursor Did Not Move
This one was sneaky.
At one point, clicks worked correctly. Menus opened. Hover states triggered. Navigation happened. The compositor was clearly receiving the events.
But the cursor itself did not visibly move.
On GNOME 50, those are not always the same thing.
There is a difference between:
- where the compositor delivers pointer interaction
- where the visible hardware cursor overlay appears on screen
The fix was to register two devices:
- one absolute pointer device for exact event delivery
- one relative pointer device for visual cursor motion
The move function sends motion to both.
The relative device updates the visible cursor.
The absolute device anchors exact click coordinates.
That split was the missing piece.
The Hot Corner Trap
After homing the cursor to (0, 0), the pointer ends up in the top-left corner.
On GNOME, that is the Activities hot corner.
So the very first slight motion can trigger the overview and derail the session before the agent has even started doing useful work.
The fix was simple: move to a safe location immediately after connecting.
safe_x = width * 0.05
safe_y = height * 0.5
Tiny detail. Huge quality-of-life improvement.
persist_mode Can Deadlock on GNOME
The ScreenCast portal supports persistence options so that users do not have to re-approve access constantly.
In practice, on GNOME 46 through 50, setting persist_mode in a combined RemoteDesktop + ScreenCast flow can cause SelectSources to hang forever without returning a response.
So the production workaround became:
- try session persistence with a timeout
- if it hangs, remove
persist_mode
- continue without it
That means access is approved once per login session rather than once forever. Not ideal, but completely livable.
The Production Shape
The best UX ended up being a user daemon.
The architecture now looks like this:
systemd user service
→ portal-use server
→ portal session established at login
→ consent granted once
→ persistent MCP endpoint on localhost
Then the AI client connects like this:
Claude Code / Claude Desktop
→ HTTP MCP connection
→ portal-use
→ real Wayland desktop
That solves the worst part of portal-based automation: repeated startup friction. Approve once at login, then the session stays alive.
What Works Right Now
Tested on Ubuntu 26.04 RC, GNOME 50, and NVIDIA RTX 5070.
Current working features:
- full desktop screenshots via PipeWire
- exact-coordinate left, right, and middle click
- double-click
- click-and-drag
- scroll in all directions
- full keyboard input, including modifiers and combos
computer_zoom for high-resolution crops of small UI regions
- visible cursor movement on screen
- works with Claude Code and Claude Desktop
- daemon mode with consent once per login session
I have already used it to search for and launch apps, open Files from a dock context menu, and navigate Chrome, all on a real Wayland desktop with no X11 in the stack.
Why This Matters
Ubuntu 26.04 LTS will be the supported Linux desktop for years.
That means developers, enterprises, researchers, and power users are going to be running modern Wayland desktops as the default, not as an experiment. AI tools that still rely on X11 assumptions are going to age badly and fast.
On macOS, AI desktop control uses the native accessibility stack.
On Windows, it uses native automation APIs.
On Linux, too many projects still jump straight to Xvfb and VNC like the real desktop is unavailable.
But the real desktop is available.
GNOME already ships the pieces:
xdg-desktop-portal-gnome
- PipeWire
- libei
- a consent-based compositor-controlled workflow
The stack exists.
The protocols exist.
The tooling just has not caught up.
A fake X display inside a container is not a real Linux desktop integration. It is a workaround that happens to run on Linux.
What I Want to Port Next
portal-use is part of a broader effort: porting AI desktop tooling to the native Wayland stack before Ubuntu 26.04 LTS makes the old assumptions impossible to ignore.
The target architecture looks like this:
| Layer | Native Linux Stack |
| Screen capture | PipeWire via XDG Desktop Portal ScreenCast |
| Input injection | libei via XDG Desktop Portal RemoteDesktop |
| Window info | AT-SPI2 accessibility tree |
| OCR | Tesseract on zoomed crops |
| Integration | MCP |
If you are building AI tooling for Linux desktops, stop building for the compatibility layer.
Build for Wayland.
Use the portal.
Use the compositor-mediated path.
Use the stack Linux desktops are actually shipping.
Get It
git clone https://github.com/johnohhh1/portal_use
cd portal_use
bash install.sh
Requires: Ubuntu 26.04+ or another GNOME Wayland compositor, plus Claude Code or Claude Desktop.
Issues, PRs, and testing notes are welcome, especially from people trying it on KDE Plasma Wayland.
Built on Ubuntu 26.04 RC + GNOME 50 + RTX 5070, because somebody had to.