Isolating Xwayland in a VM - Thomas Leonard's blog

In my last post, Qubes-lite with KVM and Wayland, I described setting up a Qubes-inspired Linux system that runs applications in virtual machines. A Wayland proxy running in each VM connects its applications to the host Wayland compositor over virtwl, allowing them to appear on the desktop alongside normal host applications. In this post, I extend this to support X11 applications using Xwayland.

Table of Contents

Overview
Introduction to X11
Running Xwayland
The X11 protocol
Initialising the window manager
Windows
Performance
Pointer events
Keyboard events
Pointer cursor
Selections
Drag-and-drop
Bonus features
Conclusions

( this post also appeared on Hacker News )

Overview

A graphical desktop typically allows running multiple applications on a single display (e.g. by showing each application in a separate window). Client applications connect to a server process (usually on the same machine) and ask it to display their windows.

Until recently, this service was an X server, and applications would communicate with it using the X11 protocol. However, on newer systems the display is managed by a Wayland compositor, using the Wayland protocol.

Many older applications haven't been updated yet. Xwayland can be used to allow unmodified X11 applications to run in a Wayland desktop environment. However, setting this up wasn't as easy as I'd hoped. Ideally, Xwayland would completely isolate the Wayland compositor from needing to know anything about X11:

Fantasy Xwayland architecture

However, it doesn't work like this. Xwayland handles X11 drawing operations, but it doesn't handle lots of other details, including window management (e.g. telling the Wayland compositor what the window title should be), copy-and-paste, and selections. Instead, the Wayland compositor is supposed to connect back to Xwayland over the X11 protocol and act as an X11 window manager to provide the missing features:

Actual Xwayland architecture

This is a problem for several reasons:

It means that every Wayland compositor has to implement not only the new Wayland protocol, but also the old X11 protocol.
The compositor is part of the trusted computing base (it sees all your keystrokes and window contents) and this adds a whole load of legacy code that you'd need to audit to have confidence in it.
It doesn't work when running applications in VMs, because each VM needs its own Xwayland service and existing compositors can only manage one.

Because Wayland (unlike X11) doesn't allow applications to mess with other applications' windows, we can't have a third-party application act as the X11 window manager. It wouldn't have any way to ask the compositor to put Xwayland's surfaces into a window frame, because Xwayland is a separate application.

There is another way to do it, however. As I mentioned in the last post, I already had to write a Wayland proxy (wayland-proxy-virtwl) to run in each VM and relay Wayland messages over virtwl, so I decided to extend it to handle Xwayland too. As a bonus, the proxy can also be used even without VMs, avoiding the need for any X11 support in Wayland compositors at all. In fact, I found that doing this avoided several bugs in Sway's built-in Xwayland support.

Sommelier already has support for this, but it doesn't work for the applications I want to use. For example, popup menus appear in the center of the screen, text selections don't work, and it generally crashes after a few seconds (often with the error xdg_surface has never been configured). So instead I'd been using ssh -Y vm from the host to forward X11 connections to the host's Xwayland, managed by Sway. That works, but it's not at all secure.

Introduction to X11

Unlike Wayland, where applications are mostly unaware of each other, X is much more collaborative. The X server maintains a tree of windows (rectangles) and the applications manipulate it. The root of the tree is called the root window and fills the screen. You can see the tree using the xwininfo command, like this:

$ xwininfo -tree -root

xwininfo: Window id: 0x47 (the root window) (has no name)

  Root window id: 0x47 (the root window) (has no name)
  Parent window id: 0x0 (none)
     9 children:
     0x800112 "~/Projects/wayland/wayland-proxy-virtwl": ("ROX-Filer" "ROX-Filer")  2184x2076+0+0  +0+0
        1 child:
        0x800113 (has no name): ()  1x1+-1+-1  +-1+-1
     0x800123 (has no name): ()  1x1+-1+-1  +-1+-1
     0x800003 "ROX-Filer": ()  10x10+-100+-100  +-100+-100
     0x800001 "ROX-Filer": ("ROX-Filer" "ROX-Filer")  10x10+10+10  +10+10
        1 child:
        0x800002 (has no name): ()  1x1+-1+-1  +9+9
     0x600002 "main.ml (~/Projects/wayland/wayland-proxy-virtwl) - GVIM1": ("gvim" "Gvim")  1648x1012+0+0  +0+0
        1 child:
        0x600003 (has no name): ()  1x1+-1+-1  +-1+-1
     0x600007 (has no name): ()  1x1+-1+-1  +-1+-1
     0x600001 "Vim": ("gvim" "Gvim")  10x10+10+10  +10+10
     0x200002 (has no name): ()  1x1+0+0  +0+0
     0x200001 (has no name): ()  1x1+0+0  +0+0

This tree shows the windows of two X11 applications, ROX-Filer and GVim, as well as various invisible utility windows (mostly 1x1 or 10x10 pixels in size).

Applications can create, move, resize and destroy windows, draw into them, and request events from them. The X server also allows arbitrary data to be attached to windows in properties. You can see a window's properties with xprop. Here are some of the properties on the GVim window:

$ xprop -id 0x600002
WM_HINTS(WM_HINTS):
		Client accepts input or input focus: True
		Initial state is Normal State.
		window id # of group leader: 0x600001
_NET_WM_WINDOW_TYPE(ATOM) = _NET_WM_WINDOW_TYPE_NORMAL
WM_NORMAL_HINTS(WM_SIZE_HINTS):
		program specified minimum size: 188 by 59
		program specified base size: 188 by 59
		window gravity: NorthWest
WM_CLASS(STRING) = "gvim", "Gvim"
WM_NAME(STRING) = "main.ml (~/Projects/wayland/wayland-proxy-virtwl) - GVIM1"
...

The X server itself doesn't know anything about e.g. window title bars. Instead, a window manager process connects and handles that. A window manager is just another X11 application. It asks to be notified when an application tries to show ("map") a window inside the root, and when that happens it typically creates a slightly larger window (with room for the title bar, etc) and moves the other application's window inside that.

This design gives X a lot of flexibility. All kinds of window managers have been implemented, without needing to change the X server itself. However, it is very bad for security. For example:

Open an xterm.
Use xwininfo to find its window ID (you need the nested child window, not the top-level one).
Run xev -id 0x80001b -event keyboard in another window (using the ID you got above).
Use sudo or similar inside xterm and enter a password.

As you type the password into xterm, you should see the characters being captured by xev. An X application can easily spy on another application, send it synthetic events, etc.

Running Xwayland

Xwayland is a version of the xorg X server that treats Wayland as its display hardware. If you run it as e.g. Xwayland :1 then it opens a single Wayland window corresponding to the X root window, and you can use it as a nested desktop. This isn't very useful, because these windows don't fit in with the rest of your desktop. Instead, it is normally used in rootless mode, where each child of the X root window may have its own Wayland window.

$ WAYLAND_DEBUG=1 Xwayland :1 -rootless
[3991465.523]  -> wl_display@1.get_registry(new id wl_registry@2)
[3991465.531]  -> wl_display@1.sync(new id wl_callback@3)
...

When run this way, however, no windows actually appear. If we run DISPLAY=:1 xterm then we see Xwayland creating some buffers, but no surfaces:

[4076460.506]  -> wl_shm@4.create_pool(new id wl_shm_pool@15, fd 9, 540)
[4076460.520]  -> wl_shm_pool@15.create_buffer(new id wl_buffer@24, 0, 9, 15, 36, 0)
[4076460.526]  -> wl_shm_pool@15.destroy()
...

We need to run Xwayland as Xwayland :1 -rootless -wm FD, where FD is a socket we will use to speak the X11 protocol and act as a window manager.

It's a little hard to find information about Xwayland's rootless mode, because "rootless" has two separate common meanings in xorg:

Running xorg without root privileges.
Using xorg's miext/rootless extension to display application windows on some other desktop.

After a while, it became clear that Xwayland's rootless mode isn't either of these, but a third xorg feature also called "rootless".

The X11 protocol

libxcb provides C bindings to the X11 protocol, but I wanted to program in OCaml. Luckily, the X11 protocol is well documented, and generating the messages directly didn't look any harder than binding libxcb, so I wrote a little OCaml library to do this (ocaml-x11).

At first, I hard-coded the messages. For example, here's the code to delete a property on a window:

module Delete = struct
  [%%cstruct
    type req = {
      window : uint32_t;
      property : uint32_t;
    } [@@little_endian]
  ]

  let send t window property =
    Request.send_only t ~major:19 sizeof_req @@ fun r ->
    set_req_window r window;
    set_req_property r property
end

I'm using the cstruct syntax extension to let me define the exact layout of the message body. Here, it generates sizeof_req, set_req_window and set_req_property automatically.

After a bit, I discovered that there are XML files in xcbproto describing the X11 protocol. This provides a Python library for parsing the XML, which you can use by writing a Python script for your language of choice. For example, this glorious 3394 line Python script generates the C bindings. After studying this script carefully, I decided that hard-coding everything wasn't so bad after all.

I ended up having to implement more messages than I expected, including some surprising ones like OpenFont (see x11.mli for the final list). My implementation came to 1754 lines of OCaml, which is quite a bit shorter than the Python generator script, so I guess I still came out ahead!

In the X11 protocol, client applications send requests and the server sends replies, errors and events. Most requests don't produce replies, but can produce errors. Replies and errors are returned immediately, so if you see a response to a later request, you know all previous ones succeeded. If you care about whether a request succeeded, you may need to send a dummy message that generates a reply after it. Since message sequence numbers are 16-bit, after sending 0xffff consecutive requests without replies, you should send a dummy one with a reply to resynchronise (but window management involves lots of round-trips, so this isn't likely to be a problem for us). Events can be sent by the server at any time.

Unlike Wayland, which is very regular, X11 has various quirks. For example, every event has a sequence number at offset 2, except for KeymapNotify.

Initialising the window manager

Using Xwayland -wm FD actually prevents any client applications from connecting at all at first, because Xwayland then waits for the window manager to be ready before accepting any client connections.

To fix that, we need to claim ownership of the WM_S0 selection. A "selection" is something that can be owned by only one application at a time. Selections were originally used to track ownership of the currently-selected text, and later also used for the clipboard. WM_S0 means "Window Manager for Screen 0" (Xwayland only has one screen).

(* Become the window manager. This allows other clients to connect. *)
let* wm_sn = intern t ~only_if_exists:false ("WM_S" ^ string_of_int i) in
X11.Selection.set_owner x11 ~owner:(Some root) ~timestamp:`CurrentTime wm_sn

Instead of passing things like WM_S0 as strings in each request, X11 requires us to first intern the string. This returns a unique 32-bit ID for it, which we use in future messages. Because intern may require a round-trip to the server, it returns a promise, and so we use let* instead of let to wait for that to resolve before continuing. let* is defined in the Lwt.Syntax module, as an alternative to the more traditional >>= notation.

This lets our clients connect. However, Xwayland still isn't creating any Wayland surfaces. By reading the Sommelier code and stepping through Xwayland with a debugger, I found that I needed to enable the Composite extension.

Composite was originally intended to speed up redraw operations, by having the server keep a copy of every top-level window's pixels (even when obscured), so that when you move a window it can draw it right away without asking the application for help. The application's drawing operations go to the window's buffer, and then the buffer is copied to the screen, either automatically by the X server or manually by the window manager. Xwayland reuses this mechanism, by turning each window buffer into a Wayland surface. We just need to turn that on:

let* composite = X11.Composite.init x11 in
let* () = X11.Composite.redirect_subwindows composite ~window:root ~update:`Manual in

This says that every child of the root window should use this system. Finally, we see Xwayland creating Wayland surfaces:

-> wl_compositor@5.create_surface id:+28

Now we just need to make them appear on the screen!

Windows

As usual for Wayland, we need to create a role object and attach it to the surface. This tells Wayland whether the surface is a window or a dialog, for example, and lets us set the title, etc.

But first we have a problem: we need to know which X11 window corresponds to each Wayland surface. For example, we need the title, which is stored in a property on the X11 window. Xwayland does this by sending the new window a ClientMessage event of type WL_SURFACE_ID containing the Wayland ID. We don't get this message by default, but it seems that selecting SubstructureRedirect on the root does the trick.

SubstructureRedirect is used by window managers to intercept attempts by other applications to change the children of the root window. When an application asks the server to e.g. map a window, the server just forwards the request to the window manager. Operations performed by the window manager itself do not get redirected, so it can just perform the same request the client wanted, or make any changes it requires.

In our case, we don't actually need to modify the request, so we just re-perform the original map operation:

let event_handler = object (_ : X11.Event.handler)
  method map_request ~window = X11.Window.map x11 window

  method client_message ~window ~ty body =
      if ty = wl_surface_id then (
        let wayland_id = Cstruct.LE.get_uint32 body 0 in
        Log.info (fun f -> f "X window %a corresponds to Wayland surface %ld" X11.Window.pp window wayland_id);
        pair_when_ready ~x11 t window wayland_id
      )

Having two separate connections to Xwayland is quite annoying, because messages can arrive in any order. We might get the X11 ClientMessage first and need to wait for the Wayland create_surface, or we might get the create_surface first and need to wait for the ClientMessage.

An added complication is that not all Wayland surfaces correspond to X11 windows. For example, Xwayland also creates surfaces representing cursor shapes, and these don't have X11 windows. However, when we get the ClientMessage we can be sure that a Wayland message is on the way, so I just pause the X11 event handling until that has arrived:

(* We got an X11 message saying X11 [window] corresponds to Wayland surface [wayland_id].
   Turn [wayland_id] into an xdg_surface. If we haven't seen that surface yet, wait until it appears
   on the Wayland socket. *)
let rec pair_when_ready ~x11 t window wayland_id =
  match Hashtbl.find_opt t.unpaired wayland_id with
  | None ->
    Log.info (fun f -> f "Unknown Wayland object %ld; waiting for surface to be created..." wayland_id);
    let* () = Lwt_condition.wait t.unpaired_added in
    pair_when_ready ~x11 t window wayland_id
  | Some { client_surface = _; host_surface; set_configured } ->
    Log.info (fun f -> f "Setting up Wayland surface %ld using X11 window %a" wayland_id X11.Xid.pp window);
    Hashtbl.remove t.unpaired wayland_id;
    Lwt.async (fun () -> pair t ~set_configured ~host_surface window);
    Lwt.return_unit

Another complication is that Wayland doesn't allow you to attach a buffer to a surface until the window has been "configured". Doing so is a protocol error, and Sway will disconnect us if we try! But Xwayland likes to attach the buffer immediately after creating the surface.

To avoid this, I use a queue:

Xwayland asks to create a surface.
We forward this to Sway, add its ID to the unpaired map, and create a queue for further events.
Xwayland asks us to attach a buffer, etc. We just queue these up.
We get the ClientMessage over the X11 connection and create a role for the new surface.
Sway sends us a configure event, confirming it's ready for the buffer.
We forward the queued events.

However, this creates a new problem: if the surface isn't a window then the events will be queued forever. To fix that, when we get a create_surface we also do a round-trip on the X11 connection. If the window is still unpaired when that returns then we know that no ClientMessage is coming, and we flush the queue.

X applications like to create dummy windows for various purposes (e.g. receiving clipboard data), and we need to avoid showing those. They're normally set as override_redirect so the window manager doesn't handle them, but Xwayland redirects them anyway (it needs to because otherwise e.g. tooltips wouldn't appear at all). I'm trying various heuristics to detect this, e.g. that override redirect windows with a size of 1x1 shouldn't be shown.

If Sway asks us to close a window, we need to relay that to the X application using the WM_DELETE_WINDOW protocol, if it supports that:

let toplevel = Xdg_surface.get_toplevel xdg_surface @@ object
    inherit [_] Xdg_toplevel.v1

    method on_close _ =
      Lwt.async (fun () ->
          let* x11 = t.x11 in
          let* wm_protocols = X11.Atom.intern x11 "WM_PROTOCOLS"
          and* wm_delete_window = X11.Atom.intern x11 "WM_DELETE_WINDOW" in
          let* protocols = X11.Property.get_atoms x11 window wm_protocols in
          if List.mem wm_delete_window protocols then (
            let data = Cstruct.create 8 in
            Cstruct.LE.set_uint32 data 0 (wm_delete_window :> int32);
            Cstruct.LE.set_uint32 data 4 0l;
            X11.Window.send_client_message x11 window ~fmt:32 ~propagate:false ~event_mask:0l ~ty:wm_protocols data;
          ) else (
            X11.Window.destroy x11 window
          )
        )
  end

Wayland defaults to using client-side decorations (where the application draws its own window decorations). X doesn't do that, so we need to turn it off (if the Wayland compositor supports the decoration manager extension):

t.decor_mgr |> Option.iter (fun decor_mgr ->
    let decor = Xdg_decor_mgr.get_toplevel_decoration decor_mgr ~toplevel @@ object
        inherit [_] Xdg_decoration.v1
        method on_configure _ ~mode:_ = ()
      end
    in
    Xdg_decoration.set_mode decor ~mode:Xdg_decoration.Mode.Server_side
  )

Dialog boxes are more of a problem. Wayland requires every dialog box to have a parent window, but X11 doesn't. To handle that, the proxy tracks the last window the user interacted with and uses that as a fallback parent if an X11 window with type _NET_WM_WINDOW_TYPE_DIALOG is created without setting WM_TRANSIENT_FOR. That could be a problem if the application closes that window, but it seems to work.

Performance

I noticed a strange problem: scrolling around in GVim had long pauses once a second or so, corresponding to OCaml GC runs. This was surprising, as OCaml has a fast incremental garbage collector, and is normally not a problem for interactive programs. Besides, I'd been using the proxy with the (Wayland) Firefox and xfce4-terminal applications for 6 months without any similar problem.

Using perf showed that Linux was spending a huge amount of time in release_pages. The problem is that Xwayland was sharing lots of short-lived memory pools with the proxy. Each time it shares a pool, we have to ask the VM host for a chunk of memory of the same size. We map both pools into our address space and then copy each frame across (this is needed because we can't export guest memory to the host).

Normally, an application shares a single pool and just refers to regions within it, so we just map once at startup and unmap at exit. But Xwayland was creating, sharing and discarding around 100 pools per second while scrolling in GVim! Because these pools take up a lot of RAM, OCaml was (correctly) running the GC very fast, freeing them in batches of 100 or so each second.

First, I tried adding a cache of host memory, but that only solved half the problem: freeing the client pool was still slow.

Another option is to unmap the pools as soon as we get the destroy message, to spread the work out. Annoyingly, OCaml's standard library doesn't let you free memory-mapped memory explicitly (see the Add BigArray.Genarray.free PR for the current status), but adding this myself with a bit of C code would have been easy enough. We only touch the memory in one place (for the copy), so manually checking it hadn't been freed would have been pretty safe.

Then I noticed something interesting about the repeated log entries, which mostly looked like this:

-> wl_shm@4.create_pool id:+26 fd:(fd) size:8368360
-> wl_shm_pool@26.create_buffer id:+28 offset:0 width:2090 height:1001 stride:8360 format:1
-> wl_shm_pool@26.destroy 
<- wl_display@1.delete_id id:26
-> wl_buffer@28.destroy 
<- wl_display@1.delete_id id:28

Xwayland creates a pool, allocates a buffer within it, destroys the pool (so it can't create more buffers), and then deletes the buffer. But it never uses the buffer for anything!

So the solution was simple: I just made the host buffer allocation and the mapping operations lazy. We force the mapping if a pool's buffer is ever attached to a surface, but if not we just close the FD and forget about it. Would be more efficient if Xwayland only shared the pools when needed, though.

Pointer events

Wayland delivers pointer events relative to a surface, so we simply forward these on to Xwayland unmodified and everything just works.

I'm kidding - this was the hardest bit! When Xwayland gets a pointer event on a window, it doesn't send it directly to that window. Instead, it converts the location to screen coordinates and then pushes the event through the old X event handling mechanism, which looks at the X11 window stack to decide where to send it.

However, the X11 window stack (which we saw earlier with xwininfo -tree -root) doesn't correspond to the Wayland window layout at all. In fact, Wayland doesn't provide us any way to know where our windows are, or how they are stacked.

Sway seems to handle this via a backdoor: X11 applications do get access to location information even though native Wayland clients don't. This is one of the reasons I want to get X11 support out of the compositor - I want to make sure X11 apps don't have any special access. Sommelier has a solution though: when the pointer enters a window we raise it to the top of the X11 stack. Since it's the topmost window, it will get the events.

Unfortunately, the raise request goes over the X11 connection while the pointer events go over the Wayland one. We need to make sure that they arrive in the right order. If the computer is running normally, this isn't much of a problem, but if it's swapping or otherwise struggling it could result in events going to the wrong place (I temporarily added a 2-second delay to test this). This is what I ended up with:

Get a wayland pointer enter event from Sway.
Pause event delivery from Sway.
Flush any pending Wayland events we previously sent to Xwayland by doing a round-trip on the Wayland connection.
Send a raise on the X11 connection.
Do a round-trip on the X11 connection to ensure the raise has completed.
Forward the enter event on the Wayland connection.
Unpause the event stream from Sway.

At first I tried queuing up just the pointer events, but that doesn't work because e.g. keyboard events need to be synchronised with pointer events. Otherwise, if you e.g. Shift-click on something then the click gets delayed but the Shift doesn't and it can do the wrong thing. Also, Xwayland might ask Sway to destroy the window while we're entering it, and Sway might confirm the deletion. Pausing the whole event stream from Sway fixes all these problems.

The next problem was how to do the two round-trips. For X11 we just send an Intern request after the raise and wait to get a reply to that. Wayland provides the wl_display.sync method to clients, but we're acting as a Wayland server to Xwayland, not a client. I remembered that Wayland's xdg-shell extension provides a ping from the server to the client (the compositor can use this to detect when an application is not responding). Unfortunately, Xwayland has no reason to use this extension because it doesn't deal with window roles. Luckily, it uses it anyway (it does need it for non-rootless mode and doesn't bother to check).

wl_display.sync works by creating a fresh callback object, but xdg-shell's ping just sends a pong event to a fixed object, so we also need a queue to keep track of pings in flight so we don't get confused between our pings and any pings we're relaying for Sway. Also, xdg-shell's ping requires a serial number and we don't have one. But since Xwayland is the only app this needs to support, and it doesn't look at that, I cheat and just send zero.

And that's how to get pointer events to go to the right window with Xwayland.

Keyboard events

A very similar problem exists with the keyboard. When Wayland says the focus has entered a window we need to send a SetInputFocus over the X11 connection and then send the keyboard events over the Wayland one, requiring another two round-trips to synchronise the two connections.

Pointer cursor

Some applications set their own pointer shape, which works fine. But others rely on the default and for some reason you get no cursor at all in that case. To fix it, you need to set a cursor on the root window, which applications will then inherit by default. Unlike Wayland, where every application provides its own cursor bitmaps, X very sensibly provides a standard set of cursors, in a font called cursor (this is why I had to implement OpenFont). As cursors have two colours and a mask, each cursor is two glyphs: even numbered glyphs are the image and the following glyph is its mask:

(* Load the default cursor image *)
let* cursor_font = X11.Font.open_font x11 "cursor" in
let* default_cursor = X11.Font.create_glyph_cursor x11
    ~source_font:cursor_font ~mask_font:cursor_font
    ~source_char:68 ~mask_char:69
    ~bg:(0xffff, 0xffff, 0xffff)
    ~fg:(0, 0, 0)
in
X11.Window.create_attributes ~cursor:default_cursor ()
|> X11.Window.change_attributes x11 root

Selections

The next job was to get copying text between X and Wayland working.

In X11:

When you select something, the application takes ownership of the PRIMARY selection.
When you click the middle button or press Shift-Insert, the application requests PRIMARY.
When you press Ctrl-C, the application takes ownership of the CLIPBOARD selection.
When you press Ctrl-V it requests CLIPBOARD.

It's quite neat that adding support for a Windows-style clipboard didn't require changing the X server at all. Good forward-thinking design there.

In Wayland, things are not so simple. I have so far found no less than four separate Wayland protocols for copying text:

gtk_primary_selection supports copying the primary selection, but not the clipboard.
wp_primary_selection_unstable_v1 is identical to gtk_primary_selection except that it renames everything.
wl_data_device_manager supports clipboard transfers but not the primary selection.
zwlr_data_control_manager_v1 supports both, but it's for a "privileged client" to be a clipboard manager.

gtk_primary_selection and wl_data_device_manager both say they're stable, while the other two are unstable. However, Sway dropped support for gtk_primary_selection a while ago, breaking many applications (luckily, I had a handy Wayland proxy and was able to add some adaptor code to route gtk_primary_selection messages to the new "unstable" protocol).

For this project, I went with wp_primary_selection_unstable_v1 and wl_data_device_manager. On the Wayland side, everything has to be written twice for the two protocols, which are almost-but-not-quite the same. In particular, wl_data_device_manager also has a load of drag-and-drop stuff you need to ignore.

For each selection (PRIMARY or CLIPBOARD), we can be in one of two states:

An X11 client owns the selection (and we own the Wayland selection).
A Wayland client owns the selection (and we own the X11 selection).

When we own a selection we proxy requests for it to the matching selection on the other protocol.

At startup, we take ownership of the X11 selection, since there are no X11 apps running yet.
When we lose the X11 selection it means that an X11 client now owns it and we take the Wayland selection.
When we lose the Wayland selection it means that a Wayland client now owns it and we take the X11 selection.

One good thing about the Wayland protocols is that you send the data by writing it to a normal Unix pipe. For X11, we need to write the data to a property on the requesting application's window and then notify it about the data. And we may need to split it into multiple chunks if there's a lot of data to transfer.

A strange problem I had was that, while pasting into GVim worked fine, xterm would segfault shortly after trying to paste into it. This turned out to be a bug in the way I was sending the notifications. If an X11 application requests the special TEXT target, it means that the sender should choose the exact format. You write the property with the chosen type (e.g. UTF8_STRING), but you must still send the notification with the target TEXT. xterm is a C application (thankfully no longer set-uid!) and seems to have a use-after-free bug in the timeout code.

Drag-and-drop

Sadly, I wasn't able to get this working at all. X itself doesn't know anything about drag-and-drop and instead applications look at the window tree to decide where the user dropped things. This doesn't work with the proxy, because Wayland doesn't tell us where the windows really are on the screen.

Even without any VMs or proxies, drag-and-drop from X applications to Wayland ones doesn't work, because the X app can't see the Wayland window and the drop lands on the X window below (if any).

Bonus features

In the last post, I mentioned several other problems, which have also now been solved by the proxy:

HiDPI works

Wayland's support for high resolution screens is a bit strange. I would have thought that applications really only need to know two things:

The size in pixels of the window.
The size in pixels you want some standard thing (e.g. a normal-sized letter M).

Some systems instead provide the size of the window and the DPI (dots-per-inch), but this doesn't work well. For example, a mobile phone might be high DPI but still want small text because you hold it close to your face, while a display board will have very low DPI but want large text.

Wayland instead redefines the idea of pixel to be a group of pixels corresponding to a single pixel on a typical 1990's display. So if you set your scale factor to 2 then 1 Wayland pixel is a 2x2 grid of physical pixels. If you have a 1000x1000 pixel window, Wayland will tell the application it is 500x500 but suggest a scale factor of 2. If the application supports HiDPI mode, it will double all the numbers and render a 1000x1000 image and things work correctly. If not, it will render a 500x500 pixel image and the compositor will scale it up.

Since Xwayland doesn't support this, it just draws everything too small and Sway scales it up, creating a blurry and unusable mess. This might be made worse by subpixel rendering, which doesn't cope well with being scaled.

With the proxy, the solution is simple enough: when talking to Xwayland we just scale everything back up to the real dimensions, scaling all coordinates as we relay them:

let scale_to_client t (x, y) =
  x * t.config.xunscale,
  y * t.config.xunscale

method on_configure _ ~width ~height ~states:_ =
  let width = Int32.to_int width in
  let height = Int32.to_int height in
  if width > 0 && height > 0 then (
    Lwt.async (fun () ->
        let (width, height) = scale_to_client t (width, height) in
        X11.Window.configure x11 window ~width ~height ~border_width:0
      )
  )

This will tend to make things sharp but too small, but X applications already have their own ways to handle high resolution screens. For example, you can set Xft.dpi to make all the fonts bigger. I run this proxy like this, which works for me:

wayland-proxy-virtwl --x-display=0 --xrdb Xft.dpi:150 --x-unscale=2

However, there is a problem. The Wayland specification says:

The new size of the surface is calculated based on the buffer size transformed by the inverse buffer_transform and the inverse buffer_scale. This means that at commit time the supplied buffer size must be an integer multiple of the buffer_scale. If that's not the case, an invalid_size error is sent.

Let's say we have an X11 image viewer that wants to show a 1001-pixel-high image in a 1001-pixel-high window. This isn't allowed by the spec, which can only handle even-sized windows when the scale factor is 2. Regular Wayland applications already have to deal with that somehow, but for X11 applications it becomes our problem.

I tried rounding down, but that has a bad side-effect: if GTK asks for a 1001-pixel high menu and gets a 1000 pixel allocation, it switches to squashed mode and draws two big bumper arrows at the top and bottom of the menu which you must use to scroll it. It looks very silly.

I also tried rounding up, but tooltips look bad with any rounding. Either one border is missing, or it's double thickness. Luckily, it seems that Sway doesn't actually enforce the rule about surfaces being a multiple of the scale factor. So, I just let the application attach a buffer of whatever size it likes to the surface and it seems to work!

The only problem I had was that when using unscaling, the mouse pointer in GVim would get lost. Vim hides it when you start typing, but it's supposed to come back when you move the mouse. The problem seems to be that it hides it by creating a 1x1 pixel cursor. Sway decides this isn't worth showing (maybe because it's 0x0 in Wayland-pixels?), and sends Xwayland a leave event saying the cursor is no longer on the screen. Then when Vim sets the cursor back, Xwayland doesn't bother updating it, since it's not on screen!

The solution was to stop applying unscaling to cursors. They look better doubled in size, anyway. True, this does mean that the sharpness of the cursor changes as you move between windows, but you're unlikely to notice this due to the far more jarring effect of Wayland cursors also changing size and shape at the same time.

Ring-buffer logging

Even without a proxy to complicate things, Wayland applications often have problems. To make investigating this easier, I added a ring-buffer log feature. When on, the proxy keeps the last 512K or so of log messages in memory, and will dump them out on demand.

To use it, you run the proxy with e.g. -v --log-ring-path ~/wayland.log. When something odd happens (e.g. an application crashes, or opens its menus in the wrong place) you can dump out the ring buffer and see what just happened with:

echo dump-log > /run/user/1000/wayland-1-ctl

I also added some filtering options (e.g. --log-suppress motion,shm) to suppress certain classes of noisy messages.

Vim windows open correctly

One annoyance with Sway is that Vim's window always appears blank (even when running on the host, without any proxy). You have to resize it before you can see the text.

My proxy initially suffered from the same problem, although only intermittently. It turned out to be because Vim sends a ConfigureRequest with its desired size and then waits for the confirmation message. Since Sway is a tiling window manager, it ignores the new size and no event is generated. In this case, an X11 window manager is supposed to send a synthetic ConfigureNotify, so I just got the proxy to do that and the problem disappeared (I confirmed this by adding a sleep to Vim's gui_mch_update).

By the way, the GVim start-up code is quite interesting. The code path to opening the window goes though three separate functions which each define a static int recursive = 0 and then proceed to behave differently depending on how many times they've been reentered - see gui_init for an example!

Copy-and-paste without ^M characters

The other major annoyance with Sway is that copy-and-paste doesn't work correctly (Sway bug #1839). Using the proxy avoids that problem completely.

Conclusions

I'm not sure how I feel about this project. It ended up taking a lot longer than I expected, and I could probably have ported several X11 applications to Wayland in the same time. On the other hand, I now have working X support in the VMs with no need for ssh -Y from the host, plus support for HiDPI in Wayland, mouse cursors that are large enough to see easily, windows that open reliably, text pasting that works, and I can get logs whenever something misbehaves.

In fact, I'm now also running an instance of the proxy directly on the host to get the same benefits for host X11 applications. Setting this up is actually a bit tricky: you want to start Sway with DISPLAY=:0 so that every application it spawns knows it has an X11 display, but if you set that then Sway thinks you want it to run nested inside an X window provided by the proxy, which doesn't end well (or, indeed, at all).

Having all the legacy X11 support in a separate binary should make it much easier to write new Wayland compositors, which might be handy if I ever get some time to try that. It also avoids having many thousands of lines of legacy C code in the highly-trusted compositor code.

If Wayland had an official protocol for letting applications know the window layout then I could make drag-and-drop between X11 applications within the same VM work, but it still wouldn't work between VMs or to Wayland applications, so it's probably not worth it.

Having two separate connections to Xwayland creates a lot of unnecessary race conditions. A simple solution might be a Wayland extension that allows the Wayland server to say "please read N bytes from the X11 socket now", and likewise in the other direction. Then messages would always arrive in the order in which they were sent.

The code is all available at https://github.com/talex5/wayland-proxy-virtwl if you want to try it. It works with the applications I use when running under Sway, but will probably require some tweaking for other programs or compositors. Here's a screenshot of my desktop using it:

Screenshot of my desktop

The windows with [dev] in the title are from my Debian VM, while [com] is a SpectrumOS VM I use for email, etc. Gitk, GVim and ROX-Filer are X11 applications using Xwayland, while Firefox and xfce4-terminal are using plain Wayland proxying.