Wayland

The Wayland Protocol(s)

Every now and then I see people arguing about X vs Wayland. I don't particularly care if people prefer one or the other, but a lot of the things people say about Wayland are wrong, or at least give an incorrect impression through omission, confusing terminology, or otherwise. Since those impressions can cause concrete problems, this is my attempt to clarify things.

Wayland is a protocol that allows Wayland clients to communicate with a Wayland server. The protocol is defined an XML file, wayland.xml, which lives in this git repository. The details of exactly how the requests and events are transmitted is usually called the "wire protocol", which we don't really care about for the purposes of this article, it's just an implementation detail.

If you look at this file closely, you might notice that there's not very much in it. For example, it doesn't say anything about how you create a window. You need an extension to do this. You'll need another protocol, from the wayland-protocols package, which lives in this git repository, called xdg-shell. This protocol lets you do most of the "normal GUI app" type of things, but it's just another XML file. Every compositor implements this protocol, it would be pretty much completely useless otherwise.

I've seen people say "Wayland needs an extension to do x/y/z, which is such a basic thing, how awful". I don't think this really means anything, it's just taking the connotations of the word "extension" out of the context where it acquired those connotations. Here it just means a protocol that isn't wayland.xml, nothing more. For example, it doesn't mean anything to say that the xdg-decoration protocol is "just an extension", any more than it means anything to say that the X Keyboard Extension (XKB), is "just an extension".

However, some people use "technically Wayland has nothing to do with x/y/z" as an excuse to design software in a way that I'm not convinced makes sense.

Compositing

A Wayland compositor is not the same thing as an X compositor. At a basic level, compositing means combining buffers to put them onto the screen, but it also has some other meanings. "Wayland always does compositing" also means that compositors reserve the right to edit client buffers before putting them on the screen, and there's no way for clients to avoid that possibility. The easiest way to composite buffers is to copy them into a larger buffer, then display that on the screen. Notably, compositing doesn't mean the same thing, or have the same implications, as being forced to run an X compositor constantly. On X, you might run a compositor to add shadows or other visual effects, or to avoid screen tearing (though I don't think that's always necessary). There are downsides to running a compositor with X, namely that it increases latency. However, Wayland's design allows you to reduce the latency associated with avoiding screen tearing to an arbitrarily short delay. With a naive vsync implementation, if a client takes 3ms to render its output, and your monitor displays frames every 16ms, then you'll be displaying a 13ms old frame by the time you put it on the screen. Wayland compositors send frame callbacks, which means you can delay the 3ms of rendering until just before the frame is needed, minimising latency. For example, if you send a frame callback to the client 4ms before putting in on the screen, the frame will be only 1ms old when it's displayed.

Some people also treat the word "compositing" as the opposite of "tearing". Screen tearing is possible with the tearing-control protocol. Clients can request "asynchronous presentation", meaning tearing is possible if you submit a buffer during display scanout. Reading through the protocol we can see that "The compositor is free to [...] ignore this hint", meaning that it can refuse the client's request. In general, the compositor always has this kind of veto power, so users can enable or disable this behaviour as they wish. I think this protocol wasn't created earlier because people thought that most of the reasons to permit tearing didn't apply (a position I also hold), but even if they don't, some people want to permit tearing in certain programs, so it was created to allow them to do so (which I also believe is a good thing).

Direct scanout is an important feature of compositors. Often, it's unnecessary to copy buffers. The simplest example is fullscreen windows. The compositor can simply send the buffer directly to the screen. This idea can be extended to more complicated cases with more work and more code, which you can research if you're interested. ("dmabuf feedback" and "KMS planes" are useful keywords.) These optimisations are important because it means that the compositor is only doing hard work when it's actually necessary. In many cases the compositor will take shortcuts, even if it has the capability to do complicated things. When this includes "skipping the compositing step entirely", complaining about "forced compositing" seems to miss the point.

Security

If a program is running as your user, it can do anything your user has permission to do. If this includes editing your compositor config and reloading the compositor, then it doesn't really make sense to have a line saying "don't let this program take screenshots", since it has permission to edit that line anyway. If you want to run software securely, you have to use some kind of sandboxing mechanism. Wayland isn't automatically secure, but it can be secured, since the protocols don't rely on potentially security-sensitive operations to function. The security-context protocol defines a way for compositors to restrict clients based on information received from a sandboxing engine, like Flatpak.

The misconception that I'm addressing here is the idea that protocols to take screenshots or record the screen somehow defeat the security of Wayland, because they allow arbitrary clients to read pixel data. This simply isn't true, because access to these protocols can be restricted with a sandboxing mechanism, which is necessary for a secure system in the first place.

Colour Management

There are two parts to colour management. The first is compositors supporting transforming the output for a monitor via an ICC profile, which is just a feature in a program, and doesn't require any new protocols. I think Weston is the only compositor that has this feature at the time of writing.

The second part of colour management is negotiating colour spaces with clients, which does require a protocol extension, which is being developed. If you're interested in tracking that work it shouldn't be hard to find out what's going on. HDR is also related to this sort of stuff.

Input Methods

In general, IMEs work by circumventing window systems. Ibus and Fcitx can both be used this way. If you look at a guide to set one of these up, you'll likely find instructions to set environment variables like GTK_IM_MODULE=ibus, which allows GTK apps to find and talk to Ibus. This works regardless of whether you're on X or Wayland.

However, this is a pretty bad way to solve the problem — what if your app doesn't use any kind of GUI library? There are two Wayland protocols, input-method, which is used by IMEs, and text-input, which is used by clients that receive IME input (i.e. anything you type into), which allow you to write in all the languages you'd expect. This is saner and more reliable, but less well supported.

Since the input-method protocol is reasonably simple, it's easy to write an IME compared to writing a big "IME framework" like Ibus, or even writing a plugin for an IME framework. Unsurprisingly, lots of people haven written them. They're usually called something like wlpinyin, with "pinyin" substituted for whatever input method you want to use.

Networking

Waypipe is similar to ssh -X, but with Wayland.

In Wayland, not much data is transferred over the wire protocol. Instead of sending pixel data directly over the wire, you might pass a file descriptor as a handle to the buffer containing the pixels. To transparently run your client on another computer from the compositor, you can use a proxy, like Waypipe. From the author's blog: "File descriptors [...] are replaced by messages that tell the remote Waypipe instance to create a file descriptor with matching properties".

All of this is just software, and doesn't require any protocol extensions. That means you can add features without having to coordinate an ecosystem. For example, some clients, like text editors, will usually update only a small part of their window every frame, whereas others, like games, usually have frames that are completely different. Waypipe accommodates both of these use-cases well. Wayland's damage tracking infrastructure makes it easy to find out what part of the buffer has been updated, and only transfer the new pixels, minimising data transferred for small updates. You can also use generic compression like LZ4 to further reduce bandwidth requirements. For clients which constantly update the whole buffer, you can use a lossy video codec like H264 or AV1, and set a bitrate depending on available bandwidth.

Wayland is generic enough to be used over a network, and doesn't mandate that you implement this any particular way. The protocols say "what to do" instead of "how to do it", which means you can be flexible in your approach, and any bad design decisions you make in some software don't become critical pillars of legacy code that other applications rely on.

Global Keybinds

In Wayland, when keys are pressed, either it's a keybind used by the compositor, in which case the compositor consumes the key event, or it isn't, and the key event is forwarded to a client. Usually, only one client sees each keypress. This means that clients can't implement global keybinds by listening to all keypresses and waiting for one that they care about.

If you were implementing global keybinds as a Wayland protocol, which is perfectly possible, it would probably work like this:

This avoids the problem with "who should get this key event", which was the reason for how keypresses were handled in Wayland in the first place. Right now, this protocol doesn't exist, I think partially because of people saying that it would be impossible, or automatically insecure, which isn't true and discourages people, and then because they're discouraged, nobody bothers. Obviously, a full specification would be more complicated and handle more edge cases like "what if two clients want to bind the same key" (sometimes that might be exactly what you wanted, so it shouldn't be disallowed outright), but the point is that it's possible and doesn't conflict with anything about how Wayland works fundamentally.

The mechanism of implementing global keybinds by listening to all keypresses is not available, but that doesn't mean the feature is fundamentally impossible to implement. Saying "Wayland won't ever do global keybinds" is misleading. Repeating this incorrect idea affects client developers, who instead of proposing and standardising a protocol, put "Wayland doesn't support this" into an issue tracker and leave it at that. Obviously, they have no obligation to do this, but the idea that things like this are something fundamental about Wayland, rather than "no one's written that yet", means that it's less likely that someone will write it.

On a personal note, one problem I've always had is that I'd try to bind a key on my mouse to push-to-talk in voice chat apps, but that key was also passed to clients, so when I'd go to speak, I'd also accidentally tell my browser to go to the previous tab. Being able to bind keys in a way that doesn't pass the key to other clients would be nice. The opposite would also be useful, but I don't feel any need to explain that since it's already the de-facto default.

Notable Protocols

Many of these protocols are notable because they do things people commonly claim(ed) is impossible, or that you can't do directly using Wayland. Unfortunately, some of the people making those claims develop Wayland software, and are misleading in ways that are only obvious if you already know quite a lot about Wayland, which is unlikely for any random person. Why wouldn't you believe someone who works on a compositor when they say "Wayland will never support server-side decorations"? Why wouldn't you think that when they said "Wayland" they actually meant Wayland in general instead of GNOME's compositor, Mutter? Developers have a responsibility to communicate clearly, and with good intentions.

It's hard for me to believe that these kind of claims are made in good faith, when they're by people who are ostensibly experts on the subject. Even if they are in good faith, then they still have a responsibility to be accurate. The consequences of being misleading by accident are the same as doing it on purpose.

It's hard for me to believe that when someone says "technically Wayland has nothing to do with screensharing", that they would also say "technically Wayland has nothing to do with opening windows". It's hard for me to believe that they would use the second statement (which is equally correct — and equally misleading and pedantic), to justify using a side-channel like Pipewire to implement an equivalent of the xdg-shell protocol extension. Wayland has nothing in particular to do with anything — it's just IPC. Acting as if using that IPC, to get data about Wayland surfaces from a Wayland compositor into a Wayland client over a Wayland protocol, is somehow using Wayland for something it "technically has nothing to do with" is completely ridiculous.

xdg-decoration

This protocol was originally developed by KDE, and a later, slightly tweaked version of it was included in the xdg- namespace in wayland-protocols. "Decorations" refers to things like window borders and titlebars. Client-side decorations (CSD) are drawn by the client, and server-side decorations (SSD) are drawn by the compositor. This protocol allows the client and server to negotiate who will draw the decorations. CSD vs SSD has historically been a source of a lot of controversy, but this is essentially a solved problem now, as far as Wayland is concerned. That doesn't mean every compositor does something reasonable, but that's their problem, not Wayland's.

cursor-shape

This protocol was added to wayland-protocols in February 2023. There were some problems with cursor themes in the past, which led to some apps not supporting cursor themes at all. However, the compositor always knows the cursor theme, because it needs to draw a cursor over its own surfaces. This protocol allows clients to tell the compositor "use the 'grabbing hand' cursor" and the compositor will display that cursor. This means instead of passing around buffers containing cursor images, clients can just set the cursor shape by name. Considering how simple this solution is, the heatedness of arguments about this topic doesn't seem appropriate.

wlr-layer-shell

This protocol was developed in early 2018, so has been available for 5 years at time of writing. It can be used to build desktop components, like taskbars, lock screens and screensavers (combined with the input-inhibitor protocol), wallpaper programs, overlays, application launchers, notification popups, and more. All compositors that want people to be able to use these kinds of apps have implemented this protocol.

ext-session-lock

This protocol was submitted to wayland-protocols in December 2021, and merged about a month later. It can also be used to make screensavers, but it differs from the previous wlr-layer-shell plus input-inhibitor approach by specifying that the screen stays locked if the client dies. This means it's not a security bug if your program crashes (though it would be better if it didn't).

ext-screencopy

This protocol has been submitted to wayland-protocols, but not yet merged. It lets you record the screen and take screenshots. It supports damage tracking for video and allows capturing specific application windows. It's based on the wlr-screencopy protocol, which has been available since 2018, and implemented in clients like wf-recorder, and in OBS through the wlrobs plugin (available since early 2019). The ext- version of the protocol is an improved version of the wlr- one, since the ext- namespace didn't exist when it was originally written.