Linux provides the KMS (Kernel Mode Setting) API to let applications query and configure display settings. It's used by Wayland compositors and other programs that need to configure the hardware directly. I found the C API a little verbose and hard to follow so I made libdrm-ocaml, which lets us run commands interactively in a REPL.
We'll start by discovering what hardware is available and how it's currently configured, then configure a monitor to display a simple bitmap, and then finally render a 3D animation. The post should be a useful introduction to KMS even if you don't know OCaml.
( this post also appeared on Hacker News )
Table of Contents
- Running it yourself
- Querying the current state
- Making changes
- 3D rendering
- Linux VTs
- Debugging
- Conclusions
Running it yourself
If you want to follow along, you'll need to install libdrm-ocaml and an interactive REPL like utop. With Nix, you can set everything up like this:
git clone https://github.com/talex5/libdrm-ocaml
cd libdrm-ocaml
nix develop
dune utop
You should see a utop # prompt, where you can enter OCaml expressions.
Use ;; to tell the REPL you've finished typing and it's time to evaluate, e.g.
1 2 | |
Alternatively, you can install things using opam (OCaml's package manager):
opam install libdrm utop
utop
Then, at the utop prompt enter #require "libdrm";; (including the leading #).
Querying the current state
Before changing anything, we'll start by discovering what hardware is available.
I'll introduce the API as we go along, but you can check the API reference docs if you want more information.
Finding devices
To list available graphics devices:
1 2 3 4 5 6 7 8 9 10 | |
libdrm scans the /dev/dri/ directory looking for devices.
It uses stat to find the device major and minor numbers and uses the virtual /sys filesystem to get information about each one.
This is a PCI device, and the information corresponds to the values from lspci, e.g.
$ lspci -nns 0:1:0.0
01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI]
Baffin [Radeon RX 550 640SP / RX 560/560X] [1002:67ff] (rev ff)
Each graphics device can have a primary and a render node. The primary node gives full access to the device, including configuring monitors, while the render node just allows applications to render scenes to memory. In the last post I was using the render to node to create a 3D image, and then sending it to the Wayland compositor for display. This time we'll be doing the display ourselves, so we need to open the primary node:
1 2 | |
To check the driver version:
1 2 3 | |
If you're familiar with the C API, this corresponds to the drmGetVersion function,
and Drm.Device.list corresponds to drmGetDevices2;
I reorganised things a bit to make better use of OCaml's modules.
Listing resources
Let's see what resources we've got to play with:
1 2 3 4 5 6 7 8 | |
Note: The Kernel Mode Setting functions are in the Drm.Kms module.
The C API calls these functions drmMode*, but I found that confusing as
e.g. drmModeGetResources sounds like you're asking for the resources of a mode.
A CRTC is a CRT Controller, and typically controls a single monitor (known as a Cathode Ray Tube for historical reasons). Framebuffers provide image data to a CRTC (we create framebuffers as needed). Connectors correspond to physical connectors (e.g. where you plug in a monitor cable). An Encoder encodes data from the CRTC for a particular connector.
Resources diagram (simplified)
Connectors
To save a bit of typing, I'll create an alias for the Drm.Kms module:
1
| |
You could also open Drm.Kms to avoid needing any prefix, but I'll keep using K for clarity.
To get details for the first connector (the head of the list):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | |
This is DisplayPort connector 1 (usually called DP-1) and it's currently Connected.
The connector also says which modes are available on the connected monitor.
I was lucky in that the first connector was the one I'm using,
but really we should get all the connectors and filter them to find the connected ones.
List.map can be used to run get on each of them:
1 2 3 4 5 | |
Then to filter:
1 2 3 4 5 6 | |
We'll investigate c, the first connected one:
1 2 3 | |
A note on IDs
In the libdrm C API, IDs are just integers. To avoid mix-ups, I made them distinct types in the OCaml API. For example, if you try to use an encoder ID as a connector ID:
1 2 3 4 5 6 | |
Normally this is what you want, but for interactive use it's annoying that you can't just pass a plain integer. e.g.
1 2 3 4 | |
You can get any kind of ID with Drm.Id.of_int (e.g. K.Connector.get dev (Drm.Id.of_int 71)),
but that's still a bit verbose, so you might prefer to (re)define a prefix operator for it, e.g.
1 2 3 | |
(note: ! is the only single-character prefix operator available in OCaml)
Modes
Modes are shown in abbreviated form in the connector output. To see the full list:
1 2 3 4 5 6 7 8 9 10 | |
Note: I annotated various pretty-printer functions with [@@ocaml.toplevel_printer],
which causes utop to use them by default to display values of the corresponding type.
For example, showing a list of modes uses this short summary form.
Displaying an individual mode shows all the information.
Here's the first mode:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | |
Properties
Some resources can also have extra properties.
Use get_properties to fetch them:
1 2 3 4 5 6 | |
Linux only returns a subset of the properties until you enable the atomic feature. Let's turn that on now:
1 2 | |
(Module.(expr) is a short-hand that brings all of Module's symbols into scope for expr,
so we don't have to repeat the module name for both set and atomic)
And getting the properties again, we now have an extra CRTC_ID,
telling us which controller this connector is currently using:
1 2 3 4 5 6 | |
Encoders
The Linux documentation says:
Those are really just internal artifacts of the helper libraries used to implement KMS drivers. Besides that they make it unnecessarily more complicated for userspace to figure out which connections between a CRTC and a connector are possible, and what kind of cloning is supported, they serve no purpose in the userspace API. Unfortunately encoders have been exposed to userspace, hence can’t remove them at this point. Furthermore the exposed restrictions are often wrongly set by drivers, and in many cases not powerful enough to express the real restrictions.
OK. Well, let's take a look anyway:
1 2 3 4 5 6 7 | |
Note: We need Option.get here because a connector might not have an encoder set yet.
Where the C API uses 0 to indicate no resource,
the OCaml API uses None to force us to think about that case.
As the documentation says, the encoder is mainly useful to get the CRTC ID:
1 2 | |
We could instead have got that directly from the connector using its properties:
1 2 | |
CRT Controllers
1 2 3 4 5 6 7 | |
An active CRTC has a mode set (presumably from the connector's list of supported modes), and a framebuffer with the image to be displayed.
If I keep calling Crtc.get, I see that it is sometimes showing framebuffer 93 and sometimes 94.
My Wayland compositor (Sway) updates one framebuffer while the other is being shown, then switches which one is displayed.
Framebuffers
My CRTC is currently displaying the contents of framebuffer 93:
1 2 | |
1 2 3 4 5 6 7 | |
A framebuffer has up to 4 framebuffer planes (not to be confused with CRTC planes; see later), each of which references a buffer object (also known as a BO and referenced with a GEM handle).
This framebuffer is using the XR24 format, where there is a single BO with 32 bits for each pixel
(8 for red, 8 green, 8 blue and 8 unused).
Some formats use e.g. a separate buffer for each component
(or a different part of the same buffer, using offset).
Modern graphics cards also support format modifiers, but my card is too old so I just get None.
Linux's fourcc.h header file describes the various formats and modifiers.
Modifiers seem to be mainly used to specify the tiling.
I don't have permission to see the buffer object, so it appears as (handle = None).
The pitch is the number of bytes from one row to the next (also known as the stride).
Here, the 15360 is simply the width (3840) multiplied by the 4 bytes per pixel.
CRTC planes
In fact, Crtc.get is an old API that only covers the basic case of a single framebuffer.
In reality, a CRTC can combine multiple CRTC planes, which for some reason aren't returned with the other resources
and must be requested separately:
1 2 | |
(note: you need to enable "atomic" mode before requesting planes; we already did that above)
1 2 3 4 5 6 7 8 9 10 11 12 | |
A lot of these planes aren't being used (don't have a CRTC), which we can check for with a helper function:
1 2 | |
Looks like Sway is using two planes at the moment:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | |
More information is available as properties:
1 2 3 4 5 6 7 8 9 10 11 | |
- Plane 52 is a
Primaryplane and is using framebuffer 93 (as we saw before). - Plane 55 is a
Cursorplane, using framebuffer 98 (and theAR24format, with alpha/transparency).
A plane chooses which part of the frame buffer to show (SRC_X, SRC_Y, SRC_W and SRC_H)
and where it should appear on the screen (CRTC_X, CRTC_Y, CRTC_W and CRTC_H).
The source values are in 16.16 format (i.e. shifted left 16 bits).
Oddly, Plane.get returned crtc_x,crtc_y = 0,0 for both planes, but
the properties show the correct cursor location (CRTC_X = 3105; CRTC_Y = 1518;).
Having the cursor on a separate plane avoids having to modify the main screen image whenever the mouse pointer moves, which is good for low latency (especially if the GPU is busy rendering something else at the time), power consumption (the GPU can stay powered down), and allows showing an application's buffer full screen without the compositor needing to modify the application's buffer.
You might also have some Overlay planes,
which can be useful for displaying video.
My graphics card seems to be too old for that.
Expanded resources diagram
Here's an expanded diagram showing some more possibilities:
- Some framebuffer formats take the input data from multiple buffers.
- A framebuffer can be shared by multiple CRTCs (perhaps with each plane showing a different part of it).
- A CRTC can have multiple planes (e.g. primary and cursor).
- A single CRTC can show the same image on multiple monitors.
Making changes
If I try turning off the CRTC (by setting the mode to None) from my desktop environment it fails:
1 2 | |
The reason is that I'm currently running a graphical desktop and Sway owns the device
(so my dev is not the DRM "master"):
1 2 | |
That can be fixed by switching to a different VT (e.g. with Ctrl-Alt-F2) and running it there. However, this will result in a second problem: I won't be able to see what I'm doing!
If you have a second computer then you can SSH in and test things out from there, but for simplicity we'll leave the utop REPL at this point and write some programs instead.
For example, query.ml shows the information we discovered above:
dune exec -- ./examples/query.exe
1 2 3 4 | |
Non-atomic mode setting
Linux provides two ways to configure modes: the old non-atomic API and the newer atomic one.
examples/nonatomic.ml contains a simple example of the older (but simpler) API.
It starts by finding a device (the first one with a primary node supporting KMS), then
finds all connected connectors (as we did above), and calls show_test_page on each one:
1 2 3 4 5 6 | |
restoring_afterwards stores the current configuration, runs the callback,
and then puts things back to normal when that finishes (or you press Ctrl-C).
The program waits for 2 seconds after showing the test page before exiting.
show_test_page finds the CRTC (as we did above),
takes the first supported mode, creates a test framebuffer of that size,
and configures the CRTC to display it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | |
If the connector doesn't have a CRTC, we could find a suitable one and use that, but for simplicity the example just skips such connectors.
To run the example (switch away from any graphical desktop first or it won't work):
dune exec -- ./examples/nonatomic.exe
Dumb buffers
Typically the pixel data to be displayed comes from some complex rendering pipeline,
but Linux also provides dumb buffers for simple cases such as testing.
The Test_image.create function used above creates a dumb buffer with a test pattern:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | |
Dumb.create allocates memory for the image data.
Dumb.map makes it appear in host-memory as an OCaml bigarray.
The loop sets each 32-bit int in the image to some colour c.
Then we wrap this data up as an XR24-format framebuffer with a single plane:
1 2 3 4 | |
Atomic mode setting
examples/atomic.ml demonstrates the newer atomic API:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | |
The steps are:
- Use
set_exn atomicto enable atomic mode. - Create an atomic request (
rq). - Use
show_test_pageto populate it with the desired property changes. - (optional) Check that it will work (
~test_only:true). - Commit the changes (
Atomic_req.commit).
The advantage here is that either all changes are successfully applied at once or nothing changes. This avoids various problems with flickering or trying to roll back partial changes.
show_test_page needs a couple of modifications.
First, we have to find a plane (rather than using the old Crtc.set which assumes a single plane),
and then we set the plane's FB_ID property to the new framebuffer in the request:
1
| |
For the example, I actually set more properties and defined an operator to make the code a bit neater:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | |
In libdrm-ocaml, properties are typed, so you can't forget to convert the source values to fixed point format.
3D rendering
The examples above use a dumb-buffer, but it's fairly simple to replace that with a Vulkan buffer.
The code in the last post exported the image memory from Vulkan as a dmabuf FD and sent it to the Wayland compositor.
Now, instead of sending it we just need to import it into our device (with Drm.Dmabuf.to_handle)
and use that handle instead of the dumb-buffer one.
I added a simple surface abstraction to the test code, wrapping the Window module's API
so that the rendering code doesn't need to care whether it's rendering to a Wayland window or directly to the screen.
Then I made a Vt module implementing the new Surface.t type for rendering directly to a Linux VT.
To get the animation working, I used K.Crtc.page_flip to update the framebuffer (I could also have used the atomic API).
The kernel waits until the encoder has finishing sending the current frame before switching to the new one,
which avoids tearing.
We also need to ask the kernel to tell us when this happens, which is done by setting the optional ~event argument to some number.
You can read events from the device file and parse them with Drm.Event.parse.
If you want to try it, this should produce an animated room:
git clone https://github.com/talex5/vulkan-test -b kms-3d
cd vulkan-test
nix develop
make download-example
dune exec -- ./src/main.exe 10000 viking_room.obj viking_room.png
If run with $WAYLAND_DISPLAY set, it will open a Wayland window (as before),
but if run from a text console then it should render the animation directly using KMS.
Linux VTs
When the user switches to another virtual terminal (e.g. with Ctrl-Alt-F3),
we should call Drm.Device.drop_master to give up being the master,
allowing the application running on the new terminal to take over.
We should also switch the VT to KD_GRAPHICS mode while using it,
to stop the kernel trying to manage it.
I didn't implement either of these features, but see How VT-switching works for details.
Debugging
If you get an unhelpful error code from the kernel (e.g. EINVAL), enabling debug messages is often helpful.
Writing 4 to /sys/module/drm/parameters/debug enables KMS debug messages, which can be seen in the dmesg output.
Write 0 to the file afterwards to turn the messages off again.
modinfo -p drm lists the various options.
Conclusions
I hope you found being able to explore the libdrm API interactively from the OCaml top-level made it easier to learn about how Linux manages displays. As when doing Vulkan in OCaml, a lot of the noise from C is removed and I think that the essentials of what is going on are easier to see.
I used ocaml-ctypes for the C bindings, and this was my first time using it in "stubs" mode (where it pre-generates C bindings from OCaml definitions). This has the advantage that the C type checker checks that the definitions are correct, and it worked well. Dune's Stub Generation feature generates the build rules for this semi-automatically.
Deciding what OCaml types to use for the C types was quite difficult.
For example, C has many different integer types (int, long, uint32_t, etc),
but using lots of types is more painful in OCaml where e.g. + only works on int.
I used OCaml's int type when possible, and other types only when the value might not fit
(e.g. an image size on a 32-bit platform might not fit into an OCaml int, which is one bit shorter).
The C API is somewhat inconsistent about types.
e.g. drmModePageFlipTarget takes a uint32_t target_vblank argument for the sequence number,
while page_flip_handler confirms the event by giving it as unsigned int sequence.
Meanwhile, the sequence_handler event gives it as uint64_t sequence.
I'm not sure what happens if the sequence number gets too large to fit in a 32-bit integer.
Anyway, I think I understand mode setting a lot better now, and I'm getting faster at debugging graphics problems on Linux (e.g. when element-desktop failed to start recently after I updated it).
Thanks to the OCaml Software Foundation for sponsoring this work.