A Unikernel Firewall for QubesOS - Thomas Leonard's blog

QubesOS provides a desktop operating system made up of multiple virtual machines, running under Xen. To protect against buggy network drivers, the physical network hardware is accessed only by a dedicated (and untrusted) "NetVM", which is connected to the rest of the system via a separate (trusted) "FirewallVM". This firewall VM runs Linux, processing network traffic with code written in C.

In this blog post, I replace the Linux firewall VM with a MirageOS unikernel. The resulting VM uses safe (bounds-checked, type-checked) OCaml code to process network traffic, uses less than a tenth of the memory of the default FirewallVM, boots several times faster, and should be much simpler to audit or extend.

Table of Contents

Qubes
- Qubes networking
- Problems with FirewallVM
A Unikernel Firewall
Summary

( this post also appeared on Reddit and Hacker News )

Qubes

QubesOS is a security-focused desktop operating system that uses virtual machines to isolate applications from each other. The screenshot below shows my current desktop. The windows with green borders are running Fedora in my "comms" VM, which I use for gmail and similar trusted sites (with NoScript). The blue windows are from a Debian VM which I use for software development. The red windows are another Fedora VM, which I use for general browsing (with flash, etc) and running various untrusted applications:

Another Fedora VM ("dom0") runs the window manager and drives most of the physical hardware (mouse, keyboard, screen, disks, etc).

Networking is a particularly dangerous activity, since attacks can come from anywhere in the world and handling network hardware and traffic is complex. Qubes therefore uses two extra VMs for networking:

NetVM drives the physical network device directly. It runs network-manager and provides the system tray applet for configuring the network.
FirewallVM sits between the application VMs and NetVM. It implements a firewall and router.

The full system looks something like this:

The lines between VMs in the diagram above represent network connections. If NetVM is compromised (e.g. by exploiting a bug in the kernel module driving the wifi card) then the system as a whole can still be considered secure - the attacker is still outside the firewall.

Besides traditional networking, all VMs can communicate with dom0 via some Qubes-specific protocols. These are used to display window contents, tell VMs about their configuration, and provide direct channels between VMs where appropriate.

Qubes networking

There are three IP networks in the default configuration:

192.168.1.* is the external network (to my house router).
10.137.1.* is a virtual network connecting NetVM to the firewalls (you can have multiple firewall VMs).
10.137.2.* connects the app VMs to the default FirewallVM.

Both NetVM and FirewallVM perform NAT, so packets from "comms" appear to NetVM to have been sent by the firewall, and packets from the firewall appear to my house router to have come from NetVM.

Each of the AppVMs is configured to use the firewall (10.137.2.1) as its DNS resolver. FirewallVM uses an iptables rule to forward DNS traffic to its resolver, which is NetVM.

Problems with FirewallVM

After using Qubes for a while, there are a number of things about the default FirewallVM that I'm unhappy about:

It runs a full Linux system, which uses at least 300 MB of RAM. This seems excessive.
It takes several seconds to boot.
There is a race somewhere setting up the DNS redirection. Adding some debug to track down the bug made it disappear.
The iptables configuration is huge and hard to understand.

There is another, more serious, problem. Xen virtual network devices are implemented as a client ("netfront") and a server ("netback"), which are Linux kernel modules in sys-firewall. In a traditional Xen system, the netback driver runs in dom0 and is fully trusted. It is coded to protect itself against misbehaving client VMs. Netfront, by contrast, assumes that netback is trustworthy. The Xen developers only considers bugs in netback to be security critical.

In Qubes, NetVM acts as netback to FirewallVM, which acts as a netback in turn to its clients. But in Qubes, NetVM is supposed to be untrusted! So, we have code running in kernel mode in the (trusted) FirewallVM that is talking to and trusting the (untrusted) NetVM!

For example, as the Qubes developers point out in Qubes Security Bulletin #23, the netfront code that processes responses from netback uses the request ID quoted by netback as an index into an array without even checking if it's in range (they have fixed this in their fork).

What can an attacker do once they've exploited FirewallVM's trusting netfront driver? Presumably they now have complete control of FirewallVM. At this point, they can simply reuse the same exploit to take control of the client VMs, which are running the same trusting netfront code!

A Unikernel Firewall

I decided to see whether I could replace the default firewall ("sys-firewall") with a MirageOS unikernel. A Mirage unikernel is an OCaml program compiled to run as an operating system kernel. It pulls in just the code it needs, as libraries. For example, my firewall doesn't require or use a hard disk, so it doesn't contain any code for dealing with block devices.

If you want to follow along, my code is on GitHub in my qubes-mirage-firewall repository. The README explains how to build it from source. For testing, you can also just download the mirage-firewall-bin-0.1.tar.bz2 binary kernel tarball. dom0 doesn't have network access, but you can proxy the download through another VM:

[tal@dom0 ~]$ cd /tmp
[tal@dom0 tmp]$ qvm-run -p sys-net 'wget -O - https://github.com/talex5/qubes-mirage-firewall/releases/download/0.1/mirage-firewall-bin-0.1.tar.bz2' > mirage-firewall-bin-0.1.tar.bz2
[tal@dom0 tmp]$ tar tf mirage-firewall-bin-0.1.tar.bz2 
mirage-firewall/
mirage-firewall/vmlinuz
mirage-firewall/initramfs
mirage-firewall/modules.img
[tal@dom0 ~]$ cd /var/lib/qubes/vm-kernels/
[tal@dom0 vm-kernels]$ tar xf /tmp/mirage-firewall-bin-0.1.tar.bz2

The tarball contains vmlinuz, which is the unikernel itself, plus a couple of dummy files that Qubes requires to recognise it as a kernel (modules.img and initramfs).

Create a new ProxyVM named "mirage-firewall" to run the unikernel:

You can use any template, and make it standalone or not. It doesn't matter, since we don't use the hard disk.
Set the type to ProxyVM.
Select sys-net for networking (not sys-firewall).
Click OK to create the VM.
Go to the VM settings, and look in the "Advanced" tab.
- Set the kernel to mirage-firewall.
- Turn off memory balancing and set the memory to 32 MB or so (you might have to fight a bit with the Qubes GUI to get it this low).
- Set VCPUs (number of virtual CPUs) to 1.

(this installation mechanism is obviously not ideal; hopefully future versions of Qubes will be more unikernel-friendly)

You can run mirage-firewall alongside your existing sys-firewall and you can choose which AppVMs use which firewall using the GUI. For example, to configure "untrusted" to use mirage-firewall:

You can view the unikernel's log output from the GUI, or with sudo xl console mirage-firewall in dom0 if you want to see live updates.

If you want to explore the code but don't know OCaml, a good tip is that most modules (.ml files) have a corresponding .mli interface file which describes the module's public API (a bit like a .h file in C). It's usually worth reading those interface files first.

I tested initially with Qubes 3.0 and have just upgraded to the 3.1 alpha. Both seem to work.

Booting a Unikernel on Qubes

Qubes runs on Xen and a Mirage application can be compiled to a Xen kernel image using mirage configure --xen. However, Qubes expects a VM to provide three Qubes-specific services and doesn't consider the VM to be running until it has connected to each of them. They are qrexec (remote command execution), gui (displaying windows on the dom0 desktop) and QubesDB (a key-value store).

I wrote a little library, mirage-qubes, to implement enough of these three protocols for the firewall (the GUI does nothing except handshake with dom0, since the firewall has no GUI).

Here's the full boot code in my firewall, showing how to connect the agents:

unikernel.ml

let start () =
  let start_time = Clock.time () in
  Log_reporter.init_logging ();
  (* Start qrexec agent, GUI agent and QubesDB agent in parallel *)
  let qrexec = RExec.connect ~domid:0 () in
  let gui = GUI.connect ~domid:0 () in
  let qubesDB = DB.connect ~domid:0 () in
  (* Wait for clients to connect *)
  qrexec >>= fun qrexec ->
  let agent_listener = RExec.listen qrexec Command.handler in
  gui >>= fun gui ->
  Lwt.async (fun () -> GUI.listen gui);
  qubesDB >>= fun qubesDB ->
  Log.info "agents connected in %.3f s (CPU time used since boot: %.3f s)"
    (fun f -> f (Clock.time () -. start_time) (Sys.time ()));
  (* Watch for shutdown requests from Qubes *)
  let shutdown_rq = OS.Lifecycle.await_shutdown () >>= fun (`Poweroff | `Reboot) -> return () in
  (* Set up networking *)
  let net_listener = network qubesDB in
  (* Run until something fails or we get a shutdown request. *)
  Lwt.choose [agent_listener; net_listener; shutdown_rq] >>= fun () ->
  (* Give the console daemon time to show any final log messages. *)
  OS.Time.sleep 1.0

After connecting the agents, we start a thread watching for shutdown requests (which arrive via XenStore, a second database) and then configure networking.

Tips on reading OCaml

let x = ... defines a variable.
let fn args = ... defines a function.
Clock.time is the time function in the Clock module.
() is the empty tuple (called "unit"). It's used for functions that don't take arguments, or return nothing useful.
~foo is a named argument. connect ~domid:0 is like connect(domid = 0) in Python.
promise >>= f calls function f when the promise resolves. It's like promise.then(f) in JavaScript.
foo () >>= fun result -> is the asynchronous version of let result = foo () in.
return x creates an already-resolved promise (it does not make the function return).

Networking

The general setup is simple enough: we read various configuration settings (IP addresses, netmasks, etc) from QubesDB, set up our two networks (the client-side one and the one with NetVM), and configure a router to send packets between them:

unikernel.ml

  (* Set up networking and listen for incoming packets. *)
  let network qubesDB =
    (* Read configuration from QubesDB *)
    let config = Dao.read_network_config qubesDB in
    Logs.info "Client (internal) network is %a"
      (fun f -> f Ipaddr.V4.Prefix.pp_hum config.Dao.clients_prefix);
    (* Initialise connection to NetVM *)
    Uplink.connect config >>= fun uplink ->
    (* Report success *)
    Dao.set_iptables_error qubesDB "" >>= fun () ->
    (* Set up client-side networking *)
    let client_eth = Client_eth.create
      ~client_gw:config.Dao.clients_our_ip
      ~prefix:config.Dao.clients_prefix in
    (* Set up routing between networks and hosts *)
    let router = Router.create
      ~client_eth
      ~uplink:(Uplink.interface uplink) in
    (* Handle packets from both networks *)
    Lwt.join [
      Client_net.listen router;
      Uplink.listen uplink router
    ]

OCaml notes

config.Dao.clients_our_ip means the clients_our_ip field of the config record, as defined in the Dao module.
~client_eth is short for ~client_eth:client_eth - i.e. pass the value of the client_eth variable as a parameter also named client_eth.

The Xen virtual network layer

At the lowest level, networking requires the ability to send a blob of data from one VM to another. This is the job of the Xen netback/netfront protocol.

For example, consider the case of a new AppVM (Xen domain ID 5) being connected to FirewallVM (4). First, dom0 updates its XenStore database (which is shared with the VMs). It creates two directories:

/local/domain/4/backend/vif/5/0/
/local/domain/5/device/vif/0/

Each directory contains a state file (set to 1, which means initialising) and information about the other end.

The first directory is monitored by the firewall (domain 4). When it sees the new entry, it knows it has a new network connection to domain 5, interface 0. It writes to the directory information about what features it supports and sets the state to 2 (init-wait).

The second directory will be seen by the new domain 5 when it boots. It tells it that is has a network connection to dom 4. The client looks in the dom 4's backend directory and waits for the state to change to init-wait, the checks the supported features. It allocates memory to share with the firewall, tells Xen to grant access to dom 4, and writes the ID for the grant to the XenStore directory. It sets its own state to 4 (connected).

When the firewall sees the client is connected, it reads the grant refs, tells Xen to map those pages of memory into its own address space, and sets its own state to connected too. The two VMs can now use the shared memory to exchange messages (blocks of data up to 64 KB).

The reason I had to find out about all this is that the mirage-net-xen library only implemented the netfront side of the protocol. Luckily, Dave Scott had already started adding support for netback and I was able to complete that work.

Getting this working with a Mirage client was fairly easy, but I spent a long time trying to figure out why my code was making Linux VMs kernel panic. It turned out to be an amusing bug in my netback serialisation code, which only worked with Mirage by pure luck.

However, this did alert me to a second bug in the Linux netfront driver: even if the ID netback sends is within the array bounds, that entry isn't necessarily valid. Sending an unused ID would cause netfront to try to unmap someone else's grant-ref. Not exploitable, perhaps, but another good reason to replace this code!

The Ethernet layer

It might seem like we're nearly done: we want to send IP (Internet Protocol) packets between VMs, and we have a way to send blocks of data. However, we must now take a little detour down Legacy Lane...

Operating systems don't expect to send IP packets directly. Instead, they expect to be connected to an Ethernet network, which requires each IP packet to be wrapped in an Ethernet "frame". Our virtual network needs to emulate an Ethernet network.

In an Ethernet network, each network interface device has a unique "MAC address" (e.g. 01:23:45:67:89:ab). An Ethernet frame contains source and destination MAC addresses, plus a type (e.g. "IPv4 packet").

When a client VM wants to send an IP packet, it first broadcasts an Ethernet ARP request, asking for the MAC address of the target machine. The target machine responds with its MAC address. The client then transmits an Ethernet frame addressed to this MAC address, containing the IP packet inside.

If we were building our system out of physical machines, we'd connect everything via an Ethernet switch, like this:

This layout isn't very good for us, though, because it means the VMs can talk to each other directly. Normally you might trust all the machines behind the firewall, but the point of Qubes is to isolate the VMs from each other.

Instead, we want a separate Ethernet network for each client VM:

In this layout, the Ethernet addressing is completely pointless - a frame simply goes to the machine at the other end of the link. But we still have to add an Ethernet frame whenever we send a packet and remove it when we receive one.

And we still have to implement the ARP protocol for looking up MAC addresses. That's the job of the Client_eth module (dom0 puts the addresses in XenStore for us).

As well as sending queries, a VM can also broadcast a "gratuitous ARP" to tell other VMs its address without being asked. Receivers of a gratuitous ARP may then update their ARP cache, although FirewallVM is configured not to do this (see /proc/sys/net/ipv4/conf/all/arp_accept). For mirage-firewall, I just log what the client requested but don't let it update anything:

client_eth.ml

let input_gratuitous t frame =
  let open Arpv4_wire in
  let spa = Ipaddr.V4.of_int32 (get_arp_spa frame) in
  let sha = Macaddr.of_bytes_exn (copy_arp_sha frame) in
  match lookup t spa with
  | Some real_mac when Macaddr.compare sha real_mac = 0 ->
      Log.info "client suggests updating %s -> %s (as expected)"
	(fun f -> f (Ipaddr.V4.to_string spa) (Macaddr.to_string sha));
  | Some other_mac ->
      Log.warn "client suggests incorrect update %s -> %s (should be %s)"
	(fun f -> f (Ipaddr.V4.to_string spa) (Macaddr.to_string sha) (Macaddr.to_string other_mac));
  | None ->
      Log.warn "client suggests incorrect update %s -> %s (unexpected IP)"
	(fun f -> f (Ipaddr.V4.to_string spa) (Macaddr.to_string sha))

I'm not sure whether or not Qubes expects one client VM to be able to look up another one's MAC address. It sets /qubes-netmask in QubesDB to 255.255.255.0, indicating that all clients are on the same Ethernet network. Therefore, I wrote my ARP responder to respond on behalf of the other clients to maintain this illusion. However, it appears that my Linux VMs have ignored the QubesDB setting and used a netmask of 255.255.255.255. Puzzling, but it should work either way.

Here's the code that connects a new client virtual interface (vif) to our router (in Client_net):

client_net.ml

(** Connect to a new client's interface and listen for incoming frames. *)
let add_vif { Dao.domid; device_id; client_ip } ~router ~cleanup_tasks =
  Netback.make ~domid ~device_id >>= fun backend ->
  Log.info "Client %d (IP: %s) ready" (fun f ->
    f domid (Ipaddr.V4.to_string client_ip));
  ClientEth.connect backend >>= or_fail "Can't make Ethernet device" >>= fun eth ->
  let client_mac = Netback.mac backend in
  let iface = new client_iface eth client_ip client_mac in
  Router.add_client router iface;
  Cleanup.on_cleanup cleanup_tasks (fun () -> Router.remove_client router iface);
  let fixed_arp = Client_eth.ARP.create ~net:router.Router.client_eth iface in
  Netback.listen backend (fun frame ->
    match Wire_structs.parse_ethernet_frame frame with
    | None -> Log.warn "Invalid Ethernet frame" Logs.unit; return ()
    | Some (typ, _destination, payload) ->
        match typ with
        | Some Wire_structs.ARP -> input_arp ~fixed_arp ~eth payload
        | Some Wire_structs.IPv4 -> input_ipv4 ~client_ip ~router frame payload
        | Some Wire_structs.IPv6 -> return ()
        | None -> Logs.warn "Unknown Ethernet type" Logs.unit; Lwt.return_unit
  )

OCaml note: { x = 1; y = 2 } is a record (struct). { x = x; y = y } can be abbreviated to just { x; y }. Here we pattern-match on a Dao.client_vif record passed to the function to extract the fields.

The Netback.listen at the end runs a loop that communicates with the netfront driver in the client. Each time a frame arrives, we check the type and dispatch to either the ARP handler or, for IPv4 packets, the firewall code. We don't support IPv6, since Qubes doesn't either.

client_net.ml

let input_arp ~fixed_arp ~eth request =
  match Client_eth.ARP.input fixed_arp request with
  | None -> return ()
  | Some response -> ClientEth.write eth response

(** Handle an IPv4 packet from the client. *)
let input_ipv4 ~client_ip ~router frame packet =
  let src = Wire_structs.Ipv4_wire.get_ipv4_src packet |> Ipaddr.V4.of_int32 in
  if src = client_ip then Firewall.ipv4_from_client router frame
  else (
    Log.warn "Incorrect source IP %a in IP packet from %a (dropping)"
      (fun f -> f Ipaddr.V4.pp_hum src Ipaddr.V4.pp_hum client_ip);
    return ()
  )

OCaml note: |> is the "pipe" operator. x |> fn is the same as fn x, but sometimes it reads better to have the values flowing left-to-right. You can also think of it as the synchronous version of >>=.

Notice that we check the source IP address is the one we expect. This means that our firewall rules can rely on client addresses.

There is similar code in Uplink, which handles the NetVM side of things:

uplink.mk

let connect config =
  let ip = config.Dao.uplink_our_ip in
  Netif.connect "tap0" >>= or_fail "Can't connect uplink device" >>= fun net ->
  Eth.connect net >>= or_fail "Can't make Ethernet device for tap" >>= fun eth ->
  Arp.connect eth >>= or_fail "Can't add ARP" >>= fun arp ->
  Arp.add_ip arp ip >>= fun () ->
  let netvm_mac = Arp.query arp config.Dao.uplink_netvm_ip >|= function
    | `Timeout -> failwith "ARP timeout getting MAC of our NetVM"
    | `Ok netvm_mac -> netvm_mac in
  let my_ip = Ipaddr.V4 ip in
  let interface = new netvm_iface eth netvm_mac config.Dao.uplink_netvm_ip in
  return { net; eth; arp; interface; my_ip }

let listen t router =
  Netif.listen t.net (fun frame ->
    (* Handle one Ethernet frame from NetVM *)
    Eth.input t.eth
      ~arpv4:(Arp.input t.arp)
      ~ipv4:(fun _ip -> Firewall.ipv4_from_netvm router frame)
      ~ipv6:(fun _ip -> return ())
      frame
  )

OCaml note: Arp.input t.arp is a partially-applied function. It's short for fun x -> Arp.input t.arp x.

Here we just use the standard Eth.input code to dispatch on the frame. It checks that the destination MAC matches ours and dispatches based on type. We couldn't use it for the client code above because there we also want to handle frames addressed to other clients, which Eth.input would discard.

Eth.input extracts the IP packet from the Ethernet frame and passes that to our callback, but the NAT library I used likes to work on whole Ethernet frames, so I ignore the IP packet (_ip) and send the frame instead.

The IP layer

Once an IP packet has been received, it is sent to the Firewall module (either ipv4_from_netvm or ipv4_from_client, depending on where it came from).

The process is similar in each case:

Check if we have an existing NAT entry for this packet. If so, it's part of a conversation we've already approved, so perform the translation and send it on its way. NAT support is provided by the handy mirage-nat library.
If not, collect useful information about the packet (source, destination, protocol, ports) and check against the user's firewall rules, then take whatever action they request.

Here's the code that takes a client IPv4 frame and applies the firewall rules:

firewall.ml

let ipv4_from_client t frame =
  match Memory_pressure.status () with
  | `Memory_critical -> (* TODO: should happen before copying and async *)
      Log.warn "Memory low - dropping packet" Logs.unit;
      return ()
  | `Ok ->
  (* Check for existing NAT entry for this packet *)
  match translate t frame with
  | Some frame -> forward_ipv4 t frame  (* Some existing connection or redirect *)
  | None ->
  (* No existing NAT entry. Check the firewall rules. *)
  match classify t frame with
  | None -> return ()
  | Some info -> apply_rules t Rules.from_client info

Qubes provides a GUI that lets the user specify firewall rules. It then encodes these as Linux iptables rules and puts them in QubesDB. This isn't a very friendly format for non-Linux systems, so I ignore this and hard-code the rules in OCaml instead, in the Rules module:

(** Decide what to do with a packet from a client VM.
    Note: If the packet matched an existing NAT rule then this isn't called. *)
let from_client = function
  | { dst = (`External _ | `NetVM) } -> `NAT
  | { dst = `Client_gateway; proto = `UDP { dport = 53 } } -> `NAT_to (`NetVM, 53)
  | { dst = (`Client_gateway | `Firewall_uplink) } -> `Drop "packet addressed to firewall itself"
  | { dst = `Client _ } -> `Drop "prevent communication between client VMs"
  | { dst = `Unknown_client _ } -> `Drop "target client not running"

(** Decide what to do with a packet received from the outside world.
    Note: If the packet matched an existing NAT rule then this isn't called. *)
let from_netvm = function
  | _ -> `Drop "drop by default"

For packets from clients to the outside world we use the NAT action to rewrite the source address so the packets appear to come from the firewall (via some unused port). DNS queries sent to the firewall get redirected to NetVM (UDP port 53 is DNS). In both cases, the NAT actions update the NAT table so that we will forward any responses back to the client. Everything else is dropped, with a log message.

I think it's rather nice the way we can use OCaml's existing support for pattern matching to implement the rules, without having to invent a new syntax. Originally, I had a default-drop rule at the end of from_client, but OCaml helpfully pointed out that it wasn't needed, as the previous rules already covered every case.

The incoming policy is to drop everything that wasn't already allowed by a rule added by the out-bound NAT.

I don't know much about firewalls, but this scheme works for my needs. For comparison, the Linux iptables rules currently in my sys-firewall are:

[user@sys-firewall ~]$ sudo iptables -vL -n -t filter
Chain INPUT (policy DROP 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 DROP       udp  --  vif+   *       0.0.0.0/0            0.0.0.0/0            udp dpt:68
55336   83M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
    0     0 ACCEPT     icmp --  *      *       0.0.0.0/0            0.0.0.0/0           
    0     0 ACCEPT     all  --  lo     *       0.0.0.0/0            0.0.0.0/0           
    0     0 REJECT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            reject-with icmp-host-prohibited

Chain FORWARD (policy DROP 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
35540   23M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
    0     0 ACCEPT     all  --  vif0.0 *       0.0.0.0/0            0.0.0.0/0           
    0     0 DROP       all  --  vif+   vif+    0.0.0.0/0            0.0.0.0/0           
  519 33555 ACCEPT     udp  --  *      *       10.137.2.12          10.137.1.1           udp dpt:53
   16  1076 ACCEPT     udp  --  *      *       10.137.2.12          10.137.1.254         udp dpt:53
    0     0 ACCEPT     tcp  --  *      *       10.137.2.12          10.137.1.1           tcp dpt:53
    0     0 ACCEPT     tcp  --  *      *       10.137.2.12          10.137.1.254         tcp dpt:53
    0     0 ACCEPT     icmp --  *      *       10.137.2.12          0.0.0.0/0           
    0     0 DROP       tcp  --  *      *       10.137.2.12          10.137.255.254       tcp dpt:8082
  264 14484 ACCEPT     all  --  *      *       10.137.2.12          0.0.0.0/0           
  254 16404 ACCEPT     udp  --  *      *       10.137.2.9           10.137.1.1           udp dpt:53
    2   130 ACCEPT     udp  --  *      *       10.137.2.9           10.137.1.254         udp dpt:53
    0     0 ACCEPT     tcp  --  *      *       10.137.2.9           10.137.1.1           tcp dpt:53
    0     0 ACCEPT     tcp  --  *      *       10.137.2.9           10.137.1.254         tcp dpt:53
    0     0 ACCEPT     icmp --  *      *       10.137.2.9           0.0.0.0/0           
    0     0 DROP       tcp  --  *      *       10.137.2.9           10.137.255.254       tcp dpt:8082
  133  7620 ACCEPT     all  --  *      *       10.137.2.9           0.0.0.0/0           

Chain OUTPUT (policy ACCEPT 32551 packets, 1761K bytes)
 pkts bytes target     prot opt in     out     source               destination         

[user@sys-firewall ~]$ sudo iptables -vL -n -t nat
Chain PREROUTING (policy ACCEPT 362 packets, 20704 bytes)
 pkts bytes target     prot opt in     out     source               destination         
  829 50900 PR-QBS     all  --  *      *       0.0.0.0/0            0.0.0.0/0           
  362 20704 PR-QBS-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 116 packets, 7670 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 ACCEPT     all  --  *      vif+    0.0.0.0/0            0.0.0.0/0           
    0     0 ACCEPT     all  --  *      lo      0.0.0.0/0            0.0.0.0/0           
  945 58570 MASQUERADE  all  --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain PR-QBS (1 references)
 pkts bytes target     prot opt in     out     source               destination         
  458 29593 DNAT       udp  --  *      *       0.0.0.0/0            10.137.2.1           udp dpt:53 to:10.137.1.1
    0     0 DNAT       tcp  --  *      *       0.0.0.0/0            10.137.2.1           tcp dpt:53 to:10.137.1.1
    9   603 DNAT       udp  --  *      *       0.0.0.0/0            10.137.2.254         udp dpt:53 to:10.137.1.254
    0     0 DNAT       tcp  --  *      *       0.0.0.0/0            10.137.2.254         tcp dpt:53 to:10.137.1.254

Chain PR-QBS-SERVICES (1 references)
 pkts bytes target     prot opt in     out     source               destination         

[user@sys-firewall ~]$ sudo iptables -vL -n -t mangle
Chain PREROUTING (policy ACCEPT 12090 packets, 17M bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain INPUT (policy ACCEPT 11387 packets, 17M bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain FORWARD (policy ACCEPT 703 packets, 88528 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 6600 packets, 357K bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain POSTROUTING (policy ACCEPT 7303 packets, 446K bytes)
 pkts bytes target     prot opt in     out     source               destination         

[user@sys-firewall ~]$ sudo iptables -vL -n -t raw
Chain PREROUTING (policy ACCEPT 92093 packets, 106M bytes)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 DROP       all  --  vif20.0 *      !10.137.2.9           0.0.0.0/0           
    0     0 DROP       all  --  vif19.0 *      !10.137.2.12          0.0.0.0/0           

Chain OUTPUT (policy ACCEPT 32551 packets, 1761K bytes)
 pkts bytes target     prot opt in     out     source               destination         

[user@sys-firewall ~]$ sudo iptables -vL -n -t security
Chain INPUT (policy ACCEPT 11387 packets, 17M bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain FORWARD (policy ACCEPT 659 packets, 86158 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 6600 packets, 357K bytes)
 pkts bytes target     prot opt in     out     source               destination

I find it hard to tell, looking at these tables, exactly what sys-firewall's security policy will actually do.

Evaluation

I timed start-up for the Linux-based "sys-firewall" and for "mirage-firewall" (after shutting them both down):

[tal@dom0 ~]$ time qvm-start sys-firewall
--> Creating volatile image: /var/lib/qubes/servicevms/sys-firewall/volatile.img...
--> Loading the VM (type = ProxyVM)...
--> Starting Qubes DB...
--> Setting Qubes DB info for the VM...
--> Updating firewall rules...
--> Starting the VM...
--> Starting the qrexec daemon...
Waiting for VM's qrexec agent......connected
--> Starting Qubes GUId...
Connecting to VM's GUI agent: .connected
--> Sending monitor layout...
--> Waiting for qubes-session...
real    0m9.321s
user    0m0.163s
sys     0m0.262s

[tal@dom0 ~]$ time qvm-start mirage-firewall
--> Loading the VM (type = ProxyVM)...
--> Starting Qubes DB...
--> Setting Qubes DB info for the VM...
--> Updating firewall rules...
--> Starting the VM...
--> Starting the qrexec daemon...
Waiting for VM's qrexec agent.connected
--> Starting Qubes GUId...
Connecting to VM's GUI agent: .connected
--> Sending monitor layout...
--> Waiting for qubes-session...
real    0m1.079s
user    0m0.130s
sys     0m0.192s

So, mirage-firewall starts in 1 second rather than 9. However, even most of this time is Qubes code running in dom0. xl list shows:

[tal@dom0 ~]$ sudo xl list
Name                     ID   Mem VCPUs      State   Time(s)
dom0                      0  6097     4     r-----     623.8
sys-net                   4   294     4     -b----      79.2
sys-firewall             17  1293     4     -b----       9.9
mirage-firewall          18    30     1     -b----       0.0

I guess sys-firewall did more work after telling Qubes it was ready, because Xen reports it used 9.9 seconds of CPU time. mirage-firewall uses too little time for Xen to report anything.

Notice also that sys-firewall is using 1293 MB with no clients (it's configured to balloon up or down; it could probably go down to 300 MB without much trouble). I gave mirage-firewall a fixed 30 MB allocation, which seems to be enough.

I'm not sure how it compares with Linux for transmission performance, but it can max out my 30 Mbit/s Internet connection with its single CPU, so it's unlikely to matter.

Exercises

I've only implemented the minimal features to let me use it as my firewall. The great thing about having a simple unikernel is that you can modify it easily. Here are some suggestions you can try at home (easy ones first):

Change the policy to allow communication between client VMs.
Query the QubesDB /qubes-debug-mode key. If present and set, set logging to debug level.
Edit command.ml to provide a qrexec command to add or remove rules at runtime.
When a packet is rejected, add the frame to a ring buffer. Edit command.ml to provide a "dump-rejects" command that returns the rejected packets in pcap format, ready to be loaded into wireshark. Hint: you can use the ocaml-pcap library to read and write the pcap format.
All client VMs are reported as Client to the policy. Add a table mapping IP addresses to symbolic names, so you can e.g. allow DevVM to talk to TestVM or control access to specific external machines.
mirage-nat doesn't do NAT for ICMP packets. Add support, so ping works (see https://github.com/yomimono/mirage-nat/issues/15).
Qubes allows each VM to have two DNS servers. I only implemented the primary. Read the /qubes-secondary-dns and /qubes-netvm-secondary-dns keys from QubesDB and proxy that too.
Implement port knocking for new connections.
Add a Reject action that sends an ICMP rejection message.
Find out what we're supposed to do when a domain shuts down. Currently, we set the netback state to closed, but the directory in XenStore remains. Who is responsible for deleting it?
Update the firewall to use the latest version of the mirage-nat library, which has extra features such as expiry of old NAT table entries.

Finally, Qubes Security Bulletin #4 says:

Due to a silly mistake made by the Qubes Team, the IPv6 filtering rules have been set to ALLOW by default in all Service VMs, which results in lack of filtering for IPv6 traffic originating between NetVM and the corresponding FirewallVM, as well as between AppVMs and the corresponding FirewallVM. Because the RPC services (rpcbind and rpc.statd) are, by default, bound also to the IPv6 interfaces in all the VMs by default, this opens up an avenue to attack a FirewallVM from a corresponding NetVM or AppVM, and further attack another AppVM from the compromised FirewallVM, using a hypothetical vulnerability in the above mentioned RPC services (chained attack).

What changes would be needed to mirage-firewall to reproduce this bug?

Summary

QubesOS provides a desktop environment made from multiple virtual machines, isolated using Xen. It runs the network drivers (which it doesn't trust) in a Linux "NetVM", which it assumes may be compromised, and places a "FirewallVM" between that and the VMs running user applications. This design is intended to protect users from malicious or buggy network drivers.

However, the Linux kernel code running in FirewallVM is written with the assumption that NetVM is trustworthy. It is fairly likely that a compromised NetVM could successfully attack FirewallVM. Since both FirewallVM and the client VMs all run Linux, it is likely that the same exploit would then allow the client VMs to be compromised too.

I used MirageOS to write a replacement FirewallVM in OCaml. The new virtual machine contains almost no C code (little more than malloc, printk, the OCaml GC and libm), and should therefore avoid problems such as the unchecked array bounds problem that recently affected the Qubes firewall. It also uses less than a tenth of the minimum memory of the Linux FirewallVM, boots several times faster, and when it starts handling network traffic it is already fully configured, avoiding e.g. any race setting up firewalls or DNS forwarding.

The code is around 1000 lines of OCaml, and makes it easy to follow the progress of a network frame from the point where the network driver reads it from a Xen shared memory ring, through the Ethernet handling, to the IP firewall code, to the user firewall policy, and then finally to the shared memory ring of the output interface.

The code has only been lightly tested (I've just started using it as the FirewallVM on my main laptop), but will hopefully prove easy to extend (and, if necessary, debug).