Option handling with OCaml polymorphic variants

After we selected OCaml as the new language for 0install, I've been steadily converting the old Python code across. We now have more than 10,000 lines of OCaml, so I thought it's time to share what I've learnt.

OCaml is actually a pretty small language. Once you've read the short tutorials you know most of the language. However, I did skip one interesting feature during my first attempts:

There are also "polymorphic variants" which allow the same field name to be used in different structures, but I haven't tried using them.

I've since found a good use for these for handling command-line options...

The problem

The 0install command has many subcommands (0install run, 0install download, etc), which accept different, but overlapping, sets of options. Running a command happens in two phases: first we parse the options, then we pass them to a handler function. We split the parsing and handling because the tab-completion and help system also need to know which options go with which command.

Using plain (non-polymorphic) variants I originally implemented it a bit like this (simplified). I had a single type which listed all the possible options:

type zi_option =
  | Refresh		(* --refresh *)
  | Show		(* --show *)
  | Wrapper of string	(* --wrapper=echo *)

Each command handler takes a list of options and processes them:

let handle_run options =
  let refresh = ref false in
  let wrapper = ref None in
  ListLabels.iter options ~f:(function
    | Refresh -> refresh := true
    | Wrapper w -> wrapper := Some w
    | _ -> assert false   (* can't happen *)
  );
  (* use refresh/wrapper... *)

let handle_download options =
  let refresh = ref false in
  let show = ref false in
  ListLabels.iter options ~f:(function
    | Refresh -> refresh := true
    | Show -> show := true
    | _ -> assert false   (* can't happen *)
  );
  (* use refresh/show... *)

Each handler function has the same type: zi_option list -> unit (they take a list of options and return nothing).

Finally, there is a table of sub-commands, giving the parser and handler for each one:

let subcommands = [
  ("run", (parse_run, handle_run));
  ("download", (parse_download, handle_download));
]

But those assert false lines are worrying. An assert false means the programmer believes the code can't be executed, but didn't manage to convince the compiler. If we declare that a subcommand accepts a flag, but forget to implement it, the program will crash at runtime (this isn't as unlikely as it sounds, because we declare options in groups, so adding an option to a group affects several subcommands).

Polymorphic variants

Polymorphic variants are written with a back-tick/grave before them, and you don't need to declare them before use. For example, we can declare handle_run like this:

let handle_run options =
  let refresh = ref false in
  let wrapper = ref None in
  ListLabels.iter options ~f:(function
    | `Refresh -> refresh := true
    | `Wrapper w -> wrapper := Some w
  );
  (* use refresh/wrapper... *)

OCaml will automatically infer the type of this function as:

[< `Refresh | `Wrapper of string ] list -> unit

That is, handle_run takes of list of options, where the options are a subset of Refresh and Wrapper. Notice that the assert is gone.

Now you can call handle_run (parse_run argv), and it's a compile-time error if handle_run doesn't handle every option that parse_run may produce.

There is, however, a problem when we try to put these functions in the subcommands list. OCaml wants every list item to have the same type, and so wants every subcommand to handle every option. The compile then fails because they don't.

My first thought to fix this was to declare an existential type. e.g.

type 'a option_parser = string list -> 'a list
type 'a handler = 'a list -> unit
type subcommand = exists 'a. ('a option_parser * 'a handler)

I'm trying to say that each subcommand has a parser and a handler and, while we don't know what subset of the options they process, the subsets are the same. Sadly, OCaml doesn't have existential types.

However, we can get the same effect by declaring a class or closure:

let subcommand option_parser handler =
  object
    method parse_and_run args = handler (option_parser args)
  end

let subcommands = [
  ("run", subcommand parse_run handle_run);
  ("download", subcommand parse_download handle_download);
]

This works because the subcommand function has a for-all type (for all types a, it accepts an a parser and an a handler and produces an object that doesn't expose the type a in its interface: parse_and_run just has the type string list -> unit.

However, if we want to expose the parser on its own (e.g. for the tab-completion) we have to cast it first. Here, the parse method simply returns a zi_option list, losing the information about exactly which subset of the options it might return (which is fine for the completion code). This allows all subcommand objects to expose the same interface:

type zi_option =
  [ `Refresh
  | `Show
  | `Wrapper of string ]

let subcommand option_parser handler =
  object
    method parse args = (option_parser args :> zi_option list)
    method parse_and_run args = handler (option_parser args)
  end

let subcommands = [
  ("run", subcommand parse_run handle_run);
  ("download", subcommand parse_download handle_download);
]

So, I think this is rather nice:

Every option displayed in the help for a command is accepted by that command.
We don't need any asserts in the handlers (indeed, adding the assert destroys the safety, since the handler will then accept any option).

One final trick: when matching variants you can use the #type syntax to match a set of options. e.g. the real handle_run looks more like this:

let select_opts = ref [] in
ListLabels.iter options ~f:(function
  | #common_option as o -> Common_options.process_common_option o
  | #select_option as o -> select_opts := o :: !select_opts
  | `Wrapper w -> wrapper := Some w
);

That is, it processes the run-specific options itself, while delegating common options (--offline, etc) and storing selection options (--version, etc) in a separate list to be passed to the selection code. The select_opts list gets the correct sub-type (select_option list).

Thomas Leonard's blog

Option Handling with OCaml Polymorphic Variants

The problem

Polymorphic variants