After we selected OCaml as the new language for 0install, I've been steadily converting the old Python code across. We now have more than 10,000 lines of OCaml, so I thought it's time to share what I've learnt.
OCaml is actually a pretty small language. Once you've read the short tutorials you know most of the language. However, I did skip one interesting feature during my first attempts:
There are also "polymorphic variants" which allow the same field name to be used in different structures, but I haven't tried using them.
I've since found a good use for these for handling command-line options...
The problem
The 0install
command has many subcommands (0install run
, 0install download
, etc), which accept different, but overlapping, sets of options. Running a command happens in two phases: first we parse the options, then we pass them to a handler function. We split the parsing and handling because the tab-completion and help system also need to know which options go with which command.
Using plain (non-polymorphic) variants I originally implemented it a bit like this (simplified). I had a single type which listed all the possible options:
1 2 3 4 |
|
Each command handler takes a list of options and processes them:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
Each handler function has the same type: zi_option list -> unit
(they take a list of options and return nothing).
Finally, there is a table of sub-commands, giving the parser and handler for each one:
1 2 3 4 |
|
But those assert false
lines are worrying. An assert false
means the programmer believes the code can't be executed, but didn't manage to convince the compiler. If we declare that a subcommand accepts a flag, but forget to implement it, the program will crash at runtime (this isn't as unlikely as it sounds, because we declare options in groups, so adding an option to a group affects several subcommands).
Polymorphic variants
Polymorphic variants are written with a back-tick/grave before them, and you don't need to declare them before use. For example, we can declare
handle_run
like this:
1 2 3 4 5 6 7 8 |
|
OCaml will automatically infer the type of this function as:
[< `Refresh | `Wrapper of string ] list -> unit
That is, handle_run
takes of list of options, where the options are a subset of Refresh
and Wrapper
. Notice that the assert
is gone.
Now you can call handle_run (parse_run argv)
, and it's a compile-time error if handle_run
doesn't handle every option that parse_run
may produce.
There is, however, a problem when we try to put these functions in the subcommands
list. OCaml wants every list item to have the same type, and so wants every subcommand to handle every option. The compile then fails because they don't.
My first thought to fix this was to declare an existential type. e.g.
1 2 3 |
|
I'm trying to say that each subcommand has a parser and a handler and, while we don't know what subset of the options they process, the subsets are the same. Sadly, OCaml doesn't have existential types.
However, we can get the same effect by declaring a class or closure:
1 2 3 4 5 6 7 8 9 |
|
This works because the subcommand
function has a for-all type (for all types a
, it accepts an a parser
and an a handler
and produces an object that doesn't expose the type a
in its interface: parse_and_run
just has the type string list -> unit
.
However, if we want to expose the parser on its own (e.g. for the tab-completion) we have to cast it first. Here, the parse
method simply returns a zi_option list
, losing the information about exactly which subset of the options it might return (which is fine for the completion code). This allows all subcommand objects to expose the same interface:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
So, I think this is rather nice:
- Every option displayed in the help for a command is accepted by that command.
- We don't need any asserts in the handlers (indeed, adding the assert destroys the safety, since the handler will then accept any option).
One final trick: when matching variants you can use the #type
syntax to match a set of options. e.g. the real handle_run
looks more like this:
1 2 3 4 5 6 |
|
That is, it processes the run-specific options itself, while delegating common options (--offline
, etc) and storing selection options (--version
, etc) in a separate list to be passed to the selection code. The select_opts
list gets the correct sub-type (select_option list
).