After we selected OCaml as the new language for 0install, I've been steadily converting the old Python code across. We now have more than 10,000 lines of OCaml, so I thought it's time to share what I've learnt.
OCaml is actually a pretty small language. Once you've read the short tutorials you know most of the language. However, I did skip one interesting feature during my first attempts:
There are also "polymorphic variants" which allow the same field name to be used in different structures, but I haven't tried using them.
I've since found a good use for these for handling command-line options...
The problem
The 0install command has many subcommands (0install run, 0install download, etc), which accept different, but overlapping, sets of options. Running a command happens in two phases: first we parse the options, then we pass them to a handler function. We split the parsing and handling because the tab-completion and help system also need to know which options go with which command.
Using plain (non-polymorphic) variants I originally implemented it a bit like this (simplified). I had a single type which listed all the possible options:
1 2 3 4 | |
Each command handler takes a list of options and processes them:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | |
Each handler function has the same type: zi_option list -> unit (they take a list of options and return nothing).
Finally, there is a table of sub-commands, giving the parser and handler for each one:
1 2 3 4 | |
But those assert false lines are worrying. An assert false means the programmer believes the code can't be executed, but didn't manage to convince the compiler. If we declare that a subcommand accepts a flag, but forget to implement it, the program will crash at runtime (this isn't as unlikely as it sounds, because we declare options in groups, so adding an option to a group affects several subcommands).
Polymorphic variants
Polymorphic variants are written with a back-tick/grave before them, and you don't need to declare them before use. For example, we can declare
handle_run like this:
1 2 3 4 5 6 7 8 | |
OCaml will automatically infer the type of this function as:
[< `Refresh | `Wrapper of string ] list -> unit
That is, handle_run takes of list of options, where the options are a subset of Refresh and Wrapper. Notice that the assert is gone.
Now you can call handle_run (parse_run argv), and it's a compile-time error if handle_run doesn't handle every option that parse_run may produce.
There is, however, a problem when we try to put these functions in the subcommands list. OCaml wants every list item to have the same type, and so wants every subcommand to handle every option. The compile then fails because they don't.
My first thought to fix this was to declare an existential type. e.g.
1 2 3 | |
I'm trying to say that each subcommand has a parser and a handler and, while we don't know what subset of the options they process, the subsets are the same. Sadly, OCaml doesn't have existential types.
However, we can get the same effect by declaring a class or closure:
1 2 3 4 5 6 7 8 9 | |
This works because the subcommand function has a for-all type (for all types a, it accepts an a parser and an a handler and produces an object that doesn't expose the type a in its interface: parse_and_run just has the type string list -> unit.
However, if we want to expose the parser on its own (e.g. for the tab-completion) we have to cast it first. Here, the parse method simply returns a zi_option list, losing the information about exactly which subset of the options it might return (which is fine for the completion code). This allows all subcommand objects to expose the same interface:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |
So, I think this is rather nice:
- Every option displayed in the help for a command is accepted by that command.
- We don't need any asserts in the handlers (indeed, adding the assert destroys the safety, since the handler will then accept any option).
One final trick: when matching variants you can use the #type syntax to match a set of options. e.g. the real handle_run looks more like this:
1 2 3 4 5 6 | |
That is, it processes the run-specific options itself, while delegating common options (--offline, etc) and storing selection options (--version, etc) in a separate list to be passed to the selection code. The select_opts list gets the correct sub-type (select_option list).