In today's "thing's I've learnt about OCaml" I look back at my first OCaml code, and think about how I'd write it differently now.
Table of Contents
Removing ;;
Looking back at my code, the most obvious "this is beginner code" clue is the use of ;;
everywhere. The OCaml tutorial gives a list of complicated rules for when to use ;;
, but in fact it's very simple:
- Never use top-level expressions in an OCaml program.
- Never use
;;
(except when tracking down syntax errors).
If you want to run some code at startup (e.g. your "main" function), just put it inside a let () = ...
block. That way you'll also get a compile-time error if you miss an argument. I don't know why OCaml even allows top-level expressions. e.g.
1 2 3 4 5 6 |
|
In a similar way, I was a bit over cautious about adding parenthesis around expressions. For example, I had Str.regexp ("...")
and match (...) with
. They're not needed in most cases.
Warnings
Always compile with warnings on. I don't know why this isn't the default. Use -w A
to enable all warnings.
I actually use -w A-4
, which disables the warning when you use a default match case. Default match cases should be avoided when possible, but if you've gone to the trouble of adding one then you probably needed it.
Exhaustive matching
One of the great strengths of OCaml (which I missed at first) is that it always makes you handle every possible case. Providing a catch-all case defeats this check. In my initial code, I needed to process a list of bindings. First, all the environment bindings, then all the executable ones. I made a do_env_binding
function which applied environment bindings and ignored all others:
1 2 3 |
|
I did the same for executable bindings. Then I applied them all like this:
1 2 3 4 5 |
|
I now think this is bad style, because if a new binding type is added no compiler warning will appear. It's better to have the functions accept only the single kind of binding they process. Then the code that calls them separates out the two types of binding. If a new type is added later, the code will issue a warning about an unmatched case:
1 2 3 4 5 6 7 8 |
|
Handy operators
The recently released OCaml 4.01 adds two new built-in operators, @@
and |>
. They're very simple, and you can define them yourself on older versions like this:
1 2 |
|
They both simply call a function with an argument. For example print @@ "Hello"
is the same as print "Hello"
. However, they are very low precedence, which means you can use them to avoid parenthesis. For example, these two lines are equivalent (we load a file, parse it as XML, parse the resulting document as a 0install selections document and then execute the selections):
1 2 |
|
The advantage here is that when you read an (
, you have to scan along the rest of the line counting brackets to find the matching one. When you see @@
, you know that the rest of the expression is a single argument to the previous function.
The pipe operator |>
is similar, but the function and argument go the other way around. These lines are equivalent:
1 2 |
|
Intuitively, the result of each segment of the pipeline becomes the last argument to the next segment.
At first, I couldn't see any reason for preferring one or the other, so I decided to use just @@
initially (which was most familiar, being the same as Haskell's $
operator). That was a mistake. |>
is the more useful of the two.
In the original post, I complained that you had to write loops backwards, giving the loop body first and then the list to be looped-over. With |>
, that problem is solved:
1 2 3 |
|
Using the pipe operator eliminates the mismatch between the desire to make the function the last argument and OCaml's common (but not universal) convention of putting the data structure last. It can also make things look more object-oriented, by putting the object first. Consider this code for setting an attribute on an XML element:
1
|
|
Which is the element, and which are the name and value? Written this way, it's hopefully obvious that c
is the element:
1
|
|
Sequences become clearer. For example, consider adding two items to a collection in order:
1 2 3 4 5 6 7 8 |
|
I was even considering changing the order of the arguments to my starts_with
function to make it work with pipe. Currently, we have:
1
|
|
But does it check that a
starts with b
or the other way around? They're both strings, so type checking won't catch errors either. Reversing the arguments and using pipe, it would be clear:
1
|
|
However, extlib's version uses the original order, so I decided not to change it. Also, I used it in a lot of places and I couldn't find a semantic patching tool to change them all automatically (like Go's gofmt -r or C's Coccinelle - which, interestingly, is written in OCaml).
Handling option types
I noted the lack of a null coalescing operator in my original code. I've now made some helpers for handling option types (I don't know if OCaml programmers have standard names for these). I find them neater than using match
statements.
The first I named |?
. It's used to get the value out of an option, or generate some default if it's missing. It's defined like this:
1 2 3 4 |
|
Using OCaml's built-in lazy
syntax makes this a bit nicer than having to define an anonymous function each time you use it. It's used like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
The only slight issue I have is that if you forget the lazy
when raising an exception then you don't get a compile-time error. It just throws the exception in all cases. However, you should spot this problem quickly when you test it.
Another common task is to execute some code with the option's value only if it's set. I defined if_some
for this. It takes a function to call with the value, but partial application means you usually don't need to define one explicitly. For example, to stop a timer if you have one:
1 2 3 4 5 6 7 8 |
|
Finally, there's a pipe_some
, which is the same except that it maps None -> None
rather than None -> ()
.
Conclusions
After spending a few months writing OCaml, my coding style hasn't actually changed much since my first attempts right after reading the tutorials. I'm not sure whether this is good or bad. Like Python, there is a one-obvious-way-to-do-it feeling to OCaml, unlike Haskell and Perl, which somehow seem to encourage clever-but-incomprehensible solutions. When I've read other people's OCaml code (e.g. Lwt), I haven't found anything new or hard to read.
The main changes have been cosmetic: the removal of ;;
, fewer brackets, and the |>
operator to make the code tidier, plus some common helper functions. I'm also finding more ways to make the type system do more of the work: e.g. avoiding catch-all match cases and using Polymorphic Variants.
The most useful functions I've added (some borrowed from other people) are:
|?
for handlingNone
values (see above)if_some
andpipe_some
(see above)finally_do
to work around the lack of atry...finally
syntax in OCamlfilter_map
(apply a function to each item in a list, filtering out anyNone
replies)starts_with
(as in Python)abspath
andrealpath
(to resolve pathnames; translated from the Python standard library code)
If anyone else wants my realpath
, it's in Support.Utils.
What other useful tips or utilities do people have?