OCaml tips - Thomas Leonard's blog

In today's "thing's I've learnt about OCaml" I look back at my first OCaml code, and think about how I'd write it differently now.

Table of Contents

Removing ;;
Warnings
Exhaustive matching
Handy operators
Handling option types
Conclusions

Removing ;;

Looking back at my code, the most obvious "this is beginner code" clue is the use of ;; everywhere. The OCaml tutorial gives a list of complicated rules for when to use ;;, but in fact it's very simple:

Never use top-level expressions in an OCaml program.
Never use ;; (except when tracking down syntax errors).

If you want to run some code at startup (e.g. your "main" function), just put it inside a let () = ... block. That way you'll also get a compile-time error if you miss an argument. I don't know why OCaml even allows top-level expressions. e.g.

(* Bad - mistake goes undetected and you need ';;' *)
Printf.printf "Hello %s";;

(* Good - compiler spots missing argument *)
let () =
  Printf.printf "Hello %s"

In a similar way, I was a bit over cautious about adding parenthesis around expressions. For example, I had Str.regexp ("...") and match (...) with. They're not needed in most cases.

Warnings

Always compile with warnings on. I don't know why this isn't the default. Use -w A to enable all warnings.

I actually use -w A-4, which disables the warning when you use a default match case. Default match cases should be avoided when possible, but if you've gone to the trouble of adding one then you probably needed it.

Exhaustive matching

One of the great strengths of OCaml (which I missed at first) is that it always makes you handle every possible case. Providing a catch-all case defeats this check. In my initial code, I needed to process a list of bindings. First, all the environment bindings, then all the executable ones. I made a do_env_binding function which applied environment bindings and ignored all others:

let do_env_binding env impls = function
| EnvironmentBinding {var_name; mode; source} -> ...
| _ -> ()

I did the same for executable bindings. Then I applied them all like this:

(* Do <environment> bindings *)
List.iter (do_env_binding env impls) bindings;

(* Do <executable-in-*> bindings *)
List.iter (do_exec_binding config env impls) bindings;

I now think this is bad style, because if a new binding type is added no compiler warning will appear. It's better to have the functions accept only the single kind of binding they process. Then the code that calls them separates out the two types of binding. If a new type is added later, the code will issue a warning about an unmatched case:

let do_env_binding env impls {var_name; mode; source} = ...

bindings |> List.iter (function
  | EnvironmentBinding b -> do_env_binding env impls b
  | ExecutableBinding b -> Queue.add b exec_bindings
);

exec_bindings |> Queue.iter (do_exec_binding config env imps)

Handy operators

The recently released OCaml 4.01 adds two new built-in operators, @@ and |>. They're very simple, and you can define them yourself on older versions like this:

let (@@) fn x = fn x
let (|>) x fn = fn x

They both simply call a function with an argument. For example print @@ "Hello" is the same as print "Hello". However, they are very low precedence, which means you can use them to avoid parenthesis. For example, these two lines are equivalent (we load a file, parse it as XML, parse the resulting document as a 0install selections document and then execute the selections):

execute (parse_selections (parse_xml (load_file path)))
execute @@ parse_selections @@ parse_xml @@ load_file path

The advantage here is that when you read an (, you have to scan along the rest of the line counting brackets to find the matching one. When you see @@, you know that the rest of the expression is a single argument to the previous function.

The pipe operator |> is similar, but the function and argument go the other way around. These lines are equivalent:

execute @@ parse_selections @@ parse_xml @@ load_file path
load_file path |> parse_xml |> parse_selections |> execute

Intuitively, the result of each segment of the pipeline becomes the last argument to the next segment.

At first, I couldn't see any reason for preferring one or the other, so I decided to use just @@ initially (which was most familiar, being the same as Haskell's $ operator). That was a mistake. |> is the more useful of the two.

In the original post, I complained that you had to write loops backwards, giving the loop body first and then the list to be looped-over. With |>, that problem is solved:

items |> List.iter (fun item ->
  Printf.printf "Item: %s\n" item
)

Using the pipe operator eliminates the mismatch between the desire to make the function the last argument and OCaml's common (but not universal) convention of putting the data structure last. It can also make things look more object-oriented, by putting the object first. Consider this code for setting an attribute on an XML element:

set_attribute a b c

Which is the element, and which are the name and value? Written this way, it's hopefully obvious that c is the element:

c |> set_attribute a b

Sequences become clearer. For example, consider adding two items to a collection in order:

  (* Using () *)
  let items = Collection.add "two"
    (Collection.add "one" items)

  (* Using |> *)
  let items = items
  |> Collection.add "one"
  |> Collection.add "two"

I was even considering changing the order of the arguments to my starts_with function to make it work with pipe. Currently, we have:

if starts_with a b then ...

But does it check that a starts with b or the other way around? They're both strings, so type checking won't catch errors either. Reversing the arguments and using pipe, it would be clear:

if a |> starts_with b then ...

However, extlib's version uses the original order, so I decided not to change it. Also, I used it in a lot of places and I couldn't find a semantic patching tool to change them all automatically (like Go's gofmt -r or C's Coccinelle - which, interestingly, is written in OCaml).

Handling option types

I noted the lack of a null coalescing operator in my original code. I've now made some helpers for handling option types (I don't know if OCaml programmers have standard names for these). I find them neater than using match statements.

The first I named |?. It's used to get the value out of an option, or generate some default if it's missing. It's defined like this:

let (|?) maybe default =
  match maybe with
  | Some v -> v
  | None -> Lazy.force default

Using OCaml's built-in lazy syntax makes this a bit nicer than having to define an anonymous function each time you use it. It's used like this:

(* Use config.dir, or $HOME if it's not set *)
let dir =
  match config.dir with
  | None -> Sys.getenv "HOME"
  | Some dir -> dir in

(* Using |? *)
let dir = config.dir |? lazy (Sys.getenv "HOME") in

(* Guess the MIME type if it's not set on the element *)
let mime_type = mime_type |? lazy (Archive.type_from_url url) in

(* Abort if not set *)
let item = lookup name |? lazy (raise_safe "Item '%s' not found" name)

The only slight issue I have is that if you forget the lazy when raising an exception then you don't get a compile-time error. It just throws the exception in all cases. However, you should spot this problem quickly when you test it.

Another common task is to execute some code with the option's value only if it's set. I defined if_some for this. It takes a function to call with the value, but partial application means you usually don't need to define one explicitly. For example, to stop a timer if you have one:

(* Normal method *)
let () =
  match timeout with
  | None -> ()
  | Some timeout -> Lwt_timeout.stop timeout in

(* Using if_some *)
timeout |> if_some Lwt_timeout.stop;

Finally, there's a pipe_some, which is the same except that it maps None -> None rather than None -> ().

Conclusions

After spending a few months writing OCaml, my coding style hasn't actually changed much since my first attempts right after reading the tutorials. I'm not sure whether this is good or bad. Like Python, there is a one-obvious-way-to-do-it feeling to OCaml, unlike Haskell and Perl, which somehow seem to encourage clever-but-incomprehensible solutions. When I've read other people's OCaml code (e.g. Lwt), I haven't found anything new or hard to read.

The main changes have been cosmetic: the removal of ;;, fewer brackets, and the |> operator to make the code tidier, plus some common helper functions. I'm also finding more ways to make the type system do more of the work: e.g. avoiding catch-all match cases and using Polymorphic Variants.

The most useful functions I've added (some borrowed from other people) are:

|? for handling None values (see above)
if_some and pipe_some (see above)
finally_do to work around the lack of a try...finally syntax in OCaml
filter_map (apply a function to each item in a list, filtering out any None replies)
starts_with (as in Python)
abspath and realpath (to resolve pathnames; translated from the Python standard library code)

If anyone else wants my realpath, it's in Support.Utils.

What other useful tips or utilities do people have?