Thomas Leonard's blog

OCaml Binary Compatibility

In the initial language analysis, OCaml did well in most areas except for diagnostics (which turned out to have an easy solution) and shared libraries / binary compatibility. Now it's time to look for a solution to that.

Table of Contents

Introduction

0install 2.3 was released last week with an (optional) OCaml front-end. This code can handle the startup-time-critical operations of running applications and generating shell tab-completions by itself, and will fall back to the Python version for any other case.

Converting the Python to OCaml was mostly straight-forward. The only difficulty was getting access to the SHGetFolderPath function on Windows. The standard library doesn't include this function, so I had to write a wrapper for it in C, and use the OCaml pre-processor to make the OCaml use my wrapper only on Windows.

However this OCaml front-end means we now have some duplicated code, which must be kept in sync and creates extra opportunities for bugs. So the next step is to eliminate the duplicated Python code and use the OCaml in all cases. This means that the OCaml part of 0install will no longer be optional, which in turn means that it has to work for everyone.

There are three ways people end up running the 0install code:

  • They install the zeroinstall-injector package from their distribution.
  • They install manually using a generic tarball or Windows installer from 0install.net.
  • They run a tool (e.g. 0compile) that depends on 0install (and will often require a newer version than was provided by their distribution).

This last case may seem a little confusing. The user is using their local (probably distribution-provided) 0install to run 0compile with its libraries, one of which happens to be another version of 0install.

Distribution packages

Distribution packages are the simplest from a binary-compatibility point of view. Each distribution runs a build farm, which builds separate binary packages for each supported architecture. Even here, compatibility can be an issue however. For example, if someone hits a bug in the version of 0install in the stable/LTS version of the distribution, we often tell them to try the package from a newer release.

OCaml provides two options when compiling:

  • ocamlc compiles to bytecode.
  • ocamlopt compiles to native platform-specific code. This is not available on all platforms.

The Debian OCaml packaging guide says that "The bytecode versions are portable. In order to spare the buildds and the Debian archive, bytecode versions should be compiled once for all for big packages (which either take a lot of place on disks or take a lot of time to build)."

However, this turned out not to be the case. Packages compiled on 64-bit systems didn't install on 32-bit systems. I had to change the Debian source package to build a different binary for each architecture (and since I had to do that anyway, I also changed it to compile to native code where possible, since that's slightly faster and more portable between distribution releases).

Upstream packages

For making upstream packages, we don't have the ability to build (or test) native binaries for multiple platforms. It would be far more convenient to release a single package containing bytecode and have it work everywhere, the way we currently do with the Python. However, there are some problems to solve here:

  • The 64-bit issue which affected the Debian packages, as noted above.

  • OCaml bytecode compiled on Linux doesn't run on Windows.

  • Even backwards compatible changes to OCaml libraries prevent bytecode from linking (see next section). This includes e.g. OCaml adding a new function to its standard library.

I tried to reproduce the 64-bit issue by building the bytecode on a 32-bit Ubuntu Raring VM and then running it on a 64-bit Arch Linux system. It worked fine. So, I'm going to assume for now that this is an unnecessary incompatibility introduced by Debian's OCaml packaging system, and not a genuine problem with the bytecode.

OCaml library compatibility

To understand how OCaml checks bytecode compatibility, let's look at a simple example (based on the one in Enforcing type-safe linking using package dependencies):

Say you have a library providing a function:

lib.ml
1
let inc x = x + 1

You can compile it to bytecode like this, getting a lib.cmo file:

$ ocamlc -c lib.ml

You can compile a program using the library in the same way (note that module names are capitalised in OCaml code):

prog.ml
1
Printf.printf "Result: %d\n" (Lib.inc 5)
$ ocamlc -c prog.ml

Then you can link them together and run like this:

$ ocamlc -o prog lib.cmo prog.cmo
$ ./prog
Result: 6

If you change the implementation of the function, it still works:

lib.ml
1
let inc x = x + 100
$ ocamlc -c lib.ml
$ ocamlc -o prog lib.cmo prog.cmo
$ ./prog
Result: 105

But, if you add a new function to the library then it breaks:

lib.ml
1
2
let inc x = x + 1
let dec x = x - 1
$ ocamlc -c lib.ml
$ ocamlc -o prog lib.cmo prog.cmo
File "_none_", line 1:
Error: Files prog.cmo and lib.cmo
   make inconsistent assumptions over interface Lib

The reason is that OCaml calculates a hash over the module's signature. You can see a .cmo file's dependencies (with their hashes) like this:

$ ocamlobjinfo prog.cmo
File prog.cmo
Unit name: Prog
Interfaces imported:
    265928798c0b8a63fa48cf9ac202f0ce        Int32
    10fca44c912c9342cf3d611984d42e34        Printf
    3f6c994721573c9f8b5411e6824249f4        Buffer
    ad977b422bbde52cd6cd3b9d04d71db1        Obj
    5c4b312910d7250e3a67621b317619f0        Prog
    4836c254f0eacad92fbf67abc525fdda        Pervasives
    8ce323e7f6c1a7ba1b604d93cde0af3d        Lib
Uses unsafe features: no
Force link: no

The hash for "Lib" covers the "inc" and "dec" functions together and OCaml refuses to link prog with the new library, even though all the functions it needs are still there, unchanged.

At first, I thought we could just disable the hash checks and use some cleverer tools to check that libraries remained backwards compatible. However, OCaml doesn't use symbol names to find functions in OCaml libraries. A module is just an array of values (the inc and dec closures in this case) and prog locates the function it wants by index. Here's prog and its compiled bytecode (comments added by me):

prog.ml
1
Printf.printf "Result: %d\n" (Lib.inc 5)
$ dumpobj prog.cmo
## start of ocaml dump of "prog.cmo"
   0  CONSTINT 5
   2  PUSHGETGLOBALFIELD Lib, 0		(* Lib[0] = Lib.inc *)
   5  APPLY1 				(* Call inc with 1 argument *)
   6  PUSHGETGLOBAL "Result: %d\n"
   8  PUSHGETGLOBALFIELD Printf, 1	(* Printf[1] = Printf.printf *)
  11  APPLY2 				(* Call printf with 2 arguments *)
  12  ATOM0 				(* The empty array *)
  13  SETGLOBAL Prog			(* Prog = [] *)
## end of ocaml dump of "prog.cmo"

So, it's as if we'd written:

prog.ml
1
Printf.[1] "Result: %d\n" (Lib.[0] 5)

Therefore, OCaml cannot cope with any change to the signature of a library. For example, if the inc and dec functions are switched around so that dec is defined first, prog will then call the dec function instead. The hashes allow OCaml to detect such changes and refuse to run the bytecode.

To allow dynamic linking, there are several options:

  • Disable the hash checks and then ensure that we never make a backwards incompatible change (e.g. we only add new methods at the end of a module, never change signatures, etc). That would require a bit of care, and it wouldn't help with changes to libraries we don't control.

  • Export a series of submodules: ZeroInstall.APIv1, ZeroInstall.APIv2, etc. Then we only ever create new modules; we never change existing ones. That works with OCaml's existing hash scheme, but it also doesn't support third-party libraries (e.g. Libxmlm and the standard library).

  • Write a front-end for ocamlrun that dynamically compiles and caches everything on demand. That's rather inefficient for users, though, and requires installing a complete development environment everywhere. Also, it may make the first run of an upgraded program very slow, with potentially no way to display a progress indicator.

  • Make the compiler add a map of symbol names to each module and use that for dynamic linking, based on the Dynlink module module. That would be the most useful, but also the most difficult to implement. You'd also need extra code to handle extensions to class interfaces, new tags in variant types, etc. Not impossible (languages like Java and C# do this well), but not simple either.

For now, we can just statically link all bytecode (which OCaml does by default) and build all libraries from source on the build system. That's not a long-term solution, because every time we made a new release of 0install we'd have to make new releases of all the tools that depend on it (0compile, 0test, 0release, etc). But we're not yet trying to provide an OCaml API to other tools, just a portable OCaml binary. We won't get automatic updates to the libraries we use (e.g. Xmlm and Yojson), but we can probably live with that for now.

Windows / Linux compatibility

The cause of the Windows / Linux incompatibility is the "Unix" module in the standard library. Despite the name, this includes general-purpose operating system functions such as rename, create_process, etc, and is used on Windows too.

However, there are actually two separate unix.ml modules in the OCaml source: ./otherlibs/unix/unix.ml and ./otherlibs/win32unix/unix.ml. When you compile OCaml bytecode, it will statically link one of these versions, which means that the generated bytecode will support only the platform on which it was built.

In the short-term, we could create separate binaries of 0install for Windows and Linux. However, that makes the release process more complicated and error-prone. And if we provide an OCaml API to other tools, everyone developing tools would need to produce separate binaries too.

The two modules implement the same interface (i.e. they have the same hash), so code compiled for one would work with the other if it could find it. I experimented with several approaches here:

Statically-linking both versions

My first attempt was to make a single module that contained code to support both Windows and Linux. The most natural thing to do here would be to create a class type (interface) with two implementing classes. However, the Unix interface is a module, not a class, and I wanted it to be compatible with existing code. Asking on the Ocaml Beginners group, Gabriel Scherer recommended the first-class modules extension, which allows treating modules as values (it's called an "extension", but it's supported by the standard compiler). So here's my first attempt, which defines RealUnix and Win32 submodules and then sets Unix to the correct one at runtime (Posix contains the current Unix API):

portable_unix.ml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
module type Posix = module type of Posix

module RealUnix : Posix =
  struct
    [ contents of unix/unix.ml ]
  end

module Win32 : module type of Posix =
  struct
    [ contents of win32unix/unix.ml ]
  end

module Unix = (val
  match Sys.os_type with
  | "Win32" -> (module Win32 : Posix)
  | _ -> (module RealUnix : Posix)
  );;

One problem with this is that it needs to link against all the C symbols for both versions, so you need to provide stubs for win_waitpid on Unix and for unix_waitpid on Windows, etc. Only the OCaml code is linked statically into the executable; C libraries are resolved dynamically on the target platform. Turns out, there are quite a lot of stub symbols to define. For testing, I just hacked it to stop complaining about missing C primitives.

It almost worked, expect that I got a strange error on Windows trying to resolve the hostname "0.0.0.0" (which the Win32 version does during initialisation). However, I didn't track it down because I got a better suggestion...

Gerd Stolpmann suggested just compiling to a library (not an executable) and then using a script to load the modules dynamically:

launcher.ml
1
2
3
#!/usr/bin/env ocaml
#load "unix.cma";;
#load "myprog.cma";;

The advantage here is that we don't ship copies of unix.ml with our code; we just use the one that comes with the runtime. However, this also has a few problems:

  • It's a bit slower.
  • It depends on the ocaml binary (1.3 MB), not just ocamlrun (170 KB).
  • For other libraries (e.g. Xmlm, Yojson), if we want to link statically, we have to include the whole library archive, not just the modules we need, because the OCaml compiler only knows what we need when it does the final link to generate an executable, which we're not doing here.

Still, if ocaml can link unix.cma dynamically, why can't we?

OCaml comes with the Dynlink module, which allows loading bytecode at runtime. However, it has quite a few limitations. Unlike ocaml, it doesn't search for the library in the default paths (easy enough to fix), doesn't load dependencies recursively, and doesn't let you access the module after you've loaded it (it's intended for plugins, where the plugin knows the API for the main system, not for libraries where the main program knows the API of the library).

I had a dig through the ocaml code to see how it does it. It seems to find the names using the Symtable module. I couldn't find a public API for that, so I hacked the Dynlink module to export a lookup_module function (it needs better error reporting; this is just for testing):

dynlink.ml
1
2
3
4
5
6
let lookup_module name =
  try Symtable.get_global_value (Ident.create_persistent name)
  with Symtable.Error ex ->
    Symtable.report_error Format.err_formatter ex;
    Format.pp_print_flush Format.err_formatter ();
    raise Not_found

Now you can find modules after loading them, and use the first-class modules stuff to treat the result as a regular module:

test_dyn.ml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
(* Load Unix module dynamically *)

open Dynlink

let libdir =
  if Sys.os_type = "Win32" then "c:\\ocamlmgw\\lib"
  else "/opt/ocaml/lib/ocaml"

let () =
  try
    let _ = Callback.register in
    allow_unsafe_modules true;
    Dynlink.loadfile (Filename.concat libdir "unix.cma");
  with Error ex -> failwith (error_message ex)

module type UnixType = module type of Unix
module Unix = (val (Obj.obj (lookup_module "Unix")) : UnixType)

(* Use Unix module in the normal way *)

open Unix
let child =
  create_process "gpg" (Array.of_list ["gpg"; "--version"])
                 stdin stdout stderr;;
Printf.printf "Child %d\n" child;;

match snd (Unix.waitpid [] child) with
| WEXITED 0 -> print_endline "Success!"
| _ -> failwith "Failure!"

Yes, there are some hard-coded paths in there. We could fix that easily enough, or get 0install to set them for you. The dummy reference to Callback.register is to ensure Callback gets linked (it's a dependency of Unix, but Dynlink doesn't handle dependencies).

I also had to modify the Dll module to use the correct extension (.so or .dll) based on the current platform. The original version used whatever extension was correct for the platform where the code was compiled.

With that, it's now working: code compiled on Linux runs on Windows and vice versa!

Timings

All approaches are reasonably fast (faster than Python, anyway, and this use isn't speed critical):

Time Method
7 ms Static native code (not portable)
10 ms Static bytecode (not portable)
11 ms Bytecode using Dynlink
20 ms Using #load with ocaml
26 ms Python 2
60 ms Python 3

On balance, I think we should go for the #load trick for now. It's a bit less efficient than using Dynlink, but it doesn't require any modifications to the OCaml libraries and it handles recursive dependencies. Also, it doesn't require any changes to code.

The times above are for the "gpg --version" test script. This shell script and launcher can be used to test the actual 0install OCaml code:

launch.sh
1
ocaml `ocamlfind query -r -i-format yojson xmlm` ./launch.ml "$@"
launch.ml
1
2
3
4
5
6
7
8
#load "easy_format.cmo";;
#load "biniou.cma";;
#load "yojson_biniou.cmo";;
#load "yojson.cmo";;
#load "xmlm.cmo";;
#load "str.cma";;
#load "unix.cma";;
#load "main.cma";;

Times are around 40 ms, compared to 10 ms for static byte-code and 5 ms for native code. We should be able to get 0install to pass the -I flags itself, if we want to avoid calling ocamlfind and using a shell script.

Conclusions

I think that dynamically linking the Unix module, as described above, is sufficient for the next step in converting 0install: we should be able to ship cross-platform bytecode that statically links all libraries except Unix and which works everywhere. It will run with an unmodified ocaml and unix.cma, providing the runtime versions match exactly the compile-time ones. Essentially, that means that we ship binaries of ocaml though 0install and just stick with a single version for as long as possible. Fixing that will have to wait for later.

Update (2013-07-14)

Using #load isn't safe. When you do ocaml /path/to/script.ml, it adds the current directory (not the directory containing the script) to the start of the search path. Thus:

$ cd /tmp
$ /usr/bin/myprog

will first try to load myprog's libraries (e.g. unix.cma) from /tmp!

Looks like we will need to do something with Dynlink after all.