I’m now written 15,000 lines of OCaml while migrating 0install to the language. So here’s another “things I’ve learned” post…
The official objects tutorial offers a good introduction to using objects in OCaml, but it doesn’t explain a number of important issues. Chapter 3 of the OCaml manual does explain everything, but I had to read it a few times to get it.
The manual notes that:
the relation between object, class and type in OCaml is very different from that in mainstream object-oriented languages like Java or C++, so that you should not assume that similar keywords mean the same thing.
Good advice. Coming from a Python/Java background, here are some surprising things about objects in OCaml:
- An object’s type is not the same as its class.
- A class A can inherit from B without being a subclass.
- A class A can be a subclass of B without inheriting from it.
- You don’t need to use classes to create objects.
I’m going to try explaining things in the opposite order to the official tutorial, starting with objects and adding classes later, as I think it’s clearer that way. Classes introduce a number of complications which are not present without them.
Table of Contents
- Creating objects
- Object types
- Creating many objects
In Python, you create a single object by first defining a class and then creating an instance of it. In OCaml, you can just create an instance directly (Java can do this with anonymous classes).
For example, 0install code that interacts with the system (e.g. getting the current time, reading files, etc) does so by calling methods on a
system object. For unit-testing, we pass mock system objects, while when running normally we pass a singleton object which interacts with the real system. We can define the singleton like this (simplified):
1 2 3 4 5
Note that the
exit method is calling the built-in
exit function, not recursively calling itself. Calling a method has to be explicit, as in Python.
To call a method, OCaml uses
# rather than
Initially, I defined time as
method time () = Unix.time (), but this isn’t necessary. Unlike for regular function definitions, the body of a method is evaluated each time it is called, even if it takes no arguments, not once when the object is created.
OCaml will automatically infer the type of
< exit : int -> 'a; time : float >
exit never returns, so it can be used anywhere, which is why it gets the generic return type
This is not a class (nor even a class type). It’s just a type.
Any object providing these two methods will be compatible with
real_system. There is no need to declare that you implement the interface.
You also don’t need to declare the type when using the object. For example:
1 2 3
However, the automatic inference will often fail. In particular, if a method is defined with optional arguments then it will be incompatible:
1 2 3 4 5 6 7 8 9 10
Error: This expression has type < exit : ?code:int -> string -> 'a > but an expression was expected of type < exit : string -> 'b; .. > Types for method exit are incompatible
In a similar way, using labelled arguments will fail unless you use them in the same order everywhere. To avoid these problems, it seems best to define the type explicitly:
1 2 3 4 5 6 7 8 9 10 11 12
As in Python,
self is explicit. However, it’s attached to the object rather than to each method, and you can leave it out if you don’t need it. I added it here in order to constrain its type to
system. I used
_self rather than
self to avoid the compiler warning about unused variables.
It seems to me that some object types can be inferred but not defined. Consider this interactive session:
1 2 3 4 5
However, we can’t actually use the type it prints:
You can define this type:
But that’s a different (and less useful) type:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
I’m not sure what causes these problems. You can, however, use the cast operator (
:>) to convert to the required type if it happens.
x does have a type, but it’s polymorphic:
'a lengther. OCaml has cleverly noticed that I don’t actually store any
'a values in the object, so it allows this single object to handle multiple types. For most objects, this will not be the case (for example, a mutable stack object could be an
int stack or a
string stack, but not both). For details, see my later post Polymorphism for beginners.
Creating many objects
Usually, you’ll want to create many objects, sharing the same code. For example, when one 0install program depends on another, it may specify restrictions on the acceptable versions. Here’s how we make
version_restriction objects to represent this (simplified):
1 2 3 4 5 6
This is not a class. It’s just a function that creates objects. It’s used like this:
test variable, which is like a private field in Java. It cannot be used from anywhere else, simply because it is not in scope. You can define functions here in the same way. OCaml does not allow accessing an object’s fields from outside (e.g.
restriction.expr in Java or Python), but you can make a field readable by writing a trivial getter for it. e.g. to expose
1 2 3 4 5 6 7
You can cast to a compatible (more restricted) type using
1 2 3
However, OCaml does not store the type information at runtime, so you cannot cast in the other direction. That is, given a
printable object, you cannot find out whether it really has a
meets_restriction method. This doesn’t seem to be a problem, since the places where I wanted to check for several possibilities were better handled with variants.
OK, so we can create objects with public methods, constructors, internal functions and state, and define types (interfaces). So what are classes for? The key seems to be this: Classes are all about (implementation) inheritance. If you don’t need inheritance, then you don’t need classes.
make_version_restriction to a class would look like this:
1 2 3 4 5 6 7 8
We just changed the
new version_restriction (in fact, there are some syntax restrictions when defining classes: a class body is a series of
let declarations followed by an object, whereas a function body is an arbitrary expression).
When you define a class (e.g.
version_restriction), OCaml automatically defines three other things:
- a class type (
- an object type (also called
version_restriction), defining the public methods
- an object type for subtypes (
The object type just defines the public methods provided by instances of the class. The class type also defines the API the class provides to its subclasses. Confusingly, OCaml calls this the “private” API (Java uses the term “protected” for this).
You can use
method private to declare a method that is only available to subclasses, and
val to declare fields (fields are always private). Methods can be declared as
virtual if they must be defined in subclasses (this is like
abstract in Java). A class with virtual methods must itself be virtual.
To inherit from a class, use:
1 2 3
Here’s an example from 0install: a
distribution object provides access to the platform-specific package manager, allowing 0install to query the native package database for additional candidates. Each distribution subclasses the base class. Here’s my first (wrong) attempt to do this with classes (simplified):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
The Python code in 0install maintains a cache of the dpkg database for quick access. The OCaml can query this cache, but can’t (currently) update it, so if the cache is out-of-date then it must fall back to the Python code.
This code doesn’t compile:
("super" in "super#is_installed package") Error: This expression has no method is_installed
If you’re used to other languages, you may have assumed, like me, that
class python_fallback_distribution : distribution means
distribution”. It doesn’t. It means that the class type of
python_fallback_distribution is identical to that of
python_distribution can’t see the
is_installed method, since it was virtual in
The solution here is simple: remove the
: distribution bits.
In fact, we don’t need a class for
debian_distribution at all: a simple object would do (we can still inherit, we just can’t let others inherit from us):
1 2 3 4 5 6 7 8 9
Notice that we declare the type of the object as
#distribution, ensuring that this is a subtype of it. For a plain object (like this), we could also use just
distribution, which would prevent us from adding any extra methods. When defining a class, you’d get an error if you did that, because restricting the type to
distribution would prevent subclassing in some cases (e.g. adding additional methods). For some reason, if you don’t declare a type at all then it defaults to something strange that sometimes causes confusing errors at compile time.
Problems with classes
Using classes causes a few extra problems. For example, this object
1 2 3 4 5 6
< classify : int -> [> `positive | `zero ] >
However, if you try to turn it into a class, you get:
Error: Some type variables are unbound in this type: class nat_classifier : object method classify : int -> [> `positive | `zero ] end The method classify has type int -> ([> `positive | `zero ] as 'a) where 'a is unbound
OCaml can see that this method only returns `positive` or `zero`, but that may be too restrictive for subclasses. e.g. an `int_classifier` subclass may wish to return `positive`, `negative` or `zero`. So you’ll need to declare the types explicitly in these cases.
Update: Sorry, the above is nonsense (as pointed out in the comments). You’ll get the same error if you just try to name the type:
# type t = < classify : int -> [> `positive | `zero ] >;; Error: A type variable is unbound in this type declaration. In method classify: int -> ([> `positive | `zero ] as 'a) the variable 'a is unbound
The type of the plain object is polymorphic (because it contains a
>, which indicates a (hidden) type variable). This allows it to adapt in certain ways. For example: if you had some code that expected to be given the type
[`positive | `negative | `zero] then our object would be compatible with that too (although it would never actually return
negative, of course).
To fix it, we can either specify a closed (non-polymorphic) return type:
1 2 3 4 5 6
Or we can list the type variable explicitly (allowing it to remain polymorphic):
1 2 3 4 5 6
1 2 3 4
< read_with : (in_channel -> 'a) -> 'a >
(i.e. it passes the open file to the given callback function and returns whatever that returns)
But if you try to use a class, you’ll get:
Error: Some type variables are unbound in this type: class file : object method read_with : (in_channel -> 'a) -> 'a end The method read_with has type (in_channel -> 'a) -> 'a where 'a is unbound
Again, you need to give the type explicitly in this case. Here, we probably want to use “universal quantification” to make the class non-polymorphic:
1 2 3 4 5
The answer to Stack Overflow’s When should objects be used in OCaml? starts:
As a general rule of thumb, don’t use objects.
Indeed, the OCaml standard library doesn’t appear to use objects at all.
However, they can be quite useful. In 0install, we use them to abstract over different kinds of restriction (version restrictions, OS restrictions, distribution restrictions), different platform package managers (Arch, Debian, OS X, Windows, etc), and to control access to the system, using
dryrun_system (which wraps a system, forwarding read operations but just logging writes, for
--dry-run mode), and
fake_system for unit-testing.
The main things to remember are that:
You often need to declare types explicitly, as the automatic type inference often can’t infer the type, or infers an incompatible type.
Classes and class types are about inheritance (the API exposed to subclasses), while object types are about the public API.
There are still some things I’m not sure about:
Is there any disadvantage to using plain objects rather than classes (when inheritance isn’t needed)? Is it considered good style to use classes everywhere, as the tutorial does?
When declaring argument types, whether to use
(system:system)(I need a
(system:#system)(the type of objects from subclasses of
system). In general, I don’t understand why we need separate types for these concepts.