I'm now written 15,000 lines of OCaml while migrating 0install to the language. So here's another "things I've learned" post...
The official objects tutorial offers a good introduction to using objects in OCaml, but it doesn't explain a number of important issues. Chapter 3 of the OCaml manual does explain everything, but I had to read it a few times to get it.
The manual notes that:
the relation between object, class and type in OCaml is very different from that in mainstream object-oriented languages like Java or C++, so that you should not assume that similar keywords mean the same thing.
Good advice. Coming from a Python/Java background, here are some surprising things about objects in OCaml:
- An object's type is not the same as its class.
- A class A can inherit from B without being a subclass.
- A class A can be a subclass of B without inheriting from it.
- You don't need to use classes to create objects.
I'm going to try explaining things in the opposite order to the official tutorial, starting with objects and adding classes later, as I think it's clearer that way. Classes introduce a number of complications which are not present without them.
Table of Contents
Creating objects
In Python, you create a single object by first defining a class and then creating an instance of it. In OCaml, you can just create an instance directly (Java can do this with anonymous classes).
For example, 0install code that interacts with the system (e.g. getting the current time, reading files, etc) does so by calling methods on a system
object. For unit-testing, we pass mock system objects, while when running normally we pass a singleton object which interacts with the real system. We can define the singleton like this (simplified):
1 2 3 4 5 |
|
Note that the exit
method is calling the built-in exit
function, not recursively calling itself. Calling a method has to be explicit, as in Python.
To call a method, OCaml uses #
rather than .
:
1
|
|
Initially, I defined time as method time () = Unix.time ()
, but this isn't necessary. Unlike for regular function definitions, the body of a method is evaluated each time it is called, even if it takes no arguments, not once when the object is created.
Object types
OCaml will automatically infer the type of real_system
as:
< exit : int -> 'a; time : float >
(note: exit
never returns, so it can be used anywhere, which is why it gets the generic return type 'a
)
This is not a class (nor even a class type). It's just a type.
Any object providing these two methods will be compatible with real_system
. There is no need to declare that you implement the interface.
You also don't need to declare the type when using the object. For example:
1 2 3 |
|
However, the automatic inference will often fail. In particular, if a method is defined with optional arguments then it will be incompatible:
1 2 3 4 5 6 7 8 9 10 |
|
Error: This expression has type < exit : ?code:int -> string -> 'a >
but an expression was expected of type < exit : string -> 'b; .. >
Types for method exit are incompatible
In a similar way, using labelled arguments will fail unless you use them in the same order everywhere. To avoid these problems, it seems best to define the type explicitly:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
As in Python, self
is explicit. However, it's attached to the object rather than to each method, and you can leave it out if you don't need it. I added it here in order to constrain its type to system
. I used _self
rather than self
to avoid the compiler warning about unused variables.
A puzzle
It seems to me that some object types can be inferred but not defined. Consider this interactive session:
1 2 3 4 5 |
|
However, we can't actually use the type it prints:
1 2 |
|
You can define this type:
1
|
|
But that's a different (and less useful) type:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
I'm not sure what causes these problems. You can, however, use the cast operator (:>
) to convert to the required type if it happens.
Update: x
does have a type, but it's polymorphic: 'a lengther
. OCaml has cleverly noticed that I don't actually store any 'a
values in the object, so it allows this single object to handle multiple types. For most objects, this will not be the case (for example, a mutable stack object could be an int stack
or a string stack
, but not both). For details, see my later post Polymorphism for beginners.
Creating many objects
Usually, you'll want to create many objects, sharing the same code. For example, when one 0install program depends on another, it may specify restrictions on the acceptable versions. Here's how we make version_restriction
objects to represent this (simplified):
1 2 3 4 5 6 |
|
This is not a class. It's just a function that creates objects. It's used like this:
1 2 |
|
Notice the test
variable, which is like a private field in Java. It cannot be used from anywhere else, simply because it is not in scope. You can define functions here in the same way. OCaml does not allow accessing an object's fields from outside (e.g. restriction.expr
in Java or Python), but you can make a field readable by writing a trivial getter for it. e.g. to expose expr
:
1 2 3 4 5 6 7 |
|
Casting
You can cast to a compatible (more restricted) type using :>
. e.g.
1 2 3 |
|
However, OCaml does not store the type information at runtime, so you cannot cast in the other direction. That is, given a printable
object, you cannot find out whether it really has a meets_restriction
method. This doesn't seem to be a problem, since the places where I wanted to check for several possibilities were better handled with variants.
Classes
OK, so we can create objects with public methods, constructors, internal functions and state, and define types (interfaces). So what are classes for? The key seems to be this: Classes are all about (implementation) inheritance. If you don't need inheritance, then you don't need classes.
Changing make_version_restriction
to a class would look like this:
1 2 3 4 5 6 7 8 |
|
We just changed the let
to class
and make_version_restriction
to new version_restriction
(in fact, there are some syntax restrictions when defining classes: a class body is a series of let
declarations followed by an object, whereas a function body is an arbitrary expression).
When you define a class (e.g. version_restriction
), OCaml automatically defines three other things:
- a class type (
version_restriction
) - an object type (also called
version_restriction
), defining the public methods - an object type for subtypes (
#version_restriction
)
The object type just defines the public methods provided by instances of the class. The class type also defines the API the class provides to its subclasses. Confusingly, OCaml calls this the "private" API (Java uses the term "protected" for this).
You can use method private
to declare a method that is only available to subclasses, and val
to declare fields (fields are always private). Methods can be declared as virtual
if they must be defined in subclasses (this is like abstract
in Java). A class with virtual methods must itself be virtual.
Using inheritance
To inherit from a class, use:
1 2 3 |
|
Here's an example from 0install: a distribution
object provides access to the platform-specific package manager, allowing 0install to query the native package database for additional candidates. Each distribution subclasses the base class. Here's my first (wrong) attempt to do this with classes (simplified):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
The Python code in 0install maintains a cache of the dpkg database for quick access. The OCaml can query this cache, but can't (currently) update it, so if the cache is out-of-date then it must fall back to the Python code.
This code doesn't compile:
("super" in "super#is_installed package")
Error: This expression has no method is_installed
If you're used to other languages, you may have assumed, like me, that class python_fallback_distribution : distribution
means
"python_fallback_distribution
extends distribution
". It doesn't. It means that the class type of python_fallback_distribution
is identical to that of distribution
. Therefore, python_distribution
can't see the is_installed
method, since it was virtual in distribution
.
The solution here is simple: remove the : distribution
bits.
In fact, we don't need a class for debian_distribution
at all: a simple object would do (we can still inherit, we just can't let others inherit from us):
1 2 3 4 5 6 7 8 9 |
|
Notice that we declare the type of the object as #distribution
, ensuring that this is a subtype of it. For a plain object (like this), we could also use just distribution
, which would prevent us from adding any extra methods. When defining a class, you'd get an error if you did that, because restricting the type to distribution
would prevent subclassing in some cases (e.g. adding additional methods). For some reason, if you don't declare a type at all then it defaults to something strange that sometimes causes confusing errors at compile time.
Problems with classes
Using classes causes a few extra problems. For example, this object
1 2 3 4 5 6 |
|
has type
< classify : int -> [> `positive | `zero ] >
However, if you try to turn it into a class, you get:
Error: Some type variables are unbound in this type:
class nat_classifier :
object method classify : int -> [> `positive | `zero ] end
The method classify has type int -> ([> `positive | `zero ] as 'a)
where 'a is unbound
OCaml can see that this method only returns `positive` or `zero`, but that may be too restrictive for subclasses. e.g. an `int_classifier` subclass may wish to return `positive`, `negative` or `zero`. So you'll need to declare the types explicitly in these cases.
Update: Sorry, the above is nonsense (as pointed out in the comments). You'll get the same error if you just try to name the type:
# type t = < classify : int -> [> `positive | `zero ] >;;
Error: A type variable is unbound in this type declaration.
In method classify: int -> ([> `positive | `zero ] as 'a)
the variable 'a is unbound
The type of the plain object is polymorphic (because it contains a >
, which indicates a (hidden) type variable). This allows it to adapt in certain ways. For example: if you had some code that expected to be given the type [`positive | `negative | `zero]
then our object would be compatible with that too (although it would never actually return negative
, of course).
To fix it, we can either specify a closed (non-polymorphic) return type:
1 2 3 4 5 6 |
|
Or we can list the type variable explicitly (allowing it to remain polymorphic):
1 2 3 4 5 6 |
|
Another example:
1 2 3 4 |
|
has type
< read_with : (in_channel -> 'a) -> 'a >
(i.e. it passes the open file to the given callback function and returns whatever that returns)
But if you try to use a class, you'll get:
Error: Some type variables are unbound in this type:
class file : object method read_with : (in_channel -> 'a) -> 'a end
The method read_with has type (in_channel -> 'a) -> 'a where 'a
is unbound
Again, you need to give the type explicitly in this case. Here, we probably want to use "universal quantification" to make the class non-polymorphic:
1 2 3 4 5 |
|
Conclusions
The answer to Stack Overflow's When should objects be used in OCaml? starts:
As a general rule of thumb, don't use objects.
Indeed, the OCaml standard library doesn't appear to use objects at all.
However, they can be quite useful. In 0install, we use them to abstract over different kinds of restriction (version restrictions, OS restrictions, distribution restrictions), different platform package managers (Arch, Debian, OS X, Windows, etc), and to control access to the system, using real_system
, dryrun_system
(which wraps a system, forwarding read operations but just logging writes, for --dry-run
mode), and fake_system
for unit-testing.
The main things to remember are that:
-
You often need to declare types explicitly, as the automatic type inference often can't infer the type, or infers an incompatible type.
-
Classes and class types are about inheritance (the API exposed to subclasses), while object types are about the public API.
There are still some things I'm not sure about:
-
Is there any disadvantage to using plain objects rather than classes (when inheritance isn't needed)? Is it considered good style to use classes everywhere, as the tutorial does?
-
When declaring argument types, whether to use
(system:system)
(I need asystem
object) or(system:#system)
(the type of objects from subclasses ofsystem
). In general, I don't understand why we need separate types for these concepts.