OCaml makes heavy use of parametric polymorphism (which you may also know as "generics" in other languages). The OCaml tutorials mention it from time to time, but the information is spread about over many articles and they don't go into much detail. I'm not a type theorist, just a Python/Java/C programmer who finds this stuff interesting. I wanted to write this guide while I still remember the things that confused me. I know several OCaml experts keep an eye on this blog, so hopefully any inaccuracies will be corrected in the comments.
Table of Contents
( This post is part of a series in which I am converting 0install from Python to OCaml, learning OCaml as I go. )
Subtyping
The first thing that confused me was that OCaml tends to use parametric polymorphism where other languages use subtyping ("subtype polymorphism"), so I'll start with a brief summary of subtyping and then show how OCaml uses parametric polymorphism to achieve similar ends.
Note: in the rest of this article I will always use "polymorphism" to mean "parametric polymorphism", which is the way the OCaml documentation uses it.
When you think of object oriented programming, you probably think of the types arranged in a tree. In fact, with multiple-inheritance of interfaces, the types form a lattice, which is easier to draw than to explain:
On the left, we have some example primitive types, int
and unit
.
In the middle, we have some GUI object types. widget
represents "things that can appear on the screen". button
and window
are types of widget
and a dialog
is a type of window
. For example, any function that can operate on windows can also operate on dialogs. A button
is both a widget
and an action
. In Java terms, we might write class Button implements Widget, Action
.
On the right, we have some variant types (enums). The type "yes or no" is a sub-type of "yes, no or maybe". For example, a function that can format a yes/no/maybe value as a string will also work on the simpler yes/no type.
The rule is that you can always safely cast a type to a super type (going upwards). However, casting in Java and OCaml work differently:
In Java:
-
Upcasting (converting to a type higher up in the lattice) is automatic and implicit. e.g. in
Widget w = new Dialog();
-
Downcasting requires an explicit cast, and may throw an exception:
Dialog d = (Dialog) w
.
In OCaml:
-
Upcasting must be explicit, e.g.
let w : widget = (new dialog :> widget)
. -
Downcasting is impossible (types are not recorded at runtime, so it wouldn't be able to check).
Here's an example that might surprise you if you're expecting automatic upcasts:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Error: This expression has type dialog but an expression was expected of type window
The second object type has no method get_response
OCaml won't let you pass a dialog to close_window
because it's a dialog
, not a window
and there are no automatic coercions in OCaml. However, OCaml does know the subtyping relationship and will therefore let you cast it:
1 2 3 |
|
Functions also have types and can be cast too, so here's another way to solve this problem:
1 2 3 |
|
Here, we upcast close_window
to the type dialog -> unit
and then pass it a dialog
.
By the way, notice that dialog
is a subtype of window
but dialog -> unit
is a super-type of window -> unit
. See covariance and contravariance for the details about that.
Top and bottom
The types top and bottom don't seem to exist in OCaml, but I included them for completeness and because they're conceptually interesting and you'll probably run across these terms when reading anything about types.
Top is a super-type of everything, like object
in Python (or Object
in Java, if you ignore unboxed types). Everything is a top and, therefore, knowing that something is a top tells you nothing at all. Because OCaml doesn't allow downcasting, there's not much you could do with a top in OCaml anyway.
Bottom is a sub-type of everything. A value of type bottom can be used as an int, a widget, a yes/no enum, etc. Needless to say, instances of this type don't actually exist.
Although OCaml doesn't have a bottom type, it achieves the same effect with polymorphism. For example, the result of these expressions can be used as any type you like:
exit 1
(exits the program)failwith msg
(throws an exception)let rec loop () = loop () in loop ()
(infinite loop)Obj.magic x
(unsafe cast; program may segfault if you get it wrong!)
So, this code compiles fine, even though we use bot
as an int
and as a string
:
1 2 3 4 5 |
|
Polymorphism
If you treat OCaml like Java then things mostly work fine; you just end up doing a lot of explicit casting. However, where Java uses implicit upcasts, OCaml generally prefers using (parametric) polymorphism.
A polymorphic value doesn't have a single concrete type. Instead, it can take on many different types as needed.
For example, the OCaml function Queue.create
can create queues of ints, queues of strings, etc. Its type is unit -> 'a Queue.t
, where 'a
is a type variable. When you want to use this function, you can use any type (e.g. int
or string
) as the value of 'a
, to get a function that makes queues of ints or queues of strings, as needed.
Note that unlike e.g. C++ templates, using polymorphism does not create any extra code. There is only ever one Queue.create
function compiled into your binary, not one for each type you use. The same generic code works for queues of ints and queues of strings.
In Java, upcasting types is implicit, while using polymorphism (generics) requires extra annotations. In OCaml, it's the other way around. Polymorphism is implicit, while upcasting requires annotations. So, in Java:
1 2 3 |
|
The <String>
parts show where we convert a generic type (list of X) to a concrete type (list of String). In OCaml, this happens implicitly:
1 2 3 |
|
(Note: I used a made-up linked_list
class rather than Queue
to keep this example similar to the Java)
This saves a lot of typing, which is good, but it also makes it far harder to understand what's going on. The way I think of it, OCaml adds type variables at certain points in the code and then uses type inference to work out what they are. So in this case, we have (note: this is not valid OCaml syntax):
1 2 3 |
|
Then OCaml infers that type t = string
.
Polymorphic objects
The problem with our close_window
function in the subtyping section was that we gave it a fixed concrete type (window -> unit
):
1
|
|
If we give it a polymorphic type then there's no problem:
1 2 3 4 5 |
|
Here, #window
means any type with at least the methods of window
(not to be confused with w#close
, which is a method call, not a type). The syntax is a bit confusing because OCaml hides the type variable by default. If you wanted to declare the type of close_window_poly
explicitly, you'd have to make the type variable explicit using as
. For comparison, here are the types of the two versions:
1 2 |
|
Thus the type window closer
is the type of functions that close a window
, while dialog closer
is the type of functions that close a dialog
. OCaml will automatically apply the appropriate type at compile time.
As OCaml will infer polymorphic types automatically, we can define and call the function without any type annotations at all:
1 2 3 |
|
Here is yet more new syntax! OCaml didn't infer that this requires a #window
, only that it requires something with a suitable close
method. Also, it couldn't infer that close
must return unit
, so it left the return type generic as 'a
.
The ..
indicates more polymorphism, with another hidden type variable. If we wanted to define the type for this function, it would be:
1
|
|
This is the polymorphic type for functions which take objects of type 'b
, where 'b
includes a close
method that returns an 'a
, and return an 'a
.
For example, the original close_window
function has the type (unit, window) closer2
.
What all this means is that defining objects and functions without explicit types usually works fine, but if you later try to add type annotations (e.g. by declaring an interface for your module in an .mli
file) then you're likely to remove the polymorphism accidentally unless you're careful. If you don't understand what happened, you'll end up doing a load of explicit casting to make things work again.
A good way around this is to use ocamlc -i
to generate an initial .mli
file with all the inferred types, with all the polymorphism still there.
In another bit of inconsistent syntax, when defining a class type you need to put the type parameter in brackets, but you don't when declaring an object type:
1 2 3 4 5 6 7 8 9 10 |
|
(see my previous Experiences With OCaml Objects post for more on objects and classes)
Polymorphic variants
The situation with variants (enums) is similar. Let's start with some non-polymorphic code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Error: This expression has type [ `black | `blue ]
but an expression was expected of type [ `black | `blue | `red ]
The first variant type does not allow tag(s) `red
Again, if you're expecting subtyping behaviour then this is confusing. If we can draw a balloon blue, black or red, why can't we draw it blue or black? Again, we can use an upcast (which is checked by the compiler and is entirely type-safe):
1 2 3 |
|
As before, OCaml generally expects you to use polymorphism instead. There's more new syntax for this:
1 2 3 4 5 |
|
Here [< `blue | `black | `red ]
means all subtypes of [ `blue | `black | `red ]
. The <
introduces another hidden type variable. If you wanted to define a type for this function explicitly, you could do it with:
1
|
|
Instead of <
, there's also >
, which means the variant must have at least the given elements.
As before, just removing the type annotations and letting OCaml infer the polymorphic type is easy:
1 2 3 4 |
|
Example : a polymorphic dialog box
Here's a neat polymorphic dialog box (based on lablgtk's GTK bindings):
1 2 3 4 5 6 |
|
1 2 3 4 5 6 |
|
OCaml will automatically infer 'a
as [`close | `delete]
- a dialog with close and delete responses.
OCaml will force you to handle every response code, so you can't add a button but forget to handle it!
The constraint 'a = [> `close]
line forces you to handle the close
response in all cases (because the user could always just close the window).
For another example, see my earlier post Option Handling With OCaml Polymorphic Variants.
Other issues
Monomorphic types
There's a subtle but important distinction between polymorphic types (which can be used to generate many concrete types) and monomorphic types (which OCaml uses to mean a single type that is currently undecided). Monomorphic types only occur while OCaml is still working out the types, so you'll only see them in compiler error messages or in the interactive toplevel. They look like regular type variables but start with an underscore. e.g.
1 2 3 4 5 |
|
Here, x
has a polymorphic type (using 'a
). It can be used as an empty list of ints, or as an empty list of strings, or both. Every time you use x
, you get to pick a type for it.
y
has a monomorphic type (using '_a
). It can be used as a mutable container of ints, or of strings, but not both. As soon as OCaml sees it used with a concrete type, it will assign that type for it:
1 2 3 4 5 6 7 8 9 10 11 |
|
Notice that the second time we print y
, OCaml has worked out the concrete type.
Note: The term "weakly polymorphic" seems to be used as an alias for "monomorphic" in some OCaml documentation.
Partial application loses polymorphism
Partially-applied functions lose their polymorphism. Consider:
1 2 3 4 5 6 7 8 |
|
fprintf
is a polymorphic function. It takes an output channel, a format string (with a polymorphic type) and values of the appropriate type. p1
retains the polymorphic type.
However, p2
, which partially applies the function to stdout
, has a monomorphic type ('_a
). You could use p2
to print a string, or to print an int, but you couldn't use it twice to do both.
p3
, which just makes the format argument explicit, is polymorphic again!
1 2 3 4 5 6 7 8 9 10 11 12 |
|
So, what's going on here?
The OCaml FAQ explains what to do about it (use p3
), but doesn't explain why.
As fprintf
has a really complicated type, let's switch to a simpler example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
logged_id msg x
prints the message and then just returns x
.
The polymorphism is lost in i2
because OCaml doesn't know, from the type of logged_id
, whether partially applying it will create any mutable state. Consider:
1 2 3 4 5 6 7 |
|
When logged_mem
is partially applied, it creates a mutable cell and returns a function that remembers the first value it is called with and always returns that.
The result of logged_mem "called"
cannot be polymorphic: it must always be called with the same argument type. Yet logged_mem
has the same type as logged_id
(string -> 'a -> 'a
), so OCaml can't distinguish the two cases.
OCaml assumes that every function call potentially creates mutable state. Therefore, the result of calling a function is never polymorphic. Normally that's what you want, but it can be surprising in the case of partial functions. The solution (i3
) is to avoid partial application and do a complete fresh invocation each time.
There are actually some cases where OCaml can turn a monomorphic result type back into a polymorphic one. For example:
1 2 3 4 5 |
|
Here, x
gets a polymorphic type despite being the result of a function call.
See Relaxing the Value Restriction for how it does that.
Universal qualification
This surprised me at first:
1 2 3 4 5 |
|
OCaml tells us that the identity function id
has the type 'a -> 'a
. OK. Then I declare another function with the same type, and OCaml tells me its type is int -> int
!
OCaml is hiding part of the type! The real type of the identity function is 'a. 'a -> 'a
(given 'a
, the type is 'a -> 'a
). Using this full type, we get the expected error:
1 2 3 4 5 6 |
|
The input to the polymorphic type expression (the 'a.
bit) can only go at the start of a type expression in OCaml. That means that you can't write a function that takes a polymorphic function as an argument. For example:
1 2 3 4 |
|
However, you can put them in object method and record field types:
1 2 3 4 5 6 7 |
|
Or:
1 2 3 4 5 6 7 8 |
|
So, you may have to wrap some things up in objects or records. If you use this in a method type, make sure you declare the type of the object you're creating. Otherwise, OCaml will infer the wrong type (it will assume that 'a
is scoped to the whole object, not a single method):
1 2 3 4 5 6 7 8 9 10 11 |
|
Error: This expression has type < id : 'a -> 'a >
but an expression was expected of type id_class
The universal variable 'a0 would escape its scope
You have to give the type at the point where you define the object:
1 2 3 4 |
|
Polymorphism in module signatures
When you define a module, you can (optionally) write an .mli
file giving it a more limited public interface.
In this interface, you can make types abstract, only expose certain functions, etc.
If you include a value with concrete type, the signature must be the same in the module and in the interface.
But when your module contains polymorphic values, you are allowed to limit the polymorphism in the module signature. For example:
1
|
|
1 2 3 4 5 |
|
Inside the module, close_window
will close anything with a close
method and return whatever it returns.
But outside the module, close_window
can only close subclasses of dialog
(and always returns unit).
Cheat-sheet
Some concrete types:
Type expression | Meaning |
---|---|
int list | a list of integers |
int -> int | a function that takes an int and returns an int |
widget | the concrete type widget exactly |
<close : unit> | an object type with a single close method |
[`red|`green] | the variant type "red or green" |
Some polymorphic types (supplying the type parameter 'a
will produce some concrete type):
Type expression | Can produce types for... |
---|---|
'a list | any list of items, each of the same type |
'a -> 'a | any function that returns something of the same type as its input |
(#widget as 'a) | any type that includes all the methods of widget |
(<close : unit; ..> as 'a) | any type with at least a close : unit method |
([< `red|`green] as 'a) | any variant with some subset of red and green as options |
([> `red|`green] as 'a) | any variant with some superset of red and green as options |
You can omit the as 'a
bits, unless you need to refer to 'a
somewhere else.
Summary
Java uses subtyping by default (automatically), with extra syntax for using generics (polymorphism). OCaml uses polymorphism by default, with extra syntax for using subtypes. Polymorphism is powerful, but can be confusing. If you allow OCaml to infer types, it will infer polymorphic types automatically and everything should work, but you'll want to understand polymorphism when you come to writing module signatures and you need to write out the types.
When the compiler and the interactive OCaml interpreter display polymorphic types, they frequently omit the type variables, which can make learning OCaml more difficult. Polymorphic object types, class types and variants all use different syntax to indicate polymorphism, which can also be confusing.
OCaml uses "monomorphic type" to mean a single (non-polymorphic) type which has not yet been inferred. Monomorphic types occur when you create mutable state or call functions (which may create mutable state internally). This explains why partially applying a function loses its polymorphism.
OCaml does not allow functions that take polymorphic arguments (arguments that remain polymorphic within the function, rather than being resolved to a particular concrete type when the function is called). However, you can work around this using record or object types.
When defining a module's signature (its external API), you can't change concrete types but you can expose less polymorphism in polymorphic types if you want.